code « dale lane

Archive for the ‘code’ Category

Avoiding monitor contention in Java’s Double parseDouble

Saturday, November 30th, 2013

Overview

You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567” into a number representation.

I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.

In this post, I’ll explain why and what I did about it.

Background (skip this if you don’t care why I had this problem!)

I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).

For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:

<myElement myRawScore="1.2345678" myThisScore="2.46801357" myThatScore="1.35792468" ...

Thousands of XML files, each containing several thousand numbers in String form.

As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.

I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.

This took *ages*.

(more…)

Tags:java, uima
Posted in code | Comments Closed

Using UIMA-AS to run UIMA annotators in parallel

Friday, August 24th, 2012

Overview

UIMA stands for Unstructured Information Management Architecture. It’s an Apache technology that provides a framework and standard for building text analytics applications. I’ve mentioned it before.

In this post, I want to talk about an area of UIMA which isn’t covered well in the documentation.

I couldn’t find practical getting-started instructions for running UIMA-AS annotators in parallel. In this post I want to discuss why you might want to do it, and share some simple sample code to show how.

Background – the UIMA pipeline

UIMA provides a framework for managing a text analytics application. You break up the analytics functionality into discrete pieces called annotators. UIMA takes care of moving a text document through an analytics engine: a pipeline containing a series of annotators.

A document goes in one end of the pipeline, passes through a number of annotators, each of which adds some metadata to the document. What comes out the other side of the pipeline is an annotated copy of the document.

By default, you get UIMA to run these annotators one at a time – one after another.

Background – annotators in parallel

What if your annotators are quite slow – perhaps they take several seconds to run?

If there is no dependency between any or all of your annotators, then maybe running them one at a time isn’t the most efficient approach.

You can run all of them at the same time, in parallel. UIMA will merge the output from all of the annotators into a single annotated document.

My sample code

I’ve written two sample UIMA apps. Each demonstrates one of these approaches, to compare and contrast.

They are divided into three eclipse projects. You can import them into an eclipse IDE.

The UIMA eclipse plugins are very helpful if you want to make changes to the XML configuration files, but they’re not essential. If you want them, there are instructions on how to install them at uima.apache.org.

I’ve added comments to the sample code to explain how the apps work, but I’ll give an overview here.

(more…)

Tags:java, jms, uima, uima-as
Posted in code | 1 Comment »

Implementing a text box for entering tags in a dojo web app

Sunday, August 19th, 2012

I needed a text box for entering tags on a dojo web app. I ended up making my own – it was only a hundred or so lines of code, but I’m sharing it here as it might be useful to others.

The text box needed to provide auto-complete when you start typing something that matches an existing tag. Dojo already has a text box widget that does auto-complete : dijit.form.ComboBox – so I started from there, modifying it’s behaviour so that

the options it offers are based on the current tag you’re typing (instead of the whole contents of the text box)
if you pick one of the options, it only replaces the current tag you’re typing with what you select (instead of replacing the whole contents)

See it in action in this short video clip.

Because I’ve based it on dijit.form.ComboBox, I also get a bunch of features for free, including that options it offers are based on the contents of a data store, which can be backed by a REST API.

This supports paging, which means my REST API doesn’t have to return all of the tags – just enough to populate the visible bit of the drop-down list. I’m using Lucene to implement filtering in the REST API, so it can quickly return a subset of tags that matches what the user has started typing. I don’t need to download everything and filter it client-side – it can be smarter and more efficient than that.

That said, this might be overkill for some needs – you can easily create a client-side store in memory, without needing to write a REST API to back it.

(more…)

Tags:combobox, dijit, dojo, tags, ui, web
Posted in code | Comments Closed

Preventing Internet Explorer from using Compatibility View

Wednesday, August 15th, 2012

I’ve had some trouble with Internet Explorer recently.

I was making a new web tool which looked fine in all browsers. Except Internet Explorer, where it looked a bit squiffy.

Internet Explorer has “Compatibility View”. Compatibility View makes IE behave like the older versions of Internet Explorer, the ones before Microsoft started paying more attention to web standards.

It makes sense – there are a lot of websites out there that were written to render well on old versions of Internet Explorer, and Microsoft needed to make the move to standards compliance in a way that doesn’t break all of them.

The problem is, Compatibility View can be a little… insistent.

It kept turning on, even though I didn’t want it, even though my site worked fine in new shiny standards mode, and looked horribly broken in Compatibility View.

You can manually disable it, but I don’t want to have to make users do that. As the web developer, I want to be able to disable it – to tell IE that I want the site to be rendered in standards mode.

It was a bit fiddly. Here’s how I did it.

(more…)

Tags:html, internet explorer
Posted in code | 4 Comments »

Using JMX to monitor UIMA running in a servlet

Wednesday, August 1st, 2012

Overview

A quick howto for if you’re running UIMA in a servlet, and want to be able to monitor your AE performance using JMX

Background

I’ve mentioned JMX before. Basically, a Java app can expose information and methods through a standard interface. Tools like jconsole, which come with Java, can then be used to monitor and administer the Java app.

UIMA (Unstructured Information Management Architecture) is an Apache project, providing a standards-based way to perform analytics on unstructured text. It hosts a pipeline of annotators: individual components each performing a specific text analytics task. As a document moves down the pipeline UIMA runs each of the annotators on the document. Each annotator adds it’s own annotations for the things it looks for in the text.

UIMA and JMX

UIMA supports JMX. UIMA registers an MBean for each annotator, letting you see the performance info for each annotator. In a pipeline of several annotators, it lets you see (amongst other things) how much time your document is spending in each annotator.

jconsole

In a stand-alone UIMA application, you basically get this for free. Start the application with the standard Java -D property for enabling JMX:

-Dcom.sun.management.jmxremote

It is ready to let jconsole connect to it.

(more…)

Tags:java, jmx, servlet, tomcat, uima
Posted in code | Comments Closed

Has today been a good day?

Monday, April 16th, 2012

Last week, I came up with a quick hack, explained quite neatly by @crouchingbadger:

Dale Lane’s TV watches him. It knows if he’s happy or surprised or sad. This is amazing. dalelane.co.uk/blog/?p=2092 (via @libbymiller)

— Ben Ward (@crouchingbadger) April 13, 2012

It was a bit of fun, even if it did seem to convince a group of commenters on engadget that I was a rage-fuelled XBox gamer. 🙂

There’s one big limitation with the hack, though: I don’t spend that much of my day in front of the TV.

It’s interesting to use it to measure my reactions to specific TV programmes or games. But thinking bigger, it’d be cool to try a hack that monitors me throughout the day to measure what kind of day I’m having.

I don’t spend much time in front of the TV, but I do spend a *lot* of time in front of my Macbook. And it has a camera, too!

What if my MacBook could look out for my face, and whenever it can see it, monitor what facial expression I have and whether I’m smiling? And while I’m at it, as I’ve been playing with sentiment analysis recently, add in whether the tweets I post sound positive or neutral.

Add that together, and could I make a reasonable automated estimate as to whether I’m having a good day?

(more…)

Tags:face, face.com, isight, last.fm, qtjava
Posted in code | 3 Comments »

Smile!

Tuesday, April 3rd, 2012

The visualisations on this page need Flash and Javascript. Apologies if that means most of this page doesn’t work for you!

This is my mood (as identified from my facial expressions) over time while watching Never Mind the Buzzcocks.

The green areas are times where I looked happy.

This shows my mood while playing XBox Live. Badly.

The red areas are times where I looked cross.

I smile more while watching comedies than when getting shot in the head. Shocker, eh?

(more…)

Tags:eightbar, face, python, webcam
Posted in code | 21 Comments »

Avoiding my Lucene TooManyClauses exceptions

Tuesday, March 20th, 2012

Before I start, I should point out that I’m not a Lucene expert. This post isn’t a definitive “you should do things this way” commandment from a Lucene mage. Think of it more as “I had this problem, and this seemed to work for me. I’m sharing it in case it helps you, too”.

I’m using Lucene to implement searches. Recently, as my Lucene index has grown (a lot), I was getting a lot of these errors when I tried to do a search:

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
    at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:163)
    at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:154)
    at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
    at org.apache.lucene.search.WildcardQuery.rewrite(WildcardQuery.java:54)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:383)
    at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:383)
    at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
    at org.apache.lucene.search.Query.weight(Query.java:94)
    at org.apache.lucene.search.Searcher.createWeight(Searcher.java:185)
    at org.apache.lucene.search.Searcher.search(Searcher.java:86)

I’m guessing that TooManyClauses is a common problem for people getting going with Lucene.

It’s mentioned in the FAQ, and there are a few StackOverflow threads around about it.

But I couldn’t find a straightforward “you need to follow these steps to fix it” post anywhere, so I’ll add my experience here.

(more…)

Tags:lucene, maxclausecount, toomanyclauses
Posted in code | Comments Closed

dale lane

Archive for the ‘code’ Category

Avoiding monitor contention in Java’s Double parseDouble

Using UIMA-AS to run UIMA annotators in parallel

Implementing a text box for entering tags in a dojo web app

Preventing Internet Explorer from using Compatibility View

Using JMX to monitor UIMA running in a servlet

Has today been a good day?

Smile!

Avoiding my Lucene TooManyClauses exceptions

Pages

Archives

Disclaimer