Archive for August, 2012

Using UIMA-AS to run UIMA annotators in parallel

Friday, August 24th, 2012


UIMA stands for Unstructured Information Management Architecture. It’s an Apache technology that provides a framework and standard for building text analytics applications. I’ve mentioned it before.

In this post, I want to talk about an area of UIMA which isn’t covered well in the documentation.

I couldn’t find practical getting-started instructions for running UIMA-AS annotators in parallel. In this post I want to discuss why you might want to do it, and share some simple sample code to show how.

Background – the UIMA pipeline

UIMA provides a framework for managing a text analytics application. You break up the analytics functionality into discrete pieces called annotators. UIMA takes care of moving a text document through an analytics engine: a pipeline containing a series of annotators.

A document goes in one end of the pipeline, passes through a number of annotators, each of which adds some metadata to the document. What comes out the other side of the pipeline is an annotated copy of the document.

By default, you get UIMA to run these annotators one at a time – one after another.

Background – annotators in parallel

What if your annotators are quite slow – perhaps they take several seconds to run?

If there is no dependency between any or all of your annotators, then maybe running them one at a time isn’t the most efficient approach.

You can run all of them at the same time, in parallel. UIMA will merge the output from all of the annotators into a single annotated document.

My sample code

I’ve written two sample UIMA apps. Each demonstrates one of these approaches, to compare and contrast.

They are divided into three eclipse projects. You can import them into an eclipse IDE.

The UIMA eclipse plugins are very helpful if you want to make changes to the XML configuration files, but they’re not essential. If you want them, there are instructions on how to install them at

I’ve added comments to the sample code to explain how the apps work, but I’ll give an overview here.


Implementing a text box for entering tags in a dojo web app

Sunday, August 19th, 2012

I needed a text box for entering tags on a dojo web app. I ended up making my own – it was only a hundred or so lines of code, but I’m sharing it here as it might be useful to others.

The text box needed to provide auto-complete when you start typing something that matches an existing tag. Dojo already has a text box widget that does auto-complete : dijit.form.ComboBox – so I started from there, modifying it’s behaviour so that

  • the options it offers are based on the current tag you’re typing (instead of the whole contents of the text box)
  • if you pick one of the options, it only replaces the current tag you’re typing with what you select (instead of replacing the whole contents)

See it in action in this short video clip.

Because I’ve based it on dijit.form.ComboBox, I also get a bunch of features for free, including that options it offers are based on the contents of a data store, which can be backed by a REST API.

This supports paging, which means my REST API doesn’t have to return all of the tags – just enough to populate the visible bit of the drop-down list. I’m using Lucene to implement filtering in the REST API, so it can quickly return a subset of tags that matches what the user has started typing. I don’t need to download everything and filter it client-side – it can be smarter and more efficient than that.

That said, this might be overkill for some needs – you can easily create a client-side store in memory, without needing to write a REST API to back it.


Preventing Internet Explorer from using Compatibility View

Wednesday, August 15th, 2012

I’ve had some trouble with Internet Explorer recently.

I was making a new web tool which looked fine in all browsers. Except Internet Explorer, where it looked a bit squiffy.

Internet Explorer has “Compatibility View”. Compatibility View makes IE behave like the older versions of Internet Explorer, the ones before Microsoft started paying more attention to web standards.

It makes sense – there are a lot of websites out there that were written to render well on old versions of Internet Explorer, and Microsoft needed to make the move to standards compliance in a way that doesn’t break all of them.

The problem is, Compatibility View can be a little… insistent.

It kept turning on, even though I didn’t want it, even though my site worked fine in new shiny standards mode, and looked horribly broken in Compatibility View.

You can manually disable it, but I don’t want to have to make users do that. As the web developer, I want to be able to disable it – to tell IE that I want the site to be rendered in standards mode.

It was a bit fiddly. Here’s how I did it.


Grace’s Olympics Scratch

Sunday, August 12th, 2012

Grace has been starting to get to grips with Scratch – a visual programming tool aimed at children.

She seems to like it so far, and has made a couple of little animal games. She’s also made a more topical animation

Grace’s Olympics animation


  • Green flag means Play from the beginning.
  • Red dot means Stop.
  • The full-screen button in the top-left seems to need you to Shift-click to work.

Scratch’s web player is Flash-based. If you’re on a mobile or other non-Flash-friendly device, sorry – you’re missing something awesome 😉

It doesn’t seem to like Internet Explorer very much. If you’re on IE, what you’re seeing is pretty broken. There is a Java applet version that seems to work better on IE though.


How to make a phone call from “Microsoft”

Saturday, August 11th, 2012

Step 1: Establish trust
Reassure the person you’ve phoned by saying that you’re calling from Microsoft and that you’re Microsoft Certified

Step 2: Introduce a problem
Make them a little nervous by saying that, as you’re calling from Microsoft, because of the “international routing” you can tell their computer is infected with “malacious” viruses. Explain that you’ve been receiving error reports from “the computer” at this phone number and that it is urgent that you fix it.

Step 3: Panic
Scare the crap out of the person you’ve called by getting the user to navigate to C:\Windows\inf . Explain that “inf” stands for infected, and that these are viruses. Exclaim in horror that, as it has so many files and folders in there, this machine is badly infected.

Step 4: Save the day
Start to sound reassuring by reminding the person that you’re Microsoft certified and can fix it.


Using a screen reader

Saturday, August 4th, 2012

What might it be like to read the BBC News website with a screen reader?
I thought this was interesting.

Using a screen reader :: YouTube video

Imagine if you needed to rely on a screen-reader to use the Internet. Seriously – give it a try.

Start the video playing, put the volume up a bit, and *shut your eyes*.

Try and follow along.

What’s it like? Imagine if you’d not seen the page before, and had to try and figure out the structure of the page from what is read out.

Choose a story that you’d want to click on, and without looking, try and work out how many times you’d need to press up/down/tab to get to it.


Using JMX to monitor UIMA running in a servlet

Wednesday, August 1st, 2012


A quick howto for if you’re running UIMA in a servlet, and want to be able to monitor your AE performance using JMX


I’ve mentioned JMX before. Basically, a Java app can expose information and methods through a standard interface. Tools like jconsole, which come with Java, can then be used to monitor and administer the Java app.

UIMA (Unstructured Information Management Architecture) is an Apache project, providing a standards-based way to perform analytics on unstructured text. It hosts a pipeline of annotators: individual components each performing a specific text analytics task. As a document moves down the pipeline UIMA runs each of the annotators on the document. Each annotator adds it’s own annotations for the things it looks for in the text.


UIMA supports JMX. UIMA registers an MBean for each annotator, letting you see the performance info for each annotator. In a pipeline of several annotators, it lets you see (amongst other things) how much time your document is spending in each annotator.


In a stand-alone UIMA application, you basically get this for free. Start the application with the standard Java -D property for enabling JMX:

It is ready to let jconsole connect to it.