Text analytics in BlueMix using UIMA

April 13th, 2014

In this post, I want to explain how to create a text analytics application in BlueMix using UIMA, and share sample code to show how to get started.

First, some background if you’re unfamiliar with the jargon.

What is UIMA?

UIMA (Unstructured Information Management Architecture) is an Apache framework for building analytics applications for unstructured information and the OASIS standard for content analytics.

I’ve written about it before, having used it on a few projects when I was in ETS, and on other side projects since such as building a conversational interface to web pages.

It’s perhaps better known for providing the architecture for the question answering system IBM Watson.

What is BlueMix?

BlueMix is IBM’s new Platform-as-a-Service (PaaS) offering, built on top of Cloud Foundry to provide a cloud development platform.

It’s in open beta at the moment, so you can sign up and have a play.

I’ve never used BlueMix before, or Cloud Foundry for that matter, so this was a chance for me to write my first app for it.

A UIMA “Hello World” for BlueMix

I’ve written a small sample to show how UIMA and BlueMix can work together. It provides a REST API that you can submit text to, and get back a JSON response with some attributes found in the text (long words, capitalised words, and strings that look like email addresses).

The “analytics” that the app is doing is trivial at best, but this is just a Hello World. For now my aim isn’t to produce a useful analytics solution, but to walk through the configuration needed to define a UIMA analytics pipeline, wrap it in a REST API using Wink, and deploy it as a BlueMix application.

When I get a chance, I’ll write a follow-up post on making something more useful.

You can try out the sample on BlueMix as it’s deployed to bluemix.net

The source is on GitHub at github.com/dalelane/bluemixuima.

In the rest of this post, I’ll walk through some of the implementation details.

Read the rest of this entry »

Starting a Code Club

February 20th, 2014

logo This year I started a Code Club at my local primary school.

It’s still early days for me (I’ve only run four sessions of the Club so far) so I’m obviously not an expert on this stuff. But I thought I’d share some of my first impressions as a volunteer.


What is Code Club about?

If you’ve not heard of it before, Code Club is about giving children aged 9 – 11 a chance to try computer programming.

“A nationwide network of volunteer-led after school coding clubs for children aged 9-11″

It isn’t something that they normally cover in primary school (in theory, this should all change from September 2014 with Year of Code, but we’ll see how that works out), so Code Club is an attempt to introduce programming in primary school, rather than wait until it gets introduced in Secondary school.

Read the rest of this entry »

Dear Fitbit, I lost my tracker…

February 9th, 2014

I lost my Fitbit today. It fell off my trousers when I was out for a long walk while the kids rode their bikes, and I didn’t notice. Boo. :-(

But I found it. Yay! :-)

A few thoughts for Fitbit about this:

Belt clips

I don’t like the Fitbit One‘s belt clip as much as the Ultra‘s belt holster. It’s stronger, and less likely to snap (which is what happened to my Ultra and why I ended up having to get the One). But it’s not as effective. It’s hard to attach it to many of my clothes, which was never the case with the Ultra. And it falls off. This isn’t the first time it’s fallen off, although it’s the first time it’s happened without me noticing.

Design-wise, I think it needs reconsidering.

Knowing when I lost it

We’d been out for hours. I had no idea when it had gone missing.

But the Fitbit app on my Nexus 4 background syncs with my Fitbit. I checked my phone.

Last synced: 40 minutes ago.

That was a big clue. I knew where we’d been and could retrace my steps. Knowing how fast we’d been going and that my phone had last seen the Fitbit 40 minutes before gave me a rough idea of where it might be.

But Fitbit, if your app stored a location with each sync, and could show me a map, that would’ve been so much better! I guess you need to think about the battery implications for my phone, but even a rough large-radius location estimate would’ve been appreciated.

Using my phone as a fitbit-detector

When I tried to manually get the app to sync with the Fitbit, it threw a “Tracker not found” error.

I retraced my steps, repeatedly hitting the sync button. My idea was that once I was within Bluetooth range (What is Bluetooth’s range outside? A dozen metres?) my phone would sync, and I’d know I was close.

Read the rest of this entry »

Creating an iCalendar from online timetables

January 19th, 2014

I’m a member of my local swimming pool, Fleming Park. I’m trying to swim a lot at the moment (as it’s a big help for my back).

I don’t have a regular schedule, I just try and squeeze in time for a swim any time I can spare. This means I’ve not learned the pool’s schedule and frequently have to check their website to find when the pool is available.

I’m checking it so frequently that it’s one of my Most Visited thumbnails in Chrome.

This isn’t efficient, particularly as it’s normally on my phone making me switch between the browser and Calendar apps. It’d be quicker and easier if I had the timetable in my calendar alongside my appointments, so I could easily see when I’m free and the pool is open.

The leisure centre doesn’t provide a feed so I can subscribe and add their schedule to my calendar.

So I made my own.

dalelane.co.uk/…/swimflemingpark.ics

If you use the Fleming Park pool, import this in your Calendar app (or subscribe to it from Google Calendar) and the next week’s pool timetable will be kept up to date in your calendar.

Read the rest of this entry »

A scheduler for Remember The Milk

January 4th, 2014

A quick tool I made for setting due dates of tasks in Remember The Milk by dragging them onto a calendar

I’ve mentioned Remember The Milk (RTM) before – the online to-do list manager. I’ve been using it for years.

My workflow has settled into:

  1. Capture
    Any time I think of something that I’ll probably need to do, it gets thrown into RTM. Then I relax knowing it won’t get forgotten.
  2. Plan
    Periodically review everything in RTM, working out what needs to be done soon, what can wait, and so on.

The RTM web application interface doesn’t suit my approach to scheduling tasks. I need a different view for planning and triaging.

So I made one.

Remember The Milk scheduler

Read the rest of this entry »

Standing desks

December 10th, 2013

Standing deskI’ve switched to a standing desk for work, and thought I’d share my experiences with it.

I tend to get involved in a variety of things at work, but I’m primarily a developer. I’m a code monkey.

Traditionally, this has involved a lot of sitting. Not just a lot of sitting overall, but for long periods, too. I couldn’t tell you the number of times that I’ve been hunched over a laptop and lost track of time… looking up in surprise hours later.

There has been a lot of press about how bad this is for me. Excessive sitting is lethal. Sitting for an hour does more to shorten your life than smoking a cigarette.

I figured that as long as I went for a run and kept my weight down, that would make up for it. There is also research that says this doesn’t work and that sitting causes harm which isn’t undone by a bit of exercise every day.

I’d ignored more or less all of this.

But then I screwed up my back and sitting for long periods wasn’t an option. Working on a standing desk started as a necessity, but now that I’ve gotten into it, I wish I’d started years ago.

I don’t want to sound like a zealot or try to convert people to it. I just want to share what it’s been like.

Read the rest of this entry »

Avoiding monitor contention in Java’s Double parseDouble

November 30th, 2013

Overview

You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567″ into a number representation.

I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.

In this post, I’ll explain why and what I did about it.

Background (skip this if you don’t care why I had this problem!)

I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).

For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:

<myElement
    myRawScore="1.2345678"
    myThisScore="2.46801357"
    myThatScore="1.35792468"
    ...

Thousands of XML files, each containing several thousand numbers in String form.

As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.

I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.

This took *ages*.

Read the rest of this entry »

Making handwritten notes on an iPad mini

November 28th, 2013

I’ve had a Bamboo stylus for my iPad mini for a while now. I’ve used it for sketching and rough diagrams but only recently started using it for making handwritten notes.

It’s not immediately obvious that it’d really work. The iPad touchscreen is designed for use by pudgy human fingers, so that’s what the stylus mimics. You don’t get a fine point for precision drawing, you get a big fat rounded end. (As an aside, this is something that the Surface gets right – a proper active pressure-sensitive stylus is very cool. But anyway…) So your handwriting has to end up really big – like trying to make notes with a child’s chunky crayon.

And the iPad mini screen is so small that you don’t have much room to write.

I ended up carrying a Moleskine notebook and pen around as well – making handwritten in notes in that and then taking photos of it with the iPad. It’s a bit of a kludgy and time-consuming workaround.

I’ve started using Penultimate instead, and it’s pretty neat. It makes up for the limitations of handwriting by giving you a zoomed-in view of a bit of the screen, and scrolling that view around for you automatically as you write – matching the speed of your handwriting. And it’s reasonably good at knowing how how to ignore a wrist resting on the screen.


Camera work by Grace, book to copy provided by Faith :-)

Read the rest of this entry »