Archive for the ‘code’ Category

Watson News Companion

Friday, December 12th, 2014

newscompanion screenshotWe recently ran a hackathon at work: people within IBM were invited to try building a mobile app aimed at consumers using Watson services. It was a fun chance to try out some new ideas, as well as to build something using our APIs – dogfooding is always a good thing.

I worked on a hack with David which we submitted on Wednesday. This is what we came up with, and how we built it.

The idea

A mobile app that will help users to digest the news by explaining references in stories and providing greater context.

Background

It’s difficult to find the time nowadays to properly read and understand what’s going on in the world. We rarely have the time to sit and read through a newspaper. Instead, we might quickly read news stories online from our smartphones and tablets. But that often makes it difficult to understand the broader context that a story is in. There might be references in the story to people, places, organisations or events that are unfamiliar.

Watson could help. It could be an assistant as you read the news, explaining unfamiliar references and the broader context.

Features

Our Watson News Companion demo is a mobile news reader app that:

  • anticipates questions and suggests areas where it can help improve understanding
  • provides answers to questions without needing the users to lose their place in the story
  • allow the user to dig deeper with their own follow-up questions


A video walkthrough of the hack

(more…)

Comparing XML files ignoring order of attributes and child elements

Monday, October 6th, 2014

I need to diff some XML files.

For these particular XML files, order is not important. The XML is being used to contain a set of things, not a list – the order of the elements has no significance. Similarly, the order of the attributes within each element isn’t significant.

For example, for my purposes, these two XML files are equivalent:

<myroot>
    <mychild id="123">
        <fruit>apple</fruit>
        <test hello="world" brackets="angled" question="answers"/>
        <comment>This is a comment</comment>
    </mychild>
    <mychild id="456">
        <fruit>banana</fruit>
    </mychild>
    <mychild id="789">
        <fruit>orange</fruit>
        <test brackets="round" hello="greeting">
            <number>111</number>
        </test>
        <dates>
              <modified>123</modified>
              <created>253</created>
              <accessed>44</accessed>
        </dates>
    </mychild>
</myroot>
<myroot>
    <mychild id="789">
        <fruit>orange</fruit>
        <test hello="greeting" brackets="round">
            <number>111</number>
        </test>
        <dates>
              <accessed>44</accessed>    
              <modified>123</modified>
              <created>253</created>
        </dates>
    </mychild>
    <mychild id="123">
        <test question="answers" hello="world" brackets="angled"/>
        <comment>This is a comment</comment>
        <fruit>apple</fruit>
    </mychild>
    <mychild id="456">
        <fruit>banana</fruit>
    </mychild>
</myroot>

I needed to compare some large XML files, which have big differences in the order of elements, and I couldn’t find a tool that would do the job. So I wrote a bit of Python to do it for me.

(more…)

Using Node.js to create a REST API around a SQL database

Sunday, June 15th, 2014

A few code snippets for how you can quickly stand up a SQL database, and provide a REST API for DB read/writes

I was helping out a team at a hackday hosted at Hursley last week. One of the things they wanted for their hack was a SQL database to put sensor data in, which they could access via a REST API. And they wanted it in node.js.

I’d never used Node before, so I used this as a chance to give myself a first crash-course.

I’m not saying this is the way to do this in Node, as it’s the result of my first hour’s tinkering. But it worked, and I mostly wanted to share how quick and easy it was.

(more…)

Text analytics in BlueMix using UIMA

Sunday, April 13th, 2014

In this post, I want to explain how to create a text analytics application in BlueMix using UIMA, and share sample code to show how to get started.

First, some background if you’re unfamiliar with the jargon.

What is UIMA?

UIMA (Unstructured Information Management Architecture) is an Apache framework for building analytics applications for unstructured information and the OASIS standard for content analytics.

I’ve written about it before, having used it on a few projects when I was in ETS, and on other side projects since such as building a conversational interface to web pages.

It’s perhaps better known for providing the architecture for the question answering system IBM Watson.

What is BlueMix?

BlueMix is IBM’s new Platform-as-a-Service (PaaS) offering, built on top of Cloud Foundry to provide a cloud development platform.

It’s in open beta at the moment, so you can sign up and have a play.

I’ve never used BlueMix before, or Cloud Foundry for that matter, so this was a chance for me to write my first app for it.

A UIMA “Hello World” for BlueMix

I’ve written a small sample to show how UIMA and BlueMix can work together. It provides a REST API that you can submit text to, and get back a JSON response with some attributes found in the text (long words, capitalised words, and strings that look like email addresses).

The “analytics” that the app is doing is trivial at best, but this is just a Hello World. For now my aim isn’t to produce a useful analytics solution, but to walk through the configuration needed to define a UIMA analytics pipeline, wrap it in a REST API using Wink, and deploy it as a BlueMix application.

When I get a chance, I’ll write a follow-up post on making something more useful.

You can try out the sample on BlueMix as it’s deployed to bluemix.net

The source is on GitHub at github.com/dalelane/bluemixuima.

In the rest of this post, I’ll walk through some of the implementation details.

(more…)

Creating an iCalendar from online timetables

Sunday, January 19th, 2014

I’m a member of my local swimming pool, Fleming Park. I’m trying to swim a lot at the moment (as it’s a big help for my back).

I don’t have a regular schedule, I just try and squeeze in time for a swim any time I can spare. This means I’ve not learned the pool’s schedule and frequently have to check their website to find when the pool is available.

I’m checking it so frequently that it’s one of my Most Visited thumbnails in Chrome.

This isn’t efficient, particularly as it’s normally on my phone making me switch between the browser and Calendar apps. It’d be quicker and easier if I had the timetable in my calendar alongside my appointments, so I could easily see when I’m free and the pool is open.

The leisure centre doesn’t provide a feed so I can subscribe and add their schedule to my calendar.

So I made my own.

dalelane.co.uk/…/swimflemingpark.ics

If you use the Fleming Park pool, import this in your Calendar app (or subscribe to it from Google Calendar) and the next week’s pool timetable will be kept up to date in your calendar.

(more…)

A scheduler for Remember The Milk

Saturday, January 4th, 2014

A quick tool I made for setting due dates of tasks in Remember The Milk by dragging them onto a calendar

I’ve mentioned Remember The Milk (RTM) before – the online to-do list manager. I’ve been using it for years.

My workflow has settled into:

  1. Capture
    Any time I think of something that I’ll probably need to do, it gets thrown into RTM. Then I relax knowing it won’t get forgotten.
  2. Plan
    Periodically review everything in RTM, working out what needs to be done soon, what can wait, and so on.

The RTM web application interface doesn’t suit my approach to scheduling tasks. I need a different view for planning and triaging.

So I made one.

Remember The Milk scheduler

(more…)

Avoiding monitor contention in Java’s Double parseDouble

Saturday, November 30th, 2013

Overview

You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567” into a number representation.

I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.

In this post, I’ll explain why and what I did about it.

Background (skip this if you don’t care why I had this problem!)

I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).

For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:

<myElement
    myRawScore="1.2345678"
    myThisScore="2.46801357"
    myThatScore="1.35792468"
    ...

Thousands of XML files, each containing several thousand numbers in String form.

As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.

I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.

This took *ages*.

(more…)

Using UIMA-AS to run UIMA annotators in parallel

Friday, August 24th, 2012

Overview

UIMA stands for Unstructured Information Management Architecture. It’s an Apache technology that provides a framework and standard for building text analytics applications. I’ve mentioned it before.

In this post, I want to talk about an area of UIMA which isn’t covered well in the documentation.

I couldn’t find practical getting-started instructions for running UIMA-AS annotators in parallel. In this post I want to discuss why you might want to do it, and share some simple sample code to show how.

Background – the UIMA pipeline

UIMA provides a framework for managing a text analytics application. You break up the analytics functionality into discrete pieces called annotators. UIMA takes care of moving a text document through an analytics engine: a pipeline containing a series of annotators.

A document goes in one end of the pipeline, passes through a number of annotators, each of which adds some metadata to the document. What comes out the other side of the pipeline is an annotated copy of the document.

By default, you get UIMA to run these annotators one at a time – one after another.

Background – annotators in parallel

What if your annotators are quite slow – perhaps they take several seconds to run?

If there is no dependency between any or all of your annotators, then maybe running them one at a time isn’t the most efficient approach.

You can run all of them at the same time, in parallel. UIMA will merge the output from all of the annotators into a single annotated document.

My sample code

I’ve written two sample UIMA apps. Each demonstrates one of these approaches, to compare and contrast.

They are divided into three eclipse projects. You can import them into an eclipse IDE.

The UIMA eclipse plugins are very helpful if you want to make changes to the XML configuration files, but they’re not essential. If you want them, there are instructions on how to install them at uima.apache.org.

I’ve added comments to the sample code to explain how the apps work, but I’ll give an overview here.

(more…)