Posts Tagged ‘eightbar’

Text analytics in BlueMix using UIMA

Sunday, April 13th, 2014

In this post, I want to explain how to create a text analytics application in BlueMix using UIMA, and share sample code to show how to get started.

First, some background if you’re unfamiliar with the jargon.

What is UIMA?

UIMA (Unstructured Information Management Architecture) is an Apache framework for building analytics applications for unstructured information and the OASIS standard for content analytics.

I’ve written about it before, having used it on a few projects when I was in ETS, and on other side projects since such as building a conversational interface to web pages.

It’s perhaps better known for providing the architecture for the question answering system IBM Watson.

What is BlueMix?

BlueMix is IBM’s new Platform-as-a-Service (PaaS) offering, built on top of Cloud Foundry to provide a cloud development platform.

It’s in open beta at the moment, so you can sign up and have a play.

I’ve never used BlueMix before, or Cloud Foundry for that matter, so this was a chance for me to write my first app for it.

A UIMA “Hello World” for BlueMix

I’ve written a small sample to show how UIMA and BlueMix can work together. It provides a REST API that you can submit text to, and get back a JSON response with some attributes found in the text (long words, capitalised words, and strings that look like email addresses).

The “analytics” that the app is doing is trivial at best, but this is just a Hello World. For now my aim isn’t to produce a useful analytics solution, but to walk through the configuration needed to define a UIMA analytics pipeline, wrap it in a REST API using Wink, and deploy it as a BlueMix application.

When I get a chance, I’ll write a follow-up post on making something more useful.

You can try out the sample on BlueMix as it’s deployed to bluemix.net

The source is on GitHub at github.com/dalelane/bluemixuima.

In the rest of this post, I’ll walk through some of the implementation details.

(more…)

Why am I still at IBM?

Tuesday, August 6th, 2013

Ten years ago.

6 August 2003.

I was a recent University graduate, arriving at IBM’s R&D site in Hursley for the first time. I remember arriving in Reception.


Reception – the view that greeted me when I arrived

Ten years.

It was a Wednesday.

I’m still at the same company. I’m still at the same site. I still do the same drive to work, more or less.

For a *decade*.

How did that happen?

It was never The Plan. The Plan (as cynical as it sounds in hindsight) was that I’d stay for two or three years. I figured that would be long enough to get experience, and then I’d leave to work at a small nimble start-up which was where all the “cool” work was.

The Plan never happened. A few years passed, and then another few… I kept saying that I’d leave “later” and before I knew it a ten year milestone has kind of snuck up on me.

I think I’m more surprised than anyone. I’ve never been at any place this long. I was at Uni for five years. The longest I was at any school was four years.

It’s a serious commitment, and one I never realised that I had made. I’ve not even been married for as long as I’ve been with IBM.

So why? Why am I still here?

I live here.

(more…)

W4A : Accessibility of the web

Thursday, June 6th, 2013

This is the last of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several presentations looked at how accessible the web is.

Web Accessibility Snapshot

In 2006, an audit was performed by Nomensa for the United Nations. They reviewed 100 popular websites for conformance to accessibility guidelines.

The results weren’t positive: 97% of sites didn’t meet WCAG level 1.

Obviously, conformance to guidelines doesn’t mean a site is accessible, but it’s an important factor. It’s not sufficient, but it is required. Conformance to guidelines can’t prove that a website is accessible, however there are some guidelines that we can be certain would break accessibility if not followed. So they are at least a useful starting point.

However, 2006 is a long time ago now, and the Internet has changed a lot since. One project, from colleagues of mine at IBM, is creating a more up to date picture of the state of the web. They analysed a thousand of the most popular websites (according to Alexa) as well as a random sampling of a thousand other sites.

(Interestingly, they found no statistically significant difference between conformance in the most popular websites and the randomly selected ones).

Their intention is to perform this regularly, creating a Web Accessibility Snapshot, with regular updates on the status of accessibility of the web. It looks like it could become a valuable source of information.

W4A2013 – Web Accessibility Snapshot: An Effort to Reveal Coding Guidelines Conformance from Vagner Santana

(more…)

Dyslexia at W4A

Wednesday, June 5th, 2013

This is the third of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

There were a few sessions presenting work done to improve understanding of how to better support people with dyslexia.

One interesting study investigated the effect of font size and line spacing on the readibility of wikipedia articles.

This was assessed in a variety of ways, some of which were based on the reader’s opinions, while others were based on measurements made of the reader during reading and of their understanding of the content after. The underlying question (can we make Wikipedia easier to read for dyslexics?) was compelling. It was also interesting to see this performed not on abstract passages of text, but in the context of using an actual website.

(more…)

W4A : Future of screen readers

Tuesday, June 4th, 2013

This is the second of four posts sharing some of the things I saw while at the International World Wide Web Conference for w4a.

Several of the projects that I saw showed glimpses of a possible future for screen readers.

I’ve written about screen readers before, and some of the challenges with using them.

Interactive SIGHT

One project interpreted pictures of charts or graphs and created a textual summary of the information shown in them.

I’m still amazed at this. It takes a picture of a graph, not the original raw data, and generates sensible summaries of what it shows.

For example, given this image:

It can generate:

This graphic is about United States. The graphic shows that United States at 35 thousand dollars is the third highest with respect to the dollar value of gross domestic product per capita 2001 among the countries listed. Luxembourg at 44.2 thousand dollars is the highest

(more…)

Web technologies I saw at W4A

Monday, June 3rd, 2013

WWW2013

Last month I went to the International World Wide Web Conference for w4a. I saw a lot of cool web technologies and accessibility projects while I was there, so thought I would share links to some of the more interesting bits.

There are too many to put in a single post, so I’ll write a few posts to cover them all.

Subtitles

Subtitles and transcripts came up a few times. One study presented looked at online video, comparing single-line subtitle captions overlaid on the video with multi-line off-screen transcripts adjacent to it.

It examined which is more effective from a variety of perspectives, including readability, reader enjoyment, the effect on understanding and so on. In summary, it found that overlaid captions are generally better, although transcripts are better for content which is more technical.

Real-time transcription from a stenographer at W4A

We had subtitles for all the talks and presentations. Impressively, a separate screen projected a live transcription of the speaker. For deaf attendees, it allowed them to follow what the speaker was saying. For talks given in Portuguese, the English subtitles allowed non-Portuguese speakers like me to understand.

They did this by having live stenographers listening to an audio feed from the talks. This is apparently expensive as stenography is a skilled expertise, and it needs to be scheduled in advance. It’s perhaps only practical for larger conferences.

(more…)

Everybody Technology

Friday, November 30th, 2012

This afternoon I went to Everybody Technology, an event to discuss the need for technology to be inclusive and made in a way that is “so smart, so simple and so powerful it works for everybody”.

A highlight of the afternoon was Stephen Hawking – perhaps one of the best examples of the power of technology to enable someone to reach their potential. He also supported the event by lending his voice to a promotional video which explains the idea better than I can.


“Who is Technology Made For?” (YouTube)

There were several speakers. I won’t do them justice, but I did jot a few notes…

(more…)

Conversational Internet : A prototype

Wednesday, September 12th, 2012

tl;dr

We’ve built a prototype to show how we could interact with the Internet using a command-driven approach.

  • A screen reader, but one that uses machine learning and natural language processing, in order to better understand both what the user wants to do, and what the web page says.
  • One that can offer a conversational interface instead of just reading out everything on the page.

It’s a proof-of-concept, but it’s an exciting idea with a lot of potential and we’ve got a demo that shows it in action.

I wrote yesterday about what it was like going to the BBC to talk about a project I’ve been working on this summer. I didn’t talk about the project itself. Here’s an overview.

(more…)