Posts Tagged ‘java’

Avoiding monitor contention in Java’s Double parseDouble

Saturday, November 30th, 2013

Overview

You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567” into a number representation.

I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.

In this post, I’ll explain why and what I did about it.

Background (skip this if you don’t care why I had this problem!)

I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).

For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:

<myElement
    myRawScore="1.2345678"
    myThisScore="2.46801357"
    myThatScore="1.35792468"
    ...

Thousands of XML files, each containing several thousand numbers in String form.

As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.

I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.

This took *ages*.

(more…)

Using UIMA-AS to run UIMA annotators in parallel

Friday, August 24th, 2012

Overview

UIMA stands for Unstructured Information Management Architecture. It’s an Apache technology that provides a framework and standard for building text analytics applications. I’ve mentioned it before.

In this post, I want to talk about an area of UIMA which isn’t covered well in the documentation.

I couldn’t find practical getting-started instructions for running UIMA-AS annotators in parallel. In this post I want to discuss why you might want to do it, and share some simple sample code to show how.

Background – the UIMA pipeline

UIMA provides a framework for managing a text analytics application. You break up the analytics functionality into discrete pieces called annotators. UIMA takes care of moving a text document through an analytics engine: a pipeline containing a series of annotators.

A document goes in one end of the pipeline, passes through a number of annotators, each of which adds some metadata to the document. What comes out the other side of the pipeline is an annotated copy of the document.

By default, you get UIMA to run these annotators one at a time – one after another.

Background – annotators in parallel

What if your annotators are quite slow – perhaps they take several seconds to run?

If there is no dependency between any or all of your annotators, then maybe running them one at a time isn’t the most efficient approach.

You can run all of them at the same time, in parallel. UIMA will merge the output from all of the annotators into a single annotated document.

My sample code

I’ve written two sample UIMA apps. Each demonstrates one of these approaches, to compare and contrast.

They are divided into three eclipse projects. You can import them into an eclipse IDE.

The UIMA eclipse plugins are very helpful if you want to make changes to the XML configuration files, but they’re not essential. If you want them, there are instructions on how to install them at uima.apache.org.

I’ve added comments to the sample code to explain how the apps work, but I’ll give an overview here.

(more…)

Using JMX to monitor UIMA running in a servlet

Wednesday, August 1st, 2012

Overview

A quick howto for if you’re running UIMA in a servlet, and want to be able to monitor your AE performance using JMX

Background

I’ve mentioned JMX before. Basically, a Java app can expose information and methods through a standard interface. Tools like jconsole, which come with Java, can then be used to monitor and administer the Java app.

UIMA (Unstructured Information Management Architecture) is an Apache project, providing a standards-based way to perform analytics on unstructured text. It hosts a pipeline of annotators: individual components each performing a specific text analytics task. As a document moves down the pipeline UIMA runs each of the annotators on the document. Each annotator adds it’s own annotations for the things it looks for in the text.

UIMA and JMX

UIMA supports JMX. UIMA registers an MBean for each annotator, letting you see the performance info for each annotator. In a pipeline of several annotators, it lets you see (amongst other things) how much time your document is spending in each annotator.

jconsole

In a stand-alone UIMA application, you basically get this for free. Start the application with the standard Java -D property for enabling JMX:

-Dcom.sun.management.jmxremote

It is ready to let jconsole connect to it.

(more…)

Generating a list of REST APIs in JAX-RS

Saturday, January 14th, 2012

Overview

Using Java Reflection to generate a list of REST endpoints defined in JAX-RS code

Background – JAX-RS

I’ve been working on a project that uses JAX-RS – the Java API for RESTful web services. If you don’t know JAX-RS, you write web services in Java using annotations to specify what REST endpoint a Java method implements.

For example, you can use @Path annotations on a class to define the root URI for methods in the class, and then use annotations like @GET, @Produces(MediaType.APPLICATION_JSON) and @Path on the individual class methods to define the endpoints that they implement.

The problem?

Reading from code to the web service is straightforward enough. By which I mean, if I’m looking at a Java method, it’s easy enough to look at it and know what endpoint it is implementing.

Going the other way can be a little trickier.

Once a project gets bigger, you can have REST endpoints spread around a large number of classes. And methods can inherit attributes from other classes than the one they’re in, through annotations like @Parent.

What if I’m using one of the project’s REST APIs, and want to look at the source for the method that’s handling it, whether to extend it or fix a bug? How can I remember which method in which class is responsible for the REST endpoint I’m using?

Using Reflection

Documentation is one way. As I develop the code, maintain a list of the mapping of Java methods to web services endpoints. And keep that up-to-date as I make any changes to the code.

But that’s very manual, and doesn’t seem very smart.

This got me thinking yesterday evening. I’d not used Java Reflection before, but thought it must be possible to work it out from the Java annotations in the same way that my JAX-RS provider must.

So I spent a bit of time trying it out and thought it might be useful to share what I came up with. It’s not terribly elegant or efficient. It’s the result of a few hours tinkering. But it shows the basic idea, and that seems useful enough to warrant sharing.

(more…)

Using MQTT in Android mobile applications

Tuesday, February 1st, 2011

Overview

How to receive push notifications using MQTT in an Android mobile application

Background

I’ve written before about MQTT as a technology for doing push notifications to mobile. When I wrote that, I gave an example Android project. However, it was the first time I’d ever done Android development, and while it was an okay Java MQTT sample, it was a poor Android sample – I didn’t know anything about how Android works as a platform.

I’ve since written other Android MQTT apps, such as a hackday app for pushing updates from websites to your phone and learnt a lot about how to do it properly. Well… if not properly, at least a little better.

But Google is still directing people to my old, and probably unhelpful, sample. So it’s about time that I share something more useful.

I’ve put the full source for a sample implementation below. (Note that I’m using the Java J2EE client library from ibm.com). Hopefully the comments in it are clear enough, but here are a few of the key points.

(more…)

CurrentCost hacking – starting to identify appliance power usage

Tuesday, June 3rd, 2008

I needed a break from work tonight, so went back to playing with the CurrentCost meter – a chance to try a few new things.

The objective
I want to make a start on identifying how much electricity different things in my house use. To begin, I’m going to start with a very manual user-driven approach:

Subscribe to updates from the CurrentCost meter, and when a significant change in usage occurs, ask me what I’ve just switched on or off, and collect that information to build up a record of how much electricity different devices use.

How?
It’s already quite late, so I just wanted to hack a quick first version together. I decided to write it as a small Java app.

As I’ve mentioned before, I’m publishing the CurrentCost readings to a small broker running on my home server. The plan was to write a Java application that uses MQTT to subscribe to updates from the broker.

Why? Because I’ve not used Java on the Slug before, or with MQTT. (Is that not a good enough reason? 🙂 )

I’ve written it as a command-line app, because it’s a quick way to run it from different devices around the house. (That is, by cheating 🙂 I’m actually running the app on the home server, using PuTTY / PocketPuTTY / SSH etc. to run it from my ThinkPad, PDAs, mobile, EEE PC, etc.).

(more…)