Avoiding monitor contention in Java’s Double parseDouble

Overview

You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567” into a number representation.

I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.

In this post, I’ll explain why and what I did about it.

Background (skip this if you don’t care why I had this problem!)

I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).

For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:

<myElement
    myRawScore="1.2345678"
    myThisScore="2.46801357"
    myThatScore="1.35792468"
    ...

Thousands of XML files, each containing several thousand numbers in String form.

As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.

I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.

This took *ages*.

The problem

I used Health Center to have a look at what was going on. (Disclaimer: It’s an IBM tool, but it’s free and is awesome, so I don’t feel bad promoting it!)

It showed that although I had 64 worker threads, only one of them were running at any given moment while the other 63 were blocked.

It showed that I was spending the majority of my time inside Double.parseDouble, because one of the methods it calls (FloatingDecimal.big5pow) is synchronized.

The implementation of Double.parseDouble is fast when used once, but it’s not well suited to being called hundreds of millions of times from multiple threads concurrently – the locking makes this very, very slow.

The fix

It turns out that this is a known issue. There is a closed bug filed against Java (JI-9004591 – Monitor contention when calling Double.parseDouble from multiple threads), but it’s been closed as fixed in Java 8.

That would be helpful if Java 8 was actually released yet. I couldn’t wait for this.

The workaround

I’ve written my own Double parser.

It’s a kludgy workaround to be honest, based on the fact that I noticed that Integer.parseInt and Long.parseLong aren’t synchronized. I decided to try and use them instead.

Consider the following decimal number:

1234.5678

Another way of expressing this is

12345678 x 10-4

I can parse this using Integer.parseInt.

Integer.parseInt("12345678") * Math.pow(10.0, -4))

So to parse any decimal number, I need to:

  1. Do a bit of String munging
    Look for where the decimal point is, move it to the end of the String and count how many places I’ve moved it by.
  2. Parse the munged String as an int (or a long if it’s too big to fit in an int)
  3. Compensate by multiplying by an exponent that reverses the change

I did a little more. For example, to handle numbers that already have an exponent, I add/subtract the number of places I moved the decimal point to the existing exponent. But this is the basic idea.

The source code for my workaround is here.

Performance of the workaround

This is massively faster. I wrote a small test app which kicks off a bunch of worker threads and parses millions of Strings into doubles. It measures the time it takes to complete the work.

my workaround Java’s Double parseDouble
run 1 211 32378
run 2 219 34861
run 3 229 28398
run 4 221 26312
run 5 216 26979
run 6 216 33115
run 7 214 34543
run 8 212 31081
run 8 215 37758
run 9 215 26236
run 10 217 26901
run 11 227 29884
run 12 217 34944
run 13 216 32981
average 218 31,169 seconds
3.63 519.49 minutes
0.06 8.66 hours

Eight hours of work can be done in a few minutes. This is the sort of difference it makes, and the reason why I can’t wait for Java 8 to fix this properly.

The source code for my timing test app is here.

It’s not a fix

Finally, I should highlight that this is absolutely a workaround. My approach involves multiplying doubles. Any time you do arithmetic with doubles, you lose precision.

For example, if I do:
    1.2345 * 10
I get:
    12.344999999999999

There will be times when my double parser returns a number that isn’t exactly what Java’s Double.parseDouble would’ve returned. But the differences are going to be very small – they’ll round to the same value. I have more arithmetic to do with the value anyway, and for the range of numbers I know I’ll be getting this will not result in a significant error.

Speed is more important for my particular use case, and the loss of precision from my parse implementation is acceptable for my purposes.

This won’t be true in all situations, so if you need precision without a synchonized method then you’re looking at having to implement your own double parser from first principles. Algorithms for doing this are well documented, so I wouldn’t be too put off doing that if that’s what you need.

Tags: ,

Comments are closed.