Avoiding monitor contention in Java’s Double parseDouble

Overview

You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567” into a number representation.

I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.

In this post, I’ll explain why and what I did about it.

Background (skip this if you don’t care why I had this problem!)

I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).

For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:

<myElement myRawScore="1.2345678" myThisScore="2.46801357" myThatScore="1.35792468" ...

Thousands of XML files, each containing several thousand numbers in String form.

As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.

I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.

This took *ages*.

The problem

I used Health Center to have a look at what was going on. (Disclaimer: It’s an IBM tool, but it’s free and is awesome, so I don’t feel bad promoting it!)

It showed that although I had 64 worker threads, only one of them were running at any given moment while the other 63 were blocked.

It showed that I was spending the majority of my time inside Double.parseDouble, because one of the methods it calls (FloatingDecimal.big5pow) is synchronized.

The implementation of Double.parseDouble is fast when used once, but it’s not well suited to being called hundreds of millions of times from multiple threads concurrently – the locking makes this very, very slow.

The fix

It turns out that this is a known issue. There is a closed bug filed against Java (JI-9004591 – Monitor contention when calling Double.parseDouble from multiple threads), but it’s been closed as fixed in Java 8.

That would be helpful if Java 8 was actually released yet. I couldn’t wait for this.

The workaround

I’ve written my own Double parser.

It’s a kludgy workaround to be honest, based on the fact that I noticed that Integer.parseInt and Long.parseLong aren’t synchronized. I decided to try and use them instead.

Consider the following decimal number:

1234.5678

Another way of expressing this is

12345678 x 10^-4

I can parse this using Integer.parseInt.

Integer.parseInt("12345678") * Math.pow(10.0, -4))

So to parse any decimal number, I need to:

Do a bit of String munging
Look for where the decimal point is, move it to the end of the String and count how many places I’ve moved it by.
Parse the munged String as an int (or a long if it’s too big to fit in an int)
Compensate by multiplying by an exponent that reverses the change

I did a little more. For example, to handle numbers that already have an exponent, I add/subtract the number of places I moved the decimal point to the existing exponent. But this is the basic idea.

The source code for my workaround is here.

Performance of the workaround

This is massively faster. I wrote a small test app which kicks off a bunch of worker threads and parses millions of Strings into doubles. It measures the time it takes to complete the work.


	my workaround	Java’s `Double parseDouble`
run 1	211	32378
run 2	219	34861
run 3	229	28398
run 4	221	26312
run 5	216	26979
run 6	216	33115
run 7	214	34543
run 8	212	31081
run 8	215	37758
run 9	215	26236
run 10	217	26901
run 11	227	29884
run 12	217	34944
run 13	216	32981
average	218	31,169	seconds
	3.63	519.49	minutes
	0.06	8.66	hours

Eight hours of work can be done in a few minutes. This is the sort of difference it makes, and the reason why I can’t wait for Java 8 to fix this properly.

The source code for my timing test app is here.

It’s not a fix

Finally, I should highlight that this is absolutely a workaround. My approach involves multiplying doubles. Any time you do arithmetic with doubles, you lose precision.

For example, if I do:
1.2345 * 10
I get:
12.344999999999999

There will be times when my double parser returns a number that isn’t exactly what Java’s Double.parseDouble would’ve returned. But the differences are going to be very small – they’ll round to the same value. I have more arithmetic to do with the value anyway, and for the range of numbers I know I’ll be getting this will not result in a significant error.

Speed is more important for my particular use case, and the loss of precision from my parse implementation is acceptable for my purposes.

This won’t be true in all situations, so if you need precision without a synchonized method then you’re looking at having to implement your own double parser from first principles. Algorithms for doing this are well documented, so I wouldn’t be too put off doing that if that’s what you need.

Tags: java, uima

This entry was posted on Saturday, November 30th, 2013 at 3:15 pm and is filed under code. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.

dale lane

Avoiding monitor contention in Java’s Double parseDouble

Pages

Archives

Disclaimer