Overview
You can call Double.parseDouble
in Java to convert String representations of numbers like “1.234567” into a number representation.
I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.
In this post, I’ll explain why and what I did about it.
Background (skip this if you don’t care why I had this problem!)
I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).
For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:
<myElement
myRawScore="1.2345678"
myThisScore="2.46801357"
myThatScore="1.35792468"
...
Thousands of XML files, each containing several thousand numbers in String form.
As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl
) was calling Double.parseDouble
500,000,000 times or more.
I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.
This took *ages*.
The problem
I used Health Center to have a look at what was going on. (Disclaimer: It’s an IBM tool, but it’s free and is awesome, so I don’t feel bad promoting it!)
It showed that although I had 64 worker threads, only one of them were running at any given moment while the other 63 were blocked.
It showed that I was spending the majority of my time inside Double.parseDouble
, because one of the methods it calls (FloatingDecimal.big5pow
) is synchronized.
The implementation of Double.parseDouble
is fast when used once, but it’s not well suited to being called hundreds of millions of times from multiple threads concurrently – the locking makes this very, very slow.
The fix
It turns out that this is a known issue. There is a closed bug filed against Java (JI-9004591 – Monitor contention when calling Double.parseDouble from multiple threads), but it’s been closed as fixed in Java 8.
That would be helpful if Java 8 was actually released yet. I couldn’t wait for this.
The workaround
I’ve written my own Double parser.
It’s a kludgy workaround to be honest, based on the fact that I noticed that Integer.parseInt
and Long.parseLong
aren’t synchronized. I decided to try and use them instead.
Consider the following decimal number:
1234.5678
Another way of expressing this is
12345678 x 10-4
I can parse this using Integer.parseInt
.
Integer.parseInt("12345678") * Math.pow(10.0, -4))
So to parse any decimal number, I need to:
- Do a bit of String munging
Look for where the decimal point is, move it to the end of the String and count how many places I’ve moved it by. - Parse the munged String as an int (or a long if it’s too big to fit in an int)
- Compensate by multiplying by an exponent that reverses the change
I did a little more. For example, to handle numbers that already have an exponent, I add/subtract the number of places I moved the decimal point to the existing exponent. But this is the basic idea.
The source code for my workaround is here.
Performance of the workaround
This is massively faster. I wrote a small test app which kicks off a bunch of worker threads and parses millions of Strings into doubles. It measures the time it takes to complete the work.
my workaround | Java’s Double parseDouble |
||
---|---|---|---|
run 1 | 211 | 32378 | |
run 2 | 219 | 34861 | |
run 3 | 229 | 28398 | |
run 4 | 221 | 26312 | |
run 5 | 216 | 26979 | |
run 6 | 216 | 33115 | |
run 7 | 214 | 34543 | |
run 8 | 212 | 31081 | |
run 8 | 215 | 37758 | |
run 9 | 215 | 26236 | |
run 10 | 217 | 26901 | |
run 11 | 227 | 29884 | |
run 12 | 217 | 34944 | |
run 13 | 216 | 32981 | |
average | 218 | 31,169 | seconds |
3.63 | 519.49 | minutes | |
0.06 | 8.66 | hours |
Eight hours of work can be done in a few minutes. This is the sort of difference it makes, and the reason why I can’t wait for Java 8 to fix this properly.
The source code for my timing test app is here.
It’s not a fix
Finally, I should highlight that this is absolutely a workaround. My approach involves multiplying doubles. Any time you do arithmetic with doubles, you lose precision.
For example, if I do:
1.2345 * 10
I get:
12.344999999999999
There will be times when my double parser returns a number that isn’t exactly what Java’s Double.parseDouble
would’ve returned. But the differences are going to be very small – they’ll round to the same value. I have more arithmetic to do with the value anyway, and for the range of numbers I know I’ll be getting this will not result in a significant error.
Speed is more important for my particular use case, and the loss of precision from my parse implementation is acceptable for my purposes.
This won’t be true in all situations, so if you need precision without a synchonized method then you’re looking at having to implement your own double parser from first principles. Algorithms for doing this are well documented, so I wouldn’t be too put off doing that if that’s what you need.