Overview
You can call Double.parseDouble
in Java to convert String representations of numbers like “1.234567” into a number representation.
I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.
In this post, I’ll explain why and what I did about it.
Background (skip this if you don’t care why I had this problem!)
I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).
For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:
<myElement
myRawScore="1.2345678"
myThisScore="2.46801357"
myThatScore="1.35792468"
...
Thousands of XML files, each containing several thousand numbers in String form.
As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl
) was calling Double.parseDouble
500,000,000 times or more.
I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.
This took *ages*.