Overview
You can call Double.parseDouble in Java to convert String representations of numbers like “1.234567” into a number representation.
I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow.
In this post, I’ll explain why and what I did about it.
Background (skip this if you don’t care why I had this problem!)
I’ve mentioned UIMA before : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files).
For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:
<myElement
myRawScore="1.2345678"
myThisScore="2.46801357"
myThatScore="1.35792468"
...
Thousands of XML files, each containing several thousand numbers in String form.
As part of the deserializing the CAS files, the UIMA library (specifically org.apache.uima.cas.impl.CASImpl) was calling Double.parseDouble 500,000,000 times or more.
I’ve got 64 processor cores and lots of memory, so I kicked off 64 threads – each one processing an XML CAS file at a time.
This took *ages*.