{"id":2936,"date":"2013-11-30T15:15:03","date_gmt":"2013-11-30T15:15:03","guid":{"rendered":"http:\/\/dalelane.co.uk\/blog\/?p=2936"},"modified":"2013-12-01T11:05:40","modified_gmt":"2013-12-01T11:05:40","slug":"avoiding-monitor-contention-in-javas-double-parsedouble","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=2936","title":{"rendered":"Avoiding monitor contention in Java&#8217;s Double parseDouble"},"content":{"rendered":"<p><strong>Overview<\/strong><\/p>\n<p>You can call <code>Double.parseDouble<\/code> in Java to convert String representations of numbers like &#8220;1.234567&#8221; into a number representation. <\/p>\n<p>I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow. <\/p>\n<p>In this post, I&#8217;ll explain why and what I did about it. <\/p>\n<p><strong>Background <em>(skip this if you don&#8217;t care why I had this problem!)<\/em><\/strong><\/p>\n<p>I&#8217;ve <a href=\"http:\/\/dalelane.co.uk\/blog\/?tag=uima\">mentioned UIMA before<\/a> : an Apache framework for doing text analytics, that I use at work. One of the ways that it stores and moves around units of work is in XML files (called CAS files). <\/p>\n<p>For a particular task at work, I will have a lot of these. Thousands of them. I need to deserialize these, and parse and process the contents. The contents includes scores from the various analytics operations that are done on the contents of the CAS:<\/p>\n<blockquote style=\"border: thin black solid; padding: 1em; background-color: #efefef; color: black;\"><p><code>&lt;myElement<br \/>\n &nbsp; &nbsp; myRawScore=\"1.2345678\"<br \/>\n &nbsp; &nbsp; myThisScore=\"2.46801357\"<br \/>\n &nbsp; &nbsp; myThatScore=\"1.35792468\"<br \/>\n &nbsp; &nbsp; ... <\/code><\/p><\/blockquote>\n<p>Thousands of XML files, each containing several thousand numbers in String form. <\/p>\n<p>As part of the deserializing the CAS files, the UIMA library (specifically <code>org.apache.uima.cas.impl.CASImpl<\/code>) was calling <code>Double.parseDouble<\/code> 500,000,000 times or more. <\/p>\n<p>I&#8217;ve got 64 processor cores and lots of memory, so I kicked off 64 threads &#8211; each one processing an XML CAS file at a time. <\/p>\n<p>This took *ages*.<\/p>\n<p><!--more--><\/p>\n<p><strong>The problem<\/strong><\/p>\n<p>I used <a href=\"http:\/\/www.ibm.com\/developerworks\/java\/jdk\/tools\/healthcenter\/\">Health Center<\/a> to have a look at what was going on. <em>(Disclaimer: It&#8217;s an IBM tool, but it&#8217;s free and is awesome, so I don&#8217;t feel bad promoting it!)<\/em> <\/p>\n<p>It showed that although I had 64 worker threads, only one of them were running at any given moment while the other 63 were blocked. <\/p>\n<p><a href=\"http:\/\/dalelane.co.uk\/blog\/post-images\/131130-healthcenter-threads.jpg\"><img decoding=\"async\" src=\"http:\/\/dalelane.co.uk\/blog\/post-images\/131130-healthcenter-threads.jpg\" width=\"440\"\/><\/a><\/p>\n<p>It showed that I was spending the majority of my time inside <code>Double.parseDouble<\/code>, because one of the methods it calls (<code>FloatingDecimal.big5pow<\/code>) is synchronized. <\/p>\n<p><a href=\"http:\/\/dalelane.co.uk\/blog\/post-images\/131130-healthcenter-profiler.jpg\"><img decoding=\"async\" src=\"http:\/\/dalelane.co.uk\/blog\/post-images\/131130-healthcenter-profiler.jpg\" width=\"440\"\/><\/a><\/p>\n<p>The implementation of <code>Double.parseDouble<\/code> is fast when used once, but it&#8217;s not well suited to being called hundreds of millions of times from multiple threads concurrently &#8211; the locking makes this very, very slow.  <\/p>\n<p><strong>The fix<\/strong><\/p>\n<p>It turns out that this is a known issue. There is a closed bug filed against Java (<a href=\"http:\/\/bugs.sun.com\/view_bug.do?bug_id=7032154\">JI-9004591 &#8211; Monitor contention when calling Double.parseDouble from multiple threads<\/a>), but it&#8217;s been closed as fixed in Java 8. <\/p>\n<p>That would be helpful if Java 8 was actually released yet. I couldn&#8217;t wait for this. <\/p>\n<p><strong>The workaround<\/strong><\/p>\n<p>I&#8217;ve written my own Double parser. <\/p>\n<p>It&#8217;s a kludgy workaround to be honest, based on the fact that I noticed that <code>Integer.parseInt<\/code> and <code>Long.parseLong<\/code> aren&#8217;t synchronized. I decided to try and use them instead. <\/p>\n<p>Consider the following decimal number:<\/p>\n<p><font size=\"5em\">1234.5678<\/font><\/p>\n<p>Another way of expressing this is<\/p>\n<p><font size=\"5em\">12345678 x 10<sup>-4<\/sup><\/font><\/p>\n<p>\nI can parse this using <code>Integer.parseInt<\/code>.<\/p>\n<blockquote style=\"border: thin black solid; padding: 0.5em; background-color: #efefef; color: black;\"><p><code>Integer.parseInt(\"12345678\") * Math.pow(10.0, -4))<\/code><\/p><\/blockquote>\n<p>So to parse any decimal number, I need to:<\/p>\n<ol>\n<li>Do a bit of String munging <br \/>Look for where the decimal point is, move it to the end of the String and count how many places I&#8217;ve moved it by.\n<\/li>\n<li>Parse the munged String as an int (or a long if it&#8217;s too big to fit in an int)\n<\/li>\n<li>Compensate by multiplying by an exponent that reverses the change<\/li>\n<\/ol>\n<p>I did a little more. For example, to handle numbers that already have an exponent, I add\/subtract the number of places I moved the decimal point to the existing exponent. But this is the basic idea. <\/p>\n<p>The <a href=\"http:\/\/dalelane.co.uk\/files\/DoubleParser.java\">source code for my workaround is here<\/a>.<\/p>\n<p><script src=\"https:\/\/gist.github.com\/dalelane\/7720269.js\"><\/script><\/p>\n<p><strong>Performance of the workaround<\/strong><\/p>\n<p>This is massively faster. I wrote a small test app which kicks off a bunch of worker threads and parses millions of Strings into doubles. It measures the time it takes to complete the work. <\/p>\n<table border=1 cellpadding=3>\n<tr>\n<td><\/td>\n<th>my workaround<\/th>\n<th>Java&#8217;s <code>Double parseDouble<\/code><\/th>\n<\/tr>\n<tr>\n<td>run 1 <\/td>\n<td>211<\/td>\n<td>32378<\/td>\n<\/tr>\n<tr>\n<td>run 2 <\/td>\n<td>219<\/td>\n<td>34861<\/td>\n<\/tr>\n<tr>\n<td>run 3 <\/td>\n<td>229<\/td>\n<td>28398<\/td>\n<\/tr>\n<tr>\n<td>run 4 <\/td>\n<td>221<\/td>\n<td>26312<\/td>\n<\/tr>\n<tr>\n<td>run 5 <\/td>\n<td>216<\/td>\n<td>26979<\/td>\n<\/tr>\n<tr>\n<td>run 6 <\/td>\n<td>216<\/td>\n<td>33115<\/td>\n<\/tr>\n<tr>\n<td>run 7 <\/td>\n<td>214<\/td>\n<td>34543<\/td>\n<\/tr>\n<tr>\n<td>run 8 <\/td>\n<td>212<\/td>\n<td>31081<\/td>\n<\/tr>\n<tr>\n<td>run 8 <\/td>\n<td>215<\/td>\n<td>37758<\/td>\n<\/tr>\n<tr>\n<td>run 9 <\/td>\n<td>215<\/td>\n<td>26236<\/td>\n<\/tr>\n<tr>\n<td>run 10 <\/td>\n<td>217<\/td>\n<td>26901<\/td>\n<\/tr>\n<tr>\n<td>run 11 <\/td>\n<td>227<\/td>\n<td>29884<\/td>\n<\/tr>\n<tr>\n<td>run 12 <\/td>\n<td>217<\/td>\n<td>34944<\/td>\n<\/tr>\n<tr>\n<td>run 13 <\/td>\n<td>216<\/td>\n<td>32981<\/td>\n<\/tr>\n<tr><\/tr>\n<tr>\n<td rowspan=4><em>average<\/em><\/td>\n<td><strong>218<\/strong><\/td>\n<td><strong>31,169<\/strong><\/td>\n<td>seconds<\/td>\n<\/tr>\n<tr><\/tr>\n<tr>\n<td>3.63<\/td>\n<td>519.49<\/td>\n<td>minutes<\/td>\n<\/tr>\n<tr>\n<td>0.06<\/td>\n<td>8.66<\/td>\n<td>hours<\/td>\n<\/tr>\n<\/table>\n<p>Eight hours of work can be done in a few minutes. This is the sort of difference it makes, and the reason why I can&#8217;t wait for Java 8 to fix this properly.<\/p>\n<p>The source code for <a href=\"http:\/\/dalelane.co.uk\/files\/DoubleParserTest.java\">my timing test app is here<\/a>.<\/p>\n<p><strong>It&#8217;s not a fix<\/strong><\/p>\n<p>Finally, I should highlight that this is absolutely a workaround. My approach involves multiplying doubles. Any time you do arithmetic with doubles, you lose precision. <\/p>\n<p>For example, if I do:<br \/>\n<code>&nbsp; &nbsp; 1.2345 * 10<\/code><br \/>\nI get:<br \/>\n<code>&nbsp; &nbsp; 12.344999999999999<\/code><\/p>\n<p>There will be times when my double parser returns a number that isn&#8217;t exactly what Java&#8217;s <code>Double.parseDouble<\/code> would&#8217;ve returned. But the differences are going to be very small &#8211; they&#8217;ll round to the same value. I have more arithmetic to do with the value anyway, and for the range of numbers I know I&#8217;ll be getting this will not result in a significant error.  <\/p>\n<p>Speed is more important for my particular use case, and the loss of precision from my parse implementation is acceptable for my purposes. <\/p>\n<p>This won&#8217;t be true in all situations, so if you need precision without a synchonized method then you&#8217;re looking at having to implement your own double parser from first principles. Algorithms for doing this are <a href=\"http:\/\/www.exploringbinary.com\/how-strtod-works-and-sometimes-doesnt\/\">well documented<\/a>, so I wouldn&#8217;t be too put off doing that if that&#8217;s what you need.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview You can call Double.parseDouble in Java to convert String representations of numbers like &#8220;1.234567&#8221; into a number representation. I needed to do this, a lot of times, from a lot of threads. And it was horrendously slow. In this post, I&#8217;ll explain why and what I did about it. Background (skip this if you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[219,533],"class_list":["post-2936","post","type-post","status-publish","format-standard","hentry","category-code","tag-java","tag-uima"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2936","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2936"}],"version-history":[{"count":0,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2936\/revisions"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2936"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2936"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2936"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}