{"id":3808,"date":"2019-07-28T12:07:32","date_gmt":"2019-07-28T12:07:32","guid":{"rendered":"https:\/\/dalelane.co.uk\/blog\/?p=3808"},"modified":"2019-07-28T12:44:58","modified_gmt":"2019-07-28T12:44:58","slug":"using-openwhisk-in-machine-learning-for-kids","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=3808","title":{"rendered":"Using OpenWhisk in Machine Learning for Kids"},"content":{"rendered":"<p><strong>I&#8217;ve moved a couple of bits of <a href=\"https:\/\/machinelearningforkids.co.uk\">Machine Learning for Kids<\/a> into <a href=\"https:\/\/openwhisk.apache.org\/\">OpenWhisk<\/a> functions. In this post, I&#8217;ll describe what I&#8217;m trying to solve by doing this, and what I&#8217;ve done.<\/strong><\/p>\n<h3>Background<\/h3>\n<p>I&#8217;ve talked before <a href=\"https:\/\/dalelane.co.uk\/blog\/?tag=mlforkids-tech\">how I implemented Machine Learning for Kids<\/a>, but the short version is that most of it is a Node.js app, hosted in Cloud Foundry so I can easily run multiple instances of it.<\/p>\n<p>The most computationally expensive thing the site has to do is for projects that train a machine learning model to recognize images.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/1444788\/62005589-6f45c980-b12d-11e9-860e-58aeb11242e8.png\" style=\"border: thin black solid\"\/><\/p>\n<p>In particular, the expensive bit is when a student clicks on the <strong>Train new machine learning model<\/strong> button for a project to train the computer to recognize images.<\/p>\n<p><!--more-->When they do that, at a high level, this is what has to happen:<\/p>\n<ol>\n<li><strong>All of their images will be downloaded<\/strong>\n<ol>\n<li>Images they created themselves (e.g. images they\u2019ve drawn, photos they\u2019ve taken with the webcam, etc.) will be retrieved from Cloud Object Storage<\/li>\n<li>Images they found on the web (which I only store as URLs) are downloaded from their original websites<\/li>\n<\/ol>\n<\/li>\n<li>Downloaded <strong>images will be resized<\/strong> to a size suitable for training (<em>images created and stored in Cloud Object Storage will already be the right size<\/em>)<\/li>\n<li>A new <strong>zip file is created<\/strong> with all of the images, ready for uploading to the Watson Visual Recognition service (<em>sometimes more than once using different Watson credentials<\/em>)<\/li>\n<\/ol>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/1444788\/62006309-ff3c4100-b136-11e9-81de-bd32a1d5c775.png\" style=\"border: thin black solid\"\/><\/p>\n<h3>The problems<\/h3>\n<p>Compared with everything else that the site has to do, resizing large images and creating large zip files are both fairly expensive in terms of memory and CPU.<\/p>\n<p>This is compounded by the fact that it tends to happen in bursts.<\/p>\n<p>If I&#8217;m unlucky, a class of thirty students all click their <strong>Train ML<\/strong> button at roughly the same time.<\/p>\n<p>When I\u2019m really unlucky, a STEM event of a hundred students all click <strong>Train ML<\/strong> button at the same time!<\/p>\n<p>(Teachers often get their classes to go through projects together, so they all move on to the next step at the same time. It is more common than you&#8217;d expect for dozens of training requests to hit my site at once).<\/p>\n<p>This means I had two problems:<\/p>\n<h4>Problem 1 : The site was horribly over-provisioned<\/h4>\n<p>The rest of the site server is efficient and light-weight, using hardly any CPU, memory or local disk.<\/p>\n<p>But I had to provision it to cope with the worst possible spikes without crashing with out of memory errors. So that means I scaled up the memory footprint of the site instances to a level that has been far more than the site has needed most of the time.<\/p>\n<p>I&#8217;m running the site on a shoestring budget, so I can&#8217;t afford to run it with far more resource than it needs!<\/p>\n<h4>Problem 2 : I had to throttle it to be slower than it could be<\/h4>\n<p>To cope with the worst spikes without blowing up with out-of-memory errors, I was fairly aggressive about avoiding parallelising the task described above.<\/p>\n<p>For example, instead of processing all of a student&#8217;s images at once in parallel, I&#8217;d <a href=\"https:\/\/github.com\/IBM\/taxinomitis\/blob\/85b3d1c59a8580b54875d62cc7d5c844dd0bd27b\/src\/lib\/utils\/downloadAndZip.ts#L196\">limit it to do a couple at a time<\/a>. That wasn&#8217;t necessary if only one or two students had clicked the <strong>Train<\/strong> button &#8211; it would&#8217;ve worked okay to just do lots, if not all, of their images in parallel. But to protect against a possible spike of loads of students, I was doing it slower for everyone.<\/p>\n<p>Training image models is already annoyingly slow. Making it even slower isn&#8217;t great for a tool used by classes of impatient children!<\/p>\n<h4>Both of these problems were only getting worse.<\/h4>\n<p>The site continues to get busier, and I&#8217;m getting more and more schools and code clubs using the site. I was rapidly approaching a point where I&#8217;d have to scale up the site app instances even more to avoid the risk of out-of-memory errors.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/1444788\/62005626-eaa77b00-b12d-11e9-8463-69db7ef4d8a4.png\" style=\"border: thin black solid\"\/><\/p>\n<h3>The solution<\/h3>\n<p>You can probably guess from the fact that I&#8217;ve been <a href=\"https:\/\/dalelane.co.uk\/blog\/?tag=openwhisk\">talking about OpenWhisk recently<\/a> (or just the title and intro to this post!) that I think these problems are well suited to serverless.<\/p>\n<ul>\n<li>It&#8217;s a very spiky workload.<\/li>\n<li>It&#8217;s infrequent.<\/li>\n<li>When it happens, it is resource intensive &#8211; an order of magnitude more than the usual base workload level.<\/li>\n<li>Demand typically comes in bursts.<\/li>\n<li>The processing is well suited to being parallelised.<\/li>\n<\/ul>\n<p>This is just the sort of <a href=\"https:\/\/dalelane.co.uk\/blog\/?p=3769\">thing that I&#8217;ve talked about<\/a> as being ideal for serverless.<\/p>\n<p>I decided to move the task of creating training data zip files for image projects to a couple of new OpenWhisk functions.<\/p>\n<p>That means that now when a student clicks on the <strong>Train new machine learning model<\/strong> button&#8230;<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/user-images.githubusercontent.com\/1444788\/62006085-f8f89580-b133-11e9-9c80-f0a9e6ab21d7.png\" style=\"border: thin black solid\"\/><\/p>\n<ol>\n<li>The main Machine Learning for Kids API server will collect the list of training data images from the training data database<\/li>\n<li>The list of training images is POSTed to an OpenWhisk function called <code>CreateZip<\/code>.<\/li>\n<li>The <code>CreateZip<\/code> function invokes a second function, <code>ResizeImage<\/code>, for every image that needs to be downloaded and resized.<\/li>\n<li>The <code>CreateZip<\/code> function creates a zip file with the responses from each of the <code>ResizeImage<\/code> function invocations, and returns it.<\/li>\n<li>The main Machine Learning for Kids API server submits it to the Watson Visual Recognition service<\/li>\n<\/ol>\n<p>Splitting it into two functions means I don&#8217;t need to throttle the image resizing. I can do all of them in parallel, without needing to scale the <code>CreateZip<\/code> function to cope with it, because each image will be handled by a separate function instance of <code>ResizeImage<\/code>.<\/p>\n<h3>The code<\/h3>\n<p>So that was it. I&#8217;ve been tinkering with this off-and-on for a week. Look at the <a href=\"https:\/\/github.com\/IBM\/taxinomitis\/pull\/212\">pull request with the initial implementation<\/a> to see more about how I approached it.<\/p>\n<p>I&#8217;m assuming it&#8217;ll change a bit over the first few weeks as I learn about all the things I did wrong (<em>I&#8217;m still fairly new to OpenWhisk!<\/em>) so you can find the current version of the functions in the <a href=\"https:\/\/github.com\/IBM\/taxinomitis\/tree\/master\/serverless-functions\">serverless-functions folder in github.com\/IBM\/taxinomitis<\/a>.<\/p>\n<p>I wrote it in TypeScript for consistency with the rest of the site, and it wasn&#8217;t without challenges.<\/p>\n<p>For example, the <a href=\"https:\/\/github.com\/IBM\/taxinomitis\/blob\/b7c10d03842c82b3c957baec19bbd623de43b0f4\/src\/lib\/utils\/download.ts#L132-L149\">previous implementation for resizing images in the main site app<\/a> used <a href=\"https:\/\/sharp.pixelplumbing.com\/\">sharp<\/a>.<\/p>\n<p><code>sharp<\/code> has some native code bits that I couldn&#8217;t figure out how to get into a Node.js function. The OpenWhisk Node.js runtime <a href=\"https:\/\/github.com\/apache\/incubator-openwhisk-runtime-nodejs\/blob\/7c8461c99390aff12e7bad33a2d79f65150b9d03\/core\/nodejs10Action\/Dockerfile#L20-L21\">already includes imagemagick<\/a> so I <a href=\"https:\/\/github.com\/IBM\/taxinomitis\/blob\/b7c10d03842c82b3c957baec19bbd623de43b0f4\/serverless-functions\/src\/Resize.ts#L67-L76\">switched to using gm for the serverless implementation<\/a>. It&#8217;s noticeably slower, so that&#8217;s frustrating.<\/p>\n<p>Plus it was a little fiddly to package. I ended up using <a href=\"https:\/\/github.com\/IBM\/taxinomitis\/blob\/master\/serverless-functions\/webpack.config.js\">webpack to get into a state that would fit<\/a> with OpenWhisk. The <a href=\"https:\/\/github.com\/IBM-Cloud\/openwhisk-typescript\/\">openwhisk-typescript<\/a> project was super useful in learning how to do it. But arguably this could&#8217;ve been better implemented in a different language that can do image processing faster.<\/p>\n<h3>No-one will notice!<\/h3>\n<p>That&#8217;s it. I&#8217;ve protected against future growth of the site, and hopefully reduced my hosting costs a bit. But if I&#8217;ve done this right, no-one will notice any difference. \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve moved a couple of bits of Machine Learning for Kids into OpenWhisk functions. In this post, I&#8217;ll describe what I&#8217;m trying to solve by doing this, and what I&#8217;ve done. Background I&#8217;ve talked before how I implemented Machine Learning for Kids, but the short version is that most of it is a Node.js app, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3818,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[587,592],"class_list":["post-3808","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-code","tag-mlforkids-tech","tag-openwhisk"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3808","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3808"}],"version-history":[{"count":0,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3808\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/media\/3818"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3808"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3808"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3808"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}