{"id":3606,"date":"2018-05-30T01:24:16","date_gmt":"2018-05-30T01:24:16","guid":{"rendered":"http:\/\/dalelane.co.uk\/blog\/?p=3606"},"modified":"2018-05-30T21:43:18","modified_gmt":"2018-05-30T21:43:18","slug":"machine-learning-for-kids-outage-report","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=3606","title":{"rendered":"Machine Learning for Kids outage report"},"content":{"rendered":"<p><strong><a href=\"https:\/\/machinelearningforkids.co.uk\/\">Machine Learning for Kids<\/a> was unavailable for most of 29th May 2018. I wanted to share what happened and what I\u2019m doing about it.<\/strong> <\/p>\n<p><!--more-->The site is hosted in IBM Cloud. I have run multiple instances of the site in parallel for reliability and availability since I first launched the site. However, all of these were deployed into the same location &#8211; the <a href=\"https:\/\/www.ibm.com\/cloud-computing\/bluemix\/data-centers\">\u201cUS South\u201d region in Dallas, US<\/a>. <\/p>\n<p>At approximately 10am on 29th May (UK time), a <a href=\"https:\/\/console.bluemix.net\/status\/notification\/60be54fd98ee2cb2ba9892446e29c0da\">major routing failure hit applications running in the US South region in IBM Cloud<\/a>. This meant that although the Machine Learning for Kids application instances were still running, the requests from people\u2019s web browsers weren\u2019t getting routed to them. The application was essentially cut off from the outside world. <\/p>\n<p>I can see from the logs from the Machine Learning for Kids application (as it kept running throughout the day) that virtually no web requests made it to the application. This situation remained for the rest of the day, until very late in the evening (UK time). <\/p>\n<p>Routing was restored before midnight, and the application seems to be accessible again now. <\/p>\n<p>I have no way of knowing how many people tried to access the site during this time. I know of many, who emailed me to ask what was going on, but I imagine there will be many others who just gave up without telling me. To all of them, I\u2019d like to say that I am very very sorry for any inconvenience that this will have caused. I know what it\u2019s like to plan an activity to run with a school class only to have it derailed without notice or warning &#8211; and I feel very disappointed and frustrated to know that I caused this for some. I want schools and code clubs to feel that they can rely on the site as a resource. <\/p>\n<p>Although this was started by an infrastructure failure, I made the site vulnerable to this sort of outage by putting all of the instances of the application into the same physical region. I\u2019m working on fixing this now. In future, I will run instances of the application in multiple different regions &#8211; to start with, US (Dallas) and UK (London). I\u2019m setting up DNS failover using Cloudflare so that in future web requests will be automatically routed to a working region. In the event of a future IBM Cloud region outage like I saw here, the site should remain accessible as long as at least one of the IBM Cloud regions is still functional. <\/p>\n<p>This is something I should\u2019ve done months ago. I\u2019m sorry that it took a major failure like today to push me to do it. <\/p>\n<p>There might be some intermittent weird behaviour over the next 24 hours as I transfer management of the tools\u2019 DNS to Cloudflare. I\u2019ll do my best to keep this to an absolute minimum. <\/p>\n<p>If you have any questions about any of this, please don\u2019t hesitate to get in touch. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine Learning for Kids was unavailable for most of 29th May 2018. I wanted to share what happened and what I\u2019m doing about it.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"class_list":["post-3606","post","type-post","status-publish","format-standard","hentry","category-misc"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3606","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3606"}],"version-history":[{"count":0,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3606\/revisions"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3606"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3606"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3606"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}