{"id":311,"date":"2008-11-13T22:40:34","date_gmt":"2008-11-13T22:40:34","guid":{"rendered":"http:\/\/dalelane.co.uk\/blog\/?p=311"},"modified":"2009-05-17T18:51:09","modified_gmt":"2009-05-17T18:51:09","slug":"posting-to-twitter-carefully","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=311","title":{"rendered":"Posting to Twitter&#8230; carefully"},"content":{"rendered":"<p>I&#8217;ve recently picked up my the code for my <a href=\"http:\/\/dalelane.co.uk\/page.php?id=1047\" target=\"_blank\">Windows Mobile Twitter client<\/a> again. <\/p>\n<p>It was originally <a href=\"http:\/\/dalelane.co.uk\/blog\/?p=244\">written back in April as a hackday idea<\/a>. The code posts Twitter updates using a variation on the <a href=\"http:\/\/www.sakana.fr\/blog\/2007\/03\/18\/scripting-twitter-with-curl\/\" target=\"_blank\">twitter-from-curl<\/a> approach of HTTP POSTing &#8220;status=MyTweet&#8221; to the <a target=\"_blank\" href=\"http:\/\/twitter.com\/statuses\/update.xml\">twitter update url<\/a>. <\/p>\n<p>I started with the update URL, and appended the message I wanted to tweet. This is fine for a quick hackday demo, but it did mean that you could end up with a URL like:<\/p>\n<p><a href=\"http:\/\/twitter.com\/statuses\/update.xml?status=Hello (twitter) world! Special chars = a problem?\">http:\/\/twitter.com\/statuses\/update.xml?status=Hello (twitter) world! Special chars = a problem?<\/a><\/p>\n<p>Which fails if you want to post characters such as accents or characters which have special meaning in URLs, like + ? \/ &#038; etc.  <\/p>\n<p>I was encouraged by a number of users to have another look at this, which I&#8217;ve done now, and hopefully <a href=\"http:\/\/dalelane.co.uk\/files\/TwitToday.CAB\">version 1.1<\/a> solves the problems. <\/p>\n<p>A quick Google turned up that a number of <a href=\"getsatisfaction.com\/twitpic\/topics\/german_umlaute_using_with_api\" target=\"_blank\">other Twitter apps share at least some of the same problems<\/a> that mine had, so thought I&#8217;d share the fix here. <\/p>\n<p><!--more--><strong>Step 1 &#8211; Url-encode<\/strong><\/p>\n<p>Okay, so technically I did this on the night on HackDay (<em>it might have been a hack, but even I know that spaces in a URL aren&#8217;t the best idea!<\/em>) but I&#8217;m including it here for completeness. <\/p>\n<p>My app was written in C++, so I used <a target=\"_blank\" href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa925749.aspx\"><code>InternetCanonicalizeUrl<\/code><\/a> to turn the tweet-posting API into something a little safer &#8211; a percent-encoded Uri. For example, this turned spaces in the message into <code>%20<\/code> <\/p>\n<p><strong>Step 2 &#8211; Specifying a content type<\/strong><\/p>\n<p>A few twitter users such as <a href=\"http:\/\/twitter.com\/reVoid\/statuses\/980186829\" target=\"_blank\">@reVoid<\/a> and <a href=\"http:\/\/twitter.com\/michaelmcmillan\/statuses\/982281350\" target=\"_blank\">@michaelmcmillan<\/a> reported problems when they tried to tweet \u00c3\u2020\u00c3\u02dc\u00c3\u2026 characters.<\/p>\n<p>After a bit of experimentation, the answer turned out to be to add <code>charset=utf-8<\/code> to the HTTP headers I send when I post.<\/p>\n<pre style=\"overflow: scroll; font-size: 1.1em; border: thin solid silver; background-color: #eeeeee; padding: 0.8em\">TCHAR header[]  = TEXT(\"Content-Type:application\/x-www-form-urlencoded;charset=utf-8\");<\/pre>\n<p>A quick play with accented characters like \u00c3\u00a9 and \u00c3\u00a1 seemed to suggest that this would get accents working.<\/p>\n<p><strong>Step 3 &#8211; UTF-8 encoding<\/strong><\/p>\n<p>This appeared to fix most accented characters. But then <a href=\"http:\/\/twitter.com\/walti\/statuses\/998679994\" target=\"_blank\">@walti<\/a> pointed out that my code still broke when posting certain characters with German umlautes. Either the next character after each umlaut would be lost when posted, or sometimes the whole remainder of the tweet after an umlaut would be lost. <\/p>\n<p>Googling showed that other apps such as <a href=\"http:\/\/getsatisfaction.com\/twitpic\/topics\/german_umlaute_using_with_api\" target=\"_blank\">Twitpic<\/a>, <a href=\"http:\/\/getsatisfaction.com\/pingfm\/topics\/german_umlauts_breaking_things\" target=\"_blank\">Ping.fm<\/a> and <a href=\"http:\/\/www.32hours.com\/2008\/02\/01\/new-features-added-to-betwittered\/#comment-201\" target=\"_blank\">betwittered<\/a> shared this bug. <\/p>\n<p>I didn&#8217;t figure this one out for myself, but the answer was given to me by <a href=\"http:\/\/groups.google.com\/group\/twitter-development-talk\/browse_frm\/thread\/965492b2159e19f1\/170838fbc7f93fe3\" target=\"_blank\">a helpful user on the Twitter API Google Group<\/a>. <\/p>\n<p>It seems that url encoding isn&#8217;t sufficient. You also need to UTF-8 encode the characters. For example, when posting \u00c3\u00b6 &#8211; it wasn&#8217;t enough for me to url-encode it and send it as <code>%F6<\/code>, I also need to utf8-encode it to <code>%C3%B6<\/code>. <\/p>\n<p>I wrote a quick-and-dirty encoder to handle this:<\/p>\n<pre style=\"overflow: scroll; font-size: 1.1em; border: thin solid silver; background-color: #eeeeee; padding: 0.8em\">\/\/---------------------------------------------------\r\n\/\/ doing a little UTF-8 encoding... \r\n\/\/---------------------------------------------------\r\n\r\nLPWSTR utf8encodedTweet = new TCHAR[dwNewSz * 3];\r\nnewptr = 0;\r\nfor (int i=0; i < dwNewSz; i++)\r\n{\r\n    if (lpszEncTweetMessage[i] == '%')\r\n    {\r\n        if (lpszEncTweetMessage[i+1] == 'B')\r\n        {\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'C';\r\n            utf8encodedTweet[newptr++] = '2';\r\n        }\r\n        else if (lpszEncTweetMessage[i+1] == 'C')\r\n        {\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'C';\r\n            utf8encodedTweet[newptr++] = '3';\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = '8';\r\n            i += 2;\r\n        }\r\n        else if (lpszEncTweetMessage[i+1] == 'D')\r\n        {\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'C';\r\n            utf8encodedTweet[newptr++] = '3';\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = '9';\r\n            i += 2;\r\n        }\r\n        else if (lpszEncTweetMessage[i+1] == 'E')\r\n        {\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'C';\r\n            utf8encodedTweet[newptr++] = '3';\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'A';\r\n            i += 2;\r\n        }\r\n        else if (lpszEncTweetMessage[i+1] == 'F')\r\n        {\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'C';\r\n            utf8encodedTweet[newptr++] = '3';\r\n            utf8encodedTweet[newptr++] = '%';\r\n            utf8encodedTweet[newptr++] = 'B';\r\n            i += 2;\r\n        }\r\n    }\r\n\r\n    utf8encodedTweet[newptr++] = lpszEncTweetMessage[i];\t\t\r\n}\r\nutf8encodedTweet[newptr] = '\\0';<\/pre>\n<p><strong>Is that it?<\/strong><\/p>\n<p>I think that covers all the bases... <\/p>\n<p>No doubt someone will point out otherwise before too long. :-)  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve recently picked up my the code for my Windows Mobile Twitter client again. It was originally written back in April as a hackday idea. The code posts Twitter updates using a variation on the twitter-from-curl approach of HTTP POSTing &#8220;status=MyTweet&#8221; to the twitter update url. I started with the update URL, and appended the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[295,151,243,298,296,297,299,19,43],"class_list":["post-311","post","type-post","status-publish","format-standard","hentry","category-code","tag-internetcanonicalizeurl","tag-twitter","tag-twittoday","tag-url-encode","tag-utf-8","tag-utf8","tag-widechartomultibyte","tag-windows-mobile","tag-windowsmobile"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/311","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=311"}],"version-history":[{"count":0,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/311\/revisions"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=311"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=311"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=311"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}