{"id":3225,"date":"2014-10-06T02:07:37","date_gmt":"2014-10-06T02:07:37","guid":{"rendered":"http:\/\/dalelane.co.uk\/blog\/?p=3225"},"modified":"2014-10-06T02:26:30","modified_gmt":"2014-10-06T02:26:30","slug":"comparing-xml-files-ignoring-order-of-attributes-and-child-elements","status":"publish","type":"post","link":"https:\/\/dalelane.co.uk\/blog\/?p=3225","title":{"rendered":"Comparing XML files ignoring order of attributes and child elements"},"content":{"rendered":"<p>I need to diff some XML files. <\/p>\n<p>For these particular XML files, order is not important. The XML is being used to contain a set of things, not a list &#8211; the order of the elements has no significance. Similarly, the order of the attributes within each element isn&#8217;t significant. <\/p>\n<p>For example, for my purposes, these two XML files are equivalent:<\/p>\n<pre style=\"border: thin solid silver; background-color: #eeeeee; padding: 0.7em; font-size: 1.2em; overflow: auto;\">&lt;myroot&gt;\r\n    &lt;mychild id=\"123\"&gt;\r\n        &lt;fruit&gt;apple&lt;\/fruit&gt;\r\n        &lt;test hello=\"world\" brackets=\"angled\" question=\"answers\"\/&gt;\r\n        &lt;comment&gt;This is a comment&lt;\/comment&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"456\"&gt;\r\n        &lt;fruit&gt;banana&lt;\/fruit&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"789\"&gt;\r\n        &lt;fruit&gt;orange&lt;\/fruit&gt;\r\n        &lt;test brackets=\"round\" hello=\"greeting\"&gt;\r\n            &lt;number&gt;111&lt;\/number&gt;\r\n        &lt;\/test&gt;\r\n        &lt;dates&gt;\r\n              &lt;modified&gt;123&lt;\/modified&gt;\r\n              &lt;created&gt;253&lt;\/created&gt;\r\n              &lt;accessed&gt;44&lt;\/accessed&gt;\r\n        &lt;\/dates&gt;\r\n    &lt;\/mychild&gt;\r\n&lt;\/myroot&gt;<\/pre>\n<pre style=\"border: thin solid silver; background-color: #eeeeee; padding: 0.7em; font-size: 1.2em; overflow: auto;\">&lt;myroot&gt;\r\n    &lt;mychild id=\"789\"&gt;\r\n        &lt;fruit&gt;orange&lt;\/fruit&gt;\r\n        &lt;test hello=\"greeting\" brackets=\"round\"&gt;\r\n            &lt;number&gt;111&lt;\/number&gt;\r\n        &lt;\/test&gt;\r\n        &lt;dates&gt;\r\n              &lt;accessed&gt;44&lt;\/accessed&gt;    \r\n              &lt;modified&gt;123&lt;\/modified&gt;\r\n              &lt;created&gt;253&lt;\/created&gt;\r\n        &lt;\/dates&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"123\"&gt;\r\n        &lt;test question=\"answers\" hello=\"world\" brackets=\"angled\"\/&gt;\r\n        &lt;comment&gt;This is a comment&lt;\/comment&gt;\r\n        &lt;fruit&gt;apple&lt;\/fruit&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"456\"&gt;\r\n        &lt;fruit&gt;banana&lt;\/fruit&gt;\r\n    &lt;\/mychild&gt;\r\n&lt;\/myroot&gt;<\/pre>\n<p>I needed to compare some large XML files, which have big differences in the order of elements, and I couldn&#8217;t find a tool that would do the job. So I wrote a bit of Python to do it for me.<\/p>\n<p><!--more--><\/p>\n<h3>How it works<\/h3>\n<p>I cheated. <\/p>\n<p>Diff tools are complex, and I&#8217;m in a hurry without time to implement one. <\/p>\n<p>Instead, to compare two of my XML files, my approach is to sort them both so they have a consistent order, and then diff the sorted files using an existing visual diff tool. (On Windows, I prefer <code>vsdiff<\/code> from <a href=\"http:\/\/www.slickedit.com\/\">SlickEdit<\/a>. On Mac, I prefer <a href=\"https:\/\/sourcegear.com\/diffmerge\/\">diffmerge<\/a>. My approach works with either of these.)<\/p>\n<h3>Example<\/h3>\n<p>For example, consider the following simple test files:<\/p>\n<p>testA.xml<\/p>\n<pre style=\"border: thin solid silver; background-color: #eeeeee; padding: 0.7em; font-size: 1.1em; overflow: auto;\">&lt;myroot&gt;\r\n    &lt;mychild id=\"123\"&gt;\r\n        &lt;fruit&gt;apple&lt;\/fruit&gt;\r\n        &lt;test hello=\"world\" testing=\"removed\" brackets=\"angled\" question=\"answers\"\/&gt;\r\n        &lt;comment&gt;This is a comment&lt;\/comment&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"456\"&gt;\r\n        &lt;fruit&gt;banana&lt;\/fruit&gt;\r\n        &lt;comment&gt;This will be removed&lt;\/comment&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"789\"&gt;\r\n        &lt;fruit&gt;orange&lt;\/fruit&gt;\r\n        &lt;test brackets=\"round\" hello=\"greeting\"&gt;\r\n            &lt;number&gt;111&lt;\/number&gt;\r\n        &lt;\/test&gt;\r\n        &lt;dates&gt;\r\n              &lt;modified&gt;123&lt;\/modified&gt;\r\n              &lt;created&gt;880&lt;\/created&gt;\r\n              &lt;accessed&gt;44&lt;\/accessed&gt;\r\n        &lt;\/dates&gt;\r\n    &lt;\/mychild&gt;\r\n&lt;\/myroot&gt;<\/pre>\n<p>testB.xml<\/p>\n<pre style=\"border: thin solid silver; background-color: #eeeeee; padding: 0.7em; font-size: 1.1em; overflow: auto;\">&lt;myroot&gt;\r\n    &lt;mychild id=\"789\"&gt;\r\n        &lt;fruit&gt;orange&lt;\/fruit&gt;\r\n        &lt;test hello=\"greeting\" brackets=\"round\"&gt;\r\n            &lt;number&gt;111&lt;\/number&gt;\r\n        &lt;\/test&gt;\r\n        &lt;dates&gt;\r\n              &lt;accessed&gt;49&lt;\/accessed&gt;    \r\n              &lt;modified&gt;123&lt;\/modified&gt;\r\n              &lt;created&gt;253&lt;\/created&gt;\r\n        &lt;\/dates&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"123\"&gt;\r\n        &lt;test question=\"answers\" hello=\"world\" brackets=\"angled\"\/&gt;\r\n        &lt;comment&gt;This is a comment&lt;\/comment&gt;\r\n        &lt;fruit&gt;apple&lt;\/fruit&gt;\r\n    &lt;\/mychild&gt;\r\n    &lt;mychild id=\"456\"&gt;\r\n        &lt;fruit&gt;banana&lt;\/fruit&gt;\r\n    &lt;\/mychild&gt;\r\n&lt;\/myroot&gt;<\/pre>\n<p>On Mac, I run:<br \/>\n<code>$ python xmldiff.py diffmerge testA.xml testB.xml<\/code><\/p>\n<p>And get:<br \/>\n<a href=\"https:\/\/www.flickr.com\/photos\/dalelane\/15267322110\" title=\"Screen Shot 2014-10-06 at 02.48.02 by Dale Lane, on Flickr\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/farm6.staticflickr.com\/5597\/15267322110_f006f44aee.jpg\" width=\"450\" height=\"171\" alt=\"Screen Shot 2014-10-06 at 02.48.02\"\/><\/a><\/p>\n<p>On Windows, I run:<br \/>\n<code>$ python xmldiff.py vsdiff testA.xml testB.xml<\/code><\/p>\n<p>And get:<br \/>\n<a href=\"https:\/\/www.flickr.com\/photos\/dalelane\/15431016366\" title=\"screenshot-windows-20141006-0255 by Dale Lane, on Flickr\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/farm6.staticflickr.com\/5600\/15431016366_ffb17d7b34.jpg\" width=\"450\" height=\"188\" alt=\"screenshot-windows-20141006-0255\"\/><\/a><\/p>\n<h3>Source<\/h3>\n<p>The source showing how this works is available in a gist at<br \/>\n<a href=\"https:\/\/gist.github.com\/dalelane\/a0514b2e283a882d9ef3\">gist.github.com\/dalelane<\/a>. <\/p>\n<p>It&#8217;s a quick hack to let me compare a handful of files, so it&#8217;s not been rigorously tested. But it&#8217;s a very simple little tool, and was good enough for my purposes tonight!<\/p>\n<p><script src=\"https:\/\/gist.github.com\/dalelane\/a0514b2e283a882d9ef3.js\"><\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I need to diff some XML files. For these particular XML files, order is not important. The XML is being used to contain a set of things, not a list &#8211; the order of the elements has no significance. Similarly, the order of the attributes within each element isn&#8217;t significant. For example, for my purposes, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[566,212,565],"class_list":["post-3225","post","type-post","status-publish","format-standard","hentry","category-code","tag-diff","tag-python","tag-xml"],"_links":{"self":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3225","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3225"}],"version-history":[{"count":0,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/3225\/revisions"}],"wp:attachment":[{"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3225"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3225"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalelane.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3225"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}