I need to diff some XML files.
For these particular XML files, order is not important. The XML is being used to contain a set of things, not a list – the order of the elements has no significance. Similarly, the order of the attributes within each element isn’t significant.
For example, for my purposes, these two XML files are equivalent:
<myroot>
<mychild id="123">
<fruit>apple</fruit>
<test hello="world" brackets="angled" question="answers"/>
<comment>This is a comment</comment>
</mychild>
<mychild id="456">
<fruit>banana</fruit>
</mychild>
<mychild id="789">
<fruit>orange</fruit>
<test brackets="round" hello="greeting">
<number>111</number>
</test>
<dates>
<modified>123</modified>
<created>253</created>
<accessed>44</accessed>
</dates>
</mychild>
</myroot>
<myroot>
<mychild id="789">
<fruit>orange</fruit>
<test hello="greeting" brackets="round">
<number>111</number>
</test>
<dates>
<accessed>44</accessed>
<modified>123</modified>
<created>253</created>
</dates>
</mychild>
<mychild id="123">
<test question="answers" hello="world" brackets="angled"/>
<comment>This is a comment</comment>
<fruit>apple</fruit>
</mychild>
<mychild id="456">
<fruit>banana</fruit>
</mychild>
</myroot>
I needed to compare some large XML files, which have big differences in the order of elements, and I couldn’t find a tool that would do the job. So I wrote a bit of Python to do it for me.
How it works
I cheated.
Diff tools are complex, and I’m in a hurry without time to implement one.
Instead, to compare two of my XML files, my approach is to sort them both so they have a consistent order, and then diff the sorted files using an existing visual diff tool. (On Windows, I prefer vsdiff from SlickEdit. On Mac, I prefer diffmerge. My approach works with either of these.)
Example
For example, consider the following simple test files:
testA.xml
<myroot>
<mychild id="123">
<fruit>apple</fruit>
<test hello="world" testing="removed" brackets="angled" question="answers"/>
<comment>This is a comment</comment>
</mychild>
<mychild id="456">
<fruit>banana</fruit>
<comment>This will be removed</comment>
</mychild>
<mychild id="789">
<fruit>orange</fruit>
<test brackets="round" hello="greeting">
<number>111</number>
</test>
<dates>
<modified>123</modified>
<created>880</created>
<accessed>44</accessed>
</dates>
</mychild>
</myroot>
testB.xml
<myroot>
<mychild id="789">
<fruit>orange</fruit>
<test hello="greeting" brackets="round">
<number>111</number>
</test>
<dates>
<accessed>49</accessed>
<modified>123</modified>
<created>253</created>
</dates>
</mychild>
<mychild id="123">
<test question="answers" hello="world" brackets="angled"/>
<comment>This is a comment</comment>
<fruit>apple</fruit>
</mychild>
<mychild id="456">
<fruit>banana</fruit>
</mychild>
</myroot>
On Mac, I run:
$ python xmldiff.py diffmerge testA.xml testB.xml
On Windows, I run:
$ python xmldiff.py vsdiff testA.xml testB.xml
Source
The source showing how this works is available in a gist at
gist.github.com/dalelane.
It’s a quick hack to let me compare a handful of files, so it’s not been rigorously tested. But it’s a very simple little tool, and was good enough for my purposes tonight!


Thanks for this Dale, we use a product that has the annoying knack of reordering attributes in its config files when an upgrade is applied, so on an initial diff it looks like a lot of things have changed.
On running xmldiff however, we can see the important things that have changed (down to 3 lines different from about 80!).
One thing to note – I needed to pip install lxml before it would work – this was on a pretty much new install of OS X on the Mac so a clean python install.
Another thank you from me, Dale! I was about to write something similar, but found your elegant and simple solution that did exactly what I was planning to do (and more, I did not plan to directly integrate the diff tool, but why not!). Cheers!
That was quite informative !! I am trying to do something similar too. But the XML files I’m working on are 4-5 Giggs in size, so entire XML file wont fit into memory. Will this method work for them ?? Or do you have any ideas to implement it ??
Hiya – Sorry, no, I didn’t do it in a streaming way. I was in a hurry so just read it and sorted it in memory.