Posts Tagged ‘xml’

Comparing XML files ignoring order of attributes and child elements

Monday, October 6th, 2014

I need to diff some XML files.

For these particular XML files, order is not important. The XML is being used to contain a set of things, not a list – the order of the elements has no significance. Similarly, the order of the attributes within each element isn’t significant.

For example, for my purposes, these two XML files are equivalent:

<myroot>
    <mychild id="123">
        <fruit>apple</fruit>
        <test hello="world" brackets="angled" question="answers"/>
        <comment>This is a comment</comment>
    </mychild>
    <mychild id="456">
        <fruit>banana</fruit>
    </mychild>
    <mychild id="789">
        <fruit>orange</fruit>
        <test brackets="round" hello="greeting">
            <number>111</number>
        </test>
        <dates>
              <modified>123</modified>
              <created>253</created>
              <accessed>44</accessed>
        </dates>
    </mychild>
</myroot>
<myroot>
    <mychild id="789">
        <fruit>orange</fruit>
        <test hello="greeting" brackets="round">
            <number>111</number>
        </test>
        <dates>
              <accessed>44</accessed>    
              <modified>123</modified>
              <created>253</created>
        </dates>
    </mychild>
    <mychild id="123">
        <test question="answers" hello="world" brackets="angled"/>
        <comment>This is a comment</comment>
        <fruit>apple</fruit>
    </mychild>
    <mychild id="456">
        <fruit>banana</fruit>
    </mychild>
</myroot>

I needed to compare some large XML files, which have big differences in the order of elements, and I couldn’t find a tool that would do the job. So I wrote a bit of Python to do it for me.

(more…)