The XML differencing algorithm implemented in the Compare tool in XMLmind XML Editor - Online Help may be described as follows:
Begin by comparing the root element of the original document to the root element of the revised document.
If the element in the original document (let's call it the original element) and the element in the revised document (let's call it the revised element) have the same serial number, then compare their contents. Otherwise consider that these elements are completely different.
Trivially compare the attributes of the original element to the attributes of the revised element.
The child nodes of an element are converted to a sequence of comparable items prior to be compared:
A text item is added to the sequence for each word contained in the element[7].
A serial number item is added for each child element contained in the element.
A comment item is added for each XML comment contained in the element.
A processing-instruction item is added for each processing-instruction contained in the element.
An inclusion directive item is added for each range of included nodes contained in the element.
Example:
<p>The <i>quick <b>brown</b></i> fox jumps over the <b>lazy</b> dog.</p>
gives:
"The ", element_7223, " fox", " jumps", " over", " the ", element_10087, " dog."
The sequence of items of the original element is compared to the sequence of items of the revised element using the well-known Unix diff algorithm[8] here applied to comparable items rather than to text lines:
Two text items are equal if they contain exactly the same text.
Two serial number items are equal if they have the same serial number.
Two comment items are equal if they contain exactly the same text.
Two processing-instructions items are equal if they have the same target and contain exactly the same text.
Two inclusion directive items are equal they have exactly the same XML contents. For example, <xi:include href="vars.xml" xpointer="copyright"/>
and <xi:include xpointer="copyright" href="vars.xml"/>
are equal, while <xi:include href="vars.xml" xpointer="copyright"/>
and <xi:include href="vars.xml" xpointer="notice"/>
differ.
Compare each child element of the original element to the child element element having the same serial number in the revised element. Proceed as explained starting from the Compare attributes step.
Notes:
The above algorithm is fast and 100% accurate by design.
The comparison of attributes, comments and processing-instructions is not as fine-grained as the comparison of elements. For example, if attribute class
is "ui-widget
" in the original element and "ui-widget ui-state-highlight
" in the revised element, the algorithm will tell you that attribute class
has changed. It will not tell you that word "ui-state-highlight
" has been added at the end of attribute class
.
Included contents (also called transcluded contents) found in the original element and in the revised element are never compared. Instead, the corresponding inclusion directives (xi:include
, DITA conref
, etc) implicitly[9] found in the original element and in the revised element are compared.
[7] If the element has or inherits xml:space="preserve"
, a text item is added for each text line contained in the element.
[8] "A file comparison program" by Webb Miller and Eugene W. Myers, 1985.
[9] By default, inclusion directives are always transcluded by XXE. Hence such directives do not really exist in the document being edited. Instead, they are recreated by XXE each time the document is saved to disk.