what’s the difference?

One of the problems with using a real VCS for normal-human-being content management is that nobody understands diffs. Even if they do, the standard line-based differencing algorithms are useless or problematic, even for diff-comprehending folk, for most of the valuable use cases. Here I do a brief survey of the options for these use cases.

  • XML/XHTML differencing. DaisyDiff seems to address this problem. XML dialects rule structured documents these days, but line-comparison gives you bad results and often malformed documents. This good survey of XML differencing got me to DaisyDiff, and it links to other relevant issues.
  • Image differencing. Perceptual Diff compares raster images. Image diff is a nicely simple php image differencer. I can find nothing on higher-level image raster differencing ( comparing layers and whatnot ) and nothing at all about comparing vector images (though SVG might be amenable to xml differencing).
  • Video differencing. Not much here. Some papers, no open-source software I can find (not a surprise, as most of what I can find is related to digital restriction management and copyright enforcement). Current takeaway is if the video collection isn’t in the same format, you’ve not a prayer.
  • PDF differencing. Adobe offers pdf differencing. Not clear if it can readily be used in a server environment. Other tools I could find were windows only, and not geared towards generating the differences as a product itself.
  • “Office document” differencing. There are some scripts for open-office doc differencing. I suspect that using a good xml differ would be sufficient.

General differencing

Current Conclusion

  • Biggest takeaway: constrain differencing to supported formats. I’m pretty close to convinced that we should not offer the ability to manage documents in unsupported formats at all, given how much the user experience of version control degrades when no differencing ability is present. Transcoding all input into supported formats for purpose of differencing is another option.
  • Most of the rich document comparison that chit (can has it together) should do can be done via xml. The devil, as it were, is in making meaningful user interfaces to the difference files thus generated.
  • PDF differencing can be bought, but no good OSS solution appears to exist. UI issues would still apply.
  • Video differencing is a rough field. Probably possible to buy algorithm engines for a big chunk o’ change. UI an issue that I haven’t seen addressed anywhere.
You can leave a response, or trackback from your own site.

Leave a Reply