Genii Weblog

How to tell if a round trip (RT -> HTML -> RT) worked

Wed 15 Mar 2006, 11:01 PM

by Ben Langhinrichs
As I test Version 1.3 of CoexEdit and Version 3.61 of the Midas products (which share some of the same engine), I try to find ways to see if conversions from Notes rich text to HTML work properly, and from HTML to rich text work properly, but it tends to be a fairly manual job.  I also have some automated tools to convert either way, but those are mostly useful for detecting crash type situations or even corrupt data situations.  While I can easily convert each of the 40,000+ documents in the Business Partner forum for 2002, for example, or every HTML file in my browser cache, I can't manually look at each one to verify that it worked properly.  So, what can I do to at least reassure myself that the conversion either way worked?

Even for me, and even with Midas, it is very hard to compare two rich text fields and see if they are "essentially the same".  By that I mean, they would look and act the same, because they would not have to be identical internally to look and act the same.  There are numerous small differences, including different PABID's, explicit versus implicit paragraph definitions, extended table flags that happen to be nonessential, etc.  So, comparing rich text to rich text is very difficult.

But comparing HTML to HTML is less difficult.  While technically you can have different HTML that looks the same, I can control for that since I am generating the HTML.  So, even though I want to ensure that the rich text looks and act "right", I am left with comparing HTML.  Basically, if I start with rich text, covert to HTML, then convert to rich text, then convert to HTML again, will the HTML results from the second and fourth step be the same?  This does not guarantee fidelity with the rich text, but it does guarantee a form of stability, since the results will likely not change no matter how many iterations if it made it through two iterations.  So, if I run my agent against the 40,000+ documents in the Partner Forum for 2002, I can readily discover the trouble spots where RT1 -> HTML1 -> RT2 -> HTML2 come up with different HTML1 and HTML2.  Then, I can look and see what caused the discrepancy.  A long slow difficult effort, but the results for CoexEdit and Midas users are pretty clear.  The smaller the set of discrepancies, the better, and I can then move on to the Designer Help db and other more difficult cases.  Hence, the late nights, even when I am in the middle of moving office and home.

Copyright 2006 Genii Software Ltd.

What has been said:

No documents found