Genii Weblog

To err is human, to forgive is... HTML?

Sun 13 Jul 2003, 11:04 PM



by Ben Langhinrichs
I have been working a lot with HTML recently, and not just because we have a new website design coming soon.  I have been working on our new HTML -> rich text -> HTML round-tripping logic in preparation for a new product which I have mentioned before.  I am reminded of how different HTML is from rich text, even though both may look the same on the outside, and much of the difference is not in the specs, which both show strict interpretations of how things should be, but in the implementations, which determine how things really are.

Here is an example.  I typed the following HTML into a file:

Cells in invalid HTML

and displayed it in Internet Explorer, even though it is filled with errors.  The first table cell <td> doesn't have its closing </td>.  The final </table> is missing.  The <b> and <i> tags are nested incorrectly.  By all rights, this should be rejected outright, but here is the result from IE:
Cells rendered
It looks just about like what you would expect, if everything had been typed in correctly.  Now, just for grins, imagine trying this kind of thing with rich text.  I have tried, and it looks like a large red box, and then there's a coffee break while you reboot.  In fact, rich text is no more heavily specified than HTML, which says right in its specs about the <table>tag, for example: Start tag: required, End tag: required.  The difference is not the specs, it is the implementation.  The Notes client is downright picky about what it does with rich text records, which explains a lot (and I mean a whole lot more than is recognized) of the random crashes that the Notes client has.  And crashes aren't then end of it, there are also a lot of oddities such as the out of line table cells when using merged cells, the wandering hide whens which also plague tables in most versions of Notes (although it is claimed that this is fixed), etc. etc. etc.

So, while this is interesting, what does it have to do with the price of tea in China? Well, nothing really, but I don't get paid to buy tea in China.  I get paid to write software.  And this has a lot to do with software.

Too much software is written by programmers, for programmers, even if the intended audience is not programmers.  It is hard to place oneself in the mind of a non-programmer, but I think it is critical to try.  Clearly, the developers of the various web browsers were able to do this, through necessity or design, I don't know.  With our Midas Rich Text LSX, I made a huge effort to see things from the perspective of a non-programmer, in this case a power-user-or-just-barely-a-developer, and the whole object model is based on a user perspective.  Why worry about the insides of rich text when you can just point at the first row of every table and say, make it 12pt Bold Magenta

This is not so easy when working with COEX! Links or the new HTML -> RT -> HTML product.  There is no object model, and there aren't even any scripters.  This software is supposed to work with no human interaction, no script, and just do what is necessary.  At best, a very few parameters can be set in the NOTES.INI, and that is on a server and not likely to change.  Magically, rich text is supposed to turn into the HTML recognized by eWebEditPro or EditLive! or any of the other web authoring tools, much of which don't have a clue about Lotus Notes/Domino, and certainly not about rich text.   Just as magically, the HTML they generate is supposed to turn back into rich text that is recognizable by the user and editable and so on.  Images which are stored as attachments by one tool may be referenced as separate documents by another, if either actually deals with images in Domino at all.  Now, if I controlled the web authoring tool, this would be fairly easy, as I could specify exactly where and how and when I stored the images or tables or attachments or links, but I don't control any of that.  I also don't control whether style sheets are accepted, rejected, allowed or even required, or whether cascading style sheets are used for authors vs readers, or whether links were originally doclinks that got converted to URL links and need to be converted back, or whether the <div>'s are really layers or paragraphs or whatever.  I have none of the control and too little of the fore-knowledge of what the content will look like going either direction.

But I do have a goal (the vision thing, if you well).  I have a goal that no matter what weird and ill-structured HTML is tossed at this software, it won't create rich text that crashes Notes.  I have a goal that no matter what odd and Notes-centric rich text is added, the HTML will do something rational and not too complex.  And finally, I have a goal that if you create the software in Notes rich text, then edit it again in a web content authoring system, then edit again in Notes, you won't have anything too, too far off what you started.  Same thing if you start with the web authoring tool.  Is this a reasonable goal?  Is this is possible goal?  Should my name be Ben "Don Quioxte" Langhinrichs?  We'll have to wait and see.   I'm sure the first release won't get all the way there, but after developing Midas for seven years, I know I have the patience and stamina to keep working on it.  Now, at least you know where I'm heading to balance out where I currently am.

Copyright © 2003 Genii Software Ltd.

What has been said:

No documents found