Home       About Genii       Products       Downloads       News & Events       Support       Weblog   
May, 2008
SMTWTFS
    01 02 03
04 05 06 07 08 09 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
April, 2008
March, 2008
February, 2008
January, 2008
December, 2007
November, 2007

Show all 60 months


Search the weblog

Ben Langhinrichs
E-mail address - Ben Langhinrichs

Photograph of Ben Langhinrichs



©2003-2008 Genii Software Ltd. All rights reserved. The rights to all logos, images, etc. are owned by their respective owners.

The views expressed in this weblog are mine alone, but since I am President and Owner, they necessarily also reflect the views of Genii Software, Ltd.
Genii Weblog
Genii Weblog

Rich Text 101 - Rich Text Itself

Tue 7 Oct 2003, 11:59 PM

Rich Text 101 logo

Wait!  Don't adjust your picture!  I really did mean to title this article "Rich Text Itself".  Instead of talking about a specific rich text construct, I thought I'd address the idea of rich text itself, and especially where it is used in Lotus Notes/Domino.  I should warn you up front, this may be the most technical and confusing of the Rich Text 101 topics, but for completeness sake, it needs to be covered.  Don't worry, there won't be a quiz at the end.

Where is rich text used?
Rich text is often thought of as the contents of a rich text field, such as the body of a mail message or forum post or an article such as this.  While true, this misses the true scope of rich text and its importance to Notes/Domino.  Whereas the "note" is the fundamental building block of the Lotus Notes database structure, with every document and design element made up of a note of one type or another, "rich text" is the fundamental building block used inside the "note".  Forms are notes with a special rich text field called $Body, which contains the design of the form.  Similarly, Subforms and Pages have a $Body rich text field containing their design, Image Resources have a $ImageData rich text field containing the image data, File resources and Style Sheets have a $FileData rich text field containing the file resource data, Frame Sets have a $FrameSet rich text field containing the frame set data, Shared Fields have a $Body rich text field with the field definition, and even Shared Actions have a $Body rich text field with design data.  Other design elements such as Agents, Applets, Navigators, Outlines and Views have other forms of data storage rather than rich text fields, but you can see that rich text is used widely.

What is rich text, really?
What we call "rich text" is known internally as "composite data records" or "CD records".  It is merely a stream of record structures with a standardized header that defines where the record structure begins, what type of record it is, and how many bytes it uses.  Some record structures are self contained, while some have data stuffed after the record in the stream.  For example, a CDTEXT record hold text, but it is only 8 bytes of data about the color, font, attributes, point size, along with the header containing the "signature" and length.  The actual text is kept in the stream after the CDTEXT record.

Why are structures used instead of a more standard format?
The reason for this storage format is that it is extremely easy and fast to load the bytes into actual records in memory.  Since CD records are a completely proprietary storage mechanism, documented but owned by IBM and under their complete control, there is no need for the generalized data structures inherent in "standard" formats such as HTML or XML.  Of course, as computers have become faster and standards have become available, there is a lot of push to use more standard formats to store data.  One of those format is MIME/HTML (see my Rich Text 101 article on MIME/HMTL).  Another option would be XML.

How can record structures be used on other operating systems?
It may occur to some to wonder how this works between operating systems.  Record structures are packed on some systems, even byte aligned on others, while integer sizes vary as well.  This brings up one of the oddest aspects of CD record storage.  The actual format is the packed, byte aligned format available on Intel machines such as Windows and OS/2 (the original OS for Notes).  On those systems, the records can be used directly.  On other operating systems such as AS/400 and AIX, a conversion must happen to move the data into appropriately sized and positioned structures.  This adds a small amount of overhead, but since the reading into memory happens just once as the form or field is loaded, it is not a major issue.  The bytes are shifted, expanded or otherwise forced into the appropriate record structure, which can then be used as it is.  This is mostly only an issue for API developers, but as more people try to use the C API from LotusScript, it is an important consideration.  

What happens when record structures change in future releases of Notes?
This touches on one of the most amazing features of Notes.  From the very beginning, the design of Notes has ensured that all rich text, both as design and as content, is backwardly and forwardly compatible.  Rich text created in Notes R1 can be read and edited and rendered to HTML by Domino.  Similarly, a Notes R1 client can read and display rich text created in Notes 6, albeit only with whatever structures were available in R1.  This is unprecedented in the database world, where data has to be normalized and restructured for every table change.  Microsoft does a horrible job of reading Word 97 document in Word 2000, and there is no way for Word 97 to read Word 2000 documents.  Similar things can be said about almost every database or word processing product/company.

While this is a phenomenal feat, the way it is accomplished is also a key to its great weakness.  Every time a major change is made to rich text, another level of redundancy or complexity is added.  Prior to R5, tables were allowed in rich text, but could never be nested.  When nested tables were added in R5, a different set of CD structures and CD "signatures", the id which identifies the type of record, had to be added so that the inside tables would be ignored by earlier versions of Notes.  Instead of a fairly simply series of CDTABLEBEGIN, multiple CDTABLECELLs and a CDTABLEEND, there can now be CDNESTEDTABLEBEGIN, CDNESTEDTABLECELL and CDNESTEDTABLEEND records.  In addition, when tabbed tables were implemented in R5, a new CDPRETABLEBEGIN record had to be added to contain the additional information, since the CDTABLEBEGIN record did not have enough room.

While these are easily handled by the rich text editor (what you see when you open a rich text field or form in the Notes client), they are not so easily handled by either third party products or by LotusScript classes.  Even in Notes 6.5, the NotesRichTextNavigator class cannot see or handle nested tables, because internally it was implemented to recognize CDTABLECELL as the start of a table cell, but doesn't know how to handle CDNESTEDTABLECELL.  What is worse, if a rich text field is inside a table on the form, the rich text table is all made of CDNESTEDTABLECELLs and none of the tables can be seen by the NotesRichTextNavigator.

By the way, I don't mean to suggest that our Midas Rich Text LSX has not also had to adjust.  Midas was created for R4.1, and updated for R4.5 and R4.6 as they came along, but it took eight months of development to handle the changes for R5, even though the Midas methods looked exactly the same in R5 as in R4.6.

The addition of new record structures to handle new features sometimes means that Notes is less efficient about storage than it would be if it were designed from scratch now.  Take a rich text field created in 4.6 and re-save it in R5, and it often grows by about 15% to 20%, just from the additional structures.  The growth in ND6 is less significant, more like 5% to 10%, but that is on top of the growth from R5.  Rich text is starting to look pretty bloated.  Unfortunately, there is little that can be done unless IBM wants to give up the wonderful strength of backwards compatibility.

A quick quiz
Just kidding.  While this topic is complex and messy, it should help to explain some of the oddities of rich text, such as how it manages to work between releases, and why it keeps growing.  As for the puzzles this doesn't solve, I address many of those in my other Rich Text 101 articles, and I encourage you to read them as well, and to watch for future articles.  Let me know if there are topics you want to see, or anything you want covered in more (or less) depth.  Cheers!

What's been said:

58.1. Ned Batchelder (10/08/2003 05:45 AM)

The main reason structures are used rather than a "more standard" format: none of those formats existed when Notes began.

Best backward compatibility story ever: at Iris, there is a Notes discussion database called "Iris Office Notes" that has been in continuous use for something like 15 years. The first topic is by Ray, and is entitled "Email forwarding works!" or something like that. At one point during R5 development, someone noticed that if you looked at that topic with the latest R5 build, all of the text was running vertically down the leftside of the screen, as if it all had to wrap to one character. Rather than say "that's really old data, who cares?", the bug was assigned to a developer to figure out what was going on. In the end, they discovered that the rich text in that topic was invalid, because it had been created with a pre-V1 Notes client, so nothing could be done. But they looked into it, and would have changed R5 if V1 data didn't display properly!

58.2. Ben Langhinrichs (10/08/2003 06:54 AM)

Excellent point about the formats, and great story! It really is amazing to me that IBM/Lotus has stuck to their guns on this issue.

58.3. Andrew (03/16/2005 06:32 PM)

Hello,

I was wondering if there is a way to extract lotus notes emails as rich text files that retain the original look and feel of lotus notes emails as seen inside of lotus. Is this possible?

Thank you,

Andrew

58.4. Ben Langhinrichs (03/16/2005 07:04 PM)

We at Genii Software have done a great deal of work on that exact topic using our Midas Rich Text tools. Take a look at some of the different formats we support, include Microsoft Internet Explorer Web Archive format (*.mht) with our Export to MIME sample database, or our Export to HTML/XHTML, which will enable more than just the appearance of the original rich text because separate documents with links to each other can be exported together and the links still work from HTML file to HTML file. None of this will work perfectly, because whatever you do is a translation from one storage and rendering format to another, but we are working hard to do the best job possible. Remember, perfection is not a destination, it is a direction.

58.5. Sandra (11/19/2005 04:04 PM)

I feel very limited by rich text in Lotus Notes. I use Midas to search for a rtchunk and replace it with something else, but cannot because along with the text, the rtchunk contains carriage returns, so the ReplaceText fails. Why would any company want to stick with a technology that is so limited? I can find the index numbers of the beginning and end of an rtchunk but cannot remove that piece. Very limiting.

Sandra

58.6. Ben Langhinrichs (11/19/2005 06:49 PM)

Sandra, I am a bit confused as to whether you are objecting to rich text or to the Midas way of handling it. You might want to take the latter issue up in the Support Forum, as it sounds like it would be quite easy to do what you want, but with a different set of methods/commands. You can certainly replace a chunk with something else, but there are embedded paragraphs, not carriage returns, so you are not replacing text at all. In any case, I'd be happy to help if you provide more details in the Support forum.

58.7. sushant upadhyay (03/30/2006 10:25 PM)

it is really good but i was searching in rich text field for problem(hide when).the some value of single rich text field have been hide in any document and not in any document.how can be fix it?

58.8. karl eliason (10/09/2007 09:01 AM)

The POST 9/26/7

In docA I have a button which composes docB. All similar named fields are created correctly except the RT field.

I have found many entries on this forum about copying RT fields but none about what to do when trying to compose.

It does not seem to matter if the field in docB is Editable or Computed.

Thanks for your attention.

**********************************************************************************************************

FROM Stan Rogers..

You are probably working with unsaved rich text on "docA".

The text, etc., in the rich text editor is not transfered to the back-end document

as it's entered (as it is for summary-type fields such as text, date-time, and so forth).

Using Formula Language alone, you'd need to save docA before composing docB in order for

inheritance to work properly.

In LotusScript, you would be able to update the back-end RT field without saving first.

Thanks.

58.9. Gordon Bentley (04/03/2008 06:50 PM)

Is there a way to remove all hide-whens from a rtitem? I've been appending one rtitem to another as a history field. The first one has a hide-when which ends up in the History.

Thanks

58.10. Ben Langhinrichs (03/04/2008 07:10 PM)

You could probably do it with DXL, but I use our Midas Rich Text LSX. See Working on Hide-When sample for am example.

Have your say:

Name *:
E-mail:
e-mail addresses will not be displayed on this site
Notify me of other comments

Comment *:


<HTML is not allowed>
Linking: Add links as {{http://xxx|title}}, and they will be activated once approved
Blocked? Unable to post a comment? Please read this for a possible explanation...

Copyright © 1996-2008 Genii Software Ltd. All Rights Reserved. Some images courtesy of BigFoto.com  Nedstat