Ben Langhinrichs

Photograph of Ben Langhinrichs

E-mail address - Ben Langhinrichs







Recent posts

Wed 18 Sep 2019

Perils of PDF 5: Data Confusion



Mon 16 Sep 2019

About that email in Notes



Mon 9 Sep 2019

Perils of PDF 4: Missing and obscured data


November, 2019
SMTWTFS
     01 02
03 04 05 06 07 08 09
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search the weblog





























Genii Weblog

How "lossy" is your data conversion?

Wed 28 Sep 2005, 02:49 PM



by Ben Langhinrichs
The term lossy is often used in image compression to describe a format that retains most of the original, but will lose something over the original.  For example, JPEG is a lossy image compression format, which saves space but risks some dimunition of fidelity.  The term lossy is also used for various other data compression format, including ZIP and RAR.

But what about lossy data conversion?  In particular, "rich text" formats such as MS RTF, HTML, XHTML, MIME, Word and Notes rich text all may represent formatted data in ways that look alike but are different under the covers.  What is worse is that these formats almost all have capabilities, both visual and action oriented, that do not match up.  Specifically, I spend a lot of time on Notes rich text to HTML/XHTML and HTML/XHTML to Notes rich text, especially with CoexEdit.The obvious goal is to minimize the loss in any conversion, how can you measure the loss?

The following three graphics might help explain what I mean.  The first is a Word document, which I then copied and pasted into the browser window for FCKEditor using CoexEdit.  I used a Firefox browser, but the result is similar with Internet Explorer.  After pasting into FCKEditor, I saved, which let CoexEdit do its bit of magic, then switched to my Notes client, where the final graphic is shown.

So, is this "good enough"?  I'm not sure.  There are a few slight differences, such as round bullets in MS Word becoming triangular bullets in FCKEditor and then back to round bullets in Notes, but that is probably just the presentation of the default bullet type.  There also seems to be a spacing issue after the Version 2.0, where an extra space has been added.  Are those good enough?  Only time (and our customers, who are always right) will tell.

MS Word (This is the original Word document, which I then copied)



FCKEditor using CoexEdit called from Firefox (I pasted the Word content in here)



Notes view of same document (CoexEdit handled auto-conversion from FCKEditor)

Copyright © 2005 Genii Software Ltd.

What has been said:


365.1. Nathan T. Freeman
(09/29/2005 01:16 AM)

Who used "lossy" when discussing ZIP or RAR files? They are lossless. MP3 is a lossy format that pretty much everyone knows now, though.


365.2. Ben Langhinrichs
(09/29/2005 10:25 PM)

Good point.


365.3. Stan Rogers
(10/03/2005 12:51 PM)

C'mon Nathan -- compressed files will fit on smaller disks and flash drives, which are quite a bit easier to lose than their larger brethren. Thus "lossy" compression. Don't you keep up with the literature?