Ben Langhinrichs

Photograph of Ben Langhinrichs

E-mail address - Ben Langhinrichs

Recent posts

Thu 26 Mar 2020

Nitty-gritty details: Salesforce data to Notes example

Wed 25 Mar 2020

From REST to Notes db in two seconds

Thu 19 Mar 2020

Mind the Gap - A mid-level development manifesto

April, 2020
   01 02 03 04
05 06 07 08 09 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30

Search the weblog

Genii Weblog

Perils of PDF 4: Missing and obscured data

Mon 9 Sep 2019, 10:21 AM

by Ben Langhinrichs
Last week, I started a series about certain constraints on exporting or archiving Notes data to PDF. There has been a lot of chatter recently about exporting to PDF, a feature that may be supported natively in HCL Notes V11, and that is offered as an archiving solution by some consultants and vendors. 
The PDF format itself is great for certain use cases, but has certain limitations by its very nature, and other limitations due to expectations. As I said in my first post, PDF is seen as being a little like an image. This is wrong on two counts. The first is that the data is not pixel perfect by any means. The second is that in to the extent that it shows what is visible, an image does a lousy job of revealing what is not visible. In this example, I actually show three different instances of where data is lost or obscured in different ways. The first part has the missing data. With caption tables, the caption (or title) is missing, thus losing context. With sections that have not been expanded, the title is there but the contents are missing. And of course, all the attachments and doclinks are non-functional in any case. The second example has obscured data. With a numbered list in a table (not tweaked in any way to try to get this result), part of the numbers is missing. Please note, there examples are only representative of other missing or obscured data with PDF rendering.
This is the fourth of eight primary issues. Depending on what vendor or driver you use, a few of these may have at least a partial solution, but they are good items to check when validating your approach. The table of contents of all issues will be at the bottom of this post.
4a) Missing data
Portion of a rich text field with a caption table and a section.
Inline JPEG image
PDF rendering. Note that the caption titles are missing, and the section title is all that remains.
Inline JPEG image
Rendered by the Midas LSX to HTML. The first image shows as it opens, while in the second I have clicked on the Q3 caption and section title to show them open.
Inline JPEG image
After clicking the Q3 caption and section title (both are clickable with our HTML rendering).
Inline JPEG image
4b) Obscured data
Portion of a rich text field with a numbered list inside a table (same wide table as before, though I don't show as much).
Inline JPEG image
PDF rendering. Note that the second page looks like it starts a new numbered list, but is really number 11. I also noted the left data cut off as shown in previous post.
Inline JPEG image
Rendered by the Midas LSX to HTML. 
Inline JPEG image
Table of Contents (will be updated as the blog series continues)
Perils of PDF 1: Attachments
Perils of PDF 2: Doclinks
Perils of PDF 3: Wide Tables and data loss
Perils of PDF 4: Missing and obscured data
Perils of PDF 5: Data Confusion
Want to try out our Midas LSX export for yourself? Simply fill out the online evaluation request, and we'll get you started. There's no cost to seeing it for yourself.

Copyright 2019 Genii Software Ltd.

What has been said:

1101.1. Stephan Wissel
(09/09/2019 10:28 AM)

One way to make archival to PDF "revision robust" is to use PDF's native capability to store custom XML properties. Nothing will stop you (short of double storage requirements) to store the DXL representation as Meta data. Eventually stripping attachments out.

1101.2. Ben Langhinrichs
(09/09/2019 05:01 PM)

Stephan - That could be done, but that only addresses the issues of getting back to where you started. People who want to archive and see the archive without Notes, whether because they are moving to web only or are a customer or whatever, won't be able to tell what the original data was even if it is stored as DXL. It needs to be be visible in a non-Notes context or it is fairly useless for most uses cases. - Ben