Ben Langhinrichs

November, 2014
SMTWTFS
      01
02 03 04 05 06 07 08
09 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30

Search the weblog





























Genii Weblog

Notes to Sharepoint or web - design vs data

Thu 31 May 2012, 09:40 AM



by Ben Langhinrichs
I've been doing custom work with a few customers who have needed to extract data from Notes databases, either to move to Sharepoint or to preserve a snapshot of Notes data outside of Notes for archival reasons and for discovery reasons. I am planning on making this a more formal offering, but the most variable part is the export of views. Sometimes, a view is just a collection, and the view is not valuable if the documents themselves are stored. In other cases, the view has columns with complex formulas that are not worth reproducing externally, so the view columns themselves are relevant. This is especially true in an archived snapshot, as it is harder to reproduce the formulas effectively in HTML than it is in Sharepoint.

Obviously, both must be possible, but the sad fact is that the people tasked with these projects often don't have either the institutional knowledge or the expertise to know how to choose. Therefore, I must make the best guess I can. Imagine a database with twenty views, ten of which contain all the documents in the database. I could easily have a snapshot which is ten times larger than the documents exported alone if I simply export them all.

Not really looking for a solution here, but I'd be happy to hear any thoughts. Also, if you might want to export some whole or partial databases for archival, snapshot or migration purposes, feel free to contact me to discuss your needs to see if we can help.

Copyright © 2012 Genii Software Ltd.

What has been said:


1040.1. Don McNally
(05/31/2012 03:27 PM)

Sounds like you'd also need a design synopsis for reference. Seems like forms could be an issue as well for default values or computed values (as in "why does that field have that value in it?").


1040.2. Ben Langhinrichs
(05/31/2012 04:04 PM)

The forms are a somewhat similar problem, but not quite as bad. I have to choose with each document which form to use (usually obvious, but more difficult when there are form formulas or XPages), but so far, I've never had to save each document more than once, and the views then link to the document. In a theoretical world, the situation would be more complex, but in the real world a snapshot usually requires one rendering and a data export simply requires every document with all its fields.

Anything more complex and we have to start working on matching need with technology, which is why this is an "offering" and not a "product". The goal is to minimize the consulting needed, but you can't eliminate it.


1040.3. Nathan T. Freeman
(05/31/2012 06:09 PM)

Since views can, for all purposes, only display information contained on the documents, why not simply export the documents as DXL and then export the view *designs* as DXL? The view design contains all the column formulas, which reference either 1) metadata functions visible in the document DXL; 2) @formula functions that manipulate field values or literals; 3) literals; or 4) field values, which you already have from the document export.

In fact, given that, you could recreate the index data at any time (short of time-dependent calculations) by simply importing the document contents and the view design into an NSF, and then opening the view.

public static void exportNsfArchive(Database db, StringWriter writer) throws Exception {

NoteCollection nc = db.createNoteCollection(false);

nc.selectAllDataNotes(true);

nc.setSelectViews(true);

nc.setSelectFolders(true);

DxlExporter exporter = db.getParent().createDxlExporter();

writer.write(exporter.exportDxl(nc));

writer.flush();

writer.close();

exporter.recycle();

nc.recycle();

}


1040.4. Ben Langhinrichs
(05/31/2012 06:56 PM)

The problem is, customers don't want DXL. If they did, they certainly wouldn't need my services/products. People want high fidelity HTML or CSV with embedded HTML or XHTML. If they just wanted a dump of the database, which is all DXL is, why not save the database itself?

As an example, for discovery purposes, various people will need to browse through the content but will not need to change it. They want a visual impression of the data in context, but without functionality, and they want it bundled together to store on DVD or external drive.

For export to Sharepoint, they want a CSV representation but without system fields or design elements that are non-essential. If they want to turn something into a document library in Sharepoint, they may want every attachment as a separate row with the rich text as HTML in its own row, all associated together but as separate rows in CSV for Sharepoint import. DXL doesn't really help with any of that. Sure, you could do it from DXL, but since you can do it from Notes directly, what's the point?


1040.5. Nathan T. Freeman
(05/31/2012 07:26 PM)

Sorry, I don't know anything about exporting the data to Sharepoint. I was simply referring to the general objective "...to preserve a snapshot of Notes data outside of Notes for archival reasons and for discovery reasons."


1040.6. Ben Langhinrichs
(05/31/2012 07:30 PM)

Nathan - That's fine, but I still don't understand even for getting a snapshot or discovery how DXL will help. It assumes you have Notes available to import back, so why not simply zip up a copy of the database.

There are plenty of good reasons to use DXL, but I don't really see how it helps these, but perhaps I was not very clear about the objectives. I appreciate your posting about DXL, as it is a good way of storing a selective portion of a database, for example.


1040.7. Nathan T. Freeman
(05/31/2012 07:33 PM)

You don't see how DXL would help discovery processes for Notes data?


1040.8. Ben Langhinrichs
(05/31/2012 07:51 PM)

Nathan - I can see how somebody could write a discovery process using DXL, certainly. It is just that a lot of discovery is done by third party vendors who don't "do Notes", and they are more easily going to be able to use a CSV version they can import into Excel or something else, or an HTML collection, simply because they already have tools for that.

I'm not trying to object to the use of DXL, which is very useful for many things, probably even discovery. It simply doesn't often address the needs of the clients I see (perhaps because they wouldn't be seeking me out at that point).


1040.9. Francisco Gallegos
(31/05/2012 20:45)

when you say: "... they want a CSV representation but without system fields or design elements that are non-essential ...", i would say: give'em an MS Excel export or a tabular text export, but when you say: "... they may want every attachment as a separate row with the rich text as HTML in its own row, all associated together but as separate rows in CSV for Sharepoint import ..." i must say that it's complicated


1040.10. Ben Langhinrichs
(05/31/2012 09:10 PM)

Francisco - When I first got that request, I thought it must be a very specific goal, but then a second customer needed it as well, so I guess it is more necessary than I thought. It has to do with how Sharepoint handles document libraries.

It is easy enough for me to do with Midas fortunately.


1040.11. Thomas Duff
(06/01/2012 04:28 AM)

Hi, Ben... if at some point you want additional "examples" of how a tool like that might be used, give me a call. As you know, we're migrating apps off of Notes onto SharePoint or other platforms, and we haven't even started to address the non-Notes archival of retired databases. As it sits right now, we either "obsolete" the database (replica to hub, non-admin access set to No Access) or "archive" it (replica to hub and non-admin access set to Reader). While we consider it a "success" to get an application into either of those two states, it doesn't mean we can unplug the last server. At some point, we'll have to address getting the archived databases into a format where the data is accessible in a human-readable form that makes contextual sense.

Oh, and of course, this won't cost anything, right? sigh...


1040.12. Tim Tripcony
(06/01/2012 09:18 AM)

I'm confused. How does a DXL export assume you'll be importing it back into Notes? DXL is just XML. I realize XML is a fringe format compared to, say, CSV, but rumor has it a few languages have the capacity to parse XML... perhaps Sharepoint already has that capacity, or will in a future release. If not, there's this new thing called XSLT that can supposedly take an XML file and convert it to any format you want... maybe that would be an option for producing your CSV.

(removes tongue from cheek)


1040.13. Ben Langhinrichs
(06/01/2012 01:04 PM)

Tom - I'll contact your separately. It may be less expensive than you think.

Tim - Software designers love to explore the possible, while business owners head out to purchase the available, eh?


1040.14. Dave Armstrong
(06/01/2012 10:38 PM)

We took a different approach for many of our apps.

We created an 'All Documents' view, and exported them all to Word docs on a LAN Share via LotusScript. We used data fields and/or categories to automatically place them into a folder structure that made sense to the business. And we detached any attachments into that same structure.

We could then simply upload that entire structure into a SharePoint library, if it was "live" content. Or we could just leave it on the LAN share for archival purposes.

On a side note -- although we did not ever actually do it, we discussed the option of adding some key fields from the Notes data as metadata fields within SharePoint. Then it would enhance the searching of the data, and allow you to minimize the folder structure within your document libraries.