Ben Langhinrichs

Photograph of Ben Langhinrichs
E-mail address - Ben Langhinrichs







Recent posts

Tue 7 Nov 2017

CoexLinks Migrate 4.10 released



Thu 2 Nov 2017

Time to Migrate



Thu 26 Oct 2017

Modernizing your Data


November, 2017
SMTWTFS
   01 02 03 04
05 06 07 08 09 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30

Search the weblog





























Genii Weblog

Asking questions to better understand your data

Fri 21 Jul 2017, 03:56 PM



by Ben Langhinrichs
Most IBM Notes/Domino customers who have used the product for a number of years have vast stores of data, but when they want to try to glean new insights, they are stymied by how to handle the data mining. Simple fields which map well to views are easy to extract, and are often relatively "clean", meaning that the value is what the value says it should be. But real applications, especially those built for internal use, often reflect a far more complex set of relationships. They may use parent-child hierarchies, doclinks, lookups to other databases. They may also contain information stored in multi-value fields or rich text fields that require manipulation and cleanup. 
 
While there are a number of techniques available from DXL to data scraping, it can quickly become programming intensive to find information and put it together. With this in mind, we have built a fairly easy database using the Midas LSX engine to extract, correlate and prepare data from different sources and build a result which does not always have a one-to-one correspondence with Notes documents. The main virtue of this approach is the ease with which you can ask questions and put together sources. If you decide you have something wrong or need something else, it takes just a minute to remove or add it.
 
I wanted to show how this works with an existing application used over a period of years by fairly sophisticated Notes users. I chose as a source the IBM Business Partner forums, because they are  widely available and familiar. One of the different uses for these forums over several years was to allow partners to file Possible Bug reports, which IBMers could monitor and use to create SPRs and so forth. In this brief video, I pose five questions of this fairly simple application. Imagine how you could use a similar application to delve into your company's data. 
 
 
Note that I don't talk much in the video about data cleaning, but if you look at the image below, note that the column F (first red arrow) is derived automatically by Midas as a boolean from column G (second red arrow). We have some data cleaning built in as options, but are also looking at ways to provide custom data cleaning and normalization for individual items. While it is inevitable that some data cleaning will be done after the data is loaded into data analysis or data visualization software, the cleaner it can be the better, as 80% of all time doing data analytics is preparing and cleaning and normalization the data. We are eager to discuss with customers how we can minimize that costly effort.
 
 

Copyright © 2017 Genii Software Ltd.

What has been said:

No documents found

Have your say:

Name *:
E-mail:
e-mail addresses will not be displayed on this site
Comment *:


<HTML is not allowed>
Linking: Add links as {{http://xxx|title}}, and they will be activated once approved
Blocked? Unable to post a comment? Please read this for a possible explanation...