Ben Langhinrichs

Photograph of Ben Langhinrichs
E-mail address - Ben Langhinrichs







Recent posts

Tue 7 Nov 2017

CoexLinks Migrate 4.10 released



Thu 2 Nov 2017

Time to Migrate



Thu 26 Oct 2017

Modernizing your Data


July, 2017
SMTWTFS
      01
02 03 04 05 06 07 08
09 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31

Search the weblog





























Genii Weblog


Civility in critiquing the ideas of others is no vice. Rudeness in defending your own ideas is no virtue.


Fri 21 Jul 2017, 03:56 PM
Most IBM Notes/Domino customers who have used the product for a number of years have vast stores of data, but when they want to try to glean new insights, they are stymied by how to handle the data mining. Simple fields which map well to views are easy to extract, and are often relatively "clean", meaning that the value is what the value says it should be. But real applications, especially those built for internal use, often reflect a far more complex set of relationships. They may use parent-child hierarchies, doclinks, lookups to other databases. They may also contain information stored in multi-value fields or rich text fields that require manipulation and cleanup. 
 
While there are a number of techniques available from DXL to data scraping, it can quickly become programming intensive to find information and put it together. With this in mind, we have built a fairly easy database using the Midas LSX engine to extract, correlate and prepare data from different sources and build a result which does not always have a one-to-one correspondence with Notes documents. The main virtue of this approach is the ease with which you can ask questions and put together sources. If you decide you have something wrong or need something else, it takes just a minute to remove or add it.
 
I wanted to show how this works with an existing application used over a period of years by fairly sophisticated Notes users. I chose as a source the IBM Business Partner forums, because they are  widely available and familiar. One of the different uses for these forums over several years was to allow partners to file Possible Bug reports, which IBMers could monitor and use to create SPRs and so forth. In this brief video, I pose five questions of this fairly simple application. Imagine how you could use a similar application to delve into your company's data. 
 
 
Note that I don't talk much in the video about data cleaning, but if you look at the image below, note that the column F (first red arrow) is derived automatically by Midas as a boolean from column G (second red arrow). We have some data cleaning built in as options, but are also looking at ways to provide custom data cleaning and normalization for individual items. While it is inevitable that some data cleaning will be done after the data is loaded into data analysis or data visualization software, the cleaner it can be the better, as 80% of all time doing data analytics is preparing and cleaning and normalization the data. We are eager to discuss with customers how we can minimize that costly effort.
 
 

Copyright © 2017 Genii Software Ltd.

Technorati tags: