Genii Weblog

Odd finding with NotesDOMParser run repeated times

Tue 12 Jun 2007, 01:14 PM



by Ben Langhinrichs
In a follow up to my earlier post about OpenSesame and large word processing documents, I created an extra large document (in honor of the OOXML specs, it is 6404 pages long) by taking the ODF specs and copying about 100 pages and then pasting it over and over and over until I was over 6000.  The resulting OpenDocBIG.odt is 4,439,906 bytes long, and the unzipped content.xml is 42,789,688 bytes long.

So, I tried loading the OpenDocBIG.odt with OpenSesame (again, it has to unzip, save the various xml files to disk, then load, parse and delete them) and it took 12 seconds.  I tried the NotesDOMParser again, it took 105 seconds.  It still seems odd that the production parser in Notes takes almost ten times as long, but neither choked on the large file, and neither seemed to lose any memory.

But here is where it gets odd.  I ran both again, without restarting Notes, just to see if I would get similar results.  OpenSesame still took 12 seconds, but the NotesDOMParser now took 215 seconds, which is a huge jump.  I ran them both again, still without restarting Notes or doing anything else, and this time OpenSesame took 11 seconds and NotesDOMParser took 215 seconds again.  Still no indication that memory was running out or anything was going on.  I shut down Notes, restarted the PC, started Notes and did it all again, with exactly the same result.  See the image below.  What in the world would lead to that kind of slowdown, especially in a reproducible way?

Inline JPEG image

Copyright 2007 Genii Software Ltd.

What has been said:


600.1. Alan Bell
(06/13/2007 07:25 AM)

sounds like an allocation of something (maybe memory maybe something else) which is done dynamically as the thing is parsed. The first time it runs it is being allocated fresh empty stuff, second time it runs there is more checking to see if the second-hand resource is available to use. I would suggest some good further tests would be to use NotesDomParser on a very small file (like 1 page) for the first run then the big file for the next run. Also try a 3000 page file for the first run with the 6000 page on the second run. Maybe if there was 3000 pages worth of second hand resources to deallocate and re-use it would add 55 seconds to the run time rather than 110.