Genii Weblog

My favorite sort of QA - performance enhancing ideas

Tue 3 Jan 2006, 11:09 AM

by Ben Langhinrichs
Among the many posts on the Notes Gold forums, the ones I really enjoy are those where a question is asked about coding, and a series of answers hones in on an especially good way to solve the problem.  Today, for example, Patrick Mens asked this question:
I have a database with about 150,000 documents and a textfile with about 1000 lines. The text on each line is something like 123415=90712.

I have created an lotusscript agent which creates a collection of all 160,000 documents. Gets each document one by one and compares a value of 1 field of that document with all lines in the textfile. When there is a match the field is changed. This is repeated for each document

I have run this agent with a test version of the database and it takes about 2 hours and 30 minutes to run through all documents. 

Is there a way to make this agent go faster? I can imagine that doing some changes in the code will make the processing go much faster.
To start with, Patrick asked the question in a reasonable manner, which is always pleasant.  While he did not post the code until a later post, he explained the scenario well.

Then the fun began.  A series of suggestions were made, by myself and others, all of which were reasonable, and a few of which were eye opening.  In roughly chronological order, it was suggested that:
  1. The text file be read into a list or array to save re-processing the file (both Alan Bell and I suggested this)
  2. That GetNthDocument be avoided and GetNextDocument used instead (Vilhjalmur Helgason suggested this)
  3. That a view be walked instead of a collection, and that the ColumnValues(i) be used instead of opening the document (Vilhjalmur Helgason suggested this too)
  4. That the document not be saved unless a change was made (my idea after seeing Patrick's code, which did Save every time)
  5. That a subcollection be made using a view and GetDocumentsByKey, then have the StampAll method be used to process once for each 1000 entries. (John Buragas came up with this)

Each suggestion seemed sensible, and each successive idea will probably lead to a significant increase in speed. Patrick asked the original question at 9:10am, and Alan Bell had posted a bit of code showing how to accomplish John's suggestion by 10:41am.  I would venture that the StampAll method will be a huge performance winner, but even if it is not, the previous suggestions are all excellent.  This is an excellent of how a "self help" forum can work at its best.  Cheers to all involved, and I look forward to hearing back from Patrick on what the results were.  I certainly learned something.

Copyright 2006 Genii Software Ltd.

What has been said:

415.1. Andre Guirard
(01/04/2006 06:05 AM)

Another point: if processing documents one at a time (instead of using StampAll), use the Delete statement to remove them from memory when you're finished with each one; otherwise they accumulate and clog up your memory, which can cause slowness.

415.2. Ben Langhinrichs
(01/04/2006 08:51 AM)

Excellent point. I am thinking that I might be able to make a mini-article with the more generic case in mind (e.g., for n documents where n is very large, check value of field f against choices and replace with matching values) and address each of these more completely. When should you consider each approach, with a cascading set of possibilities. I'll let you know if I do.