« How to run Lotus Domino on a Windows Home Server | Main| The backup actually works in WHS! »

Part 1: Icon and Images database -or- How I store more than 1.8 million icon files in a single Notes database

Tags: Lotus Notes Software DXL LotusScript
You have always heard that Lotus Notes isn't that  capable of storing millions of documents. While that is mostly true, this article will show you how I store more than 1.8 million files in a single database. Perhaps these ideas can be used by your applications too?

As a developer I from time to time use different imagery in my solutions, such as icons, backdrops and web-related images for navigators, mastheads and whatnot. One of the resources I use is the huge icon collection from VirtualLNK, whichs consists of more that 1.8 million files. The icon collection is shipped as several huge zip files, which again consists of several zip files.

Below you see some of the libraries;
A picture named M2

And below you see what one of the libraries, Network_V2.zip  looks like when I just open the zip file up with explorer;
A picture named M3

How do you find what you are looking for, say a 24x24 GIF of some image?? VirtualLNK provide a PDF with a dump of the different images, so you find the name of the icon. Note that you have to search through several PDF files to look for icons! When the desired icon has been found, you have to dive into the first level of the respective zip file. Remember, the zip file contains other zip files typically separated by file type, so you find GIF.zip, JPEG.zip and PNG.zip etc. The challenge unfolds when you dive into the sub-levels  of the cointained zip files. On my 4 GB RAM, super-duper fast laptop, this takes really long time, and makes it practically impossible to retrieve the desired icon. So what do I do then? Yes, I unpack  all the zip files, and unpack all the sub-zip files again, which eventually reveal all the 1.8 million files. Via standard file access and Windows Explorer, its now pretty easy and quick to retrieve any icon file of any size and type. I have even indexed parts of the directory structure with the excellent ThumbsPlus tool, providing powerful search features combined with snazzy thumbnail display.

A picture named M4

Why change this successful receipt?

The challenge with 1.8 million files is that they have to be stored somewhere. I use an external disk for the moment, which of course also contains other files. When I need to search for other  files than imagery, I experience that I have to be very sure to keep the imagery-sub directory structure out of reach  for my normal file search tools. To illustrate this, say we're searching for a Word document, and I just search the complete external disk. The file traversal of 1.8 million files will slow down the search for the Word document to an extent where I feel it takes too long. So to alleviate this, I have to be very aware about the whereabouts of the imagery directory structure to avoid too long search times for other files. I have also used Desktop Search  tools  like X1 to index both imagery and other files. Unfortunately I don't think that X1 look as nice as the ThumbsPlus display looking at icons;

So, as always, each tool has its pros and cons!

How on earth did I come up with the idea of using Lotus Notes?!?! The initial reason what the disk went bananas and all the files seemed to be a problem  for NTFS. The chkdsk-tool reported that I had approxemately 10 million files while I only had 1.7 million, and file access really slowed down. By the way; I have always suspected that Windows files systems has some issues (or challenges ....) with too many files. Note how fast and quick your freshly installed Windows computer is? Use it for some months, install a lot of software, and bring in a bunch of files, and eventually you will get the No, now it's time to reinstall Windows to regain the initial speed-feeling. How is that??!?! Why does a Windows computer always seem  to deteriorate over time?!?! I believe the answer must be related to the file system itself, and some inability to keep indexes, FATs and whatnot optimized all the time. Combine this with the challenges of defragmented files that the file system has to also keep track of, and you have a file system degrading over time as a result. Enough ranting over Windows file systems! Onto Notes!

Lotus Notes can store enormous amounts of data and each database can contain 64 GB, and you can have as many databases as you have diskspace for! However, Notes has always had challenges  with too many documents within a single database. This means that even if a single document can be of any size, Notes has more challenges when the number of documents  reach some hundreds of thousands or more. He he, it somehow seems like Lotus Notes may have similar challenges to Windows file systems, don't you think? Behind the scene the challenge with too many files has to do with how the database has been designed and how many views the database has. Then again how often the designer has set each view to be indexed. An database with many views and perhaps not-so-considered settings for the view indexing may therefore lead to intensive view-indexing from the Notes client or Domino server if the database is hosted on a server. So a Notes database storing 1.8 million icons doesn't initially seem like a good idea.

The Design of the Database

To store 1.8 million files each in a separate document within a Notes database doesn't seem like a good idea, so I chose another angle. By analyzing the image libraries from VirtualLNK, I found that a single image existed in a multitude of formats, sizes and states. For example could the Accountant  image existed in the states Regular, Hot  and Disabled , in all image formats such as GIF, BMP, TIF, JPG, PNG and ICO (icon files), and in sizes from 16x16 up to 256x256. In fact there was 78 different Accountant-images within the huge directory structure!

My design is therefore to create a single Notes document  for all 78 Accountant images, and then rather use application logic to retrieve the different image formats, states and sizes. Technically the Notes database we are about to dive into consists of two Notes documents per unique image, where the first document represent the image itself and is what the end-user actually see;

A picture named M5

In the image above you see the actual Notes document for the Accountant-image. You see all the states, like Regular, Disabled and Hot  (not shown in the image above, but its there!), and all the available sizes. In my view its much easier to get an idea what the image looks like this way.

If you click on the an image itself, the application logic will access the other  associated Notes document to retreive which image formats  that are available for the selected size. Below I have clicked on the 128x128 image and see the following dialog box;

A picture named M6

You can now select which image format you want and what you want to do with it, such as Extract as file or Copy to clipboard.

In other words have a stored all the 78 image files in another Notes document simply as ordinary attachments. The application logic knows which images belong to which size and so forth.

By using this approach, I reduce the 1.8 million unique image files to approxemately 18.000 unique Notes documents! 18.000 documents within a database is absolutely no problem for neither the Notes client or a Domino server, and thus the application really performs!

Finally I need to search the database, and have created an ordinary Notes full-text index, ensuring that I don't index attachments;

A picture named M7

The pros of using a Notes database

By having a single database  I have only one single NSF file on the file system. This shouldn't wreck the file system index and FAT in any way!

The search power of Notes is available! I can now find any image in a blink and just as quickly retrieve the exact image format, size and state!

The images are now stored within a programmable application, where you easily can extend the logic in any way you want.

The standard Notes security model can fully be utilized, securing the images to only the users allowed to work with the images. This way you can perhaps better comply to the license agreements of the image vendors compared to a file system.

Finally, you can enable the image database for web-usage, again under the same security model as for the Notes client.

The cons of using a Notes database

The single NSF file is huge. 1.8 million files is itself occupying approximately 19 GB of diskspace. The Notes database is approximately 20-21 GB, so it introduces some overhead. This is mainly because of the Notes documents containing the imagery consisting of imbedded images. In other words, the thumbnail view takes up some space! So does the view-indexes itself, but with very low numbers.

The mere size of the database may  also impose other challenges. If Lotus Notes for some reason has to perform a so-called consistency check  on the database, that will take a long time, perhaps hours! A consistency check is performed if Notes detect inconsistencies in the database in any way, and Notes determine that those inconsistencies has to be fixed. This may for example happen if the Notes client crash while you have the image database open.  

Programming the database

Basically the database is an ordinary Notes database, with some agents to process the image files and create the Notes documents displaying the images. Only LotusScript and @formulas has been used to program the database.

Pass 1 - Enumerating the image files

The first part of the database is to actually find the unique image files, and determine all the different variations of the image (remember, the Accountant image had 78 different image files). Since the directory structure of VirtualLNK spreads the variants of a single image out on many different sub-directories, my logic is to assemble  the variants as we go. For each unique image file, I create the first Notes document and populate the Files -field. When the first pass is finished, I have all the unique Notes document for each unique image. At this point neither the imbedded imagery in these documents, nor the attachment documents has been created.

Also note that the VirtualLNK libraries has different  directory structures, making the enumeration a bit more complicated. The current instance of the database therefore enumerate only a single library at the time, and that the user has to manually determine which of the directory-structure types we are currently enumerating. Below you see the dialog box presented to the end-user when the first pass is to be performed;

A picture named M8

By selecting what the current Base Import Folder  looks like, we can ensure that the subsequent enumeration can parse the different image formats, states and sizes. VirtualLNK actually has 7 different directory structures!! Not good in my opinion, but it only reveals that VirtualLNK probably has some different contracted artists/sub-vendors of imagery, that has different standards for their images.

Future versions of the database may better analyze the directory structure to automatically detect which directory-structure type we have in play.

Pass 2 - Creating the imbedded Notes imagery - and/or - attaching the files

Remember that the first pass only enumerate the files. When the user selects to process pass 2, he or she is presented with this dialog box;

A picture named M9

From the dialog box above, you can select whether you want to process selected documents in current view  -or- all documents in the database. This is nice if you just want to work with a subset of the documents.
Secondly you choose which process-passes you want to perform, where the first is Icon imagery (the tables with icons). This step will create the actual imbedded imagery in the Notes document;

A picture named M10

The second step will attach the files to the second, appurtenant, Notes document so the application logic actually works afterwards.

The next article will cover the programming details!

Post A Comment