« Remote Support System - Try Lotus Unyte Share | Main| Upgraded Traveler and Domino to 8.5 - How to get rich text on Windows Mobile? »

How I downloaded all Lotusphere 2009 presentations in 30 minutes

Tags: Lotusphere
0
Read on to see what tool I used when I downloaded all 179 presentations from Lotusphere 2009 online.
Last minute update: Just before I published this blog, I did a rerun on the technique and discovered that IBM has now altered the pdf.nsf database design. The views in the database are now hidden, and thus ordinary web spidering doesn't work. I decide to publish anyway so you know about the technique - so read on if you are curious! Also, If you still want to download the presentations, you now need to first collect the urls from the agenda database, and then grab the pdf files with exact urls. The exact programming to download the pdfs can be done in many ways, such as using Java agents or other external tools. I remember that there has been some script-based tools previously.

First of all, you need your Lotusphere 2009 online userid and password to access the online agenda. By examining the urls in the agenda solution you see that url looks like this;

https://www.ls09.info/confapps/pdf.nsf/0/F0EC6E9DB3D731B58525752700781D38/$FILE/AD204.pdf

In other words, the PDFs live in the database pdf.nsf. Below you see the url when I hover over the AD204.pdf link on a page

A picture named M2
(Click on the image to see a large one)

I turns out that the PDF database is also available via the http-protocol and not only under https. This is good news for the tool I use, The Teleport Pro from Tenmax. Teleport Pro is a general utility web spider, which costs only $49. No expensive tool for what it do. Below I will walk you through how I created a Teleport Pro project which grabbed all the 179 presentations in approx. 30 minutes

A picture named M3
(again, click the image to see a larger one)

In the screenshot above, you see Teleport Pro when it's first opened. Click File -> New Project Wizard to continue. This brings up the following dialog box;

A picture named M4

I select Search a website for files of a certain type.  Click on the Next button.

A picture named M5

I enter the url http://www.ls09.info/confapps/pdf.nsf as the starting point of my spider session. Note that I don't use https but http. You need the much more expensive Teleport Ultra if you really need to spider https! I also tell Teleport Pro that I want to allow the spider to go 10 levels deep. The default 3 levels is probably enough, but just to be sure ... Click Next button...

A picture named M6

Now it's time to specify what file types I want to retrieve. Click the Add button in the screen above to see the following drop down;

A picture named M7

Select User defined... to define your own;

A picture named M8

Note that I specify PDF files by using the pattern *.pdf. Click OK button...

A picture named M9

Note how the file type is added. Now it is important to add your Lotusphere 2009 online username and password in the Account and Password fields. Otherwise the spider won't be able to grab the files. Click Next button

A picture named M10

And now you are finished! Click Finish button.

A picture named M11

You will now select a name for your Teleport Pro project. Note that Teleport also will create a directory with the same name. The grabbed pdf files will be saved to that directory!! Click Save button.

A picture named M12

Now you see your project in Teleport Pro. You are ready to start grabbing files by pressing the Start button in the toolbar.

You will now spider the side from the starting url address and retrieve all pdf files in a snap!

As a closing note, you should understand now that this spidering technique can be used to grab files and other info from many other sites as well. Finally, remember to use this technique according to copyrights and so forth.

Post A Comment

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)