30 June 2005

Good reading...

My interests these days are around searching, indexing, and processing large data sets...datamining on steroids to some extent. Think terabytes not gigabytes, think hundreds of thousands of files instead of thousands, and that's getting to the scale that interests me. Parallel programming, grid processing, and all those new (old) technologies are a big part of making that sort of processing scalable and affordable, especially for small companies. Since I've been a research/learn/academic state of mind this summer, here's part of my current reading list:

Hariri, Parashar et. al., "Tools and Environments for Parallel and Distributed Computing" (Wiley, 2004)
Chakrabarti, Soumen, "Mining the web: Discovering Knowledge from Hypertext Data" (Morgan Kaufmann, 2003)
Morse, H. Stephen, "Practical Parallel Computing" (Academic Press, 1994)

The "Tool and Environments" book is a solid overview/review. The Chakrabarti book is fascinating...I can nit a bit about it being dated (alta vista was the BIG search engine when the book was written), but the presentation of algorithms, and the depth of detail is impressive...a real "page turner" in a geeky sort of way. I bought "Practical Parallel Computing" back in 1994 when I was prepping for an interview with Thinking Machines (remember them?). The material is a bit dated, but if you view "Tools and Environments" and "Practical" as a combined skim-through/review it's worth the time.

I really have enjoyed reading Chakrabarti thus far...not yet sure what to do with all the ideas coming out of the reading, but I've got some interesting thoughts.

No comments: