Thursday, October 25, 2012

Web scraping

And yet I just can't get past thinking about Web scraping as something fun and profitable.

If it could just be simplified, a lot.  So I'm thinking again about declarative means of describing the "shape" of a site in terms of where the useful data is - and I'm coming up empty.  Again.

The only way to get my mind around it is to build some Web scrapers.  Elance is not going to be an interesting place to find challenging scraper specifications, so I'm going to have to look at the ones on ScraperWiki and go from there.

Oh, ho! The Mechanize Cookbook is replete with interesting examples.  I shall start there.

Update: Those seem boring and old.  Instead, I've subscribed to the ScraperWiki mailing list, which involves requests to the masses.  Here's a cool one already: find all the churches in Germany, with lots of links to start with.  So yeah.

No comments:

Post a Comment