Ooh. The "Readability" Javascript tool munges a page to put its "content" - that poorly defined part of the HTML that represents the parts the humans actually read - into a separate area for actual, well, reading, minus all the ads and links and sidebars and so on.
That algorithm has been ported into Perl as HTML::ExtractMain. So going into WWW::Declarative.
No comments:
Post a Comment