Tuesday, July 27, 2010

Target domain: Web reading

OK, so Web slurping and robots (data acquisition from the Web) were always going to be one of my target domains, and I've got a need right now, so I guess WWW::Declarative is herewith on the table.

WWW::Declarative plus Wx::Declarative should make it possible to build a browser that can be automated in Perl. I think that would be a really handy little tool.

At least initially, WWW::Declarative will wrap LWP (book, intro) and HTTP::Cookies. At some point WWW::Robot might be an interesting thing to look at. Or WWW::Mechanize. (Or, of course, both.) It's always hard to judge, but from the volume of writing, I'd say Mechanize seems to be the leader in the field.

HTML::TreeBuilder will also be part of WWW::Declarative. I suspect that will just build a nodal structure (that seems to make the most sense) that we can then map to whatever. I'd feel more confident if all that had already been implemented; perhaps this is the domain where I'll implement it, yes?

No comments:

Post a Comment