Sunday, August 15, 2010

WWW::Mechanize and HTML::TreeBuilder

So I just wrote a little 40-line script to hit my router's HTTP interface and find out the list of DHCP clients currently connected. (This is preliminary to checking each for an rsync server running, and doing a backup on those that do.) To write this script, I used WWW::Mechanize, HTML::TreeBuilder, and maybe an hour and a half of divided attention.

Here are a couple of thoughts that occur.

First, looking at HTML results, then getting HTML::Element's find functions to find the things you're actually looking for, is a painstaking process that ends up with pretty brittle results. The parsley selector language would probably be a far, far better way to look for things in the HTML tree, and has the benefit that it could probably be declarativized pretty easily.

Second, the lack of Javascript capability makes this a particularly error-prone process; my router's forms all use Javascript for preprocessing of form input (which is stupid, but prevalent). I'm not sure that providing an entire Javascript parser is a viable option, though. I guess that would depend on the state of Inline::Javascript, assuming there is one. (If there isn't, well, maybe there should be...)

Third, this script naturally broke down into the Mechanize part and the HTML part. That is, once the document is obtained is when it gets parsed - ... this seems obvious, and maybe I'm still too jetlagged to notice that I've lost my point somewhere.

Anyway, it would probably be most instructive to take some actual scripts like this one, translate them into likely-looking declarative structures, and then implement those.

No comments:

Post a Comment