Monday, November 12, 2012

Trees

I've been taking a lot of my intermediate bloviating offline lately, because I wrote myself a little notes system to organize all the many threads of my life (I'll write about that at some point, but I want to let it mature a little).  So forgive me for not having mentioned that, having finished the minimum viable module for Data::Table::Lazy (an iterator-based in-memory table/matrix module), I have started work on Tree::Walker, which will walk directories and other hierarchical structures and output lazy tables, which in turn will be amenable to processing by an action framework as yet undesigned.

Note that what I've been doing in this thread lately is taking a lot of the components I wanted in Decl, and breaking them out into proper CPAN modules.  Decl will be a little syntactic sugar on top by the time I'm done with that effort - which is exactly right.

So.

Yeah, so I started Tree::Walker.  And rather than just implement the first tree kind of thing that sprang to mind, I decided to examine what others have written about trees.  Which just leads to infinite regress, of course, so here's a short list, in rough order of discovery.

  • Tcl has TreeQL, the Tree Query Language.  I honestly don't find it very convincing.
  • Here's a much nicer approach, TQL, standing for the same thing.  TQL has a from and a select clause, with a match pattern in the from that can bind variables, and it does a kind of unification thing. Once nodes have been identified, the select clause determines what to return.  I really like this approach.
  • ANTLR does tree grammars, but here's an article I found deploring the practice.
  • TDL is a Tree Description Language.
  • Our old friend TXL also returns to the field [earlier].  If you recall, TXL combines BNF to create a tree with a pattern matcher and action specifier to transform the tree into whatever we like.  It's a nice concept that I thought a lot about during Decl 1.0.
  • But right now I was wanting to work with walkers, not transducers.  So yeah, there aren't actually all that many options for a traversal algorithm: here's basically the whole list.  It consists of pre, post, and in.  The "in" option really only makes sense with binary trees.
So that's kind of the stuff I read to get my head back into the tree game.  As a result, I have a little clarity:
  • A walker has to specify prefix, postfix, or after-n-fix sorting, or I suppose provide its own traversal function to specify "what to visit next" for a given location (this can be generalized to include a queue, perhaps)
  • For each node visited, we end up with a type, a key (name, etc.), and arbitrary data
  • For each valid type, we can specify a traversal action, a data return action, and an action action.
  • Matching is built on top of traversal.  To match, we are given a multipart key (or a set of them) that matches a vertical series of nodes, and we track the key on the way down the traversal.  If it matches fully, the match's action fires.  These actions can again be actions or data retrieval actions. The driver has to be able to take an arbitrary single-node match pattern and say yea or nay for a given node.
  • The whole thing has to go into an iterator.

No comments:

Post a Comment