Saturday, December 1, 2012

Decision trees

So I've been doing various little bite-sized programming tasks (at HackerRank and Rosalind, both of which I heartily recommend) and thinking hard about what I'm doing with them, along with getting back into the literate programming game to do this.

One of the things that came up this week was decision trees.  Here's the thing - complex if-then-else structures come up pretty frequently in even the simplest of tasks, like serialization of data structures into domain-specific formats (game boards, for example).  And that kind of code, to me, is a write-only thing - given that I'm doing these things in little bits here and there and otherwise dealing with paying (non-translation) work and family, my rule is that everything I do has to be instantaneously comprehensible in bite-sized chunks.

So I realized that any if-then-else structure deeper than a single level should really by rights be expressed as a decision tree.  A decision tree is a (to me) more declarative, domain-specific way of expressing some varieties of rules.  Naturally, I thus came up with a simple expression language for decision trees and planned my first literate-programming macro - "decision".  This macro reads the decision tree description and outputs target code expressing it.  I am writing my own preprocessor, one that will work on both C and Perl - as well as any other language you choose.  We'll see whether it pays off, without necessarily taking me quite so far as a full Prolog-like unification engine.  (Although come to think of it, it could be extended to express just such rules...  That would be interesting.)

And so I did another absolutely fascinating random walk through programming land.  Here's what I found:

  • SmartDraw is a dandy target application - it allows you to draw diagrams and charts that are data-accessible, which I find absolutely fascinating.  It includes decision trees as one of its chart types.
  • One can, of course, learn decision trees, and there is plenty of literature available about that.  Note on the latter: references Milk, a machine-learning toolkit in Python.
  • Python Enterprise Application Kit does decision trees.  That looks like a dandy place to mine for ideas.  As does IBM's corresponding thing, which of course is not open source, but probably way more ramified.
  • Finally, Methods and Tools magazine (cool!) has a teeny little article about decision support systems from 2004.  That one reminds me of decision tables, which are essentially just an alternative way of expressing decisions that are somewhat less restricted than trees.  But good stuff!  Again, one of the benefits of decision tables is that they're human-comprehensible and, if well-designed, can easily be used to allow very detailed configuration by non-programmers.
  • As though to illustrate that very point, here is a timely and political Greek bond default decision tree that has been used to illustrate a complex situation in a rather tidy way.
  • And here's a nice article on the use of decision tree learning in gauging party affiliation, which is political but slightly less timely, given it's December.
So that's our random walk for the day.

No comments:

Post a Comment