Monday, September 12, 2011

Dada Engine

OK, so back in the 90's, Andrew Bulhak came up with a snazzy little engine that interprets a grammar to generate random text. Well, we've all done that, of course, (here's a Tcl translation of one grammar) but his is implemented as an interpreter that takes the grammar as a specification language, and it can use troff to generate some pretty nice output. Like this, famously. And more recently, spam, apparently, which explains a lot. Here's a page that can run it on any grammar you like. (Incidentally, a Google search on "Dada Engine" turns up a paid ad for a code generator.)

So the engine itself is kind of boring, although he's put some really nice features into it. What really gets me going is thinking about that grammar specification language. I know this is Not a New Idea, but his sentence or phrase patterns are templates - syntactic units expressing semantic units, getting back to my Langacker days - that bear research. What he's missing is the semantic pole, although the context his grammar carries along is something in that direction.

What would be interesting would be something that could mine the Web for such patterns. And a statistical analysis of their use and interrelatedness to produce some kind of indication of voice/register for the text. That kind of thing. Not to mention a compact set of "typical error message" patterns, etc. for practical text generation in software. The point being that I think the pattern-based approach could work for both text analysis and text generation (not that this is a new idea).

This is back to my notion of the Lexicon, last looked at seriously in 2005. I really have to take a sabbatical from this damned having-to-earn-money thing.

No comments:

Post a Comment