Wednesday, March 24, 2010

Down the parsing rabbit hole

So I wanted to do parsers, you know, because I'd like to be able to parse SQL statements and stuff? So I got sucked into Chapter 8 of Higher Order Perl, of course, and once I really started getting into it and realizing how much better life would be if I did some stuff different in the basic parser, well, two weeks had passed.

Still not done.

I do have a lot of parser tools done, though.

But my basic approach to parsing nodal structure was naive. First, the idea that a tag will always mean the same throughout the application is naive - a page in a PDF will have to mean something different from a page in a Web site, and yet I still want to use the tag "page" for both meanings.

But secondly, I realized that I couldn't rely on runtime objects to determine parsing structures, and that rankled.

So I'm going to do things differently now, and in a much more flexible manner. I'm removing the dependency on Parse::RecursiveDescent, and I'm making Class::Declarative::Node a primary class instead of using XML::xmlapi (snif).

Each top-level tag will be parsed minimally into a line and a body, and the first word in the line will determine its semantics, as now. But those semantics will already be able to determine the parsing of all lines indented under that tag! In other words, if it wants to use a different parser, it can. If templates should be expressed, they will be. I'm halfway leaning towards everything always being a template, actually.

This scheme, though, allows me to vary the semantics of inner tags, so if I want to use a radically different syntax to express parser rules, for instance, I can, without a lot of twisting or tweaking.

More on this when it firms up. But there will be recursive-descent parser support built right into Class::Declarative from the get-go. If the strength of Lisp is that it has no syntax, let the strength of Declarative be that it has all syntaxes.

No comments:

Post a Comment