Saturday, November 13, 2010

Thoughts on rich text

There are lots of ways to encode formatting and semantic information concisely in text. (The "concisely" is what I'm going to address here.) They include things like Markdown, ReStructured Text, Text::Multi (a more generic Markdown-like framework), Markdent (another interesting parsing framework), and of course markup like HTML/SGML/XML, RTF, and TeX.

The end purpose of all these text formatting languages is to provide a way to type regular text and have it formatted or typeset. I've been using something like Markdown in my pseudocoding so far, but I need to think things through in a principled manner, and essentially, what I've come up with is this.

First, the target. The target is formatted text, but what does that mean? Clearly, it means something with a series of nodes specifying format in a generic manner:
container
paragraph
text
This is an
italic
text
example
text
text.
paragraph
text
It consists of two paragraphs, one of which contains italics.
Obviously, I'd rather type this:
text
This is an {\i example} text.

It consists of two paragraphs.
Now note that in this example I've used an RTF-like curly-bracket-and-backslash style, and I've used a convention that a blank line represents a paragraph break, the latter being more or less universal in Wiki and Markdown settings these days.

So what I want to do for the text node - and this will end up being used throughout Class::Declarative, mind you - is to provide a basic framework for rich text that will allow the user to specify parsers to turn any textual formatting language into nodal structure, then to include a few simple parsers (e.g. Markdown and RTF-like) that can be selected in some way.

No comments:

Post a Comment