Wednesday, December 29, 2010

Win32::Daemon

Since Windows services provide some of the functionality that Unix just uses cron for, the service manager should be part of Schedule::Universal. Win32::Daemon really seems to do everything you want, here.

Mission starting to be accomplished

The whole point of the Class::Declarative effort, to me, is to enable me to write quick scripts in my fields of interest that don't slow me down enough with whatever I'm doing. In other words, when working with Word documents, the idea is to offload all the knowledge, all the lookup work inherent in dealing with OLE (and there's a lot) so I can actually conceive of the need for a script, start it, test it, and use it, within, say, an hour or so. If I go beyond that time, I might as well not have a scripting language - I need to finish the task, not spend a day crafting a way to automate it.

Last night, Win32::Word::Declarative managed to meet that need to a certain extent. Just the ability to remember that I can simply type
use Win32::Word::Declarative;

document (active)
  do {
    ...
  }
is enough to get me over some humps.

Not only that, the declarative framework gives me the start of a way to store knowledge accumulated. My task last night was to iterate over all the stories in a document - not the first time this has come up, of course. And this task is needlessly complicated, but now, for the first time, I have a way to package it up into a function that can easily be called. Now, I can simply say "foreach my $range in (^story_ranges())" and I'm done. It's still not as easy as it could be, but I no longer have to worry about that wacky double loop.

I'm sure I'm not the first programmer ever to be in the position of realizing that his accumulated codebase made things easier for him - it's not even the first time for me (the XMLAPI sure did make working with strings and structures in C much easier) - but it's immensely satisfying knowing that this work is starting to pay off in real terms.

Sunday, December 26, 2010

Unison: file synchronization, possible interesting target application

Unison. I intend to start using it.

Update: I have started using it, and it rocks! A guide.

Boomerang: a synchronization language

A bidirectional textual language. It's the same thing as my concept of a mapping, which it turns out (at least some) other people call a "lens".

Very rough roadmap

Having just committed the basic database tests, I thought now might be a good time to look up and get my bearings on the immediate future of Class::Declarative.

There are several additions to basic functionality that will go into the next release: these are the "system" tag (configuration), the "application" tag (also configuration), the shell and command line features, and state machines.

A word on the application tag first. Essentially, the definition of an application defines both the basic structure of a given set of programs as well as establishing a context for actions. So the application will also be where we define actions - and actions will inevitably result in the definition of workflow, because workflow is really all about the definition of actions (and who does them, planning for them, and so on). A full workflow engine won't be in the next release, but I think it's important to realize that it's probably going to be in the core functionality, and probably pretty soon.

An application can (1) be a refinement or a modification of another application, and (2) can define any structure at all for the scripts that belong to it. Whatever is defined in the application import structure will end up as macroinserted structure in the running script. This can even go so far as there not being a running script at all - in this case, the file run will be a data file, and the application will have to define how it's handled (most likely as a file tag - still to be written, obviously).

(An aside: it might be useful to go ahead and include filesystem semantics in the next release, but I'm not wedded to that.)

An application will probably be declared something like "application [name]", possibly with some other decoration. Class::Declarative will first try to treat the name as a class [name].pm, then maybe something like App::Declarative::[name].pm, and then whether or not that fails, will look to [name].conf in the current directory, and parent directories up the tree.

The application could also be given on the command line, of course. And it occurs to me that an "include" mechanism would be essentially the same thing, so we'll toss that in as well.

All this gives us multiple ways to define configuration all in one blow. The key is that all the structure found in this way goes into the same bucket as macroinserted tags. (It might be useful to keep track of where structure comes from in this way - and eventually even precompile structure while checking for changes to files each time.)

The state machine mechanism is really only destined for this next release because it's already started (kind of). It's cleaner. But definitely, applications, configuration, command-line invocation, a shell, and probably filesystem access all need to be in this next release, getting us closer to being a mature language.

All that is probably enough to keep me out of trouble for a month or two.

Update 11/19/11: Jeez, I didn't end up doing any of that. Except the shell, kind of.

Friday, December 24, 2010

Thursday, December 23, 2010

Norvig: How to Write a Spelling Corrector

Now here's a thought-provoking essay. It would be interesting to analyze the thought processes at a higher level of detail, but it's also useful just for thinking about how to go about doing NLP.

Function descriptor

I had a thought: if I set up a "knowledge base" as an equivalent of a library, so that each function would have some kind of high-level semantic description of what it does, you'd expect to be able to reason about that to assemble a program from a description of what you want done.

It's still a pretty squishy idea, but ... it sounds right. You could even attach some kind of templated version of how to carry out the computation in question, and then write plain Perl or something on that basis.

Milestone of a sort: database access

Just wanted to mark the date that I've got rudimentary database access going and build-time database access to modify built content. Result: I can define a Word document template/script that includes a table defined by a query result.

Kind of neat, and it just makes me think of other things I can include.

Latest idea: generalize the "text" tag as a kind of output. When it fires, it asks its parent how to output its contents, and that propagates up the tree, allowing e.g. a document to define how text gets output, but permitting general output for non-Word areas. Add an "output" tag for alternative output handling, and you've got a pretty neat way to handle text.

Then you give the generalized text tag the ability to do Wiki formatting (for which it would also have to ask its parent how to represent different formatting choices) and you've got a really powerful mechanism. I'm pretty sure it would be most of a templating facility, right there. I need to implement it and see what it can do.

Sunday, December 19, 2010

Invoicing again

Here's a little stream-of-consciousness post for ya.

Now that W32::W::Decl is working to create Word files, I need an overall process in which to embed it. What will that look like?

First, it's going to be driven by my invoking something. This "something" is effectively saying, "We are going to run invoices now." And that will involve hitting my prepared ready-to-invoice database in Access, retrieving a query from it, and taking action based on that.

Since everything in my translation business is customer-dependent (each agency has a different workflow for invoicing and, really, for everything else), my first query is simply going to be "which agencies have outstanding finished jobs that should be invoiced?" Then for each of those agencies, I'm going to switch to the agency context (represented in this case by that agency's working directory) and perform "invoicing" in that context.

OK. So here is already an interesting point, because it's workflow and thus I've thought about it for a decade already without satisfactory resolution. Running a script is equivalent to taking an action - a workflow action. But every action happens in a context.

An action and an event are necessarily separate, although in a sense they're the same kind of thing. An event takes place internally to the running program, whereas an action is a long-term, irreversible, action taken in the context of a larger system which includes that running program. An action is workflow.

Thus workflow is going to have to be a part of Class::Declarative, will it or nill it, because running a script is taking action is doing workflow, and in the larger sense we're going to suffer if we don't manage workflow properly. This has to be explicit - well, unless we leave it as a default or something.

So the context. For the time being, I'm going to consider a context to be purely a question of directories. I just can't get my head around much more abstraction than that, although ultimately it's going to end up being necessary, I'm sure.

The context modifies the "system" structure. System structures can be chained, too, allowing us a context hierarchy.

Oh - before I get too much farther here, I need to mention another sort of orthogonal dimension of contextuality, and that's the application. Again, C::Decl needs to take this into consideration right from the start: an application sets up a system context for a set of scripts, too, but stores this configuration in a central location instead of a location relative to the script being run.

The application can be set in one of two ways: first, I can just say "application " in my script. Second, I can specify it on the command line: perl --application= . This allows me to set up extension mappings under windows: .proj might be "perl -MWx::Declarative --application=project ".

The application can monkey with all kinds of things, like importing additional modules and determining the controlling domain. Some of this will be macro-inserted into the current script to be sure it all works out correctly, and that's essentially what the "application" does.

But - and this is key - so does the context. It's just that without an application, we don't actually know what the context is supposed to be named (unless we explicitly name it ourselves).

Oh, which brings me to the third way to invoke a script, which is to associate its extension with a script stored elsewhere...

Well. You can definitely tell this is stream of consciousness. You're lucky to get coherent sentences, even.

Recap:
  • Invoke a script in situ
  • Invoke a script as a "second language" by giving it -M and other args.
  • Invoke a script as an "application language" by giving it -M and --application=
  • Invoke a non-script file by assigning it to a script.
I've done the first two successfully, defining ".dpl" on my machine as a declarative Perl script. I have only now really formally acknowledged the last two. I'm not really thinking in terms of Unix yet, am I?

Onwards, to other work. Let this simmer.

Scalable Web crawling

Posted without further comment because I haven't taken the time.

Friday, December 17, 2010

Scheduling

One of the real-life things that comes up all the time, and there's no convenient way to work with it, especially cross-platform. So, a twofold proposal:

Part the first: Schedule::Universal - a proposed cross-platform Perl API for both cron and the Windows Task Scheduler, along with at (on both platforms) and a way to set up a scheduling process of your own. In other words, one-stop shopping for scheduling.

Part the second: Schedule::Declarative - the declarative front for S::U; put a schedule into your system definition and you're already done scheduling your tasks.

Thursday, December 16, 2010

"TDD problems" - statements of interesting small programming assignments

Here's an interesting page: a list of interesting programming problems for people who want to learn test-driven design. I'm not going to flat-out say it would be good to have an automated program that could program from those statements, because that would be crazy. But it's certainly instructive to contemplate what such a nonexistent system might look like, and chew off bits around the edges.

Wednesday, December 15, 2010

Requirements engineering

So I'm translating course material on business process management for the Hochschule Esslingen at the moment, and the topic has turned to something called "requirements management", something I'd never really considered.

But a little googling turned up OpenOME, the Organization Modelling Environment, a project at the University of Toronto that is a part of the larger "agent-oriented approach to software engineering" project.

Standards-based Java programming is generally not terribly to my taste, but I really like the notion of modeling anything and everything about software to be developed. The more model is explicit, the more we've replicated a human's understanding of the context of the code, and the closer we are to semantic programming.

So. Deserves further study. I'm intrigued by the repeated mention of a knowledge base. I really want to know what that's about.

Later update: The PowerPoint presentations are really interesting! And they've come up with a kind of declarative DSL to describe models. This is an encouraging field of inquiry.

Monday, December 13, 2010

State machines redux (link dump)

I started digging around looking for some good examples of state machines to test with, and found a bunch of stuff. (And I could swear that I'd already written this post, but ... apparently not.)

State machines are used for simple sentence parsing. Turns out they're not powerful enough to do the job in all cases (which I knew, but coming at a topic from another angle always allows me to be surprised again and again). However, they've been used pretty successfully for extraction of noun phrases and names from newsfeeds, which is kind of interesting.

Here's a kind of fun approach (though Java-based) to FSMs, using the example of a kind of treasure-hunt game. A state machine lends itself well to Zork-like games.

Here's a tutorial from a robotics approach, although not one I find all that convincing. State machines are, however, very commonly used for robotics controllers, for the obvious reasons. There is lots of material about compiling state machines onto microcontrollers. So my original reason for looking about state machines, WWW::Mechanize, makes a lot of sense. A state machine is a natural way of describing the actions of an agent.

Here's a Ruby/Rails state machine plugin, with good examples.

There's a guy in St. Petersburg who has coined the term "Automata-based programming".

An interesting application of a state machine in building a tree from a serialized protocol.

Charming Python has a chapter on state machines, particularly focused on text processing (e.g. generating HTML from Wiki, which is a good example).

And that's pretty much my link dump. Not a lot of coherence.

Update 11/17/11: I did a little more thinking on this topic.

Thursday, December 9, 2010

Embedding declarative code into regular Perl

It's always when trying to code something that I have the best ideas. So far I've been considering embedding Perl into declarative structures as kind of a one-way street, but it would be useful to close the loop. The point where I realized this was in building a bit of code that could benefit from a state machine.

Well, my current thinking had been that the state machine would be declared out in the tree somewhere, and the Perl code would just call it. That's kind of rigid, though. Wouldn't it be nice to get something like this?

my @tables = $^content->find('table');
my @rows = $tables[1]->find('tr');

declare state-machine process (start=off)
  off {
    => on if $^next->as_HTML =~ something;
  }
  on {
    something
    => off;
  }

$^process->iterate(@rows);

Isn't that neat? And put it together with a "perl mode" that effectively treats the entire file as a do { ... }, and you have a true declarative framework that would be useful without necessarily being in control.

Tuesday, December 7, 2010

State machines

Yeah, so I'm going to add explicit state machines support to the language. It looks like this:

state-machine
  start {
    # Figure stuff out.
    => a1
    => b1
  }
  a1 {
    # Other stuff.
    => return
    ...

You can parameterize a state machine:

state-machine (input)
  ...

Put like this, a state machine is itself an action (e.g. a "do" that runs on ->go()). You can also simply declare a state machine and instantiate it on input elsewhere, in which case the instance will be a ....

[we interrupt this post to point out that FSA::Rules really does.]

... function that you can call repeatedly. And thanks to FSA::Rules, all the hard work is done; I can just wrap it. The only question remaining being whether state machines belong in the core or not. Actually, I think not. Which means I have to figure out how to chain semantic domains (but I had to figure that out eventually anyway).

A thought on order of execution

So I've been thinking about the logic way to handle what, of the top-level items in a program, should actually get run. In my thinking, there are three different types of top-level item. First is just "do" and its friends: pure actions. They always run when at the top level. Second is purely declarative stuff. This never runs anywhere, because it doesn't have a go function.

The third class is both fish and fowl. It will run if necessary, but would rather not - these are items that are really more declarative than anything but that still have a default action, such as documents, URLs, GUI definitions, and so on.

The overall regime that makes sense to me is: 1. Run any preliminary code. 2. Skip any ambiguous code. 3. If there is code after the ambiguous items, don't run those items. 4. If the last item in a program is ambiguous, run it.

This gives us the option of setting things up for an ambiguous item that then controls the program (e.g. prints its document or activates its GUI or whatever). But if there is any action below an ambiguous item, it will be considered a declaration, not an action. I think this really captures what makes most sense to me.

Actually, there are places this should hold even within other items. For example, I'm putting a state machine in a URL. Should the state machine run as the code for that URL or act as a function definition for use later? Good question....

Monday, December 6, 2010

Next up: WWW::Declarative again

I feel the need to start writing scrapers for real. With WWW::Mechanize, HTML::Treebuilder, and Data::Match, I've got most of the heavy lifting ready to go. So that's where I'm looking.

Win32::Word::Declarative published

And thus the first publication of a Class::Declarative-based module. And oh, how many things could still be done on it...

Saturday, December 4, 2010

Target domain: Automated game design!!!

This is just crack for me: game design as a domain for automated discovery. With links! Like to the game ontology! Ah! That.

Because it's not about game design, it's about thinking about software - in software. It's about a pattern language for game design. It is, in short, about semantic programming.

Note: LUDOCORE paper.

Markov chains for test input

Hmm.

Friday, December 3, 2010

Framework fatigue

Heh. From Chris Harden's Jeviathon: Framework fatigue: How many frameworks do I need to know?, we have the following interesting closing statement:

A talented developer has an interpreter and compiler in his head and thinks in pseudo-code anyway. Applying that to a language is just a matter of figuring out the syntax...and that is the easy part.

Hear, hear. You might as well write in pseudocode.... Site::Declarative should be the metaframework to end all frameworks.

Parameterized templates

How about this?

define formatting my-snippet "text"
   parameters (bold)
   nodes
      text "$text"
      text (italic) "$text"
      text "$text"

This would define a parameterized snippet that types its input text three times, in bold, with the middle also italicized. You'd invoke it like this:

document
   para "Some initial text"
   para
      my-snippet "Repeated text here"

By default, parameters would be passed through to the defined object.

You could define-and-implement in place with this:

document
   var text "Here is my variable"
   para "Some initial text"
   <= formatting (bold)
      nodes
         text "$text"

Here, we don't need a separate "parameters" tag because we're just going to use the variables in our event context at runtime.

The reason there's a "nodes" tag in this is that I might want to include other tags as well:

define formatting my-snippet "text"
   parameters (bold)
   do {
      # Set some things up
   }
   nodes
      text ...

We'd want a whole range of tags to denote the parts of a full node: parameters, options, label, parser, code, body, and nodes.

Note that instantiation of a named macro happens at build time, while instantiation of an anonymous macro happens at runtime. To run a named macro at runtime, we'd want to do:

express my-snippet "text here"

I should implement this stuff now, then see whether it handles everything I want. Add some control structures and it could be just about as powerful as you could want.

Thursday, December 2, 2010

A couple of decent Perl links

Serious Perl - has some excellent insight on funky module magic and some OK advice on Perl code maintainability.

Perl - OLE - Word - a collection of Word invocation tricks.

Further necessaries for Win32::Word::Declarative

So I did the standalone use-WWD thing and I'm polishing up an initial tutorial for the module prior to releasing 0.01 onto CPAN, and there are two things, at this point, that make the module less than perfectly usable in its current form. (This ignores the fact that it covers about 2.7% of Word's functionality; that's just incremental stuff that can be filled in at my leisure.)

First, it's hard to use it from plain Perl. I have a good plan for this: instead of indented strings, accept an arrayref format based more or less on the output from the indentation parser. This allows us to generate nodal structure really easily without worrying about having to format it with indented text.

But the worse thing is that the C::Decl framework still basically supports mere declaration. That is, I can't really specify a mutable data structure that is based on instance data. And that's a severe limitation - which is mildly surprising. A Wx user interface doesn't really need a lot of runtime mutability, but a Word document is an output format. Its natural functionality is to present runtime data, making the lack more glaring.

So really, a high priority for C::Decl has to be mutable structure. Macros. Templates. Whatever the mechanism or mechanisms are called, the use case is this: a script that (1) gathers some information somewhere, then (2) generates a Word document based on that information. We can't really do that right now without dropping into Perl or using Perl to generate the Word-generation script and then running it separately, and that just isn't where I want to be.

Also, a small tweak: right now, C::Decl only gives runtime love to the topmost semantic domain. Instead, I'm pretty sure that it should just scan its children in order and attempt to execute each one. There could be a [norun] option to suppress this explicitly, if necessary. And of course in the case of Wx, execution just won't return after you hit the base frame/dialog - but we don't really care much about that.

But this is necessary in order to gather information in the use case above - and in general, it will be a normal state of affairs to set things up, then instantiate something. The alternative is to allow the macro itself to run code at build time, and that's OK, but in terms of presentation it will often be clearer to do things in two phases that are visually distinct.

Wednesday, December 1, 2010

Open-source data mining software

Sigh. Here. So much to do, so little time.

Tuesday, November 30, 2010

World moving faster than me

As per usual: there are some awesome data extraction tools coming online, making my WWW::Declarative nearly obsolete before it's really even started.

Winning at coin-tossing games

A neat Mathematica presentation on Wolfram's blog.

Sunday, November 28, 2010

A more useful way to include individual declarative classes

It would be nice to be able to say "use Win32::Word::Declarative;" in a conventional Perl program and have it go ahead and set up the Class::Declarative environment. Makes testing setups easier, too.

Hmm...

Getting Real: book on webapp construction

Free book! Probably worth perusing.

Win32::Word::Declarative

So for some time I've needed a way to slap together scripts for Word that don't rely on Word's own scripting. I used to do this stuff in Python, but now I've gotten Win32::Word::Declarative to the point where it can generate an attractive document as, say, an invoice.

This is an important milestone, but it doesn't get me all the way to an actual invoicing system. I need to have a few more components first:
  • Something like Document::Declarative to manage the actual files, repositories, and templates - I go back and forth as to whether that should be inherent in the language or split out into a document-management semantics
  • Mapping to apply templates and create abstract documents and expressed document
  • Database retrieval to obtain customer information and so on, not to mention to determine what goes into invoicing in the first place
  • The hierarchical configuration system (this would apply to templates)
  • Probably something like a general semantics system; this is kind of intuitive, but I get the feeling that this is how you'll be able to say "I want an invoice and this is what an invoice is".
So. Getting there, but not yet there. A whole lot closer than last week, though. I think there may be enough working in Win32::Word::Declarative that I can put v0.01 on CPAN.

More HNN quant bloviation

Lots of keyword-rich text there. No time.

Saturday, November 27, 2010

Target application: RTLKlub spider

My wife wants to watch Hungarian TV clips, and RTL has some stuff online - but it takes 5.5 minutes to get a 2-minute clip out through the Hungarian pipe. Obviously, I need to cache things, and so: a spider. A task!

I know how to cache movies given Flash players, but some of that stuff is in Silverlight, so I don't know whether I can do that or not. But at least I could get the Flash movies.

Getopt::Lucid and Term::Shell

Perl brass tacks: I'm basing my command-line handling on Getopt::Lucid. And in the end, I think I'm going to just end up writing my own Term::Shell replacement that pretty much works like that. The events in an event context can be seen as commands, so just invoking some "shell" conversation against an arbitrary event context will be pretty cool. But Term::Shell is just a little too hardwired to make that entirely useful. So we'll see. Either way, both modules have been open tabs on my browser for a couple of weeks now and it's time to clean up and leave a little nonvolatile state.

Update after looking more closely at the Term::Shell source - it would be folly not to subclass Term::Shell. There's a lot of really cool stuff in there. I just have to have an automagical way of setting up the commands - which is easy - and I'm good to go. This will be very cool!

From looking at the code, it looks as though Term::Shell can run under Tk somehow, but Google isn't being very forthcoming about that. It would be nice to be able to use Term::Shell in a Wx context. Very nice, actually.

More GA and machine learning

Link-dumping continues. First, "Using GA to find Starcraft 2 build orders". Second, a useful overview of machine learning techniques I just haven't had the time to finish.

Gamification

An interesting new term for the patterns of game play ported into superficially non-game venues. There's a Wiki for that. Also, an entertaining article on Cracked about intentional addictivity in games. My evil-future scenario: use addictivity plus a Mechanical-Turk interface to analyze email for the NSA. The seed of an interesting Stross story.

Friday, November 26, 2010

Sweet-expressions

A readable syntax for Lisp, with HNN commentary. I just eat this stuff up. A Plisp would probably have to include this.

Target domain: 3D modeling

Cool library of 3D modeling functionality, OpenSceneGraph. Problem: it has a Python binding but no Perl binding. Now, clearly we ought to be able to build Python just as easily as Perl in a declarative structure. Would this be the context for doing that?

Expressive Programming Systems

Steps towards Expressive Programming Systems, 2010 report. Chock-full of interesting ideas that deserve further exploration on some day when I don't have 42,000 words hanging over my head.

Monday, November 22, 2010

Design without designers

An interesting post on what design means in a world where it's being nibbled away by A-vs-B data approaches and genetic algorithms.

More scraping

Ooh. The "Readability" Javascript tool munges a page to put its "content" - that poorly defined part of the HTML that represents the parts the humans actually read - into a separate area for actual, well, reading, minus all the ads and links and sidebars and so on.

That algorithm has been ported into Perl as HTML::ExtractMain. So going into WWW::Declarative.

In other news on this front, O'Reilly has a sale on a bunch of relevant books. Sigh.

NLP (again)

I just keep finding cool things.

Python's NLTK. HNN post on "Natural Language Processing for the Working Programmer", a book-in-the-offing based on Haskell. I'm contemplating porting something like this into Perl.

Thursday, November 18, 2010

Page scraping: Enlive and CSS selectors

So here I am again, thinking about page scraping (earlier) - that part of a Web robot that comes between retrieval of a given page and some data representation that is what we're really after.

Unsurprisingly, there are a great number of solutions to this problem, some better, some worse. The one I want to talk about right now is Enlive, which is based on Clojure and looks pretty darned interesting. I want to do that in dperl. There's an Enlive tutorial that's quite well-written.

Now Enlive uses something very CSS-selector-like to extract interesting information from HTML trees. This is the part that struck my fancy. So here is a CSS selector tutorial as well.

Very promising direction.

Natural-language programming

Dammit, Wolfram is at it again, stealing my ideas. "Natural-language programming is closer than you think" - and he's got the key insight (no, not hexapodia this time): natural-language programming is a dialog. You'll still end up with code, but it will be code that is semantically wrapped in a set of concepts you've defined interactively with the machine. Lack of clarity will elicit questions.

Those last two sentences are my own plan. I didn't read all of Wolfram's article because I find it hard to read about people developing things I really want to do first.

Target domain: computer forensics

Computer forensics is basically the analysis of data files, intrusion logs, etc. As such, it's related to the kind of AI I want to do, so it's game - and there are open-source tools to use, too. Moo ha ha.

I'm particularly struck by the nascent definition of standard operating procedures (none actually yet defined). As you know, Bob, SOPs are workflow. Workflow is ... another target domain, actually. So stick that in your pipe and smoke it.

Monday, November 15, 2010

Target application: text generation

Oh, this is just an idea I can't kick - text generation. You can get paid for it. (That's an essay writing company; I Googled after reading this account of a freelance essay writer.) It's the Eschaton, immanentizing before your very eyes.

I keep thinking of the tic-tac-toe analyzer that was the subject of a thin book I read - and presented at an AI class given by Doug Hofstadter - lo! these many years ago. No notes survive that I can find, but the quality of text generated as justifications for the program's strategies was truly amazing.

I want to do that.

Sunday, November 14, 2010

Pattern matching

I still haven't quite got my head around pattern matching, but I think I'm getting closer. The Wikipedia article on pattern matching largely addresses Haskell and Mathematica (both of which provide pattern matching as part of the language). There's a Sub::PatternMatching in Perl, which ... almost does what I'm looking for, and there's Data::Match, which I remember finding earlier this year. And of course XSLT is based on pattern matching as well.

Essentially, a pattern is a structure with holes. These holes may be named, and we can also make assertions about the holes, like hole A and hole B have to have the same content, but that's the basic upshot. When applied to one or more targets, the pattern is an iterator; it can return more than one match.

Chained together, these matches are AI's "unification", a powerful technique that can find multiple solutions to a given question posed in terms of predicate assertions over a universe of data. Pattern matching is unification, quite literally (although the converse is not true; unification includes things that aren't pattern matching).

Oh, wait. I said that the pattern matcher is an iterator - that's true, it can be used in a "data mode", but we can also associate actions with patterns to move the pattern matcher into an "action mode", and I think in the case of Class::Declarative, this mode is going to be easier to conceptualize. This is the mode used in functional languages such as Haskell (I'm basing this statement on the introduction to Sub::PatternMatching), and now that I think about it, XSLT as well.

When applying a series of patterns to a given data structure in this model, we can think of the series as a kind of "case" selector. Each match runs a bit of code, and the named bindings in the match are passed to the code as its call parameters. We could also disassociate the matches and the code by defining events that would fire, invoking code defined elsewhere (making more general coding easier).

All that remains is to start thinking of some use cases and coding some likely-looking expressions of patterns. WWW::Mechanize and HTML::TreeBuilder are such obvious candidates here; we're essentially doing what parsley does in this case.

Comparison of Python Web frameworks

"I am so starving" - a series of implementations of the same service in a dozen different Python Web frameworks. Interesting!

Saturday, November 13, 2010

Thoughts on rich text

There are lots of ways to encode formatting and semantic information concisely in text. (The "concisely" is what I'm going to address here.) They include things like Markdown, ReStructured Text, Text::Multi (a more generic Markdown-like framework), Markdent (another interesting parsing framework), and of course markup like HTML/SGML/XML, RTF, and TeX.

The end purpose of all these text formatting languages is to provide a way to type regular text and have it formatted or typeset. I've been using something like Markdown in my pseudocoding so far, but I need to think things through in a principled manner, and essentially, what I've come up with is this.

First, the target. The target is formatted text, but what does that mean? Clearly, it means something with a series of nodes specifying format in a generic manner:
container
paragraph
text
This is an
italic
text
example
text
text.
paragraph
text
It consists of two paragraphs, one of which contains italics.
Obviously, I'd rather type this:
text
This is an {\i example} text.

It consists of two paragraphs.
Now note that in this example I've used an RTF-like curly-bracket-and-backslash style, and I've used a convention that a blank line represents a paragraph break, the latter being more or less universal in Wiki and Markdown settings these days.

So what I want to do for the text node - and this will end up being used throughout Class::Declarative, mind you - is to provide a basic framework for rich text that will allow the user to specify parsers to turn any textual formatting language into nodal structure, then to include a few simple parsers (e.g. Markdown and RTF-like) that can be selected in some way.

Target application: poker

I swear, this blog is starting to be less about techniques and more about me rediscovering how many fun things there are to program in the world.

This time, as so very often, the trigger was an HNN post about poker bots, leading me to investigate the poker sites still legal in the US and eventually find the software download page for one of them, Full Tilt.

Thing is, the poker sites don't want bots for one simple reason: if people think they're losing because machines are cheating them, the poker site loses users and therefore money. However, to be perfectly honest, a poker bot would be a good way to make a little money every day while learning a whole lot about AI techniques.

I'm sorely tempted. Really and truly, especially given my heritage as the grandson of a professional poker player.

Update: Full Tilt has fallen on hard times lately; as of November 16, 2011, the link goes to a "US-only" page discussing disbursement of remaining funds held by US residents. But hey - in half a year I'll be living in Europe! So poker can wait until then; Europe loves poker.

Kaggle.com: hosting data mining/machine learning challenges

Now this is cool. Kaggle.com hosts challenges for data mining. That is, every few weeks they start a new challenge for statistical machine learning algorithms, with cash and prizes. I believe it's time for me to start learning.

Saturday, November 6, 2010

NLP link dump

So there's an NLP challenge: find "semantically related terms" over a large vocabulary (> 1 million words) given a large corpus, in a reasonable amount of computer time and with little or no human input. And from its links as to how the corpus was prepared, I ran across splitta (a sentence boundary finder) that will be very useful in the Xlat project, a Penn treebank tokenizer sed script, and the Topia term extractor.

All of these will be essential parts of the eventual front end to OpenLogos, for example. I urgently need a way to find significant terms in the source text in order to facilitate early glossary work (i.e. ideally do the glossary work before starting the translation). This is pretty critical.

I might actually tackle the NLP challenge as well, but honestly - all this stuff is mind-meltingly fascinating to me. I need to get serious.

Wednesday, October 27, 2010

The only thing that could improve this would be an automated form that lets you insert different people's names into it

Via HNN, like so many other posts:

I really enjoyed that essay by Paul Graham. Paul Graham is an excellent writer and a very nice fellow. But when he said that thing that made me look bad, I just had to draw the line. For years, I’ve been doing something and telling people I’m doing it and then all of a sudden Paul Graham comes along and tells me it’s a bad idea. I think it’s time to question his assumptions.

Yeah. So why is putting up a form so hard? Answer: it really isn't, except you always have to look things up again. That's the background logistics that a smooth system would relieve you (read: me) of.

Freelancing with Ruby on Rails

An interesting post on tips and techniques.

Link dump: journalism and data sets (mostly)

From lecture #6 in the P2PU course, we have followthemoney.org (that sounds very familiar), the NICAR data library, the Sunlight Foundation, and DocumentCloud's open-source foundations.

Tuesday, October 26, 2010

Startup::Declarative

Ha. Just a thought: a declarative language describing the structure of a startup company. It would hook into workflow, maybe a site description language ... ??? 3. Profit.

String matching algorithms

Here. For once, a link from Metafilter, not HNN.

Saturday, October 9, 2010

Further thoughts on Color::Declarative

The basic object in Color::Declarative is, of course, the color:
color green
or
color (SVG) blue
There is also a dictionary object to create custom dictionaries:
color-dictionary my-palette
If we're not using a custom dictionary, then we use Color::Library for named dictionaries, and the default there is the SVG dictionary as a kind of general catchall.

We can also define a whole new custom color from scratch like this:
color my-green #00ff02
But we're not normally going to do this. Instead, normally we'll define a color by its function:
color button-color (SVG) green
Then we can use it elsewhere. And we can do that for a whole set of colors like this:
color-dictionary my-palette
button (SVG) green
titlebar (SVG) blue
Then we can define a button something like this:
button fahrenheit (x=130, y=50, color=button) "Fahrenheit"
We could also imagine a palette defined something like:
color-dictionary
primary (SVG) { $retrieved_value }
secondary (SVG) primary.complement
I don't like that poorly considered syntax, but the point is that we should be able to build sophisticated palettes based on functional relationships.

Wednesday, October 6, 2010

Text analysis - identification of sources in news articles

So I'm taking this online class about journalism, and one of the exercises is to identify the sources in a news article. By hand, of course, this is easy. Wouldn't it be nice to automate it (even partially)?

Of course, nothing is easy when natural language is concerned. I see two parts to this, clearly. First is taking a page and extracting the news item. Frankly, I don't see any better way to do this than simply to have a bunch of definitions for different news services that could identify the CSS classes used by each of them to mark their payload text. And this is exactly the kind of task that a pattern-matching language would be dandy for.

Which leaves us with the text, and its analysis. Which is hard. I can think of a couple of ways to get some sources out of a given text: "'...' said x" is one obvious pattern. Without language-savvy tools, it would be a series of hacks, but maybe worth the effort. (With language-savvy tools, a lot of this stuff starts to look more amenable to solution, though, doesn't it?)

Mapping

A quick thought on mapping. A "document" object can map onto a PDF via a PDF builder, and a PDF object can map back onto a document object by means of some sort of thing that recognizes tables and things with less-than-perfect accuracy. Point being that we've got different tools in the two directions, and that one direction might be lossy.

Forgive me if that was obvious. I told you these were more like notes for myself than a regular blog.

Tuesday, September 21, 2010

Octobot

Here's an interesting building block: a task queue processor. Runs anything in Java or other JVM languages. There's definitely a system-level design language waiting to be defined that would put together this sort of component to make an overall system.

Wednesday, September 15, 2010

Babbage's Debugger

A universal graphical language devised by Charles Babbage to enhance understanding of complex clockwork.

Friday, September 3, 2010

OpenLogos

Over on my translation tools hat, I've been looking at OpenLogos, a very old commercial machine translation program that has recently gone open-source.

Here's the thought that struck me earlier this week: the intermediate parse structure contains something very like a semantic structure. Might it not be possible to "translate" English into a Class::Declarative nodal structure? At least in limited domains?

I'm just going to leave that thought on the table here, and back away slowly.

Wednesday, September 1, 2010

The mother of all pattern sites

I never really noticed c2.com - the birthplace of the freaking Wiki. Sometimes you just have to wonder if I've ever been paying attention at all.

Friday, August 27, 2010

Screen patterns

Web UI design using patterns. Interesting. A pattern language would be another good target, you see. I'm trying to scratch out a plan.

Tuesday, August 24, 2010

Lisp in C

A micro-manual for Lisp in C

Sunday, August 22, 2010

How linkers work

Sometimes it seems like nothing gets posted here but stuff that came from HNN. How linkers work.

GraphLab machine learning framework

GraphLab. A new parallel framework for machine learning.

AppEngine makefile

AppMake is a Makefile framework for AppEngine.

Sunday, August 15, 2010

WWW::Mechanize and HTML::TreeBuilder

So I just wrote a little 40-line script to hit my router's HTTP interface and find out the list of DHCP clients currently connected. (This is preliminary to checking each for an rsync server running, and doing a backup on those that do.) To write this script, I used WWW::Mechanize, HTML::TreeBuilder, and maybe an hour and a half of divided attention.

Here are a couple of thoughts that occur.

First, looking at HTML results, then getting HTML::Element's find functions to find the things you're actually looking for, is a painstaking process that ends up with pretty brittle results. The parsley selector language would probably be a far, far better way to look for things in the HTML tree, and has the benefit that it could probably be declarativized pretty easily.

Second, the lack of Javascript capability makes this a particularly error-prone process; my router's forms all use Javascript for preprocessing of form input (which is stupid, but prevalent). I'm not sure that providing an entire Javascript parser is a viable option, though. I guess that would depend on the state of Inline::Javascript, assuming there is one. (If there isn't, well, maybe there should be...)

Third, this script naturally broke down into the Mechanize part and the HTML part. That is, once the document is obtained is when it gets parsed - ... this seems obvious, and maybe I'm still too jetlagged to notice that I've lost my point somewhere.

Anyway, it would probably be most instructive to take some actual scripts like this one, translate them into likely-looking declarative structures, and then implement those.

Sunday, August 8, 2010

Color::Declarative

Since colors and their names come up with astonishing regularity in graphics (PDF, SWF, Wx, etc.) I've decided I need a common set of semantics. Exactly what that means is still a little underspecified, but it's important.

Color::Library - support for all kinds of existing color libraries/naming schemes

Graphics::ColorNames - general support for color naming

Other things to do with colors, maybe: Color::Mix, tools for Color::Schemes, and ... well, a CPAN search on color would probably find others.

The semantics of color are pretty neat, although a lot of it is symbols without referents for me. It might be interesting to make this class the first that really starts to be a semantic class.


Microsoft Singularity (ahem): self-describing systems

Interesting article at ACM; while I'm not convinced type safety needs to be enforced at the operating system level (!) I'm interested by self-description. That sounds enticingly like semantics, there...

Data scraping

ScraperWiki (HNN post about it, that is). As usual, there are lots of very interesting related links in the thread that I should follow up on, particularly the selector semantics of parsley.

Friday, August 6, 2010

Target domain: Hofstadterian microworlds

Reading through your old tag posts can make you think of things you meant not to forget.

Lest I forget this, though: one of the motivating principles for this pseudocode stuff is that I wanted a reasonable way to express the Lexicon for Magnifi*. So, literally, while I was mentioning them metaphorically back in November, Hofstadterian microworlds really are a target domain. More precisely, I suppose, the Lexicon for expressing them is a target domain. Getting that to mean actual fluid concepts... that will be a big project.

Data journalism at the Grauniad

Interview.

Thoughts for the evening

So I tried "adapting" some of the examples for Flash into a pseudocode form, and as usual, everything I do points out the holes in my plan. Then I go walk the dog, and possible solutions occur. Today's harvest:

The sokoban example from the gazbming is rather oddly written; he defines some functions to create tiles and sprites, then creates 200 wall tiles, a bunch of package tiles, and so on. Then to build a map, he moves the tiles in Actionscript (and the maps are encoded in the Actionscript).

So fine. I can imagine writing a macro that expresses the map encodings as Actionscript, but really those 200 wall tiles are a map. I don't want to create them with Perl (though I could), because normally I'm not running Perl at define time, only building code snippets and running them after start. Maybe I could rethink that, but in this case, it seems more elegant to express the 200 wall tiles as a map from a sequence (from=1, to=200) through a template into a series of macro-inserted structures.

Now, this begs the question as to what, really, a map is. I've been going on the presumption that it's a nodal structure thing - but in this case, conceptually, I'm mapping from a sequence onto a series of nodes. I can imagine the same thing with a set of row data. Now, in the case of a sequence (which is immutable) this is a one-way trip. If I map to data, it's ... maybe not a one-way trip, but I probably have to define explicitly how the data should be updated if the nodal structure is changed, and certainly something like this is probably one-way.

But the point is, semantically, this is a map. Just because it's not my two-way nodal structure doesn't mean I shouldn't still call it a map.

Oy. You want to do something simple, define a new programming paradigm and a language to go with it, and revolutionize the world, and then it's just one thing and another.

Thursday, August 5, 2010

UML

I always figured UML would be interesting to look at in more detail, and once I even bought a book. (UML has been around so long I still had the habit of buying a book to learn about it, back then.) But ultimately, every time I look hard, it seems to be Much Ado about Nothing - or at least a little too much ado about Java class definitions.

But here comes an interesting blog post Wiki page from a couple years back talking about some kind of wacky pseudocode UML description language that could compile into pictures or something - look at it! Significant indentation! Great minds, apparently.

The graphics tools he's talking about there are collectively UmlGraph, which I ran across Googling on the word "declarative". You find the most interesting things trawling on some keywords.

Anyway. I already mentioned UML back in the first days of this blog as one of those systems that is "kind of like, but not" what I want to do. Its semantics, though, are intriguing. UML as a set of semantics for sketching solutions is a good intuition.

Wednesday, August 4, 2010

A thought on code organization

Well, it's a pretty thin thought. Mostly this: These Actionscript examples I'm looking at tonight instead of doing the paying work are a freaking mess. They build stuff, and move stuff, and do all sorts of things in messy imperative ways. Now, obviously I can't write Declarative for Actionscript - but I could write a sort of front-end that would compile to Actionscript.

But if I do that, then, then, well then I've written a front-end that could compile to anything. A sort of Code::Declarative, if you will. And I'm not at all sure I truly understand what that would look like, although to a certain extent it's obviously what this entire venture is fumbling towards.

So maybe I should spend a little thought on just what it means to compile to a text file. I mean, structured text is not such an outré concept.

It's really a macro system again. I wish I could implement just one version of a macro system so I could see how many ways it fails.

haXe

Oooh, language design. The new haXe language compiles to SWF, Javascript, C++, NekoVM, or PHP. Now that there is cool. (And the successor to mtasc, actually.)

Of course, it manages to make everything thus look like Java, but ... there's potential there.

RobotWar. The battlefield of the future.

Ah, the future past.
Welcome to the battlefield of the future! It is the year 2002. Wars still rage, but finally, they have been officially declared hazardous to human health. Now, the only warriors are robots - built in secret and programmed to fight each other to the death!
The RobotWar game was written for the Apple II, it's so old. The idea is that you can program little robot tanks, then run them in tournaments. The programming language is extremely restricted, so honestly, it should be easy to write. Right?

So wouldn't it be cool to write a Flash front-end to display the simulated world, and a Perl back-end to run it, and a Web service to organize the tournaments? Wouldn't that be neat? Sure it would!

Note: an Actionscript example of XMLSocket, which is how Flash can talk to the server. Another.

SWF::Declarative

Yet another interesting target for a declarative approach: Flash by way of Ming by way of CPAN's SWF module by way of a declarative wrapper. There is a truly excellent set of tutorials (using the PHP wrapper, but who cares?) here. Note that there are two entire games there as tutorials. Perl tutorials here. A Flash tutorial blog.

Flash is cool, especially now that the open-source community is getting to the point that it can deal with it. I'm particularly intrigued by the notion of being able to generate SWF on the fly like this, and being able to take a user interface that's been designed in one GUI paradigm like Wx, and translating it into a Flash environment. There are basic design principles that transfer - those are arguably part of the basic semantics of any GUI, and should be treated as such.

Man. So much to look at, so little time.

Later note: mtasc is an Actionscript compiler for library code.

Tuesday, August 3, 2010

Layout in wxHaskell

First: wxHaskell. Wow. And still their wx configuration is less than declarative if you ask me. But they've got one pretty neat library that I don't think I've seen elsewhere: Layout. Since layout is currently one of the things bothering me, I think I'll need to spend some quality time with this documentation.

I'm currently leaning towards a really graphically-oriented layout style, like this:
field field1
button one
button two
button three
button four
layout
field1 one
two
three- four
See what I'm doing? Argh, maybe you can't if Blogger swallows my indentation again, but I'm actually lining the names of fields up to indicate they should be aligned. Dashes can be used to extend the real estate of the names if the alignment should be on both left and right, and vertical bars would do the same thing vertically.

This is kind of like the XPM graphics format, that uses characters to represent pixels of different colors. It does require everything on the dialog/frame to have unique names, which is a pretty innocuous requirement, really.

Sunday, August 1, 2010

Stanford NLP

Huh. The Stanford NLP Parser, a Java library, is also available online via a Javascript API. Interesting. Very interesting...

NUPOS, a new part-of-speech system. A couple of useful blogs. None of this is available in Perl, sadly.

Wednesday, July 28, 2010

Flash remoting

In Perl. Interesting platform element.

Wx and Flash

If I'm going to write a browser in Perl, I'll want to run Flash. Unfortunately, Windows is the only supported platform because it runs ActiveX, but ... at least it would work- Wx::ActiveX::Flash.

Grist for the mill

So of course you know what I'd like to do is to analyze streams of jobs appearing on the various freelancer sites, and be able to post with a finished product within minutes of the job being posted. I know, I know, but this is the sort of pipe dream that drives genius of my level. So when I run across sources for said job streams, I feel the need to bookmark them. And now, blog them, especially when I need to close my browser and dump links.

Twitter! Specifically, "I need software that".

Scriptlance "Data mining" tag. You can search by content as well. Whether an RSS feed is available I don't yet know, but a topic-specific set of aggregators would be the thing to start with, eh?

Freelancer.com "scraping" search term.

Anyway, it would be cool at least to start on the aggregator and NLP end of this, just to see how far I'd get before running into the really-not-worth-it wall.

Tuesday, July 27, 2010

Target domain: Web interaction

Maybe I shouldn't just be thinking in terms of Web reading, but rather (in the 2.0 spirit) Web interaction. So, on that note, two things:

1. Hookbox is a system (in JS) that treats Websites as channels, and intermediates. Definitely needs analysis.

2. A valuable addition to WWW::Declarative would be the "web-agent". The web-agent is what uses the browser and target sites in order to carry out ... stuff. And that should generalize to an agent in general, and subclass for WWW-specific domain knowledge and actions.

The agent object could provide things like flow charts, action-reaction tables, timing and scheduling, and that sort of thing. You could build a chatbot around a generic agent, for instance. I think it's a sufficiently general concept that it's justified.

Update 7/30/2011 - dammit, the link there is toast now. Probably need Wayback or something to figure it out now. Why do people do that? Thirty seconds on Google gives me an interesting overview that's probably better for my purposes. My knee-jerk interpretation: the guy was hired and Hookbox disappeared into proprietaritude. It looks like it sparked a flurry of interest in late 2010, though.

Target domain: Web reading

OK, so Web slurping and robots (data acquisition from the Web) were always going to be one of my target domains, and I've got a need right now, so I guess WWW::Declarative is herewith on the table.

WWW::Declarative plus Wx::Declarative should make it possible to build a browser that can be automated in Perl. I think that would be a really handy little tool.

At least initially, WWW::Declarative will wrap LWP (book, intro) and HTTP::Cookies. At some point WWW::Robot might be an interesting thing to look at. Or WWW::Mechanize. (Or, of course, both.) It's always hard to judge, but from the volume of writing, I'd say Mechanize seems to be the leader in the field.

HTML::TreeBuilder will also be part of WWW::Declarative. I suspect that will just build a nodal structure (that seems to make the most sense) that we can then map to whatever. I'd feel more confident if all that had already been implemented; perhaps this is the domain where I'll implement it, yes?

Monday, July 26, 2010

Clay: generic compiled language

For efficiency without the loss of higher-level semantics.

Sunday, July 25, 2010

Levels of abstraction as maps

So back in May I had a post on levels of abstraction in code, followed this month by some thoughts on finding mappings - these are actually the same thought.

In my invoice example, I imagine a very high-level abstract specification of a specific invoice, then a medium-level specification of the invoice as a generic document, then perhaps a lower-level specification of that document as a PDF. Followed, I guess, by the actual PDF. The point is that each of those specifications represents the same thing.

When I talk about levels of abstraction in a program, this is the same thing again - a high-level summary (or specification) of the parts of the program, and a lower-level specification of the actual program.

I just need to figure out how to express this understandably, and I'm good to go.

Perl: notes on threads

Threads in Perl have decent support; Wx in Perl just uses the Perl variety without using much of the wxWidgets machinery, as far as I can tell. (Caveat: I don't know squat about threads except for theory; I took the right classes in grad school, but this is another thing I've never felt the need to deal with in real life. Yet.)

First, threads in Wx. Note that this references the modules threads and threads::shared. Finally, the perlthrtut is a recommended read.

My thinking is leaning towards providing threads only in the context of event handlers. Some event handlers could be marked as explicitly threaded, and those would spawn a new thread when fired. Wx methods would then be used for synchronization under Wx; in the general case, we'd use thread->join.

Target domain: Ruby on Rails

I've tacitly been thinking about Web apps in terms of PHP and perhaps jQuery, but really, to be honest, I'd be a fool not to look at different platforms, now wouldn't I?

So here's a very interesting roadmap for learning Ruby on Rails. Here's what I like about this: it takes the semantic subdomains needed to plan (or understand) a Web app, and lays them out. A similar thing could be done for other approaches and platforms - and that result would be the semantic map for the domain.

That is what semantic programming is about.

Target domain: Project management

Here's another domain I've spent a lot of time thinking about in the past. And here are some open-source links to various GANNT chart options and open-source projects (thanks to a recent post on HNN, as always).

dotProject.net - what it says on the tin.
faces - seems to be a Python-based DSL for project planning or something.
OpenProj - what it says on the tin.

A page by Edward Tufte about alternative graphical languages for project presentation.

Two ways to make GANNT charts in LaTeX.

Saturday, July 24, 2010

Perl: mucking with the symbol table

Suppose you want to be able to build a function on the fly (as a closure) then call it like any other subroutine. Your answer is to modify the symbol table, to wit:
$sub1 = sub {
print "Hi!\n";
};

$clo2 = sub {
print "OK?\n";
local *sub1 = $sub1;
sub1();
};

$clo2->();


sub1();
Here, the call to $clo2 will print "OK?", then "Hi!", then the second attempt to call sub1() will fail because it's not defined (we localized the typeglob within $clo2).

This is how I've implemented subroutines in Semantics::Code.

wxdtut - a Wx::Declarative tutorial (first attempt and thoughts)

So here's the idea: I want to write a self-contained HTML-based tutorial that will both publish to the Web and run the demo programs in a standalone format. As an initial stab, I put together forty lines of Declarative (posted here on the Wiki).

It's a nice first try; it does actually find a sloppily-named script in the demo directory, and runs it. There are a few holes it shows in my design so far; no "sub" for common code is the worst. So I'm going to try to solve that first.

Ultimately, I'll want a tree of chapter/section, a built-in editor for trying demos, output capture, and so on. But this is a good start.

Wednesday, July 21, 2010

More thoughts on invoices

In addition to a "brittle" definition of a specific data structure representing an invoice, a semantic programming system should have some general, dare I say semantic knowledge concerning invoices, and business processes in general, documents, and so on. In other words, there should be a semantic web, one view of which might be a specific data structure definition, and the system could interact with e.g. Perl code using that specific data structure definition - but it could also modify it based on what it knows about invoices.

That's pretty darned hand-wavy, but again: it's where I want to go.

AI

I always go crazy for any AI I see. I'm happy to note there's a CPAN module AI::Genetic (searched for it after I saw an HNN post on genetic algorithms in Python). Anyway, this is a good target domain. Duh.

Tuesday, July 20, 2010

3D modeling

New domain of interest. ICE is an interactive composition environment that looks interesting. A blog.

Saturday, July 17, 2010

Coffeescript

Interesting: compiles to Javascript. Slideshow.

Thursday, July 8, 2010

SYMADE

An example of semantic-oriented programming. Interesting.

Finding mappings

Imagine, then, we have an invoice. We moreover have a task that somehow says, "Make a document from this invoice." The way to do that is to find a unit that is indexed as a map from an invoice onto a document. That map satisfies the need to make a document, so we set up the map, let it do its thing, and we now have a document.

Note that the task is thus a high-level description of the essence of the program. Again, this is akin to 4GL languages in that we state what to do, and the semantics tell us how to do it. This is truly declarative programming.

Mapping in real life

OK. So today's epiphany is as follows: Given a unit "invoice" which I have already specified (i.e. filled in), I can map that onto a "document" unit that might look like this:
document
text title "Invoice #3"
text customer
Customer name
Street 9
D-whatever Germany
text identifying
Job # whatever
table
header
text "Description"
text "Units"
text "Unit price"
text "Price"
row
text "Translation DE-EN"
text "4802 wds"
text "See PO"
text "438.00 €"
text (align=right) "Total: 438.00 €"
text bottom_boilerplate
This maps back and forth to the invoice, which is the abstract view of the same thing. And this can be specified even further with layout information, either in the abstract document or a different, more specific template, and then that structure can be mapped onto a Word document or PDF. (Or both.)

The point being that this map is then itself a live object that can be stored and represented, and that is semantic programming. This part is a lot like XSLT, because XSLT is all about mapping and transforming tree structures. But it's unlike XSLT because (1) it doesn't presume that the mapping is a one-way, one-time transformation, and (2) the organization of the maps and semantic units is organized in a lexical database somehow. That lexical database is itself the program. In some way.

I hope this cleared all that up.

Wednesday, July 7, 2010

Diggy/DGE Javascript/DHTML-based game platform

Another cool domain - Javascript games. (Also examine as prototype for Fireworks.)

Tuesday, July 6, 2010

Yahoo! design pattern library

Not the first time I've run across this, of course, but... it has semantics written all over it, so into the link heap it goes.

Monday, July 5, 2010

Another stab at "invoice"

I now dislike my earlier text-based notion of macros. The macro system should be native, and more importantly, needs to be at a semantic level. That is, we are describing to Class::Declarative what sort of node it should be building.

With this in mind, here's my current notion:
unit invoice
has customer => customer
has data items (description, price, unit, subtotal)
assert usually count(items) > 0
calc total = sum(price) from items
has currency => currency default USD
has comments

I need to find a more specifically macro-oriented structural definition and express it like this.

Now Lisp in PHP!

Of all things. I should just go ahead and finish my Perl one.

Another thought on files

I'm proposing a new approach to the formalization of knowledge about programming constructs. Instead of building a new module, I'm proposing the definition of heuristics for the use of the old ones. OK, sometimes your application justifies the creation of a new module (rather often, granted), but sometimes, given a script order, you either don't have the luxury of installing a module, or you just don't want to involve that kind of overhead.

In such a case, you want to be able to write a script that uses existing Perl infrastructure (say) but still manages to deal with sticky cases like UTF-8 BOM markers because the actual specific files in use for the case include them.

So the trajectory I went through: write the naive code, discover BOM markers, work out a way to deal with them, find one file in the list that didn't have BOM markers and write the appropriate conditional code to handle both cases - that trajectory is something that is amenable to automation.

Well. It's not really very amenable to automation right now. That's the point of semantic programming. That's what I want to automate. It didn't require a whole lot of human insight, just some techniques I've learned over the years and some basic logic. I don't think it's AI-complete; it's just one more corner to break off the brick, and it's the corner I have my eye on.

Saturday, July 3, 2010

Couponing automation

Probably stupid, but Jeffrey believed that with couponing, he could live on $1 a day for food and have plenty to eat (via MeFi). He blogged it for a month, spending less than $30 and eating rather well - while donating food to a local food bank. Then he kept doing it.

This intrigues me. A lot of the coupon game involves planning and "deal detection" that might be amenable to automation. So ... maybe it's a good target?

If so: SavingAdvice.com coupon database. Jeffrey's FAQ. An explanation of blinkies. There are lots of fora, of course, where heuristics could be gleaned.

Worth thinking about.

MetaOptimize Q&A site

A stack-overflow clone for statistical methods, machine learning, etc.

A great domain for semantic programming.

Javascript local data store

Another interesting component in Javascript. Why, yes, I am in link dump mode, thank you.

Understanding systems

A blog post musing on understanding systems and how it makes us better programmers. This, again, is kind of where I want to be going with a semantic software system.

Intrusion detection

This blog post on intrusion detection probably has no place here, except that if semantic programming can't also include semantically motivated machine learning paradigms like intrusion detection, then I'm probably not thinking about the right thing.

Well. I find it interesting, so it gets bookmarked here.

UTF-8 files and Perl

I ran into some problems trying to load some files with UTF-8 text (German) using Perl. Thing is, these files had a three-byte byte order marker (BOM) of ef bb bf [here is another useful link] and Perl freaks out. You have to check the first line for those three bytes; if present, you toss them and keep a flag. Then for the rest of the file, you have to set the UTF-8 flag on each line read.

The code:
use Encode;

open F, "$d/$file";
$utf8_file = 0;
$firstline = scalar <F>;
if ($firstline) {
if ($firstline =~ /^\xef\xbb\xbf/) {
$firstline =~ s/^\xef\xbb\xbf//g;
$utf8_file = 1;
Encode::_utf8_on($firstline);
}
[ consume $firstline ];
}
while (<F>) {
Encode::_utf8_on($_) if $utf8_file;
[ consume $_ ];
}
This is tedious.

I see two approaches to dealing with this. The first is to create Yet Another Module (this includes adding code to Class::Declarative), then always use that module when coding. This is kind of the default, and ultimately it is unsatisfying.

The other approach is some kind of pattern / macro / template system that would include this knowledge and would somehow generate the appropriate code as needed. That's where semantic programming needs to be headed.

Boy, that's vague.

Thursday, June 3, 2010

Web app ideas

A list of Web app ideas. Can't have too many. One or two of these look interesting.

Open-source Rails apps

Some example Rails apps. Seems as though a generic semantic platform could establish parallels between different frameworks, and kind of ... port ideas or features back and forth. Or something.

Monday, May 10, 2010

HNN finds another interesting thing

Fabrik, an Apple visual programming toolkit from 1988, has lots of very intriguing features - including bidirectional mapping between objects, and a pervasive use of the dataflow paradigm. (!)

Saturday, May 8, 2010

Another thought on levels of abstraction

Maintaining two views of the same structure (i.e. two levels of abstraction) is what I referred to earlier as a dynamic mapping. It's definitely something that has to end up in C::D.

Dataflow programming: Cascading

Cascading is a Java-based dataflow API for Hadoop. Since dataflow is one of the key declarative domains, as I see it, I want to do something more or less like Cascading's feature set.

Saturday, May 1, 2010

Levels of abstraction

Here's a concept that crystallized for me today: a programming system that could present code at different levels of abstraction at the same time. Well - clearly not at the same time, but ... a system that had something like a semantic organization of the code could present it at a higher level for a summary view, then get more specific as needed.

This is again something like literate programming, but closer to the code. It would identify larger structures within the source code as having a meaningful relationship with one other, without obscuring the specific source code.

A Lisp macro system is great at producing levels of abstraction, and in many cases this is exactly what's needed; I don't care where the compiler puts variables, so I don't need that low level. But when approaching a large software system, I need to understand what parts of it correspond to specific features, and only look at the features I'm working with at the moment.

Or something. I need to refine this further. But here's kind of where I want to go with this. I've got a venture going with wxPerl, and so I'm developing it in Wx::Declarative (at least as far as I can). I'd like to be able to have a summary description of the overall application, while still having full access to the lower-level specifications of the different panels and so on. I'm not sure I want to put things into a different file, because it's all so compact that I like the idea of having it all in one place (with "business logic" put into a separate module, but with the presentation all being in one place).

Semantically, it would be nice to be able to integrate this summary structure into the structure of the code. I'm really not sure what that would look like, though.

Sunday, April 25, 2010

WWW::Mechanize

A good target for declarative coverage.

Monday, April 19, 2010

Data::Match

This might be a decent place to start with some kind of declarative pattern-matching ... thing.

Lisp in Perl

Makes you wonder why you couldn't just write a Lisp that would "compile" to Perl. It would be instructive and fun.


Edit 2: a serendipitous link to the same in Python. Freaky.

Sunday, April 18, 2010

What makes Lisp great?

Of all the languages out there, Lisp is the one that seems to have the most proponents who sound believable. Occasionally I think I should learn some of it - then I bounce off what to me are always the stopping points: I have a hard time with the radically different vocabulary, I don't like capital letters, I miss CPAN like childhood innocence, and all this - pathetically - is enough to stop me.

So I decided to check what other people think makes Lisp great, and put that into Class::Declarative if possible. But when you get down to it, Perl already does a lot of what makes Lisp great, it turns out. (The closure epiphany I had in January is what got me started on this path in the first place, after all.)

Paul Graham lists nine new ideas that Lisp embodied: conditionals, first-class functions, recursion, dynamic variables, garbage collection, programs as expressions (i.e. functional programming), a symbol type, a code notation that is a tree of symbols, and the whole language available at all times. Of those, the first three or four are now universal, and the first five unambiguously part of Perl. The sixth is mostly covered by Perl and can be simulated with anonymous subroutines in cases where it's not covered.

That leaves the symbol type, a code notation that is accessible to the program, and the whole language available at all times. I'm not terribly interested in the symbol type, because it's a performance issue (testing for equality using the symbol handles instead of checking string contents); I'm really interested in what makes Lisp more expressive than other languages, at least for those versed in it.

Moving on to several sources, "Lisp is a programmable language", meaning that Lisp can easily write Lisp code in order to provide higher-level semantics for a given domain. I think this is getting closer to the crux of the matter. Due to Lisp's minimal syntax, its control structures are all functions, and so you can easily extend the language to suit your domain. That, plus Lisp's interactive nature - the way a Lisper effectively enters into a dialog with the language while evolving new semantics - make Lisp unique, or at least partly unique. (Here is another good presentation of this notion.)

Python has considerable interactivity, of course, and the introspection that Lispers find so useful. But I've never had much luck with that mode. I find it far more instructive to break things down into unit tests in the CPAN module paradigm, and work out semantics that way - although I can certainly see how an interactive data inspection facility would really speed the process in many cases (actually, it would make some things possible that aren't any other way.)

Perl is supposed to make easy things easy, and hard things possible. An interactive data facility would be a great addition to the language. Well - I suppose the Perl debugger already does this, to a certain extent, but the Perl debugger has always made my brain hurt.

The Lisp macro system and quasiquotation are powerful facilities making it easy to extend the structure of the language; a Lisp macro is a program that writes other programs that perform the ultimate tasks. Class::Declarative is most definitely moving in that direction; my macro system will be equivalent to Lisp's (I think) and nearly as terse as quasiquotation. More on that when I've got something working.

I'm left with the following chief advantages of Lisp: introspection at all levels, and interactivity allowing multiple approaches to new semantics. And I'm forced to admit that at the moment, Class::Declarative is not doing this - but could, if it grows more in the direction of semantic programming.

On paper, I've been exploring some possible approaches to realistic semantic programming; it appears that the key insight is recognition. That is, matching. If I posit a given structure for the world, then allow a matching engine to tell me how the world can be made to match that structure, then I've parsed the world, or recognized some structure in it. That's really what semantics is.

Here's an example. I want to write an invoice. I create an "invoice" structure. The Lexicon looks up what an invoice is, and fills in some structure as follows:
invoice
customer {{index customer}}
data items (description, price, unit, subtotal)
{{.assert count(items) > 0}}
total "{{select sum(price) from items}}"
currency "{{index currency default USD}}"
{{.section comments}}
comments
{{@}}
{{.end}}
This is still pretty crude, but effectively we can scan the template to see whether all the information is available that we need for a full invoice. Some sort of interactive process will allow me to fill in what's missing, and some other specification not shown here will allow the invoice to be stored. That part's probably some sort of workflow environment. Once defined, the invoice can be expressed using a template to produce PDF or a Word file, and attached to an email. These actions are certainly workflow.

The point is that this is semantic programming - I define an invoice, the environment knows what an invoice is, and we interact to define the case. It's fuzzy in my mind, but I'm getting there.

Note that some of the fields in this example (the index fields) are expectations that prevent the object from being fully specified until they are fulfilled. The same applies to the assertion. However, the select field is simply a calculated field that doesn't require - in fact, doesn't support - an interactive assignment of value.

That calculated field gives us something like the capabilities of Excel, by the way. It would be instructive to be able to build a full spreadsheet in a system like this.

Friday, April 16, 2010

Why Perl?

Occasionally I'm beset by the worry that I've chosen Perl as the basis for all this development. Wouldn't some other language be a better choice?

All I can say in my defense is that my brain seems to work better with Perl for some reason. It's not that I've done a great deal of work in Perl - I've written more lines of C and Tcl. It's not that I have no exposure to other languages - I've worked in Python and Scheme. And while I enjoy the joke that Perl is a write-only language as well as the next programmer, still - when I look at my own code written in Perl, I can read it more easily than my own code, let alone that of others, in other languages.

Then there's CPAN. Oh, sure, half the stuff on CPAN is alpha-quality just-off-the-ground stuff, but oddly enough, it's usually enough to get me off the ground on a given project. Then, after I've got something halfway done, I can rewrite the CPAN modules that weren't what I wanted, and get the rest of the way.

At the end of the day, all programming is about semantics. You are taking the inchoate space of possible meaning, and narrowing it down to the meaning that gets you where you want to go. I find it easy to do that in Perl, for whatever reason.