Wednesday, December 29, 2010

Win32::Daemon

Since Windows services provide some of the functionality that Unix just uses cron for, the service manager should be part of Schedule::Universal. Win32::Daemon really seems to do everything you want, here.

Mission starting to be accomplished

The whole point of the Class::Declarative effort, to me, is to enable me to write quick scripts in my fields of interest that don't slow me down enough with whatever I'm doing. In other words, when working with Word documents, the idea is to offload all the knowledge, all the lookup work inherent in dealing with OLE (and there's a lot) so I can actually conceive of the need for a script, start it, test it, and use it, within, say, an hour or so. If I go beyond that time, I might as well not have a scripting language - I need to finish the task, not spend a day crafting a way to automate it.

Last night, Win32::Word::Declarative managed to meet that need to a certain extent. Just the ability to remember that I can simply type
use Win32::Word::Declarative;

document (active)
  do {
    ...
  }
is enough to get me over some humps.

Not only that, the declarative framework gives me the start of a way to store knowledge accumulated. My task last night was to iterate over all the stories in a document - not the first time this has come up, of course. And this task is needlessly complicated, but now, for the first time, I have a way to package it up into a function that can easily be called. Now, I can simply say "foreach my $range in (^story_ranges())" and I'm done. It's still not as easy as it could be, but I no longer have to worry about that wacky double loop.

I'm sure I'm not the first programmer ever to be in the position of realizing that his accumulated codebase made things easier for him - it's not even the first time for me (the XMLAPI sure did make working with strings and structures in C much easier) - but it's immensely satisfying knowing that this work is starting to pay off in real terms.

Sunday, December 26, 2010

Unison: file synchronization, possible interesting target application

Unison. I intend to start using it.

Update: I have started using it, and it rocks! A guide.

Boomerang: a synchronization language

A bidirectional textual language. It's the same thing as my concept of a mapping, which it turns out (at least some) other people call a "lens".

Very rough roadmap

Having just committed the basic database tests, I thought now might be a good time to look up and get my bearings on the immediate future of Class::Declarative.

There are several additions to basic functionality that will go into the next release: these are the "system" tag (configuration), the "application" tag (also configuration), the shell and command line features, and state machines.

A word on the application tag first. Essentially, the definition of an application defines both the basic structure of a given set of programs as well as establishing a context for actions. So the application will also be where we define actions - and actions will inevitably result in the definition of workflow, because workflow is really all about the definition of actions (and who does them, planning for them, and so on). A full workflow engine won't be in the next release, but I think it's important to realize that it's probably going to be in the core functionality, and probably pretty soon.

An application can (1) be a refinement or a modification of another application, and (2) can define any structure at all for the scripts that belong to it. Whatever is defined in the application import structure will end up as macroinserted structure in the running script. This can even go so far as there not being a running script at all - in this case, the file run will be a data file, and the application will have to define how it's handled (most likely as a file tag - still to be written, obviously).

(An aside: it might be useful to go ahead and include filesystem semantics in the next release, but I'm not wedded to that.)

An application will probably be declared something like "application [name]", possibly with some other decoration. Class::Declarative will first try to treat the name as a class [name].pm, then maybe something like App::Declarative::[name].pm, and then whether or not that fails, will look to [name].conf in the current directory, and parent directories up the tree.

The application could also be given on the command line, of course. And it occurs to me that an "include" mechanism would be essentially the same thing, so we'll toss that in as well.

All this gives us multiple ways to define configuration all in one blow. The key is that all the structure found in this way goes into the same bucket as macroinserted tags. (It might be useful to keep track of where structure comes from in this way - and eventually even precompile structure while checking for changes to files each time.)

The state machine mechanism is really only destined for this next release because it's already started (kind of). It's cleaner. But definitely, applications, configuration, command-line invocation, a shell, and probably filesystem access all need to be in this next release, getting us closer to being a mature language.

All that is probably enough to keep me out of trouble for a month or two.

Update 11/19/11: Jeez, I didn't end up doing any of that. Except the shell, kind of.

Friday, December 24, 2010

Thursday, December 23, 2010

Norvig: How to Write a Spelling Corrector

Now here's a thought-provoking essay. It would be interesting to analyze the thought processes at a higher level of detail, but it's also useful just for thinking about how to go about doing NLP.

Function descriptor

I had a thought: if I set up a "knowledge base" as an equivalent of a library, so that each function would have some kind of high-level semantic description of what it does, you'd expect to be able to reason about that to assemble a program from a description of what you want done.

It's still a pretty squishy idea, but ... it sounds right. You could even attach some kind of templated version of how to carry out the computation in question, and then write plain Perl or something on that basis.

Milestone of a sort: database access

Just wanted to mark the date that I've got rudimentary database access going and build-time database access to modify built content. Result: I can define a Word document template/script that includes a table defined by a query result.

Kind of neat, and it just makes me think of other things I can include.

Latest idea: generalize the "text" tag as a kind of output. When it fires, it asks its parent how to output its contents, and that propagates up the tree, allowing e.g. a document to define how text gets output, but permitting general output for non-Word areas. Add an "output" tag for alternative output handling, and you've got a pretty neat way to handle text.

Then you give the generalized text tag the ability to do Wiki formatting (for which it would also have to ask its parent how to represent different formatting choices) and you've got a really powerful mechanism. I'm pretty sure it would be most of a templating facility, right there. I need to implement it and see what it can do.

Sunday, December 19, 2010

Invoicing again

Here's a little stream-of-consciousness post for ya.

Now that W32::W::Decl is working to create Word files, I need an overall process in which to embed it. What will that look like?

First, it's going to be driven by my invoking something. This "something" is effectively saying, "We are going to run invoices now." And that will involve hitting my prepared ready-to-invoice database in Access, retrieving a query from it, and taking action based on that.

Since everything in my translation business is customer-dependent (each agency has a different workflow for invoicing and, really, for everything else), my first query is simply going to be "which agencies have outstanding finished jobs that should be invoiced?" Then for each of those agencies, I'm going to switch to the agency context (represented in this case by that agency's working directory) and perform "invoicing" in that context.

OK. So here is already an interesting point, because it's workflow and thus I've thought about it for a decade already without satisfactory resolution. Running a script is equivalent to taking an action - a workflow action. But every action happens in a context.

An action and an event are necessarily separate, although in a sense they're the same kind of thing. An event takes place internally to the running program, whereas an action is a long-term, irreversible, action taken in the context of a larger system which includes that running program. An action is workflow.

Thus workflow is going to have to be a part of Class::Declarative, will it or nill it, because running a script is taking action is doing workflow, and in the larger sense we're going to suffer if we don't manage workflow properly. This has to be explicit - well, unless we leave it as a default or something.

So the context. For the time being, I'm going to consider a context to be purely a question of directories. I just can't get my head around much more abstraction than that, although ultimately it's going to end up being necessary, I'm sure.

The context modifies the "system" structure. System structures can be chained, too, allowing us a context hierarchy.

Oh - before I get too much farther here, I need to mention another sort of orthogonal dimension of contextuality, and that's the application. Again, C::Decl needs to take this into consideration right from the start: an application sets up a system context for a set of scripts, too, but stores this configuration in a central location instead of a location relative to the script being run.

The application can be set in one of two ways: first, I can just say "application " in my script. Second, I can specify it on the command line: perl --application= . This allows me to set up extension mappings under windows: .proj might be "perl -MWx::Declarative --application=project ".

The application can monkey with all kinds of things, like importing additional modules and determining the controlling domain. Some of this will be macro-inserted into the current script to be sure it all works out correctly, and that's essentially what the "application" does.

But - and this is key - so does the context. It's just that without an application, we don't actually know what the context is supposed to be named (unless we explicitly name it ourselves).

Oh, which brings me to the third way to invoke a script, which is to associate its extension with a script stored elsewhere...

Well. You can definitely tell this is stream of consciousness. You're lucky to get coherent sentences, even.

Recap:
  • Invoke a script in situ
  • Invoke a script as a "second language" by giving it -M and other args.
  • Invoke a script as an "application language" by giving it -M and --application=
  • Invoke a non-script file by assigning it to a script.
I've done the first two successfully, defining ".dpl" on my machine as a declarative Perl script. I have only now really formally acknowledged the last two. I'm not really thinking in terms of Unix yet, am I?

Onwards, to other work. Let this simmer.

Scalable Web crawling

Posted without further comment because I haven't taken the time.

Friday, December 17, 2010

Scheduling

One of the real-life things that comes up all the time, and there's no convenient way to work with it, especially cross-platform. So, a twofold proposal:

Part the first: Schedule::Universal - a proposed cross-platform Perl API for both cron and the Windows Task Scheduler, along with at (on both platforms) and a way to set up a scheduling process of your own. In other words, one-stop shopping for scheduling.

Part the second: Schedule::Declarative - the declarative front for S::U; put a schedule into your system definition and you're already done scheduling your tasks.

Thursday, December 16, 2010

"TDD problems" - statements of interesting small programming assignments

Here's an interesting page: a list of interesting programming problems for people who want to learn test-driven design. I'm not going to flat-out say it would be good to have an automated program that could program from those statements, because that would be crazy. But it's certainly instructive to contemplate what such a nonexistent system might look like, and chew off bits around the edges.

Wednesday, December 15, 2010

Requirements engineering

So I'm translating course material on business process management for the Hochschule Esslingen at the moment, and the topic has turned to something called "requirements management", something I'd never really considered.

But a little googling turned up OpenOME, the Organization Modelling Environment, a project at the University of Toronto that is a part of the larger "agent-oriented approach to software engineering" project.

Standards-based Java programming is generally not terribly to my taste, but I really like the notion of modeling anything and everything about software to be developed. The more model is explicit, the more we've replicated a human's understanding of the context of the code, and the closer we are to semantic programming.

So. Deserves further study. I'm intrigued by the repeated mention of a knowledge base. I really want to know what that's about.

Later update: The PowerPoint presentations are really interesting! And they've come up with a kind of declarative DSL to describe models. This is an encouraging field of inquiry.

Monday, December 13, 2010

State machines redux (link dump)

I started digging around looking for some good examples of state machines to test with, and found a bunch of stuff. (And I could swear that I'd already written this post, but ... apparently not.)

State machines are used for simple sentence parsing. Turns out they're not powerful enough to do the job in all cases (which I knew, but coming at a topic from another angle always allows me to be surprised again and again). However, they've been used pretty successfully for extraction of noun phrases and names from newsfeeds, which is kind of interesting.

Here's a kind of fun approach (though Java-based) to FSMs, using the example of a kind of treasure-hunt game. A state machine lends itself well to Zork-like games.

Here's a tutorial from a robotics approach, although not one I find all that convincing. State machines are, however, very commonly used for robotics controllers, for the obvious reasons. There is lots of material about compiling state machines onto microcontrollers. So my original reason for looking about state machines, WWW::Mechanize, makes a lot of sense. A state machine is a natural way of describing the actions of an agent.

Here's a Ruby/Rails state machine plugin, with good examples.

There's a guy in St. Petersburg who has coined the term "Automata-based programming".

An interesting application of a state machine in building a tree from a serialized protocol.

Charming Python has a chapter on state machines, particularly focused on text processing (e.g. generating HTML from Wiki, which is a good example).

And that's pretty much my link dump. Not a lot of coherence.

Update 11/17/11: I did a little more thinking on this topic.

Thursday, December 9, 2010

Embedding declarative code into regular Perl

It's always when trying to code something that I have the best ideas. So far I've been considering embedding Perl into declarative structures as kind of a one-way street, but it would be useful to close the loop. The point where I realized this was in building a bit of code that could benefit from a state machine.

Well, my current thinking had been that the state machine would be declared out in the tree somewhere, and the Perl code would just call it. That's kind of rigid, though. Wouldn't it be nice to get something like this?

my @tables = $^content->find('table');
my @rows = $tables[1]->find('tr');

declare state-machine process (start=off)
  off {
    => on if $^next->as_HTML =~ something;
  }
  on {
    something
    => off;
  }

$^process->iterate(@rows);

Isn't that neat? And put it together with a "perl mode" that effectively treats the entire file as a do { ... }, and you have a true declarative framework that would be useful without necessarily being in control.

Tuesday, December 7, 2010

State machines

Yeah, so I'm going to add explicit state machines support to the language. It looks like this:

state-machine
  start {
    # Figure stuff out.
    => a1
    => b1
  }
  a1 {
    # Other stuff.
    => return
    ...

You can parameterize a state machine:

state-machine (input)
  ...

Put like this, a state machine is itself an action (e.g. a "do" that runs on ->go()). You can also simply declare a state machine and instantiate it on input elsewhere, in which case the instance will be a ....

[we interrupt this post to point out that FSA::Rules really does.]

... function that you can call repeatedly. And thanks to FSA::Rules, all the hard work is done; I can just wrap it. The only question remaining being whether state machines belong in the core or not. Actually, I think not. Which means I have to figure out how to chain semantic domains (but I had to figure that out eventually anyway).

A thought on order of execution

So I've been thinking about the logic way to handle what, of the top-level items in a program, should actually get run. In my thinking, there are three different types of top-level item. First is just "do" and its friends: pure actions. They always run when at the top level. Second is purely declarative stuff. This never runs anywhere, because it doesn't have a go function.

The third class is both fish and fowl. It will run if necessary, but would rather not - these are items that are really more declarative than anything but that still have a default action, such as documents, URLs, GUI definitions, and so on.

The overall regime that makes sense to me is: 1. Run any preliminary code. 2. Skip any ambiguous code. 3. If there is code after the ambiguous items, don't run those items. 4. If the last item in a program is ambiguous, run it.

This gives us the option of setting things up for an ambiguous item that then controls the program (e.g. prints its document or activates its GUI or whatever). But if there is any action below an ambiguous item, it will be considered a declaration, not an action. I think this really captures what makes most sense to me.

Actually, there are places this should hold even within other items. For example, I'm putting a state machine in a URL. Should the state machine run as the code for that URL or act as a function definition for use later? Good question....

Monday, December 6, 2010

Next up: WWW::Declarative again

I feel the need to start writing scrapers for real. With WWW::Mechanize, HTML::Treebuilder, and Data::Match, I've got most of the heavy lifting ready to go. So that's where I'm looking.

Win32::Word::Declarative published

And thus the first publication of a Class::Declarative-based module. And oh, how many things could still be done on it...

Saturday, December 4, 2010

Target domain: Automated game design!!!

This is just crack for me: game design as a domain for automated discovery. With links! Like to the game ontology! Ah! That.

Because it's not about game design, it's about thinking about software - in software. It's about a pattern language for game design. It is, in short, about semantic programming.

Note: LUDOCORE paper.

Markov chains for test input

Hmm.

Friday, December 3, 2010

Framework fatigue

Heh. From Chris Harden's Jeviathon: Framework fatigue: How many frameworks do I need to know?, we have the following interesting closing statement:

A talented developer has an interpreter and compiler in his head and thinks in pseudo-code anyway. Applying that to a language is just a matter of figuring out the syntax...and that is the easy part.

Hear, hear. You might as well write in pseudocode.... Site::Declarative should be the metaframework to end all frameworks.

Parameterized templates

How about this?

define formatting my-snippet "text"
   parameters (bold)
   nodes
      text "$text"
      text (italic) "$text"
      text "$text"

This would define a parameterized snippet that types its input text three times, in bold, with the middle also italicized. You'd invoke it like this:

document
   para "Some initial text"
   para
      my-snippet "Repeated text here"

By default, parameters would be passed through to the defined object.

You could define-and-implement in place with this:

document
   var text "Here is my variable"
   para "Some initial text"
   <= formatting (bold)
      nodes
         text "$text"

Here, we don't need a separate "parameters" tag because we're just going to use the variables in our event context at runtime.

The reason there's a "nodes" tag in this is that I might want to include other tags as well:

define formatting my-snippet "text"
   parameters (bold)
   do {
      # Set some things up
   }
   nodes
      text ...

We'd want a whole range of tags to denote the parts of a full node: parameters, options, label, parser, code, body, and nodes.

Note that instantiation of a named macro happens at build time, while instantiation of an anonymous macro happens at runtime. To run a named macro at runtime, we'd want to do:

express my-snippet "text here"

I should implement this stuff now, then see whether it handles everything I want. Add some control structures and it could be just about as powerful as you could want.

Thursday, December 2, 2010

A couple of decent Perl links

Serious Perl - has some excellent insight on funky module magic and some OK advice on Perl code maintainability.

Perl - OLE - Word - a collection of Word invocation tricks.

Further necessaries for Win32::Word::Declarative

So I did the standalone use-WWD thing and I'm polishing up an initial tutorial for the module prior to releasing 0.01 onto CPAN, and there are two things, at this point, that make the module less than perfectly usable in its current form. (This ignores the fact that it covers about 2.7% of Word's functionality; that's just incremental stuff that can be filled in at my leisure.)

First, it's hard to use it from plain Perl. I have a good plan for this: instead of indented strings, accept an arrayref format based more or less on the output from the indentation parser. This allows us to generate nodal structure really easily without worrying about having to format it with indented text.

But the worse thing is that the C::Decl framework still basically supports mere declaration. That is, I can't really specify a mutable data structure that is based on instance data. And that's a severe limitation - which is mildly surprising. A Wx user interface doesn't really need a lot of runtime mutability, but a Word document is an output format. Its natural functionality is to present runtime data, making the lack more glaring.

So really, a high priority for C::Decl has to be mutable structure. Macros. Templates. Whatever the mechanism or mechanisms are called, the use case is this: a script that (1) gathers some information somewhere, then (2) generates a Word document based on that information. We can't really do that right now without dropping into Perl or using Perl to generate the Word-generation script and then running it separately, and that just isn't where I want to be.

Also, a small tweak: right now, C::Decl only gives runtime love to the topmost semantic domain. Instead, I'm pretty sure that it should just scan its children in order and attempt to execute each one. There could be a [norun] option to suppress this explicitly, if necessary. And of course in the case of Wx, execution just won't return after you hit the base frame/dialog - but we don't really care much about that.

But this is necessary in order to gather information in the use case above - and in general, it will be a normal state of affairs to set things up, then instantiate something. The alternative is to allow the macro itself to run code at build time, and that's OK, but in terms of presentation it will often be clearer to do things in two phases that are visually distinct.

Wednesday, December 1, 2010

Open-source data mining software

Sigh. Here. So much to do, so little time.