Friday, June 17, 2011

Workflow as core semantics

You know what the unit of workflow is? It's the task. And you know what the natural grouping of tasks is? The checklist.

Build those two things into the language, and I think that's the only really basic support you need for workflow. I suspect (I haven't taken the time to think this through) that all other workflow structures can be derived from those two. For example, a sequence (as opposed to the parallel nature of a pure checklist) can be expressed as a checklist in which the completion of each task's predecessor is a prerequisite for its start.

So what's a task? It's a macro action consisting of:
  • A set of actions to carry out (this really is a sequence)
    • Plain code
    • Subtasks in a subchecklist
  • A set of prerequisites, or pre-existing conditions
    • Completion of other tasks
    • Resource requirements
    • Assertions about input data
  • A set of post-facto assertions, or expectations
    • Expected outcomes of the task
In addition, we might want to declare some of the data (files, etc.) the task is going to work on, and so forth, but that's kind of inherent in the declarative style.

A checklist can persist beyond the technical process running the workflow, and that's really the essential component that makes workflow workflow - but even without the persistence, the checklist is a useful design component. The order in which tasks execute in a checklist is undetermined; the checklist is only complete when all its tasks are complete. The post-facto assertions are used to determine completeness - always.

If an assertion fails, this is an exception. There may be exception handlers, etc. - but if not, the entire checklist hangs (persistently) until the exceptions are dealt with at the human level.

An example is the 1694 LUZ project I've been spending time with lately. Here, the issue is the translation of a few thousand documents of various formats in a complex directory structure. After translation, each file must be cleaned, and there are a multitude of ways in which this cleaning step can fail. As things stand, I have no good exception mechanism; the result is a laborious process of making sure I haven't lost my place when fixing individual files.

A persistent checklist would already be able to handle that situation, and as I say, the non-persistent checklist (a sort of "parallel loop") would handle similar things inside a single technical process.

Task dependency within a checklist is an additional organizational layer on top of this, and really has little to do with the underlying checklist-and-task structure. Similarly, other types of control flow can be modeled with items that can change dependencies, introducing dependencies on local variable values, and so on. Conditionals can be modeled using post-facto assertions that bypass the entire execution of a branch (i.e. that something is complete before it starts). Loops can be modeled by adding tasks to a checklist dynamically while it's still running. For performance, the checklist should really be a queue (minus the presumption of order) - completed tasks are simply removed once complete.

Add logging to a checklist and you've got a good history mechanism. Again, persistence makes a true log of this.

A checklist should include the concept of multiple actor roles (=task queues); the system is one, but even the system should have a list of outstanding tasks in a given checklist. It's a simple extension to add that list of outstanding tasks in an index over a given class of active (persistent) checklists.

I'm pretty sure that basically covers the entire set of workflow functionality. The wftk had some other mechanisms that are good (notification, delegation, etc.) but they're essentially extraneous to the core workflow engine. That core - checklists and tasks - needs to be inherent in the Decl core semantics. It's just too useful not to include it.

No comments:

Post a Comment