Tuesday, May 14, 2013

Record-based data retrieval

Back in the prehistoric days of 2001 or so, when I was working on the original wftk, I kept running into the fact that really, if I wanted a quick tool for defining business processes as workflow, I needed some kind of tool to define data records that could extract information from diverse sources and put it all together into a single record, then update the various data sources correctly when that record was changed.

The initial version of that tool was the repository manager and it consumed my coding thoughts until about 2004, when I qualified for EIC and stopped pretending I programmed for a living and switched to technical translation full-time.  It was a complicated time for me.

Anyway, every time I start to do business-related programming, I want to revive the wftk. And every time I want to revive the wftk, I still want an easy and declarative way to describe data sources, load data into a more-or-less-structured record that I can work with in memory, and write things back out to appropriate data storage in an arbitrarily complex manner.

Some of that is simply impossible to do declaratively, of course.  Sometimes you really just have to build a class to handle things.  But sometimes it should be possible to do things without (much) coding.

Somewhat to my surprise, CPAN doesn't really have a lot of things that help me here.  I think this is just a weird way to think about data or something (even though the storage of arbitrary records into a key-based list is what has the NoSQL crowd all riled up these days, and I was doing it in 2001).  Ideally, my records had the following features:

  • Key-value retrieval from an arbitrary composite data source
  • Key-value storage to the same source as needed
  • Hierarchy: a value could be a list of records or text
  • Version control: a value or the whole record can retain a history or a version number
  • Document management: a value can be a document
  • Named values with a path-based composite key retrieval/update mechanism
  • Composite data sources: some parts of a record could be stored in different places; for example, I might retrieve a list of documents from a directory, then keep arbitrary data about some of them in an SQL table.  Retrieving all the information from the table and the directory is a single retrieval call.
  • Layers of composition: sometimes I don't want to retrieve from expensive data sources
That's pretty much it.  The result is that I should be able to set up a declarative, record-based data structure (a repository) that lets me give a list (a source) and a key, and get back a record.

This is a broad enough requirement that I think I should split it out of the wftk into its own data system.  I just don't know what to call it.  Data::Repo isn't bad, actually.  Data::Storage is taken by a 2004 skeletal data storage structure that seems to have been abandoned.  Data::Record is taken.  (There are fascinating things to be discovered on CPAN probes like this.  Data::CapabilityBased, GitStore, VCI, Treemap, TAP::DOM and TAP::Parser for working with Perl test output, and (ding ding ding) Data::DPath, which is precisely what I want for composite value retrieval from my in-memory record structures.

I think Data::Repo.  So mote it be.  The record itself is just a hash (or optionally could have a class assigned as Data::Repo::Record) - the idea is that you'll just remember where you got it and will store it back to the same place should that be necessary.  If you extract a D::R::Record explicitly, it will remember where it came from, but seriously, I'd rather keep the data anonymous in terms of type and work up data manipulation things that are right in the language instead of OO in nature.

No comments:

Post a Comment