Wednesday, September 12, 2012

Data handling

I'm lumping a bunch of stuff together into the general rubric of "data handling" that is really kind of poorly defined.  But no matter how poorly defined it is, it appears that people keep writing about it, and a a not insignificant portion of many practical machine learning books is devoted to it.

Anyway, it basically involves all the moving around of files and databases that you wave your hands at, and end up being most of the fiddly work of any practical project.  I'd like to set aside a little time to think about how to do it right (kind of a best-practices semantic domain, as usual).  And I'm getting the occasional link about it.

My fileset module has to do with data handling (as a way to define files that should be subjected to an action).  The Data::Table module is a handy in-memory way to cut off blobs of relational data and manipulate them in handy ways.  Excel is a good place to stash tables like this in a file.  And so on.

A lot of workflow involves "data handling" - grouping things into documents and that kind of thing.  Taking items from this document and summarizing them into that one.  The "bizop" semantic-level language/view I've been musing on is largely a matter of data handling.

No comments:

Post a Comment