Sunday, March 10, 2013

Accounting and data management

So I've been looking at accounting in more detail lately (it's coming up on tax season, and then there's data modeling, and business plans) and I've been having some various epiphanies.

I've always thought of accounting as being primarily a database application.  Which it is, naturally, but I've come to realize that the purpose of accounting isn't actually data management - the purpose of accounting is to predict the future.  Well - and satisfy the tax authorities and make sure your customers don't forget to pay you, but one of the main reasons you do accounting is so you can plan.

In looking for data models for accounting, I first looked at the mother of all data model sites for a basic model.  Data models are fluid (they don't get treated as very fluid, but at the semantic level, they should be seen as fluid).  In other words, there are a lot of different ways to model accounting data.  There are standards, of course, some of which are mandated by governments so that corporate reports and statements take a standard form, and some of which are just good ideas - but they only make sense if you accept that the purpose of accounting is to tell a story about a company and explain how that history allows you to make such-and-so a prediction about next year.

Case in point: the chart of accounts.  The chart of accounts is the list of all the separate accounts that a given accounting system tracks for a company.  By convention, they're numbered, and the first digits of the numbers have meaning. ('1' being assets, '2' liabilities, and so on.) The reason for this numbering system may not be obvious to the programmer - but account numbers must often be written down on papers or whatever, and if they have internal structure it's easier to see what's what on these paper documents.  In other words, the numbering system is an interface to traditional document management and is justified.

So here's what I learned about planning a chart of accounts.  If you initially use a very simple arrangement, but then as the company grows you start ramifying into separate accounts for separate purposes, then as the page I linked to above notes (and I highly recommend that entire site, actually - very information-rich!), you lose the ability to compare year to year. Because the point of accounting is to compare year to year!  (And also to make sure you get paid for invoices, and you pay your own incoming invoices, etc.)

Which brings me to data management.  There's a concept of master data management (MDM, which to me means the Manufacturing Data Management system I worked on at Eli Lilly in the 90's, but that's another story entirely) which can be seen as version control for data that warrants it.  Master data tends to be complex, slow-changing, benefits from versioning, and is largely global (although for reasons of performance it can be mirrored here and there).  The processes of master data management can be seen as relatively independent of the processes that simply use the master data for other purposes (which are transactional processes).

Now clearly, master data management and data model management are essentially the same thing: they involve definition of the semantics of a given company.  They can evolve over time, but if they do, you need to keep track of how they've done so.  For example, our list of customers naturally changes over time; a properly versioned customer master can tell us when a customer was a valid customer, when they stopped being a customer, and so on, and the customer master at a given point in time can be seen as a snapshot of that process.  The same can be said of the data model; as our data management needs grow, we start to make distinctions that were unimportant before - perhaps we have different processes for retail customers on our website and larger contracted customers, and so having a single record that addresses both sets of needs may be too complex.  This is an area of data management that seems to be really poorly considered and addressed, but maybe I'm just too naive at this point.

Anyway, back to accounting.  In terms of the chart of accounts, you can easily see that accounts fall into a hierarchy that can ramify to an arbitrary extent - but as they ramify, if you want to preserve comparability, you need to "back-categorize" existing entries to fall into one of the new subcategories.  This is arguably what that post from last week (or last month, time flies) is doing using machine learning; taking the posts from Chase as a sort of general ledger, it categorizes them into subcategories using a machine-learning algorithm I haven't examined in any detail.  The same kind of thing could be done if we just split a general asset ledger into a petty cash and bank account setup, for example.

If we look at the overall process of accounting, the "accounting cycle", we see that there are actually two phases involved.  The first phase is really not even a phase - it's an ongoing thing.  As each transaction happens (an invoice is received, money changes hands, etc.), it's identified and a determination is made of its significance to the accounting system.  That is, if we receive money, we determine which account should be credited, why we got it (which invoice the customer is paying), and so on.

Then, periodically, we close the books - we reconcile all the outstanding weirdnesses, fix things up with corrections if necessary, and issue statements that can be given to shareholders, governments, and management planners to explain what happened to the company and what can be expected to happen next period or next year.

That's what I learned about accounting in my browsing today.  There are a couple of side points on data management I'd like to address as well.

First is business rules.  As noted in the data model tutorial here, business rules are generally implemented as constraints on the database that prevent certain nonsensical things from happening - an order for a non-existent product, for example.

Second, a canonical data model ([here] and [here]), a popular concept lately due to service-oriented architecture, can be seen as a lingua franca between two specific data models. We can define transformations between each specific model to the canonical model to permit communication between the specific systems.

Third, a link to Microsoft's modeling tool, such as it is, and the observation that my own notion of modular data modeling really seems underrepresented out there still, and maybe there's a need for it.

Actually, for basic accounting concepts the GnuCash manual is pretty fantastic.

No comments:

Post a Comment