Sunday, March 24, 2013

Email bounce parsing and mapping of modules

The definitive CPAN module for extracting information from email bounces is Mail::DeliveryStatus::BounceParser, and it's still relatively actively maintained - just had an update adding a couple of cases in January.  There's only a couple of problems.  First, there are bugs listed that are years old and (to me) look relatively important - a bounce with multiple bad addresses doesn't get multiple addresses parsed out, that kind of thing.

Second, its report object subclasses Mail::Header.  So sure, that's not horrible, but still - I'd much rather have something that can look at an Email::Abstract object or just the headers of one (Email::Simple::Header) and extract an object that doesn't hook into an alternative mail ecology.

Third, some of the cases might be dubious.  I'd rather have something that uses some kind of tabular organization of filtration cases or something - this is wishy-washier, but it does contribute to my sense of unease in going with this module casually.

But the amount of information in this module is outstandingly valuable - it represents years of tinkering with bounce messages and trying to deal with the weird ones.  So I don't want to just start over, either, and if I port it into a different framework I'd like to be able to keep up with any further updates.

This is a common set of problems in reusable software development.  The source code/module level is not really a fantastic level of granularity for knowledge preservation - it's just better than anything else we've got yet in common use.

Another case I ran across was a translation tool called Anaphraseus - it's written as an open-source replacement for the TRADOS Word tools (essentially a work-alike) but it works only in Open Office.  I use Word, so if I want to use Anaphraseus I'd need to port it - but I'd like to be able to keep in synch with the official release because they do things that I don't think of.

In general I think of these situations as requiring a tool I think of as a "cross-parser"; they take a text in one high-level language and translate it into another, and maybe back.  They allow a continuous mapping of knowledge into two different expressions, in other words.

I need to research this general area of problems.  I'm sure it's old hat to somebody.

Specifically, though, this week, I'm proposing Email::Simple::BounceParser, which would mirror Mail::DeliveryStatus::BounceParser, perhaps through some kind of database representation in the middle.  I have no idea how that would work exactly.

In general, it would be nice to be able to define some kind of abstract module for a specific set of "knowledge" that would be crystallized as specific module instances.  This is essentially what parser parsers already do (Parse::RecDescent does exactly this); it would be nice to formalize this technique with a more visible system of expressing it.  Kind of a code template thing, I guess.

No comments:

Post a Comment