Sunday, November 17, 2013

Windows COM and OLE Automation

Remember how I was just going to get Win32::IE::Mechanize up and running again?  As if! Turns out newer versions of IE depend far more heavily on custom COM interfaces than did older IE versions, and Perl's Win32::OLE just doesn't do that.

So if I want to automate IE, I'm going to have to do that.

Unfortunately, the code for Win32::OLE is atrociously documented. Well, let's put it this way: the documentation principle used was RTFC. If you can't understand both Windows COM in C++ and perlguts manipulation of stashes in the same code, you clearly don't need to be doing COM in Perl, apparently.  It's horrible.

So down the rabbit hole I go. I've been reading a lot about COM and OLE Automation. Let me boil it down to the basic points for you.
  • Windows uses COM [msdn] as its interprocess communication standard. It's actually pretty freaking neat, but Microsoft never met a set of documentation they thought wasn't obscure enough, so most of what Microsoft has written about how to use COM is horrible. And of course they also have no interest in enabling the use of their technology without purchase of their programming tools, so what you do find will largely assume you're using their MFC foundation classes for C++. That, or Visual Basic. Or now, .NET.
  • COM calls communicate using interfaces. You create an object (or latch onto one running), and that object might be in a different process or even on a different machine. Windows handles all that for you; you just communicate with the interface, which might be a local stub. All data is encapsulated into VARIANTs, which are essentially little typed data blobs that work pretty much like Perl scalars.
  • Interfaces can inherit from one another - with the restriction that only single inheritance is allowed, and new methods are just appended to the list of inherited ones.
  • IUnknown is the basic interface.
  • IDispatch is the interface for OLE Automation [wp]. It's the only interface supported by Perl's Win32::OLE. Therein lies our problem. IE is no longer built to be driven through IDispatch. Why? I'll tell you why: because Visual Basic now knows how to use non-IDispatch interfaces. Simple as that. Microsoft restricted their development to IDispatch only to give VB a chance to catch up. That catching up is .NET.
  • typelib is used to define the interfaces; in the absence of a typelib you just need documentation. If you don't have documentation, you can't use the interface; I believe this is by design, as Microsoft's business model relies on the ability to provide secrecy. IDispatch provides some reflection tools, but they're weak. And of course they only work if IDispatch is implemented. Win32::OLE::TypeLib provides some interface to that (see?) but it is undocumented. Urggh.
  • An interface inherited from IDispatch is called dispatched; an interface inherited from IUnknown but not IDispatch is called custom. Win32::OLE explicitly does not cover custom interfaces; since the advent of .NET, though, Microsoft coding has increasingly wandered over there, because custom interfaces provide a reasonable namespace convention and also provide an easy upgrade path.
So if I want to rewrite Win32::IE::Mechanize to work with IE versions greater than about 7, I need to write Win32::COM, essentially from scratch.

Which is a fantastic opportunity! OK, it's not what I wanted to do in order to automate IE this year, but still - it's actually not that horrible. COM isn't as opaque as everybody makes it out to be, and I've been working towards this general thing anyway for years.

The basic guts of COM can actually be supported with very little code: I've run across a fantastic OLE tutorial written by Bartosz Milewski, who wrote and markets a neat code sharing product that works either with peer-to-peer or email connections. That's rad! Anyway, in the distant past, Milewski himself was at Microsoft, heading a product team, and had some good suggestions about how to organize COM - which Microsoft ignored. So now he has a tutorial that addresses OLE as it actually makes sense. And yeah, that's going into Perl now. Another take on how the whole thing works is here, by Chris Oakley. And there's a neat little library for C/C++ that takes a lot of the sting out of things, DispHelper.

(The rest of Milewski's tutorial is pretty salient, too, in terms of working with Windows, and honestly it should probably all end up in Perl. With a tutorial. And everything else on the Reliable Software site is equally fascinating. It's all good.)

Anyway, here's a little snippet of code [from here] showing how interfaces are explicitly referred to in VB:
'Create a html document class
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
'Get all elements present in the document
Dim allElements As IHTMLElementCollection = htmlDocument.all

See that? IHTMLDocument2 isn't a type of object - it's an explicit specification of the interface to be used for the HTMLDocumentClass created. Same with the IHTMLElementCollection. Win32::OLE doesn't allow us to specify these interfaces and can't call them if we did.

Ideally, a new implementation of COM under Perl should permit not only consumption of COM interfaces, but provision as well. To a certain extent this is already solved in the event handling in Win32::OLE (which presents IDispatch to the outside world), but it's a weak solution. We can do better, if I can figure out how the whole process thing works. Then we can even register Perl to provide automation objects we can call from Word, for example. Wouldn't that be cool? Yes. Yes, it would be cool.

To make that happen the way I want to make it happen, I want to implement something like a declarative typelib provider. There's such a tool as a typelib viewer [here's the overview at AutoHotKey of the Microsoft OLEVIEW.exe tool]; not only does Microsoft provide that, but ActiveState Perl does as well. I'd much rather have one available on CPAN. That tool will be a command-line thing that outputs a report, and the report will be usable by the COM tools, both to provide an interface and to call it - and to document it.  I've been moving in that direction anyway for Office. This is the only real way to manage this stuff. This part is essentially database work.

Once that interface definition is there, we could also conceivably use it not only to do Perl things with it, but to generate code in other languages as well, for example to generate interface boilerplate in C++ a la this tool on Sourceforge or to build an XS framework for the same thing in a Perl module.

So we've got several things to provide, here:
  • Win32::COM module that wraps up just the essentials of COM.
  • Adapt or steal the other stuff that Win32::OLE provides, like typelibs, and for God's sake, let's document their use as well. Maybe we can just use them directly, I don't know. Whatever can be used instead of replaced, should obviously not be replaced.
  • Better discovery and reflection in general of COM interfaces.
  • Finally, a set of declarative tools for working with COM interfaces in Perl and possibly in other languages.
After the very first step, we can probably already get back to IEMech. Sheesh. Every time you think you're done pushing down the task stack, it turns out there's more cruft down there. I may have mentioned that I used to own a 140-year-old house. Perl is like that.

Hey, incidentally, Win32::OLE would be a great target for code understanding.

No comments:

Post a Comment