Saturday, November 30, 2013

Nintendo 3DS SD card filesystem

Here's a page describing the SD card filesystem used by the Nintendo 3DS. (I found this because my son's SD card was corrupted, to much wailing and gnashing of teeth - unfortunately, Nintendo is kind enough to have encrypted everything on the SD card using a 2048-bit AES cypher, so I can't really do anything to restore his Fire Emblem characters.)

This, again, is an interesting instance of "how to recognize information in data". Just filed for later reference.

PeerProof

PeerProof is (going to be) a collaborative proof system - iterative theorem proving. Even has an acronym, ITP.

Tuesday, November 26, 2013

Coursera courses in functional programming and reactive programming

I might do these...

Functional programming in Scala, reactive programming in Scala.

Why aren't there standards for invoices and things?

Answer: there is, kind of - EDI.

Nimrod: C + macros = cool language

Nimrod. [hnn]

Oh - here is a slideshow that makes it a lot more interesting-looking.

Boilerplate-free REST API for arbitrary databases, in Python: sandman

Now this is software engineering, folks. SQLAlchemy for database introspection, flask for the API provision, and voila, a database editor in ten lines.

Promises

A learning-by-doing post about Promises.

Messaging as a programming model

This is worth reading.

TextBlob, pattern

TextBlob is a handy-dandy text manipulation module in Python, based on NLTK and pattern. Pattern is a CLiPS module that does pattern matching in texts and it looks really tantalizing.

Analytics applied to MarioKart

This is pretty fun: Kartlytics. It's a Joyent fun article presenting their product Manta, which is an object store apparently for Big-Data-as-a-Service (BDaaS, I like that).

Cool stuff all the way down.

AppScale

An open-source Google Apps alternative. Groovy!

Modal logic

Here's a neat little tutorial/app/whatever about modal logic, with some kind of graph-editing toy built in.

The directed graph editor itself is in a block here.

More on logic: Logic in Action (open-source courseware).

CloudFlare

Another piece on the architecture pile.

CloudFlare, by the way, runs Lua on nginx as their basic architecture - and compiles to Lua for some features. Just all kinds of cool.

Mail handling in Python

"Envelopes is a wrapper for Python’s email and smtplib modules. It aims to make working with outgoing e-mail in Python simple and fun."

Learn Datalog today

http://www.learndatalogtoday.org/ - does what it says on the tin.

Unsupervised joke generation using big data

Ha!

Stanford's ML course

One of these days....

Also, Tadas Vilkeliskis's ML notes.

Mapping the ArXiv

And speaking of analysis of large scientific/mathematical structure, here's a map of the ArXiv using some kind of similarity metric that I have no idea about.

The Stacks project

The Stacks project is a collaborative, densely linked website elaborating the theory of algebraic stacks (whatever those are). It contains theorems and lemmas and stuff, in LaTeX and online, and even has some kind of query API.  This is a fascinating look at one approach to complex math.

I originally saw it due to an analysis of its complexity.

507 mechanical movements

OOoooh, this is so cool. It's a nineteenth-century book cataloging mechanical movements, with pictures; the Website is replacing them (augmenting them) with animations.

This is evocative of a descriptive grammar for machinery, and that really pushes all my buttons. I'd like to spend some time thinking it through sometime.

And while we're on the topic of machinery, have an animation of epicyclic gearing.

Saturday, November 23, 2013

In data science, why Python?

A retrospective. The short answer is basically that MATLAB changed their licensing to exclude the Fraunhofer Institutes (possibly unintentionally), and Fraunhofer people responded by rolling their own.

Interesting point made: if it takes minutes to load up your dataset, it's nice to be able to work with the heap in a dynamic manner, adding new functions to work with the data structures already in memory. Python has ways of making that easy, and of course MATLAB had that rolled in from the start.  Any REPL language can do that.

This is essentially a question of in-memory indexing of a database.

Sunday, November 17, 2013

Windows COM and OLE Automation

Remember how I was just going to get Win32::IE::Mechanize up and running again?  As if! Turns out newer versions of IE depend far more heavily on custom COM interfaces than did older IE versions, and Perl's Win32::OLE just doesn't do that.

So if I want to automate IE, I'm going to have to do that.

Unfortunately, the code for Win32::OLE is atrociously documented. Well, let's put it this way: the documentation principle used was RTFC. If you can't understand both Windows COM in C++ and perlguts manipulation of stashes in the same code, you clearly don't need to be doing COM in Perl, apparently.  It's horrible.

So down the rabbit hole I go. I've been reading a lot about COM and OLE Automation. Let me boil it down to the basic points for you.
  • Windows uses COM [msdn] as its interprocess communication standard. It's actually pretty freaking neat, but Microsoft never met a set of documentation they thought wasn't obscure enough, so most of what Microsoft has written about how to use COM is horrible. And of course they also have no interest in enabling the use of their technology without purchase of their programming tools, so what you do find will largely assume you're using their MFC foundation classes for C++. That, or Visual Basic. Or now, .NET.
  • COM calls communicate using interfaces. You create an object (or latch onto one running), and that object might be in a different process or even on a different machine. Windows handles all that for you; you just communicate with the interface, which might be a local stub. All data is encapsulated into VARIANTs, which are essentially little typed data blobs that work pretty much like Perl scalars.
  • Interfaces can inherit from one another - with the restriction that only single inheritance is allowed, and new methods are just appended to the list of inherited ones.
  • IUnknown is the basic interface.
  • IDispatch is the interface for OLE Automation [wp]. It's the only interface supported by Perl's Win32::OLE. Therein lies our problem. IE is no longer built to be driven through IDispatch. Why? I'll tell you why: because Visual Basic now knows how to use non-IDispatch interfaces. Simple as that. Microsoft restricted their development to IDispatch only to give VB a chance to catch up. That catching up is .NET.
  • typelib is used to define the interfaces; in the absence of a typelib you just need documentation. If you don't have documentation, you can't use the interface; I believe this is by design, as Microsoft's business model relies on the ability to provide secrecy. IDispatch provides some reflection tools, but they're weak. And of course they only work if IDispatch is implemented. Win32::OLE::TypeLib provides some interface to that (see?) but it is undocumented. Urggh.
  • An interface inherited from IDispatch is called dispatched; an interface inherited from IUnknown but not IDispatch is called custom. Win32::OLE explicitly does not cover custom interfaces; since the advent of .NET, though, Microsoft coding has increasingly wandered over there, because custom interfaces provide a reasonable namespace convention and also provide an easy upgrade path.
So if I want to rewrite Win32::IE::Mechanize to work with IE versions greater than about 7, I need to write Win32::COM, essentially from scratch.

Which is a fantastic opportunity! OK, it's not what I wanted to do in order to automate IE this year, but still - it's actually not that horrible. COM isn't as opaque as everybody makes it out to be, and I've been working towards this general thing anyway for years.

The basic guts of COM can actually be supported with very little code: I've run across a fantastic OLE tutorial written by Bartosz Milewski, who wrote and markets a neat code sharing product that works either with peer-to-peer or email connections. That's rad! Anyway, in the distant past, Milewski himself was at Microsoft, heading a product team, and had some good suggestions about how to organize COM - which Microsoft ignored. So now he has a tutorial that addresses OLE as it actually makes sense. And yeah, that's going into Perl now. Another take on how the whole thing works is here, by Chris Oakley. And there's a neat little library for C/C++ that takes a lot of the sting out of things, DispHelper.

(The rest of Milewski's tutorial is pretty salient, too, in terms of working with Windows, and honestly it should probably all end up in Perl. With a tutorial. And everything else on the Reliable Software site is equally fascinating. It's all good.)

Anyway, here's a little snippet of code [from here] showing how interfaces are explicitly referred to in VB:
'Create a html document class
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
htmlDocument.write(htmlToParse)
 
'Get all elements present in the document
Dim allElements As IHTMLElementCollection = htmlDocument.all

See that? IHTMLDocument2 isn't a type of object - it's an explicit specification of the interface to be used for the HTMLDocumentClass created. Same with the IHTMLElementCollection. Win32::OLE doesn't allow us to specify these interfaces and can't call them if we did.

Ideally, a new implementation of COM under Perl should permit not only consumption of COM interfaces, but provision as well. To a certain extent this is already solved in the event handling in Win32::OLE (which presents IDispatch to the outside world), but it's a weak solution. We can do better, if I can figure out how the whole process thing works. Then we can even register Perl to provide automation objects we can call from Word, for example. Wouldn't that be cool? Yes. Yes, it would be cool.

To make that happen the way I want to make it happen, I want to implement something like a declarative typelib provider. There's such a tool as a typelib viewer [here's the overview at AutoHotKey of the Microsoft OLEVIEW.exe tool]; not only does Microsoft provide that, but ActiveState Perl does as well. I'd much rather have one available on CPAN. That tool will be a command-line thing that outputs a report, and the report will be usable by the COM tools, both to provide an interface and to call it - and to document it.  I've been moving in that direction anyway for Office. This is the only real way to manage this stuff. This part is essentially database work.

Once that interface definition is there, we could also conceivably use it not only to do Perl things with it, but to generate code in other languages as well, for example to generate interface boilerplate in C++ a la this tool on Sourceforge or to build an XS framework for the same thing in a Perl module.

So we've got several things to provide, here:
  • Win32::COM module that wraps up just the essentials of COM.
  • Adapt or steal the other stuff that Win32::OLE provides, like typelibs, and for God's sake, let's document their use as well. Maybe we can just use them directly, I don't know. Whatever can be used instead of replaced, should obviously not be replaced.
  • Better discovery and reflection in general of COM interfaces.
  • Finally, a set of declarative tools for working with COM interfaces in Perl and possibly in other languages.
After the very first step, we can probably already get back to IEMech. Sheesh. Every time you think you're done pushing down the task stack, it turns out there's more cruft down there. I may have mentioned that I used to own a 140-year-old house. Perl is like that.

Hey, incidentally, Win32::OLE would be a great target for code understanding.

Saturday, November 16, 2013

The future of programming

An interesting way to present background notes for a slideshow, and also an interesting historical look at the 70's in computer science.

Mandelbrot set in SQL

Betcha didn't see that coming.

Intro to probabilistic programming and Bayes

What it says on the tin.

Summary of computer science

Nice summary of topics in computer science, with links to textbooks and a few topic lists.

Save the cat beat sheet

The concept of a beat sheet is kind of neat. It's a story structure, a kind of template for your plot (or movie script).

Huxley for graphical regression testing

This is a good idea (out of Facebook) - take snapshots while testing and compare them at the pixel level for regression.

Anatomy of a buggy random-number generator

Post-mortems are always neat.

Tools for treason

Here's a good point: if security tools aren't strong enough for crime, they aren't strong enough.

Prairie dog language

So ... turns out prairie dogs might be rather loquacious.

How to build Skype

Zero to Skype in nine months.

GPG tutorial

GPG tutorial for those of us who still can't quite figure out that whole GPG thing.

One-class support vector machines

An introduction.

Monads made difficult

I still really haven't got a clue what a monad is. This didn't help.

Code kata project list

Here's a neat list of 125 project ideas for programming practice.

Mailparser.io

Email data extraction as a service.

Political rhetoric generator

I love this stuff.

Handbook of Applied Cryptography

Another book!

IndieWeb for ... social network publishing business pattern ...

I dunno. Seems interesting though.

Shit for making websites

What it says on the tin.

Game Programming Patterns

Another book.

Hancock

Hancock is a language developed at AT&T in the late 90's to do searching of incoming data on a bulk basis, essentially to implement metadata surveillance. Wired (2007) and HNN commentary.

Top free-to-play monetization tricks

All about coercive monetization. tl;dr: premium currencies, hide a money game inside your skill game, reward removal, progress gates, soft and hard boosts, and ante games.

Reservoir sampling

A cute little post about interviewing for work in data science, especially talking about reservoir sampling.

July bookmarks: let's start with security

Along about July I thought I'd try to get a handle on the many bookmarks I was accumulating without blogging them, so I ended up with a bunch of categorized bookmarks that still never got posted.  Oy. And I should warn you, as I'm getting caught up with the summer's bookmarks I've been accumulating lots and lots more on various topics up here in November. Fortunately they group a little better, so they'll probably fit into bulk posts better.

To get July started off well, let's post a big bunch of security-related stuff that caught my eye in the first half of the year:

Sunday, November 10, 2013

Marelle: sysadmin in Prolog

Marelle allows you to declare packages and dependencies, then derive (and execute) the steps needed to install a given package. Very interesting.

Frog: frozen blog

Frog is another static blogging framework, this time based on Racket.

Pyret

A new language!  Pyret is a dynamic language with optional typing, first-class testing apparently, and other neat features that are intended to make it a good teaching language.

Sunday, November 3, 2013

Automating IE with jQuery

This is something I ran across in my shotgun research for IE automation. Neat approach!

Automating IE

Here's the thing. I'm a Windows user - the demands of my main industry require it, and Windows was what there was, when I was getting started in the whole computer thing. And as much as we hates IE, precious, and as much as we likes our Chrome, the fact remains that Windows automation using COM/OLE is a pretty slick setup for integrating GUI stuff.

And yeah, the state of the art for browser automation is Selenium, but for the life of me I can't get Selenium to work with Chrome on my machine, and the documentation is horribly unhelpful in that regard.

The tool of choice has always been Win32::IE::Mechanize, but it has apparently been removed from CPAN - it hadn't been actively developed since 2005, so I suppose that's probably reasonable. Win32::IEAutomation exists but apparently ceased to work with IE7.

I would consider using WWW::Mechanize, obviously, except that for what I want to do, I need the browser visual, and that's not an option with the Mech (which also doesn't support JavaScript, so ....)

So yeah. At the moment, the options are poor. I'm wondering what normal people do. (Probably Selenium, honestly.) I may just get the last active IE::Mech, which is still on CPAN but not indexed; I've asked whether I could adopt it, which might be a mistake.  We'll see.

Update 2013-11-17: Turns out the primary maintainer of Win32::IE:Mechanize is now, ahem, me. Now all I have to do is make it work again, which is sadly no trivial task.

Saturday, November 2, 2013

Code reviews

Occasionally, large, historically relevant pieces of software get released to open source (games, mostly), and code reviews are then done. Prince of Persia is one. Doom 3 is another. Both reviews by Fabien Sanglard. Good stuff. This is kind of where I want to go with the concept of an exegesis.

Reactor.js

Reactive programming in JavaScript.

Gmail: consistent rendering of UI in email

This is a weird one; Google has defined a data schema for defining UI in an email, to be rendered in the mail client. Useful for workflow, I suppose, but it seems a bit, I dunno, hyperfocused.

Macropy

Macropy does real live syntactic macros in Python using some manner of magickal tomfoolery.

Physics in Scheme

So Leonard Susskind has written a course teaching the "theoretical minimum" of classical mechanics, and HNN, as usual, delivers all interesting related topics, including Gerald Sussman's MIT "Structure and Interpretation of Classical Mechanics", or: "Physics in Scheme", essentially, where the models used are expressed in Scheme and are entirely computable, directly. That's cool.

Excision

Datomic is a kind of neat database that I haven't really looked into. The upshot is that it tracks changes in data over time; when a fact is added, it remembers when it was added, so you can restore your state of knowledge at a given point in time.

There are requirements, though (e.g. privacy laws) that require information to be removed outright - that you forget you ever knew it in the first place - and this is excision.

Big-O cheat sheet

Algorithm overview in cheat-sheet format. This is the kind of thing that would be useful in spaced rehearsal, actually...

More efficient porn streaming

A very tasty post-mortem about reverse-engineering RTMP to replace Adobe's Flash Media Server. I love this stuff!

Django REST framework

I ... don't know exactly why I bookmarked it, but here's the Django REST framework.

Music

A couple of bookmarks on music: first is a little monkish discussion of music stuff in Perl (and the lack thereof), and the other is just CSound, probably the most common basic tool out there.

Naked WordPress

A bare-bones WordPress theme for people who don't do WordPress themes. Boilerplate!

Or here's how to do boilerplating in Sublime Text. With a link to the WordPress boilerplate project.

Sleek

Analytics in Ruby. I haven't really turned my attention to analytics, which is essentially "reporting on events".

Dynamo works too hard

Damien Katz writes about scaling.

Software construction isn't construction - it's writing

This guy gets it - programming is language. Conceptually we are describing things when we write software. Not building things.  (Not that, deep down, those are that different.)

Mocky

Mocky lets you build test responses for REST APIs. Or something. My brain is a little fried today.

Varnish in five acts

A nicely written article about implementing Varnish for scaling.

noBackend

I think this is a framework for frontend apps in the browser. The idea being that you can essentially serve it up as a static page and do all the backend work from arbitrary REST APIs.

More frameworks!

More frameworks compared! Begging the question of when I'll write my metaframework.

jQuery plugin repository: unheap

Unheap. Nice.

MadLibs signup forms

Another best-practice Web design post. Signup forms work better if presented as blanks in letters. (Comparison study.)

Bayes Rule right in the language!

You've got logic in my programming language design! You've got programming language design in my logic!  Bayes reasoning in Haskell.

Graph database work in Clojure

This is interesting...

How Craig Kerstiens writes SQL

I guess I was on a modeling tear in May (I can hardly even remember May at this point; it's been a busy year).  Anyway, this is a nice little guide about how one PostgreSQL expert thinks about carefully constructed SQL.

GitHub's Linguist

Linguist is what GitHub uses to figure out the (programming) language in a given file.

DIP in the wild

So here's a really well-written article about software engineering - which paradoxically, given that I earned my bread and butter with software development for over a decade and have both a BS and and MS in CompSci, I really know very little about.

But there is a lot of good thought out there (in the Java enterprise world, mostly) about how to engineer this kind of software and model at scale. It's important stuff; out on the enterprise edge people are pushing the boundaries of the number of details a human organization can actually keep track of, and the way they do it is quite instructive in terms of how we think about software.

And this author just plain writes well.

Filed under data modeling because I don't even have a thread for software engineering, it's so far from my normal stomping grounds.

OTOY's JavaScript HD codec

I sure don't understand video very well, but browsers are getting to the point where they can essentially do anything at all.

Survey of C header files across operating systems

Build harnesses for C programs need to check for the presence of header files on any specific operating system - or do they? Coverage is actually pretty standard these days, according to Zack Weinberg, who has a survey. This is useful!

May bookmarks!

Let's blog all of May's bookmarks today!  (April's bookmarks netted me an interview, after all.)

Riot.js

Riot.js is a JS framework that minifies down to about a single kilobyte. And has a nice, clean structure as well.

Friday, November 1, 2013

badBIOS

Ars Technica has the story of Dragos Ruiu, whose Mac seems to have contracted a virus that jumps airgaps by sound. Through the microphone.  Seriously.