Tuesday, December 31, 2013

Scheme. In Perl?

You know, TinyScheme is essentially in the public domain, and would really be easy to embed into Perl. In fact, you could just use Perl's own scalars as a basic Scheme type and get most of the R5RS spec for free that way...

I think I want to think harder about that. There are a lot of things I want to do in Lisp-y environments without giving up my CPAN, and TinyScheme is really, really lightweight. It's quite conceivable that you could also use the Perl module structure (and CPAN) to organize your Scheme libraries. Hmm...

But even more fascinatingly, a Scheme embedded in XS could also be used to get some introspective (and on-the-fly!) access to other XS-defined items. Say, OLE.XS, which still has me stymied in my effort to get IEMech up and running in the modern age.

So - TinyScheme is probably the lightestweight choice, but there's also Chibi Scheme that looks a more full-featured alternative, plus of course there's already Inline::Guile for serious scheming.

(I'm going to go with TinyScheme, though, and make something self-contained, a la SQLite, that already has everything you need to start with Scheme.)

Oh, here's a link on what hygienic macros are. Makes sense.

Anyway, the motivation for all that is that Sussman physics-in-Scheme course. That thing is incredibly densely written! Fortunately I'm married to a theoretical physicist who, so far, appears impressed at my staying power and is more than happy to talk about anything I find unclear. At length.

Speech synthesis

So I just found out about something called Vocaloid, which is a Yamaha product, closed source, for text-to-singing. There's nothing remotely like it in the open source world, but I suspect you could cobble it together from parts already in existence (mostly) by including the melody and timing into an existing text-to-speech engine.

Possible engines might be:
  • MARY (another project from the DFKI)
  • eSpeak - this is more or less the Linux default
  • flite, which is Festival lite.
  • And Festvox, which is the full Edinburgh/CMU Festival system.
I can only imagine that Vocaloid is a unit synthesizer with a large database; the output is pretty natural-sounding, in contrast to the state-of-the-art of truly synthetic speech.  It would be a lot of fun to play with this stuff, especially in the context of music.

Friday, December 27, 2013


More LogicBlox/Datalog stuff:
I'm lagging in my investigations.


Here's a complete markup editor for formatting posts for StackOverflow and other markup-based stuff. It's bewilderingly good.


windres is the Unix-y toolchain utility for Windows resources.

Code search engines

An HNN thread listing lots of code search engines.

A useful catalog of easing animations

Easing and bounces in JavaScript - neat!

Wordpress deployment

Genesis: a deployment tool for Wordpress.


Harp looks like a nice static site generator/server for modern languages and approaches.

Game Developer magazine

Free open archives.

Text/NLP stuff

Like clockwork, I collect links to interesting NLP stuff.

  • GATE: a General Architecture for Text Processing
  • textteaser extracts summaries with machine learning
  • Dezi is a Lucy/Lucene-alike in Perl

Sound generators

A couple of sounds for ... stuff.

  • A whole site with a bunch of tunable sound generators. These are fascinating. "Noise machines."
  • Dial-up modem sound effects, for the sake of nostalgia.

Post-mortem (literally) of the Knight Capital bug

It was a process error. Worth reading!


Again. I think. Maybe this isn't - it's really more of a statistics thing.


A beautiful icon font.

Audio analysis with image processing algorithms

This is very cool.

Machine learning to find memory leaks

This is pretty cool.

Negative captcha

A Ruby framework for building forms that are more bot-proof. This is nice from the standpoint of technique.


Website templates based on Bootstrap.


I always love anything about d3.js, so October saw three links:

Thursday, December 26, 2013


I've launched into a project at last, a terminological database toolset that's been on the drawing board for a very long time indeed (with what I hope will prove to be an accompanying business model to harness it all), and one thing that I ran across in my initial data scheme for termbases is the "context" field. Logically, that context is an ontological specification - a kind of "where am I?" in the larger scheme of the vocabulary of the full language - and it's used to draw distinctions about the terminology used in specific applications.

Well, so I delved into the available literature about ontology tools. Of which there are many.  And I hadn't really looked in many years; they've proliferated, especially in the context of the semantic web and bioinformatics, so here's a partial linkdump of some of the information that looks most promising.

  • A decent overview.
  • KIF = Knowledge Interchange Format [here] [in SUMO]. This is a declarative language with LISP syntax used to express first-order logic predicates about concepts.
  • SUMO = Suggested Upper Merged Ontology. Sort of the basic list of concepts that underlie everything else.
  • Tips on ontology development. And pitfalls.
  • A few basic tutorials about the semantic web. It's based on a graph database model (for semantic networks).
  • RDF is used to encode chunks of graph data in the semantic web (it can also be embedded in HTML, of course).
  • Ontological data about RDF documents is encoded in RDFS and OWL.
  • OntoSelect is apparently a cataloging service for ontologies found/discovered on the Semantic Web - here's a mention, but the service itself seems to be down.
  • Biology is another area where ontologies are used extensively; here, for example, is the Experimental Factor Ontology. Note that it is downloadable in OWL format. Experimental ontologies are generally available as free, open-source data, while anything with any hint of commercial usefulness is blisteringly expensive (pharmacovigilance, for example - the adverse effects ontology used for drug side effect reporting).
  • The Gene Expression Atlas is also ontology-based. This is a real-world application of something that used to be considered hard AI, and I find that pretty fascinating in and of itself.
  • Aaand a protein ontology that I've linked partly because proteins are inherently cool and partly because the legend is so pretty.
  • Bioinformatics ontologies aren't always published in OWL; OBO is a competing standard. The Obofoundry catalogs a bunch of ontologies.
  • Ontobee.org is an ontology viewer for ontologies published on the Web. Here's the display for an adverse event ontology.
There are reams and reams of information about ontologies these days. Those are the more interesting things I ran across while determining that I don't need to go into that kind of depth to do what I need to do.

Sunday, December 22, 2013


Saw a post on the Perl jobs list yesterday for an ETL expert. Apparently this is the name Big Data people give to batch data transformations: Extract, Transform, Load.

What I just call "indexing" in the greater sense.


Sequences from neural nets

Generating Sequences With Recurrent Neural Networks on the arXiv.

Viral spread model

A Scalable Heuristic for Viral Marketing Under the Tipping Model on the arXiv.


Some kind of open-source machine learning project.

Sentiment analysis

... at Stanford, using deep learning.


Schema is a Clojure DSL for defining "data shapes" - lightweight types, effectively. Looks pretty snazzy. Types are useful; heavy use of them, however, gets in the way of reuse.

Bayesian updating

... of probability distributions.

Course in Machine Learning

Another one.


Given any process, there is a method for identifying and prioritizing risks. This is the kind of systems thinking that go into the procl folder in my notes.

GPG audit

GPG is, of course, an important piece of software in the security world. It's kinda crufty and old. It probably needs an audit. Tptacek on HNN says more than that, it needs some decent code documentation, hence my idea of an exegesis, a deliteralization of sorts.

Anyway, multilevel code understanding and presentation.

Angular tutorial

Tutorial here.


Free game engine. Makes my laptop sound like a jet engine warming up for takeoff - but then essentially anything does that, so it's not really an antifeature.


An attempt (abortive, apparently) to use Bayesian techniques to detect stupidity. Obviously, this can't detect stupidity at the semantic level, but it might be able to pick up on syntactic markers of stupidity. It's an interesting exercise.


Sleep is a Perl-like scripting language for Java, which is pretty cool.


UnQLite is kind of the same thing as SQLite but for NoSQL-type document-based architectures.


A translator between English descriptions of C type declarations, and the type declarations. Pretty fascinating!


A framework for anonymous speech online.

The Architecture of Open-Source Programs

Online book. I think I linked it already - well, it deserves multiple linking.

Ginko: tree-shaped organization of text

An interesting perspective on document organization: hierarchical, two-dimensional organization of text into successive layers of detail. I kinda like it.



Maude is a rewriting logic system for doing pretty much the same things as other logic programming tools.

Brilliant vs. insane code

Here's an odd little ditty musing about a line of Python Stavros Korokisthakis (perhaps HNN's StavrosK?) ran across:
def GetContourPoints(self, array):
    """Parses an array of xyz points and returns a array of point dictionaries."""

    return zip(*[iter(array)]*3)
Hmm. Like it says on the label, it takes an iterable of points and returns an iterable of triples in order. But as Stavros notes, it's not at all obvious how it does that. You have to reason your way through it.

It's clockwork, and quite clever - and not the way people think (well, except insofar as people build clockwork and this Python in order to do things like this of course). In terms of code understanding this code is not self-documenting in any way. To determine programmer intent, we have to simulate what it does and see why it does that.

It's kind of like a syntactic artifact of a semantic reasoning process, one that we can recover (hopefully!) with careful reasoning. But the original reasoning is gone.


Random user generator

Randomuser.me gives you a random user profile - "Lorem ipsum for people". (See how this ties back to the contact management thing?)

So here's a semantic pole for ya: people. They just keep coming up all over the place. And they often share a lot of things with one another. So why don't we have a range of tools like this one, for people, for companies, etc.?

Sort of a semantic toolbox kinda thing.

Contact management

A recurring problem for everybody who deals with people. Which is ... everybody. [musings] [hnn]

Seems to me that part of the problem is that not every application of contact management requires a full-on heavy-artillery solution. So - as with many, many other domains - there is a kind of sliding scale of complexity that could be modeled using a set of mapped semantic domains.

I really think this concept is going to pay off once I have it clear in my head.


By the way, that was the first post from stuff I bookmarked in September. I seem to have a 3-month lag that is more or less constant, at this point.

Funny vs. LOL

Identifying semantic ... poles? Foci? by textual analysis. Starting with the distinction between funny and LOL in posted graphics. [slides] [hnn]

I'd like to do that with the stuff I'm linking from here. See how close my tag cloud is to the detailed semantic knobbiness.

UI patterns

So I've come back quasi-full circle, finding myself thinking of Wx-enabled smart widgets based on the wftk (seriously, it's like it's 2002 again), and last night after the laptop was off, I scribbled down the following note:
Basic mail UI against repo. Define mail-like queues as a thing.
The first sentence is a UI pattern (basic mail reader, with groups, then messages, then a document view for the mail itself), and the second is the data pattern it presents to the user (the concept of a message queue, possibly with threads, certainly with some kind of topic grouping).

This kind of pattern-based "architectural" programming could and should be carried down to the lowest possible levels of programming. That's a semantic mode of conceptualizing software.

Anyway, so I looked up "UI patterns", thinking that would get me that Yahoo effort (YUI, already noted elsewhere on this blog) - but instead it turned up UI-patterns.com, a short-lived effort by Danish web developer Anders Toxboe, an attempt to develop a UI pattern gallery/database/article focus site that seems to have gone on for 2010 and 2011 and stopped. Plenty of spam in the comments sections, but otherwise a ghost town.  Too bad, because it's pretty much what I'd like to start from in this attempt to come up with a language of UI pattern design.

Some good grist for the mill, anyway.

Filed under UI design, patterns, and architectural patterns, because I suspect the UI design has to be based on some conceptualization of the data that would be reflected in the system architecture.

Saturday, November 30, 2013

Nintendo 3DS SD card filesystem

Here's a page describing the SD card filesystem used by the Nintendo 3DS. (I found this because my son's SD card was corrupted, to much wailing and gnashing of teeth - unfortunately, Nintendo is kind enough to have encrypted everything on the SD card using a 2048-bit AES cypher, so I can't really do anything to restore his Fire Emblem characters.)

This, again, is an interesting instance of "how to recognize information in data". Just filed for later reference.


PeerProof is (going to be) a collaborative proof system - iterative theorem proving. Even has an acronym, ITP.

Tuesday, November 26, 2013

Coursera courses in functional programming and reactive programming

I might do these...

Functional programming in Scala, reactive programming in Scala.

Why aren't there standards for invoices and things?

Answer: there is, kind of - EDI.

Nimrod: C + macros = cool language

Nimrod. [hnn]

Oh - here is a slideshow that makes it a lot more interesting-looking.

Boilerplate-free REST API for arbitrary databases, in Python: sandman

Now this is software engineering, folks. SQLAlchemy for database introspection, flask for the API provision, and voila, a database editor in ten lines.


A learning-by-doing post about Promises.

Messaging as a programming model

This is worth reading.

TextBlob, pattern

TextBlob is a handy-dandy text manipulation module in Python, based on NLTK and pattern. Pattern is a CLiPS module that does pattern matching in texts and it looks really tantalizing.

Analytics applied to MarioKart

This is pretty fun: Kartlytics. It's a Joyent fun article presenting their product Manta, which is an object store apparently for Big-Data-as-a-Service (BDaaS, I like that).

Cool stuff all the way down.


An open-source Google Apps alternative. Groovy!

Modal logic

Here's a neat little tutorial/app/whatever about modal logic, with some kind of graph-editing toy built in.

The directed graph editor itself is in a block here.

More on logic: Logic in Action (open-source courseware).


Another piece on the architecture pile.

CloudFlare, by the way, runs Lua on nginx as their basic architecture - and compiles to Lua for some features. Just all kinds of cool.

Mail handling in Python

"Envelopes is a wrapper for Python’s email and smtplib modules. It aims to make working with outgoing e-mail in Python simple and fun."

Learn Datalog today

http://www.learndatalogtoday.org/ - does what it says on the tin.

Unsupervised joke generation using big data


Stanford's ML course

One of these days....

Also, Tadas Vilkeliskis's ML notes.

Mapping the ArXiv

And speaking of analysis of large scientific/mathematical structure, here's a map of the ArXiv using some kind of similarity metric that I have no idea about.

The Stacks project

The Stacks project is a collaborative, densely linked website elaborating the theory of algebraic stacks (whatever those are). It contains theorems and lemmas and stuff, in LaTeX and online, and even has some kind of query API.  This is a fascinating look at one approach to complex math.

I originally saw it due to an analysis of its complexity.

507 mechanical movements

OOoooh, this is so cool. It's a nineteenth-century book cataloging mechanical movements, with pictures; the Website is replacing them (augmenting them) with animations.

This is evocative of a descriptive grammar for machinery, and that really pushes all my buttons. I'd like to spend some time thinking it through sometime.

And while we're on the topic of machinery, have an animation of epicyclic gearing.

Saturday, November 23, 2013

In data science, why Python?

A retrospective. The short answer is basically that MATLAB changed their licensing to exclude the Fraunhofer Institutes (possibly unintentionally), and Fraunhofer people responded by rolling their own.

Interesting point made: if it takes minutes to load up your dataset, it's nice to be able to work with the heap in a dynamic manner, adding new functions to work with the data structures already in memory. Python has ways of making that easy, and of course MATLAB had that rolled in from the start.  Any REPL language can do that.

This is essentially a question of in-memory indexing of a database.

Sunday, November 17, 2013

Windows COM and OLE Automation

Remember how I was just going to get Win32::IE::Mechanize up and running again?  As if! Turns out newer versions of IE depend far more heavily on custom COM interfaces than did older IE versions, and Perl's Win32::OLE just doesn't do that.

So if I want to automate IE, I'm going to have to do that.

Unfortunately, the code for Win32::OLE is atrociously documented. Well, let's put it this way: the documentation principle used was RTFC. If you can't understand both Windows COM in C++ and perlguts manipulation of stashes in the same code, you clearly don't need to be doing COM in Perl, apparently.  It's horrible.

So down the rabbit hole I go. I've been reading a lot about COM and OLE Automation. Let me boil it down to the basic points for you.
  • Windows uses COM [msdn] as its interprocess communication standard. It's actually pretty freaking neat, but Microsoft never met a set of documentation they thought wasn't obscure enough, so most of what Microsoft has written about how to use COM is horrible. And of course they also have no interest in enabling the use of their technology without purchase of their programming tools, so what you do find will largely assume you're using their MFC foundation classes for C++. That, or Visual Basic. Or now, .NET.
  • COM calls communicate using interfaces. You create an object (or latch onto one running), and that object might be in a different process or even on a different machine. Windows handles all that for you; you just communicate with the interface, which might be a local stub. All data is encapsulated into VARIANTs, which are essentially little typed data blobs that work pretty much like Perl scalars.
  • Interfaces can inherit from one another - with the restriction that only single inheritance is allowed, and new methods are just appended to the list of inherited ones.
  • IUnknown is the basic interface.
  • IDispatch is the interface for OLE Automation [wp]. It's the only interface supported by Perl's Win32::OLE. Therein lies our problem. IE is no longer built to be driven through IDispatch. Why? I'll tell you why: because Visual Basic now knows how to use non-IDispatch interfaces. Simple as that. Microsoft restricted their development to IDispatch only to give VB a chance to catch up. That catching up is .NET.
  • typelib is used to define the interfaces; in the absence of a typelib you just need documentation. If you don't have documentation, you can't use the interface; I believe this is by design, as Microsoft's business model relies on the ability to provide secrecy. IDispatch provides some reflection tools, but they're weak. And of course they only work if IDispatch is implemented. Win32::OLE::TypeLib provides some interface to that (see?) but it is undocumented. Urggh.
  • An interface inherited from IDispatch is called dispatched; an interface inherited from IUnknown but not IDispatch is called custom. Win32::OLE explicitly does not cover custom interfaces; since the advent of .NET, though, Microsoft coding has increasingly wandered over there, because custom interfaces provide a reasonable namespace convention and also provide an easy upgrade path.
So if I want to rewrite Win32::IE::Mechanize to work with IE versions greater than about 7, I need to write Win32::COM, essentially from scratch.

Which is a fantastic opportunity! OK, it's not what I wanted to do in order to automate IE this year, but still - it's actually not that horrible. COM isn't as opaque as everybody makes it out to be, and I've been working towards this general thing anyway for years.

The basic guts of COM can actually be supported with very little code: I've run across a fantastic OLE tutorial written by Bartosz Milewski, who wrote and markets a neat code sharing product that works either with peer-to-peer or email connections. That's rad! Anyway, in the distant past, Milewski himself was at Microsoft, heading a product team, and had some good suggestions about how to organize COM - which Microsoft ignored. So now he has a tutorial that addresses OLE as it actually makes sense. And yeah, that's going into Perl now. Another take on how the whole thing works is here, by Chris Oakley. And there's a neat little library for C/C++ that takes a lot of the sting out of things, DispHelper.

(The rest of Milewski's tutorial is pretty salient, too, in terms of working with Windows, and honestly it should probably all end up in Perl. With a tutorial. And everything else on the Reliable Software site is equally fascinating. It's all good.)

Anyway, here's a little snippet of code [from here] showing how interfaces are explicitly referred to in VB:
'Create a html document class
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
'Get all elements present in the document
Dim allElements As IHTMLElementCollection = htmlDocument.all

See that? IHTMLDocument2 isn't a type of object - it's an explicit specification of the interface to be used for the HTMLDocumentClass created. Same with the IHTMLElementCollection. Win32::OLE doesn't allow us to specify these interfaces and can't call them if we did.

Ideally, a new implementation of COM under Perl should permit not only consumption of COM interfaces, but provision as well. To a certain extent this is already solved in the event handling in Win32::OLE (which presents IDispatch to the outside world), but it's a weak solution. We can do better, if I can figure out how the whole process thing works. Then we can even register Perl to provide automation objects we can call from Word, for example. Wouldn't that be cool? Yes. Yes, it would be cool.

To make that happen the way I want to make it happen, I want to implement something like a declarative typelib provider. There's such a tool as a typelib viewer [here's the overview at AutoHotKey of the Microsoft OLEVIEW.exe tool]; not only does Microsoft provide that, but ActiveState Perl does as well. I'd much rather have one available on CPAN. That tool will be a command-line thing that outputs a report, and the report will be usable by the COM tools, both to provide an interface and to call it - and to document it.  I've been moving in that direction anyway for Office. This is the only real way to manage this stuff. This part is essentially database work.

Once that interface definition is there, we could also conceivably use it not only to do Perl things with it, but to generate code in other languages as well, for example to generate interface boilerplate in C++ a la this tool on Sourceforge or to build an XS framework for the same thing in a Perl module.

So we've got several things to provide, here:
  • Win32::COM module that wraps up just the essentials of COM.
  • Adapt or steal the other stuff that Win32::OLE provides, like typelibs, and for God's sake, let's document their use as well. Maybe we can just use them directly, I don't know. Whatever can be used instead of replaced, should obviously not be replaced.
  • Better discovery and reflection in general of COM interfaces.
  • Finally, a set of declarative tools for working with COM interfaces in Perl and possibly in other languages.
After the very first step, we can probably already get back to IEMech. Sheesh. Every time you think you're done pushing down the task stack, it turns out there's more cruft down there. I may have mentioned that I used to own a 140-year-old house. Perl is like that.

Hey, incidentally, Win32::OLE would be a great target for code understanding.

Saturday, November 16, 2013

The future of programming

An interesting way to present background notes for a slideshow, and also an interesting historical look at the 70's in computer science.

Mandelbrot set in SQL

Betcha didn't see that coming.

Intro to probabilistic programming and Bayes

What it says on the tin.

Summary of computer science

Nice summary of topics in computer science, with links to textbooks and a few topic lists.

Save the cat beat sheet

The concept of a beat sheet is kind of neat. It's a story structure, a kind of template for your plot (or movie script).

Huxley for graphical regression testing

This is a good idea (out of Facebook) - take snapshots while testing and compare them at the pixel level for regression.

Anatomy of a buggy random-number generator

Post-mortems are always neat.

Tools for treason

Here's a good point: if security tools aren't strong enough for crime, they aren't strong enough.

Prairie dog language

So ... turns out prairie dogs might be rather loquacious.

How to build Skype

Zero to Skype in nine months.

GPG tutorial

GPG tutorial for those of us who still can't quite figure out that whole GPG thing.

One-class support vector machines

An introduction.

Monads made difficult

I still really haven't got a clue what a monad is. This didn't help.

Code kata project list

Here's a neat list of 125 project ideas for programming practice.


Email data extraction as a service.

Political rhetoric generator

I love this stuff.

Handbook of Applied Cryptography

Another book!

IndieWeb for ... social network publishing business pattern ...

I dunno. Seems interesting though.

Shit for making websites

What it says on the tin.

Game Programming Patterns

Another book.


Hancock is a language developed at AT&T in the late 90's to do searching of incoming data on a bulk basis, essentially to implement metadata surveillance. Wired (2007) and HNN commentary.

Top free-to-play monetization tricks

All about coercive monetization. tl;dr: premium currencies, hide a money game inside your skill game, reward removal, progress gates, soft and hard boosts, and ante games.

Reservoir sampling

A cute little post about interviewing for work in data science, especially talking about reservoir sampling.

July bookmarks: let's start with security

Along about July I thought I'd try to get a handle on the many bookmarks I was accumulating without blogging them, so I ended up with a bunch of categorized bookmarks that still never got posted.  Oy. And I should warn you, as I'm getting caught up with the summer's bookmarks I've been accumulating lots and lots more on various topics up here in November. Fortunately they group a little better, so they'll probably fit into bulk posts better.

To get July started off well, let's post a big bunch of security-related stuff that caught my eye in the first half of the year:

Sunday, November 10, 2013

Marelle: sysadmin in Prolog

Marelle allows you to declare packages and dependencies, then derive (and execute) the steps needed to install a given package. Very interesting.

Frog: frozen blog

Frog is another static blogging framework, this time based on Racket.


A new language!  Pyret is a dynamic language with optional typing, first-class testing apparently, and other neat features that are intended to make it a good teaching language.

Sunday, November 3, 2013

Automating IE with jQuery

This is something I ran across in my shotgun research for IE automation. Neat approach!

Automating IE

Here's the thing. I'm a Windows user - the demands of my main industry require it, and Windows was what there was, when I was getting started in the whole computer thing. And as much as we hates IE, precious, and as much as we likes our Chrome, the fact remains that Windows automation using COM/OLE is a pretty slick setup for integrating GUI stuff.

And yeah, the state of the art for browser automation is Selenium, but for the life of me I can't get Selenium to work with Chrome on my machine, and the documentation is horribly unhelpful in that regard.

The tool of choice has always been Win32::IE::Mechanize, but it has apparently been removed from CPAN - it hadn't been actively developed since 2005, so I suppose that's probably reasonable. Win32::IEAutomation exists but apparently ceased to work with IE7.

I would consider using WWW::Mechanize, obviously, except that for what I want to do, I need the browser visual, and that's not an option with the Mech (which also doesn't support JavaScript, so ....)

So yeah. At the moment, the options are poor. I'm wondering what normal people do. (Probably Selenium, honestly.) I may just get the last active IE::Mech, which is still on CPAN but not indexed; I've asked whether I could adopt it, which might be a mistake.  We'll see.

Update 2013-11-17: Turns out the primary maintainer of Win32::IE:Mechanize is now, ahem, me. Now all I have to do is make it work again, which is sadly no trivial task.

Saturday, November 2, 2013

Code reviews

Occasionally, large, historically relevant pieces of software get released to open source (games, mostly), and code reviews are then done. Prince of Persia is one. Doom 3 is another. Both reviews by Fabien Sanglard. Good stuff. This is kind of where I want to go with the concept of an exegesis.


Reactive programming in JavaScript.

Gmail: consistent rendering of UI in email

This is a weird one; Google has defined a data schema for defining UI in an email, to be rendered in the mail client. Useful for workflow, I suppose, but it seems a bit, I dunno, hyperfocused.


Macropy does real live syntactic macros in Python using some manner of magickal tomfoolery.

Physics in Scheme

So Leonard Susskind has written a course teaching the "theoretical minimum" of classical mechanics, and HNN, as usual, delivers all interesting related topics, including Gerald Sussman's MIT "Structure and Interpretation of Classical Mechanics", or: "Physics in Scheme", essentially, where the models used are expressed in Scheme and are entirely computable, directly. That's cool.


Datomic is a kind of neat database that I haven't really looked into. The upshot is that it tracks changes in data over time; when a fact is added, it remembers when it was added, so you can restore your state of knowledge at a given point in time.

There are requirements, though (e.g. privacy laws) that require information to be removed outright - that you forget you ever knew it in the first place - and this is excision.

Big-O cheat sheet

Algorithm overview in cheat-sheet format. This is the kind of thing that would be useful in spaced rehearsal, actually...

More efficient porn streaming

A very tasty post-mortem about reverse-engineering RTMP to replace Adobe's Flash Media Server. I love this stuff!

Django REST framework

I ... don't know exactly why I bookmarked it, but here's the Django REST framework.


A couple of bookmarks on music: first is a little monkish discussion of music stuff in Perl (and the lack thereof), and the other is just CSound, probably the most common basic tool out there.

Naked WordPress

A bare-bones WordPress theme for people who don't do WordPress themes. Boilerplate!

Or here's how to do boilerplating in Sublime Text. With a link to the WordPress boilerplate project.


Analytics in Ruby. I haven't really turned my attention to analytics, which is essentially "reporting on events".

Dynamo works too hard

Damien Katz writes about scaling.

Software construction isn't construction - it's writing

This guy gets it - programming is language. Conceptually we are describing things when we write software. Not building things.  (Not that, deep down, those are that different.)


Mocky lets you build test responses for REST APIs. Or something. My brain is a little fried today.

Varnish in five acts

A nicely written article about implementing Varnish for scaling.


I think this is a framework for frontend apps in the browser. The idea being that you can essentially serve it up as a static page and do all the backend work from arbitrary REST APIs.

More frameworks!

More frameworks compared! Begging the question of when I'll write my metaframework.

jQuery plugin repository: unheap

Unheap. Nice.

MadLibs signup forms

Another best-practice Web design post. Signup forms work better if presented as blanks in letters. (Comparison study.)

Bayes Rule right in the language!

You've got logic in my programming language design! You've got programming language design in my logic!  Bayes reasoning in Haskell.

Graph database work in Clojure

This is interesting...

How Craig Kerstiens writes SQL

I guess I was on a modeling tear in May (I can hardly even remember May at this point; it's been a busy year).  Anyway, this is a nice little guide about how one PostgreSQL expert thinks about carefully constructed SQL.

GitHub's Linguist

Linguist is what GitHub uses to figure out the (programming) language in a given file.

DIP in the wild

So here's a really well-written article about software engineering - which paradoxically, given that I earned my bread and butter with software development for over a decade and have both a BS and and MS in CompSci, I really know very little about.

But there is a lot of good thought out there (in the Java enterprise world, mostly) about how to engineer this kind of software and model at scale. It's important stuff; out on the enterprise edge people are pushing the boundaries of the number of details a human organization can actually keep track of, and the way they do it is quite instructive in terms of how we think about software.

And this author just plain writes well.

Filed under data modeling because I don't even have a thread for software engineering, it's so far from my normal stomping grounds.

OTOY's JavaScript HD codec

I sure don't understand video very well, but browsers are getting to the point where they can essentially do anything at all.

Survey of C header files across operating systems

Build harnesses for C programs need to check for the presence of header files on any specific operating system - or do they? Coverage is actually pretty standard these days, according to Zack Weinberg, who has a survey. This is useful!

May bookmarks!

Let's blog all of May's bookmarks today!  (April's bookmarks netted me an interview, after all.)


Riot.js is a JS framework that minifies down to about a single kilobyte. And has a nice, clean structure as well.

Friday, November 1, 2013


Ars Technica has the story of Dragos Ruiu, whose Mac seems to have contracted a virus that jumps airgaps by sound. Through the microphone.  Seriously.

Tuesday, October 22, 2013

Command lines in the browser

A lot of the tools I build for my own use are command-line-based (the various perl Shell modules are pretty useful and very quick to put together), so lately, since REST APIs seem to be a really nice way to build new functionality, I'm thinking, how can I do a command line in the browser?

Seems I'm not the first. Blue Sky on Mars posted on this topic in 2011 and even started a Google Group (which unfortunately didn't last too long).  These were more focused on shell replacements, though, which is more than I want or need.

But hey - Josh.js is perfect. And the examples are easy to understand. Add this to a Mojolicious backend and I think I've got a pretty snazzy way to organize functionality.

Sunday, October 20, 2013


Multiple Perls on one system? Perlbrew! I'd never really heard of it until today. Also: Polish URL for the win in the Perl ecosystem!

Sunday, October 6, 2013


While sleuthing around with the Trados thing day before yesterday, since the key for Studio is "LDSRClient" (which brings back no Google hits), I naturally tried to figure out what language-related thing "LDSR" might be, and found the Linked Data Semantic Repository (now renamed as "FactForge" - the "fast track to the center of the Data Web"), which appears to be a collection of various, well, linked data from the Internet, in the form of semantic chunks of some kind.

Anyway, it's interesting stuff.

It's been used to (kind of) respond to something called the Modigliani test, which is essentially, "Given the fact that there is an artist named Modigliani, tell me where all his paintings are located using public data". That's actually impossible right now, but the guys responsible for FactForge tried their hand at it using their repository, and found six of the paintings in public data - which is really impressive! - but as they say, it took a trained expert an hour to assemble the query.

So: the Semantic Web, or the Data Web - interesting stuff.

Saturday, October 5, 2013

Semantics and the Windows Registry

In the paying work, as you know, I am a technical translator these days, and that means I work with CAT tools (computer-aided translation - keeps track of stuff you've already done so you don't have to do it twice). The main CAT tool on the market is TRADOS - a name that dates it, as it sounded modern (like DOS!) when it originated in Germany in the 80's. At any rate, TRADOS has some odd ideas about rent-seeking versus more modern openness, because back then, you had to code things down to the metal without any help from anybody and by golly you felt you deserved some money for that. The upshot is that the TRADOS Freelancer version restricts a freelancer to working with five installed languages.

My problem is that I relatively frequently work with seven: English, German, French, Spanish, Italian, Hungarian, and Portuguese.

The TRADOS solution for changing the languages you currently have installed is easy: uninstall the entire tool, then reinstall from scratch; during the reinstallation process the language wizard asks you for your five languages.

Since the TRADOS install process takes twenty minutes, this means considerable effort when switching the cards you have in your hand (in Yugi-Oh terms).

I've been working with a lot of Portuguese and Italian lately for some reason, so this has really been a bother this week - and finally I decided to hit Google and figure it out. Sure enough, those languages are stored in a Registry key (the translation industry runs on Windows). In the older pre-2009 version of the software, the key is relatively understandable, while in post-2009 versions there is an opaque lengthy binary value that I haven't figured out yet, but the key is this:

You can change your language selections by diddling with the Registry.

Well. In Perl, of course, we use Win32::TieRegistry to work with the Registry; it allows you to treat Registry keys as hashes and do the obvious things to modify them. But if I want to distribute a convenient tool to make this available to the unwashed masses - and I do! - then I should probably write that tool in C/C++ with a bog-standard resource-file GUI, because distribution of Perl is not trivial enough to make it worthwhile for a simple utility.

But the model of the Registry, and therefore the model of actions I take against it, should be shared between those two approaches. At a semantic level, we are talking about the same thing. But without a semantic language, I can't really realistically do this.  And so in a certain sense, what we're talking about here is defining a model of the Registry, and then a model of the actions I want to take against it (in some abstract form, like a template or something).  The model of the Registry maps onto either Win32::TieRegistry and Perl, or it maps onto some library code and template-y stuff and C/C++, and the actions then can be expressed in either language in some as-yet-unexamined way.

That's what I'm talking about when I talk about semantic programming.

Now let's take that as a given, OK? Because I want to address an even more abstract concept along these lines.  At a higher level, the Registry falls into a "value store" bucket that could include e.g. XML in an initialization file or other kinds of configuration files. At that higher level, the choice of a specific form of storage is an architectural decision; in other words, the architecture of a program is in a sense independent of the actual program - in that we could take a given implementation of an algorithm and "translate" it from reading an XML config file into reading the Windows Registry, and everyone would agree that is the same program (in a sense). The problem is that this kind of port is essentially a fork; changes made to the syntactic-level program on one branch are very difficult to translate back into the other branch.

In a semantic programming paradigm, they wouldn't be. You'd be working at the level above the architecture, and simply "compile" the specification into specific code in different languages on different architectures with different choices made in terms of protocols, storage locations, Web server back end, database, JavaScript front end, and so on.

And that's also what I'm talking about when I talk about semantic programming.

Anyway, sometime soon I hope to build these little Registry tools. In general, it would be nice to think about the general set of all Registry value utilities and how they could be addressed by a semantic domain in this manner.

Thursday, October 3, 2013


Holy freaking Toledo, I am in love.

Sunday, September 29, 2013

Diagramming again

I think I may have posted draw.io before, by jgraph as a free demonstrator for their non-free JavaScript diagramming drop-in. The HNN thread, as usual, includes a few alternatives, none of which seem really to measure up. One commenter is using jgraph's code as part of his very cool InsightMaker tool, which uses the diagrammer to build a system, then runs simulations on that system to provide numerical graphs. (Now that is cool.)

Saturday, September 28, 2013

Database reconciliation

So a thing that comes up a lot in working with data is the need for reconciliation, where you have two datasources and you need to match them up and see whether everything in one is in the other, and vice versa. (This is overall a part of the whole data quality issue.) So here I am, today, seeing what's out there - the answer is, unless you're buying SAP, very little (there is a Perl module, kind of, for doing table comparison) - and lo! the process of data reconciliation was patented in 1999 by Qwest; later, JPMorgan Chase inherited it and later sold it off.

Can you believe that?  Can you honestly believe that somebody patented the very idea of finding records in one list and matching them with records in another list?  This is the kind of nonsense up with which we really should not put.

Some NLTK posts

Finally, the last two things I bookmarked in April (!), two posts from the same guy about using NLTK to do neat things: Build your own summary tool, and an Efficient way to extract sentence topics. Note the comment section of that latter one, with a commenter weighing in with his own Prolog sentence parser. I really, really need to spend some time in this arena.

PHP commandments

Some wisdom about PHP.


I'll just leave this here. Lots and lots of data.

Scaling Django

A slideshow.


A language built on the Erlang VM focusing on metaprogramming, with the following highlights, some of which I like:

  • Everything is an expression.
  • Specific attention paid to metaprogramming and DSLs.
  • Polymorphism via "protocols", whatever that is in this context.
  • First-class documentation of subs, using Markdown. This is neat.
  • Pattern matching.

Problems with Markdown

Here are some problems with Markdown when you get down to serious formatting with it - but the basic idea I still love.

Selections in D3

D3 is cool, and apparently has selections. (You can tell I've spent a lot of time with it...)

Tutorial site for Foundation

Foundation is yet another HTML5/CSS framework. Here is a tutorial site with clever gamification and pretty design.

Shark: ML in C++

Shark is a machine learning library for C++.

Basic quantitative finance reading list

Good stuff.

Adding types to PHP

Hack is kinda cool, a way to gradually add static types and type checking to PHP without disturbing programmer workflow. It appears to run on the HipHop VM and is gaining some traction. That's an interesting evolution.

Friday, September 27, 2013

Maps! Maps!

... map icons, anyway. Kinda neat!

Using D3 to produce SVG

More graphics stuff.

AnyYolk: example HTML5 game

Here's an HTML5 game example written on Backbone and Parse. Interesting architecture.


Astronomical calculations in Python. Very slick.

A comparison of process managers

Hey, finally some halfway modern technology for dealing with Linux sysadmin issues - here's a comparative look at some of the newer process managers available.  Very nice!


Churnalism is a tool for tracking journalistic plagiarism by scanning a database of existing text for phrases found in a given article. It's a pretty straightforward application of the open-source SuperFastMatch [git] text comparison tool, which I should probably investigate in greater detail.

Excellent happy synthetic eyesight

So back in April (yeah, I'm still chewing through April's bookmarks, can you believe that?) I bookmarked Qbix, a new social platform attempt that ... doesn't seem to have done a lot since April. And I was going to mark it as a kind of interesting point, but got distracted by a comment spam on their blog.

Excellent happy synthetic eyesight with regard to details and can anticipate problems just before
they happen.
That's kind of poetic, and Googling it turns up a lot of similar variations. This kind of thing always draws me in, because there's a template at work here that could easily be reverse-engineered, allowing us to classify the link spam and identify specific actors.

I just love that kind of plan. I really ought to do something with it.

Signs you're a good programmer

... and how to cultivate the art.

Code comprehensibility

Here's a paper asking: what makes code hard to understand?

Good question...

Machine learning for link spam

Double-whammy for tickling my fancy: a blow-by-blow account of applying machine learning to detection of link spam.

Usability checklist

This checklist for Website usability is a goldmine of best-practice information!

Code organization for AngularJS

Oh, a code project template post!

Here's the thing. All code organization schemes provide a mapping between filenames and semantics that reveal the semantic structure of a given project. That's pretty interesting; the code organization reflects the programmer's mental model of the project.

Status page hosting

Here's a service that provides status-page hosting for your site.

NLP hacking in Python with Scripted

Here's a neat post.

H2O math for Hadoop

The H2O project provides a math runtime for Hadoop, extending it for big data, statistics, machine learning, all that jazz.

Liquid Helium

Liquid Helium provides a linguistic analysis API that does rule-based decision about various textual markers (formal register, etc.) - interesting.


Datalog is a declarative language for data, a subset of Prolog, apparently. Zef Hemel has a neat little taste-test. The company he just joined, LogicBlox, has developed a new, high-performance commercial implementation, but there are various interesting-looking open-source alternatives.


TokuDB apparently scales MySQL/MariaDB instances by improving indexing?

Conception IDE

Conception is a general IDE for assembling snippets and macros into code (that's probably an inadequate description). It looks pretty slick.

Optimization with the Excel solver

This is a fascinating little article about how to set up and solve optimization problems in Excel.

Structure of a good open-source project

Yeah, yeah. I can't resist structure descriptions.

2014-04-19: Darn. This is clearly not the right link. I wonder what I did intend to link to?

pip and virtualenv in Python

Here's something I always get confused about. Good post.

Datasets released by Google

Google has actually released a lot of interesting ML datasets. Here's a short list.

Getting started with HFT

Quantstart's starter post. HFT is no longer the cash cow it was five years ago, but it's still a fascinating intersection of statistics and big data with real-world things (for certain values of "real world").

Webspam template

Some spammer mistakenly posted the template instead of the Webspam to a comment section - here's the gist!

Thursday, September 26, 2013


Drop-in drawing board widget. This is getting pretty cool lately.

OCR in Perl

Neat little DIY article that hits my sweet spots.

Probabilistic programming languages

Apparently I was on a real language-design tear in April, too - here's a post on probabilistic programming languages, proposing semantic primitives for, well, probabilistic programming. Where do DSLs stop and plain old programming languages start? ... Good question.

I have to say, the BUGS language [that was to the old WinBUGS: here's OpenBUGS, the current project] looks pretty darned interesting - you're really using this to set up a model in a declarative manner, then invoking an engine that writes the "query results" into the original file, looks like.  I really like the cut of that jib.

Then there's Church. Wow. I think this might have Hofstadterian implications, honestly.

Sapir-Whorf on the Lua forums

Sapir-Whorf as applied to computer languages... Hmm...

Open-source quant platform

Now here's something you don't see every day.

Voice-operated queries to ... things

This is really keen - and it's pure Python, making it groovier.

Wednesday, September 25, 2013


A Python IDE written in Python.

Hoodie local-only Web app framework

Hoodie is a Webapp framework for local use only. Neat! Claims to be very fast.

Decompiling, reverse engineering tools

More on code analysis!  Apparently April was the month for it. Note that this is an HNN post, not an article; the article pointed to is not actually all that interesting but the discussion is.


Open-source code analysis and profiling tool.

PHP refactoring browser

And while we're on the topic of code understanding (which I think is kind of prerequisite to code refactoring... or related to it, anyway), here's a magical set of PHP refactoring tools written in PHP that ... you know, help move models around at the syntactic level.

Partially-powered languages

Here's an interesting polemic about, well, everything that's not Haskell and using a declarative data model for its data structures, essentially, but especially where that concerns the Java/Ant/XSLT ecology. There are some interesting comments with corrections, but I get the author's concerns.

Icon-like expression evaluation

OK, so here's a little break from the usual - programming languages differ in part not only due to their syntax, but also in the vernacular they provide for the expression of programming ideas, right? To that end, there are still some semantic atoms out there that aren't in general use (yet). Here's a paper about an evaluation system used in a research language in the 70's that permitted an interesting type of backtracking during evaluation.

Essentially, it builds on the concept of generators (like the ones offered by Python, which can deliver any number of values before they "fail", having run out of values - this is explicitly called succeeding and failing in Icon, but Python just uses an undefined return as failure, which is pretty reasonable).

If you chain generator and expression calls together with &, then Icon will try to retrieve a value from the first thing in the chain, then go on to evaluate the rest of the chain. Only if each link in the chain succeeds does the overall expression succeed; a failure at any step causes the evaluator to backtrack to the previous link in the chain. And you can assign "temporary variables" within the chain, whose values revert to the earlier value as you backtrack up through the chain.

This is really a pretty cool notion, but I have to start asking, first: what other "semantic primitives" are permitted by programming languages, and how can they be categorized in terms of ease of comprehension? How far can you go, designing a language, before people just don't get it?

Second: it would be cool to categorize this kind of semantic primitive and see how they move between languages.  If a given algorithm is expressed using such a primitive, how easy it is to "recast" the concepts into other idioms? This kind of thing is also related to the notion - often seen in Python discussions - of "idiomatic" programming, that is, programming that makes use of the community-condoned semantic primitives to achieve elegance and evidence of community membership, of "getting it".

There's a sliding scale of complexity here. Programming languages are, when you get down to it, just another human medium of expression - they're just specialized for the expression of algorithms and procedures. Are they as good as they can be? How easily can software "understand" the same things humans do?

Moose for software analysis

Aside from Moose for Perl object-oriented programming, there is also a Moose for the analysis of software. There's a book as well. Moose appears to be about the model-based facilitation of software engineering, especially in the research arena. It's Swiss, meaning that there is this FrancoGerman assumption of underlying ontologies I find nearly incomprehensible, but they appear to be doing a lot of things I want to understand as well.

So I should come back to it. Sometime when I can grok what meta-meta-modelling is supposed to be about.

A practical intro to data science

Here's a good, link-rich article for ya.

Data sharing

Caitlin Rivers deplores the current state of the art in data sharing, and offers some tips. I wonder how much could be done with some kind of semantic "data presentation understanding tool".


We want to retire the plain old generic-text diff and replace it with a programming-language aware semantic diff tool.

Sounds good!

Tuesday, September 24, 2013


Quandl is a search engine for datasets. Cool!

Media queries are a hack

(An aside - yeah, the last few posts are things other people posted in April. April is when I started throwing up my hands and storing links instead of blogging them, so there's a bit of backlog that will hopefully be working its way out into the world over the next couple of months.)

So here is a fascinating little post about responsive design and how it's focusing too much on medium instead of design situations. I like the way this guy thinks. Anyway, worth a read.


Quantopian appears to be a development platform/incubator kind of thing for amateur quants. Interesting stuff there that you could really spend some time grokking.


PSPP is an open source alternative to SPSS, IBM's statistical analysis package.

Thomas Friedman op-ed generator

I know, I know, it's just template filling - but I'm a sucker for these things. I love'em. [inspiration]

Raven Software open-sources code for Star Wars games

This is always cool stuff: a couple of games got open-sourced after Disney's acquisition of Lucas.

I'm posting this under "open source target", but my understanding of what that means seems to have drifted a little. Originally I considered open source targets to be interesting things that could be done for programming using declarative styles and semantic programming. Now I find myself also including things that could be used as existing code for the purpose of exegesis and code understanding.

This is kinda both.

Note 2013-10-10: I just now noticed I didn't link to the post in question, but it doesn't matter. Raven apparently undecided to release, and all trace of their code is gone from SourceForge. That irks me, but there doesn't seem to be anything I can do about it.

Hosting options

There have been a few new hosting options lately - it's really getting very cheap to host a server. Case in point: Digital Ocean, which provides $5 root-access IP addresses. Not much storage, granted (20 gig), but the servers in question are blazingly fast, SSD drives and multiple cores for a little more money. Outgoing bandwidth is measured in terabytes, and incoming is not metered at all.

You can set up new servers with an API call.

And that's just one such hosting company. I'm going to start tracking the ones I find under this "hosting" tag.  Right now I'm paying $60 a month for a dedicated server that's, what, seven years old and feeling it? That's just money wasted these days.

Another cheap hosting alternative I've seen lately is Uberspace.de, remarkable for being in Germany, which could be quite useful.


Wit purports to be a (voice) NLP API for arbitrary apps. I'm skeptical, but it's still a neat idea.

Email message threading

Jamie Zawinski explains his email threading algorithm here, the one used in Netscape back in the day. I love reading his work.

Sunday, September 22, 2013

Schema.org scraper

Does what it says on the tin, apparently. Interesting!


http://opencv.org/ is the open-source computer vision library I keep hearing about. There's currently a Kickstarter up for using it to interpret hand drawings of a mobile UI and generate the UI skeleton, which begs the question - couldn't I use it for sketches and concept maps?

I don't see why not!

General SEO tricks for any Website template

Here's a short list of some SEO best practices that seems pretty good.

Calculating rolling cohort retention - with SQL

This kind of trick is great stuff. I don't even know how to categorize it. Well, "data science", of course, but this general kind of algorithmic sleight-of-hand is always attractive.

Dictionary of Algorithms and Data Structures

Semantic gold mine! A long-running personal project at NIST cataloging data structures and the algorithms that use them.


A lightweight library for manipulating SVG in JS.

Data table editor for jQuery

And another nice drop-in component: a data table viewer/editor built on jQuery.

Outline editor Concord

Oh, this is nice - a drop-in outline editor component in open-source JavaScript.

Saturday, September 21, 2013

Skill trees for Webdev work

A new skill tree (cheat sheet) site for Webdev work (bentobox.io [github]) hit HNN the other day, and the hivemind came up with a couple of interesting alternatives: The Odin Project with a self-contained curriculum, and the very cute Dungeons & Developers.

Note that a skill tree is essentially a semantic map of the domain of interest. I'm just sayin.

How to build a MOBI

The SICP book has been essentially open-sourced and there are spinoffs for different formats. The Kindle version is generated using this Github project, so it would be nice to go in and figure out how the content is handled.  (It appears to reside in HTML files, oddly.)

Wednesday, September 11, 2013

Summer hiatus

Due to health issues and travel (and it's always fun when those coincide) I have not really done any programming or thinking about programming for about two or three months now.  So I'm coming back to a lot of my old ongoing efforts with a fresh eye, and today I had a strange epiphany:

I'm thinking of the platform for a given piece of software as ephemeral now.

For instance, one of the things I'm working on is a parser of English in order to automate some of the language-quality work I do professionally. I'd like to implement that on my usual machine, but for performance reasons it would be convenient to offload it onto the Parallella platform since I expect it will really benefit from it.

So I can't really write it in Perl because of platform conflict. OK, I know Perl will probably run fine on the managing processors - but the point here is not whether Perl will or won't work, the point is that I really want to develop the algorithms and then "compile" them to Perl or C or whatever, as needs require.

This is what Java purports to address, by the way.  But I'm seeing a lot of new languages that "compile" to various high-level languages, notably JavaScript and C, and maybe this is a new modality.

Maybe what semantic programming is about, I tell myself yet again, is working out the semantic content of an algorithm, expressing it at that level, then having it run in whatever platform is required - and if that means "compiling" to a given language, then in a sense it's really coding in that language. The semantic structure is expressed in C or in Perl, but at some level it's also expressed as a bunch of semantic units that could also be used to express an explanation of the code in English, or even to derive a domain-specific language for intermediate work, a set of macros or something like that.

In other words, what I'm internalizing is that in a semantic programming paradigm the computer should be doing more of the work of coding, at a level that reflects a knowledge of the underlying purpose of each part of the code. That naturally ties back into code understanding to reverse-engineer this kind of semantic structure given existing syntactic expressions, but it's output that should logically come first.

Monday, August 12, 2013

Programming by voice

This is more manipulation of emacs by voice, but still: here. Writing Python by voice with emacs...

Sunday, August 11, 2013

Gameboy emulation

For Pokemon play, one uses an emulator on the PC or other computer (or, you know, you buy an actual Gameboy, but I'm assuming you're more interesting than that).  Mostly that's VBA or VBA-M, although there are others. On Win64, VBA-M is not working for me with Pokemon Emerald, so I'm using a 32-bit VBA, but the source tree for VBA-M is on SourceForge here.

Clearly, part of this is the emulator itself and part is the UI and associated tools, so we've got to tease those threads apart. But I'm most interested in how the ROM itself works (i.e. its file structure and how all that stuff is defined). It appears to be programmed on this virtual machine in a bytecode; how does that work, and how can we pull it apart to build a new ROM or modified one? I'm pretty sure the emulator code itself is going to tell us that, but the documentation is horrible, all read-the-code-Luke with a few cryptic comments for things the authors found tricky or unexpected, I'm assuming bug fixes mostly.

Anyway, the reason Pokemon is suddenly featuring on this heretofore more general blog is that my son has a truly fascinating idea for a programming project involving emulated Pokemon. More here later if it proves feasible to do what he wants to do. But in the meantime we gotta understand and clean up this codebase, so it's code understanding to the forefront!


Clever: randomly generated bottle labels for reading in the bathroom.

Monday, August 5, 2013

The Machine Zone

Bad user-experience patterns: the Machine Zone. This is the late-night high-channel surf, the one-more-pic-on-Facebook that you zone into for hours instead of sleeping. Great for impressions, and if your metric is stickiness, you might think you're giving your users what they want.

Are you?

Filed under UI design with a little trepidation, because "user experience" is really more something for the purpose of an app, not its implementation.

Express regexes with verbal expressions

Neat JS library for expression of simple regex use cases in non-incomprehensible form. This is an interesting start, but I don't see any way to extend it to more complicated use cases (identified match outputs, replacements, alternates with any nested structure, and so on).

Although for a lot of things, honestly, regexes should be replaced by explicit grammars with named components and a match specification.

Anyway, this is a nice start and deserves contemplation.

Yegge on Perl

Steve Yegge always makes you think, even when he's doing something so preposterous as challenging the One True Language.

not that he's wrong


Yeah, I have to admit he's right. Perl is a convenient place to start with a lot of problems because as glue for CPAN it really gets you close to a quick solution for nearly anything on Earth. But ultimately I think we're going to find that any one programming language is simply not going to be sufficient for every need. That, after all, is the entire point of this very blog. So even though I have a kneejerk negative reaction to any criticism of Perl, my language of choice these past ten years or so, he's still right. Perl is an antique.

I just happen to like antiques.

And honestly, seeing the Perl expression of any idea as merely the cave-wall projection of the actual program, I have to say that starting out in Perl isn't such a bad way to start approaching a problem. It's just that I find myself drifting off the rails to language design fairly quickly in the course of any given larger project.

Monday, July 29, 2013


A recent post hit HNN waxing lyrical about the glory days of programming and including APL. The real problem with APL today is that there isn't a whole lot of open-source support for it, but it turns out it's still used (or at least some of its direct successors are still used) extensively for financial modeling at e.g. Morgan Stanley and that kind of company.  It's still a valuable skill, actually, in that industry. And since it's valuable for the masters of the universe, it's mostly supported as $100K-per-CPU site licenses, and not so much as open source.

There are exceptions:
  • J, of course, which I'd seen before. You think Perl is line noise? J looks worse to my eye.
  • OpenAPL, which only runs under Linux but is interesting nonetheless.
  • Kona is an open-source re-implementation of K, "K" presumably being intended as a successor to J. It's still line noise, but the Wiki is nice.
  • NARS2000 is an experimental successor to APL written and maintained by one of APL's authors, and runs only under Windows (or Wine).
APL is neat. For quick, concise definition of linear algebra-type problems, it still can't be beat. As with many DSLs, I think it's a mistake to try to build all the filesystem and module stuff into it directly (except insofar as it can be used to memory-map files for extremely rapid access to large data), and I certainly think it should be embedded into some kind of semantically oriented declaration and documentation system, but the fact remains that it is neat.


Exegesis is the practice (originally in religious study) of annotating a text with its historical or other background - currently made popular by RapGenius, a tool for composing exegeses for rap lyrics and, incidentally, other texts.

It's interesting because it treats the text as an artifact that exists in a colossal web of associations and meaning, which can at least in part be explained, commented on, and clarified.  In other words, exegesis is a way of reifying the semantic web behind a syntactic object, making it explicit.

Comments in code are a form of exegesis; textual proximity stands in for explicit linking, though, and the semantic background is in natural language and thus inaccessible to the computer (not the compiler, you understand, which only cares about the syntactic object - but any automated tools we might want to use, and any deep documentation system).

So a rich code environment (like my footnote-and-section scheme) could possibly also include some kind of explicit codification of the semantics of the code. I'm still trying to work out just what it means, but there you go.

I'd like some kind of exegesis tool for my efforts to understand propaganda anyway, so maybe it would be appropriate to build one that could be used for general purposes.

Wednesday, July 24, 2013

180 sites in 180 days

This is a fun challenge by Jennifer Dewalt - she's on day 114 as I write this.

LLVM is better than assembly

Oooh - this is a nice post on writing LLVM intermediate representation instead of assembly. It's like portable assembly, really, in a representation that optimizers can work with directly, then IR compilers can convert into assembly right on the metal.

This kind of thing makes me want to take a sabbatical.

Friday, July 19, 2013

Cello - higher-level C

Cello is a different approach to high-level C programming.

Malware stored in EXIF headers of JPEG files

This is an unexpected way of providing the payload for a malware attack...

Unix command line for data science

Here's a useful post.

Regex crossword and constraint programming

So there's this neat little Regex learning site at Regex Crossword - it's fun. I worked straight through it and enjoyed the whole thing.  (HNN quote that made me laugh: "I tried to solve these with regular expressions. Now I have two crosswords.").

This led to the realization that constraint programming is actually a fun type of puzzle - and that automating it is something that was once considered AI, but is now no-true-Scotsmanned out of the domain. One example of a solution to this kind of problem using Haskell's Regex.Genex package is here. Cool stuff!

Monday, July 8, 2013


Some sort of framework for building PHP-based RESTful APIs, apparently. I like this for its semantic flavor.


I'm just fascinated by interactive documents and how they compare to Excel and that sort of interactive dataflow management tools; Tangle.js has been around for a while but it's still incredibly cool. [hnn]

Saturday, June 29, 2013

Genetic cars in HTML5

This is cool. I'm always a sucker for these things.

Lorem ipsum

Google Translate does funny things with lorem ipsum.

Wednesday, June 19, 2013


MetaC seems to be a preprocessor that uses a much nicer syntax to set up boilerplate, instrument functions, etc., while still working in standard C.


Lobster is a game programming language.

Open street maps data

Openstreetmap.org provides the same service as Google Maps, but as an open-source project. Very, very nice! Listed under data science because I don't have a better place for public databases.

Don't check - tell, in Ruby

Here's a neat idiom: instead of checking for a null pointer return, use a do construct in Ruby to either act on the return or not (if it's null). Very slick and elegant.

Particularly elegant algorithms

Interesting StackExchange topic.

Business process simulator

I'm not 100% I'd call this a business process simulator - it's really a queuing theory simulator, maybe. But whatever it is, it's damn cool.

Saturday, June 15, 2013

Responsive design for Webapps

Very nice overview.

Web workers

Browser-local JavaScript threads.

Do people actually use UML?

Survey says ... not really.

Offbrand: useful data structures for C

This is kinda neat. It's a ... library-making system, effectively, that provides generic data reference-counted data structures for C that can be instantiated on the fly.

Good sources for security knowledge

So there's an online course coming up on Web security, and HNN weighs in with useful resources for people interested in this stuff.

Using metadata to find Paul Revere

Analysis of membership data to find influential colonial terrorists.

Reverse-engineering the NSA backdoor in Lotus Notes

This is cool.


Vagrant assembles development environments to spec using (I think) VMware.

Friday, June 14, 2013

Logic programming

It's either overrated or underrated, but it's definitely of interest in Clojure (core.logic) and Scheme (the Reasoned Schemer), and I want to learn more at some point when I eventually get translation automated to the point that I can earn money while working less.

Saturday, June 8, 2013

A possible way to combine the command line with Wx

Something that's bothered me for a while with Perl/Wx programming is that I'd really rather have a terminal that's easy to manage for command input. Since the DOS box is right there, it's irritated me that I can't just use it.

Maybe I can: Perlmonks from 2010. Then combine that with AnyEvent::Run for starting other programs...