Monday, December 31, 2012

Voice recognition

Simon, an open-source voice recognition library/system.

Open source contribution

14 ways.

Also, a way to organize and compare Git repositories, apparently.  Interesting, if I understand correctly.

Clojure by koan

Nice TDD tutorial on Clojure.

Business process diagramming/definition languages

Just some leads:

  • BPMN is an open notation format.
  • Some flowchart examples.
  • Gliffy does BPMN, apparently, as a Web service - probably easier to use it than rolling my own.

New Year resolutions + gamification


Web scraping with node.js

Nice post on Web scraping, using node.js - but the techniques are pretty universal and really worth a read for any platform.

Exit traps for bash

Better bash scripts using exit traps for cleanup and error handling.


NinjaBlocks is a neat-o series of home automation blocks on the cheap.


MessagePack is a data serialization library that packs to a compact binary format.


Here's what I'm slowly realizing about how people use patterns (this iteration of this particular insight is due to an article on business patterns I ran across): "The pattern gets you part of the way to the final objective by starting you off from a tried-and-trusted place. It looks familiar, but it is not a solution in itself—it is an abstraction of the solution that you have to embellish to make it the solution that suits your needs."

Here's the insight, yet again: by using the pattern only to document the solution instead of including that description into the code structure itself, we're losing information at both ends.  First, the pattern's structuring of the code is lost once the code is written, and second, changes to the code are not reflected in the pattern, so a learning opportunity for the pattern base is lost.

The pattern is the semantic structure of the solution.  The solution can almost be seen as the syntactic expression of the pattern in this view.

Food for thought.

Website design best practices

A couple of links for check list items for site design:

Thursday, December 27, 2012

Data science architecture

Now here's a useful post: an overview of Big Data architecture.

Website copy design

A couple of articles came up recently:


Sikuli is a cute idea: a scripting language (built on Python) for graphical environments that uses optical processing to find bits of screen to automate.

Free data science books

A list of free online data science books, posted by a data scientist.  This is nice.

Cascading for the impatient

I know I mentioned Cascading once before, but there's a handy tutorial blog series now.  I should work through it, especially since there's a focus on TF-IDF, which I find myself needing lately.


Grunt seems like an interesting JavaScript-based development workflow/build automation system.  This is the kind of task automation I'm spending a lot of time thinking about these days.

Error handling in Node.

Error handling in node.js is just as weird as the rest of node.js.


I've been poking around looking at techniques for database design, because every actual application I go to design these days essentially starts with a database.  (More on this later.)  (Well, and previously, of course.)

But while looking at these things, I ran across the concept of OLAP (online analytical processing), which is essentially based on the concept of keeping a matrix of figures of high dimensionality available for fast ad-hoc querying.  The query language usually used is MDX.  (MultiDimensional eXpressions.)

I don't have much specifically to say about this, just wanted to note that OLAP exists but is apparently entirely dominated by big players.  Which is interesting.  Is it because only the big players can market successfully to the corporate buyer most interested in OLAP, or because OLAP just isn't that interesting to the open-source world?  (I find the latter improbable, so maybe there's a market for low-price OLAP.)

Saturday, December 22, 2012


This is a neat tool by the folks at MailChimp - handles incoming and outgoing mail and uses Webhooks for everything.  Mandrill.


A boilerplate extractor in Ruby to simplify scraper writing.

Selecting frameworks

Here's an interesting article on a tool to help you select a cross-platform JS framework.

This is an interesting notion.  Alternative frameworks are close to being alternative boilerplate expressions of the same underlying concepts, but different ones can have different strengths.  There are tradeoffs to selecting one over another.

For a maximalist version of this sort of thing, look at the 30+ implementations of the same app at TodoMVC.  That just cries out for a detailed analysis!

Thursday, December 20, 2012

Ember table by Addepar

Very good data table browser!  Exactly what I need for ... everything!


Graphing UI library jsPlumb - very nice.  Too cartoony for my taste, but I'm sure it's all configurable.  The point is all the heavy lifting's already done.

HTML5 Bones

Bare-bones HTML5.

Moonbase: animation builder

This looks cute!

Parse's new data browser

This is cool - I mean, it's basically Access (ha!) but still very cool.

I've been thinking again about UI programming.  I need some kind of higher-level UI specification language that can describe the way the UI acts in a platform-independent way.  Then translators into the different UI platforms.

Probably not so impressive, since half the UI platforms already have some kind of similar notion.  Still. From a semantic standpoint, this seems like where I need to be going.

Wednesday, December 19, 2012


Hey, look: a free huge ontology at!

12 online tools for big data

Big Data, as a buzzword, is starting to leak.  I wonder what will come next?  Good list of tools/startups, though.

Hilarious pitfalls of C++

This is a pretty funny article MST3K-ing the C++ FAQ.

Language concept mindmaps

Now here's an utterly fascinating comparison of the concepts used in Coffeescript, Ruby, and C++ - I truly wish there were more things like this! Comparative programming linguistics.

Tuesday, December 18, 2012

Why Scala?

It appears to fix problems.

Maybe Perl 6 will be kinda cool

You know I'm not ready for Perl 6 yet, but still.... This can be defined in Perl 6:

die "resistance is futile" if !($resistance ~~ 4.7kΩ ± 5%);

Simple HTML5/jQuery game

A useful exercise.

Put chubby models on a diet with concerns

A Rails post that - seriously - sets up a VX resonance in my brain.  I should probably learn me some Rails.

Berkeley: Big Data with Twitter

A blog post at Berkeley apparently recapping a class just completed.

Blekko crawl data on Common Crawl

Web crawl data.  Also, Common Crawl is cool.

Backbone boilerplate

How to start writing backbone server-side JS.

Blaze: compiling data science in Python

Blaze is the new - compiling - generation of NumPy.  It's fast.  I wonder why people don't just write macros that write C anyway?  (I'm thinking code quality, but seriously - that would be interesting to investigate.)


Interesting constructed language and an unbelievably Gibsonesque story about it.  (Anything featuring the countries formerly known as the Soviet Union ends up feeling Gibsonesque to me, I guess.)

How to learn data science

Version 1.38 - nice list of resources.  Including linear algebra at OCW.

Stanford Core NLP via Perl

Well this is neat!


Open-source data mining suite of some kind. With a service based on it.

Fun with graphics and music

Exhibit one, the Aphex face. Exhibit two, some music I like better, with an oscilloscope.

Combinators presented well

This may well be the most salient article I've ever seen explaining why combinators are a useful tool.  Raganwald is a pretty hoopy frood.

Sunday, December 16, 2012

VX technology

Just discovered a whole subreddit about VX tech - man, that brings back memories!  I did a summer internship at Ball State in 1983 and they had an obsolete VX5 in the basement.  I worked through the manuals in my time off - copious - and at one point managed to bring in Radio Moscow on the secondary fibrillation coils (the Danffy eigenvectors had to be accurate within four sigs, though - you had to use a special-order triangular-calibrated sliderule to get anything better than three, back then, but I recently saw an article about a guy who'd successfully simulated that tri-cal stick on a Beowulf cluster).

I had to quit messing with it when the resonance arrays heterodyned with the Dean's fillings, though, and I had to work without pay the rest of the summer to pay for the restorative dental work.

There's a Wiki, too.

Smoke testing CPAN on Windows

As a Perl programmer on Windows, I've often had moments of supreme frustration. I can't use SSH or SFTP from Perl, and I can't use some of the mail handling modules. When I got started back in the early years of the century, I couldn't even use CPAN; I had to compile my own Perl and accept the fact that tests just mostly had no chance of working. Over the past year or two, however, especially as I've come to rely on CPANtesters for my own modules, I've started to realize that one reason for this is the lack of dense testing of modules on Windows. (Yeah, Paul Evans had something to do with that realization, too.)

But then Gábor Szabó posted on Google+ that he hoped to make it to #20 on the Windows smoke tester leaderboard. (As of today, he's at #19 - congrats!)

Wait. There's a leaderboard?

[more at]

Holy schemoley, SQL::Translator exists!

Now I don't have to write it!  Manual here!  Holy Toledo, what a fantastic thing this is!


I'm getting serious - finally - about learning design, precipitated largely by this lovely article on "How to make your site look half-decent in half an hour."  Answer, mostly: use Bootstrap and then mess around with it for half an hour.  (No seriously!  That's great advice!  Plus she's got more specific tips than that, so go read it already.)

Friday, December 14, 2012


Deprecated, but interesting: a Web not-quite-framework in Clojure.

Math in programming

Here is a thoughtful article on the proper role of mathematics in hacking.

Math, done right (I am slowly beginning to understand), is semantics - the semantics of models.  The rest is just detail, and there's a great deal of attention paid to being sure your manipulations of models are correct, but essentially, "doing math" (or doing applied math, anyway) is just the creation of models.

As such, all of our current programming languages suck.  No, seriously!  They do!  This is where a declarative model language would shine - this is actually pretty close to what Haskell is, though, so probably I'm just reflecting my own bias.

I really need to learn me some Haskell.  And math.  I wish I knew where to start.

"Either" in C#

Here's a neat little article about implementing a variant "either" structure (this is a feature of Haskell that permits a flag in a structure to switch between two subtypes) in C#.

Thursday, December 13, 2012

Fast game-writing competition

Oh, look - another programming competition, this one a series of games! And a specific post listing favored components.


Punch is yet another static content builder, and looks pretty neat.

This post is tagged "Web frameworks, build systems, boilerplate".  It's interesting to see those tags converging.  Very interesting.


The other little insight I had today is that boilerplate is actually the syntactic pole of a mixed unit, in the old Langacker terminology.  Different bits of boilerplate would start to look a lot like language, but boilerplate proper is a whole chunk of "language" that has no direct counterpart in natural languages.  And yet a boilerplate outline (an architecture, perhaps) is definitely a semantic unit deserving of all the semantic love that any word might get.

Chew on that a bit.

New policy here

In the past, I've tended to open interesting tabs from HNN and leave them there until the sheer weight of Chrome bogged down my system to the point that the exhaust from the fan started charring my desk surface.  By the time I roll around to blogging them (which is the point of leaving them open), I usually have fifty to chug through and I'm left with very little of interest to say - not that this is necessarily a bad thing, because linking itself is a perfectly valid goal, but still, it's another pile of tasks that bogs down me to the point where my exhaust chars my vicinity.

That bogging down means that instead of blogging being a joyous way to express myself, it starts to look like, well, work.  Unpaid work, my least favorite kind.

So: new policy.  If it's interesting, blog it now.  If it's not interesting enough to blog before I have three tabs open, then close it.

There is an infinite fire hose of interesting things on the Internet.  If I miss some, nobody cares, not even me.  And this is a way to push the balance back away from consumption and towards production.

Semantic databases

Hmm. Z came up with a "related article" for the last post - not too related, I guess, but still - the inclusion of the word "semantics" still has a high probability of hitting something I find interesting.

The new keyword is "semantic databases", and it's set off a veritable breakdown cascade between my curiosity and the Internet.  In the interest of getting something done tonight, here's the trajectory (although in no particular order):

OK, that's enough for now.  I'm almost caught up with work.

The semantics of code

Some thoughts I've been turning over in my head the past few days involve the semantics of code.  Not the semantics addressed by the coded solution - the semantics of the code itself, which clearly do map onto the semantics addressed by the coded solution.

Here's the thing.  Each section of code is made of meaningful parts in a hierarchical structure. The parts are things like "variables", "loops", and this kind of low-level thing. By recognizing these (which, yes, are pretty close to the syntactic objects they denote) and grouping them by purpose, a human programmer can intuit the intent of the programmer.  At a low level, the intent of the programmer is something like "get data out of this file" or "sort this list".  At a higher level, we work with APIs (which themselves have an internal semantic structure) to form semantic units that are closer to human actions, like "put this record in the database" or "show this box on the screen".

Once the intent of the programmer is understood (whether correctly or not), we can ask questions about that intent.  Does the code actually meet the intent?  (The code could be wrong.)  Then we have a coding error that should potentially be fixed.  This kind of thing is a higher-level example of what static analysis does (static analysis actually does some pattern matching on the code for common errors, and warns the programmer that certain sections of the code look fishy).

Now.  At the highest level of the code, our semantic structures should look a whole lot like those expressed in the requirements and specifications documents.  These are human-readable documents that (hopefully) express the purpose of the code in a way that the programmer has implemented or is supposed to implement at some point in the future.

Here, too, errors can occur, and a semantic code checker that could understand something of English might be able to check whether the semantic structures look similar.  (And of course eventually will do just that.)

Moreover, tests could be derived from the semantic structures at the specifications/requirements level that could empirically check whether the code matches.

So that's the code-semantics aspect of my recent thinking.

Open Sauce: unlimited free testing for open-source projects

Interesting.  Interesting.

You know where I'm going with this blog, right?

I want to do textual analysis on both the blog and everything it links, then search for similar things so I can just let the blog run itself, more or less.

Instead of letting HackerNews find things for me, I can do it myself and maybe start feeding HackerNews instead of the other way around.

roots toolbox for web products

roots is a static site compiler written in node.js.

Hidden communities on Reddit

Tracing the non-explicit connections at Reddit, StackExchange, and SomethingAwful.

Wednesday, December 12, 2012

Better analytics

HNN had a pretty interesting thread up due to Analytics.js, an open-source aggregator for all your analytics frameworks calls.  There are also two open-source JS analytics tools mentioned: and OWA Open Web Analytics, both of which look pretty nice.

Here's the thing: set up your events and analysis carefully and you can use your analytics as a workflow feed.  Metrics and alarms can trigger any kind of action.  That integrates directly into your business processes, and you're halfway cyborged.  That's the goal.

Landing page checklist

Checklist is basically another word for boilerplate (just at a higher level).


Language of the week!  Magpie, a pattern-based language written on a VM (JVM and a higher-powered C++ machine that has fewer features implemented).

Development appears to be ongoing; might be worth looking at.

GYP - build builder

GYP (Generate Your Projects) is a build descriptor system.  Interesting.

Build monitoring

TravisLight is a build monitoring tool for online (continuous integration) builds.

Interesting database designer

Build your database using Excel-like tables. Online.


Whoa.  Open-source SAP replacement - and we all know what a profitable ecosystem SAP is!  This is open-source!

So ... gotta look at this in much more detail.

Update: And here is its slightly more scammy-seeming cousin VIENNA Advantage.  Apparently this one is built by a single company with what sure looks like a remote-control office (I am not free of this guilt myself, of course) who has reportedly opened their source.  If it's open access source, color me unimpressed, natch - and there are some textual weirdnesses on their cluttered front page as well that make me kind of wonder how serious they are.

But they exist.  And as such, they're data.

Update #2: dear Lord, their user interface is in Silverlight.

Design placeholders

HNN thread (lorem pixel)


A full headless browser in JavaScript.  Good for testing and scraping, apparently - worth investigating.

10 reasons for message queues

This is a neat architectural article.

  • Decoupling
  • Redundancy and scaling
  • Elasticity and spike tolerance
  • Resiliency
  • Delivery guarantees
  • Sort order guarantees
  • Buffering
  • Transparency in data flow (at the queue boundaries anyway)
  • Asynchronous communication
Very cool indeed.

I think maybe I linked once before, but the affairs of Zeus are kinda neat.

Amazon random shopper bot


Tuesday, December 11, 2012


CodeMirror is a JavaScript-based online code editor that looks pretty darned snazzy.  I'd like to combine it with other representations of the code - eventually I think this is where I'll be doing for code editing.


There's a lot of really cool stuff on CPAN.

Smaller samples -> greater variation

DeMoivre's law - and how people really don't get it. This is really a good read.


Did you know Nginx can be scripted?  I sure didn't.  With Lua!

SVGO SVG optimizer

This is kinda neat.

Sunday, December 9, 2012


A data wrangling tool from Stanford.  Data wrangling itself is actually a topic pretty near and dear to my heart.

Hacking Pokemon from inside and language security

A very cool playthrough of Pokemon Yellow that manipulates the item list to make it play a song [HNN thread] - related to "language security", which is limiting the security exposure due to overly Turing-powerful interfaces in unexpected places.

In other news, there is a whole community of people who spend their free time writing bots to play Pokemon for them.  That's pretty cool.

Data model XML

The Data Mining Group has published an XML standard "PMML" for the interchange of data models.

You know, I really need some kind of better semantic model for XML.  For myself, actually, but at the language level as well.  There's got to be a good way to think about XML as it relates to data structures or APIs or something, but somehow when I look at an XML specification I feel dry.

Building stuff with math

Great presentation of the math behind simulations, along with the github source of the presentation - doubly fantastic!

Partly based on Mentions Nature of Code, an online (and now e- and even paper!) book about Processing.js.


When scraping, removal of boilerplate is job #1.  Boilerpipe is a library to do that (one of several, of course).  It provides a sort of "de-boilerplating" step.  (And this is probably a really good way of looking at things.)

On the same topic, a really fantastic overview of web scraping here.

How to use git again

More and more I think a workflow tool for daily use is just about essential.  Some curated git tips and workflows.

The business case for APIs

Good business-level justification for APIs.

Programming language competition

Here's a fun competition series - every month, there will be a goal, and competitors will develop a programming language specifically to meet that goal.

Auto-threading compilers

Here we go: Microsoft has developed a C#-like language whose compiler can decide which parts are amenable to parallel cores.  That's kinda neat!

Update 2012-12-14: A follow-up post by the same guy explaining how it works: by declaring some data structures immutable.  He suggests we read the paper. I didn't fully realize that Microsoft Research publishes papers...  Kind of a no-brainer now that I've considered the question at all.

Friday, December 7, 2012

Coding quality

A couple of interesting posts about coding itself that I ran across:
  • Coding Horror: All Abstractions are Failed Abstractions
    The example being LINQ abstracting away the SQL, and sometimes making bad decisions in doing so.  I think in this case, my response is that an abstraction is useful as far as it goes, but a better programming system would allow you to capture all the useful levels of abstraction when and if needed.
  • Joel on Software: Making Wrong Code Look Wrong
    The original purpose of Hungarian notation.  I am blown away, having learned Hungarian notation in the second wave where it had become useless.
    But the point here is (to me) packing semantics into your syntactic code.  Wouldn't it be better to address the semantics from the start?
  • Programming like a Pirate
    A nice point made about overdoing the extraction technique - the point is for your code to explain what it does, not to have fifteen levels of abstraction everywhere.

Web API design

A white paper on how to build a Web API properly.

SQL injection for fun and profit

An absolutely fantastic article about SQL injection.  Don't treat SQL like a string! You will come to regret it.

Command and Conquer in JS

This is apparently a complete implementation of an existing RTS game Command & Conquer in HTML5/Javascript.  Which is really pretty amazing.

Data exploration

Data exploration in Unix.

But a few days later, the O'Reilly "Try R" class at CodeSchool was posted on HNN and prompted a rich outpouring of data science linking (and book recommendations, too).  MIT OpenCourseWare statistics class, and an online book on statistics and R, stand out.  Also the recommendation not to learn R without a simultaneous good grounding in statistics itself, because R is first and foremost a statistics platform.


frothkit is a Web application framework for Objective-C.  And Gnustep is an Objective-C platform for the rest of us.

Monday, December 3, 2012

Inversion of control

Now here is an absolutely fascinating Perl module, Bread::Board, which implements "inversion of control".  It's a way to organize a large software system that permits you to define services, then allow them to be invoked and constructed in the right way without your needing to worry about the detail.

It's a declarative framework, is what it is.  Study the hell out of it.

Don't repeat yourself

DRY.  I'd seen it bandied about.  It's a good principle - but essentially it's one that you can only implement in LISP or another macro-enabled language.

Python productivity

One of Python's key features is its simple expression of some very cogent mental structures.  Here's a nice short article on that:

  • Dictionary and set comprehensions
  • Counter objects
  • JSON pretty printing
  • Quick and dirty Web service
(Note that I've construed the article a little differently from its author's intent.)

The secret of success: suck less

A really nice point by one of the authors of BugZilla: suck less over time.  This should be incorporated into any software maintenance process.

Understanding Fourier transforms

An intuitive explanation of Fourier transforms, with comments contributing two different animations.  That's the kind of thing I like to see humanity doing!

Rx - nascent schema for JSON/YAML


Rich text editing

I'm kind of kicking the notion of rich text source code around in my head, although I'm honestly not sure it's a good idea.

The cheapest, easiest way for me to start doing this would simply be to kick some RTF out of Perl into a notes file and call Word on it.  If I go that route, Perl has some RTF support (RTF::Tokenizer) that will probably do what I need it to do.

Alternatives would be to be a different rich text editor.  Scintilla isn't it - but Wx does include a rich text control.  That's one route.  The other would be to run as a server and pop up a CKEditor window.  I have to admit I really like this notion.  (See also Maplat's DocsTextEditor; Maplat in general looks pretty cool and not entirely unlike what I'd like to do in this arena, in terms of the document management aspects.)

The obvious advantage to using a little server to serve up an app is that it's getting to be a lot easier to write UI as HTML5, and of course you get the remote option then.


Wow!  Sirea is a Haskell framework for Reactive Demand Programming - tl;dr of that being "better Excel".  Kind of where I've wanted to go for years with dataflow programming, long before I got entangled with Decl.

But the Sirea project has immense amounts of ramifying thought behind it, to the extent that it must be made part of the Decl canon.

Saturday, December 1, 2012

Decision trees

So I've been doing various little bite-sized programming tasks (at HackerRank and Rosalind, both of which I heartily recommend) and thinking hard about what I'm doing with them, along with getting back into the literate programming game to do this.

One of the things that came up this week was decision trees.  Here's the thing - complex if-then-else structures come up pretty frequently in even the simplest of tasks, like serialization of data structures into domain-specific formats (game boards, for example).  And that kind of code, to me, is a write-only thing - given that I'm doing these things in little bits here and there and otherwise dealing with paying (non-translation) work and family, my rule is that everything I do has to be instantaneously comprehensible in bite-sized chunks.

So I realized that any if-then-else structure deeper than a single level should really by rights be expressed as a decision tree.  A decision tree is a (to me) more declarative, domain-specific way of expressing some varieties of rules.  Naturally, I thus came up with a simple expression language for decision trees and planned my first literate-programming macro - "decision".  This macro reads the decision tree description and outputs target code expressing it.  I am writing my own preprocessor, one that will work on both C and Perl - as well as any other language you choose.  We'll see whether it pays off, without necessarily taking me quite so far as a full Prolog-like unification engine.  (Although come to think of it, it could be extended to express just such rules...  That would be interesting.)

And so I did another absolutely fascinating random walk through programming land.  Here's what I found:

  • SmartDraw is a dandy target application - it allows you to draw diagrams and charts that are data-accessible, which I find absolutely fascinating.  It includes decision trees as one of its chart types.
  • One can, of course, learn decision trees, and there is plenty of literature available about that.  Note on the latter: references Milk, a machine-learning toolkit in Python.
  • Python Enterprise Application Kit does decision trees.  That looks like a dandy place to mine for ideas.  As does IBM's corresponding thing, which of course is not open source, but probably way more ramified.
  • Finally, Methods and Tools magazine (cool!) has a teeny little article about decision support systems from 2004.  That one reminds me of decision tables, which are essentially just an alternative way of expressing decisions that are somewhat less restricted than trees.  But good stuff!  Again, one of the benefits of decision tables is that they're human-comprehensible and, if well-designed, can easily be used to allow very detailed configuration by non-programmers.
  • As though to illustrate that very point, here is a timely and political Greek bond default decision tree that has been used to illustrate a complex situation in a rather tidy way.
  • And here's a nice article on the use of decision tree learning in gauging party affiliation, which is political but slightly less timely, given it's December.
So that's our random walk for the day.


I always love Processing stuff.

Thursday, November 29, 2012


I just got an email for a survey about open-source frameworks.  (Examples: Spring Framework, Ruby on Rails, django, CakePHP, Zend, JUCE, etc.)

I guess, to be honest, I hadn't really conceptualized frameworks as an independent category.  Interesting.

Wednesday, November 28, 2012

Kickstarter for an app builder

Build apps in the cloud, apparently.

Ebook publishing process

Good example of a build framework.

Function grapher play-by-play

Neat blog post series about writing a function grapher, including the parser.

JavaScript goodies

I love this stuff.

Tuesday, November 27, 2012

Doom is open source

Open-source target: Doom.


So I looked up LINQ, and it turns out it's just a better interface for query manipulation.  Kind of like Data::Table::Lazy.  But it's got drivers for everything in the world.  So it's worth reading about.

Catch the cat

Here's a cute little Flash game that is both surprisingly solvable and surprisingly challenging.  Very bare-bones - you have a grid of circles and click them to keep a cat from escaping the grid.

It would be a good AI target, actually.  I think this may be a case where it would be interesting to explore simple semantic structures of some kind (grouping areas into some kind of cage element or something). I'm not sure.  But it would be interesting to explore.

Sunday, November 25, 2012

Learning algorithms in Haskell

Hmm, this makes sense for Haskell, I think - really, anything mathematical makes sense for Haskell.  Note to self: really.  Learn you some Haskell.

Also: this.  I think the time has come.

Friday, November 23, 2012

Journal on Data Semantics

So this is a thing.

Random walk through coding country

Man, there is a lot of stuff out on the wild Internet these days.  Here's a brief trajectory of interesting things.

Parsing C

So I'm taking another stab at writing a quasi-literate-programming tool, which, as I am writing things in C with it, requires a credible C parser to find declarations of stuff.

And while Perl has lots of C parsing tools of varying quantity, including a sample with Parse::Eyapp (which is quite fascinating in its own right), none of them are easily adapted - with the exception of the Inline tools.  Inline::C::ParseRegExp, for example, which does exactly what I want it to - find declarations of stuff.

Python, though, has pycparser.  (And of course, Perl has Inline::Python...)

And then, as always, there is Marpa.  I still have a big fat to-do on my list that says "Learn Marpa".  There's a new set of tutorials on Kegler's blog.  I need to work through those.

Update: I realized I was wrong.  I don't actually need a parser - just a tokenizer for C.  This is because all I need to do is cross-reference all identifiers, and the job is done.

Citation indexing

I can't remember whether I noted this when I saw it first, but: citation networks again.  Oh.  Citation analysis is a tag, so I guess I did.

Comparison of Python Web frameworks by code complexity

I keep meaning to do something meaningful about code complexity analysis, too.

Extremist programming

Here's a good point: programming in extremist languages (everything is X) is a good way to expand your understanding of programming.

Automated proofs and Coq

I ... need more sleep before really understanding what Coq can do, but apparently it's a theorem proving assistant, by which I understand it to be a system that can check your proofs for consistency if you stick to a machine-readable format for describing them.  I guess?

Sunday, November 18, 2012

Code editors again

So somehow I ended up with a bunch of tabs open with code development environment articles.
  • IDEs are important to Java because Java has lousy code arrangement - lots of tiny files with framework-induced nomenclature.  So the IDE is a navigational tool.  That squares with my memories of working with Visual Studio back in the day - vast amounts of boilerplate and Studio was really necessary to find the good stuff.
  • Textadept is a programming editor written mostly in Lua, that also uses Scintilla as the editor component.  This makes it kinda like Padre (no doubt why it has no Perl tools).
  • Zen Coding is ... typing acceleration for HTML and CSS.
I have some pretty decent thoughts about these links, but it's late and the thoughts are rather inchoate.  Short version: every set of code is a text that is written in a formal language in order to express some carefully defined syntactic structures that can be translated into code or actions.  But - and this is again not a new insight - those syntactic structures are a reflection of the deeper semantic structures in the programmer's mind as she comprehends the problem to be solved.

The actual program may or may not solve that problem (hence the need for good testing), but its intent is to do so.  In reading code, we attempt to discover that intent and reconstruct the deep semantics.  What I'd like to do in an IDE and/or editor is to maintain something approximating those semantics in a structure during editing.  As that toolset improves, you could communicate with the editor on a higher level, interacting with the semantics and letting the toolset manipulate the specific code.

Because you don't actually care about the code any more than you care about assembler.  (Unless you care about assembler, but that's a different point.)  You want to solve your problem.

By looking at competitive programming problems and problem statements, I hope I'll be able to have small enough and abstracted enough snippets of semantics that it will be realistic to think about how the semantic comprehension of the problem statement is translated into program structure.

House Republicans gain my respect for twenty hours, blow it

So the House Republicans released an excellent position paper entitled "Three Myths about Copyright Law and Where to Start to Fix It."  Those myths are as follows:
  • The purpose of copyright is to compensate the creator of the content
    (It is actually "to promote the progress of science and the useful arts")
  • Copyright is free market capitalism at work
    (It actually provides a guaranteed, government-instituted, government-subsidized content monopoly)
  • The current copyright regime leads to the greatest innovation and productivity
    This can't be refuted in a sound bite, so read the paper.  It's quite effectively written.
Oh, except you can't read the paper in its original place provided by the United States government, because less than 24 hours after its release, the MIAA and RIAA went ballistic and demanded it be retracted.  And those freedom-loving, Hollywood-support-free Republicans! They just refused!  Oh, no, wait, they did exactly what their corporate masters told them to do - and issued an apology for having mistakenly been too open about how America could benefit, but won't.

If you actually want to know what actual research by the Republican Party came up with about copyright, you'll need to check out one of the mirrors. [here] or [here]  Because today's American government is not there to provide you with research - it's there to protect money.  From you.

Entire Windows API in JavaScript

A Github gist - I don't actually know how you'd use it to access the Win32 API, but it's valuable as data alone.

Programming competitions

OK, so there is such a thing as programming competitions, which I knew.  [Stanford course]  TopCoder is, in fact, structured as a set of programming competitions that lead to a complete product, which is kind of a neat idea, although exploitative of third-worlders if you ask me.  (I'm still going to poke around it for a while - I need to learn Java anyway.)

Anyway, so here's the thing.  All these take fairly clear statements of a problem, add domain knowledge that is fairly restricted, and output code.

If you actually read this blog - in which case, who are you? - I don't even need to finish.  You're already ahead of me.

Here's an online judge, at Peking U.

Penguin Puzzle

This is cute. Also, it's all JavaScript, which is utterly fascinating.  Makes my laptop sound like it's readying for takeoff.

Saturday, November 17, 2012

Open source job boards

I'm not at a point right now where I can do freelance work in open-source very effectively, but here are two places to look when the time comes:
I should check for German-language ones, too.

Also: Topcoder.  I'd vaguely heard of it.

Learn visualization!

A recommendation to people new to scientific programming: learn visualization tools, especially fast ones.  The post has some nice specific recommendations, including R.

This is getting to be kind of a common thread lately in what I'm reading.  Ooh, Z has come up with something nice here: MetaSee for metagenomic visualization (open source).

Rewriting Reddit

Thoughts by Aaron Swartz on the occasion of the Reddit rewrite from Lisp to Python using his  The short version is that Python does have a lot of frameworks [see], but that they all suck, essentially.  He has some interesting things to say about Django, for example.

AI Sandbox

The dudes at Guerilla Games (no, I never heard of them, either) have released AI Sandbox, a neat platform for writing in-game AI, with a contest to write a Capture the Flag captain.

That is the very definition of cool.

I may find the time - somehow - to wedge that into my day this month.

One of the prizes is a free ticket to the Vienna Game/AI conference in 2013, a September event I really wish I'd known about in September!  If we're still here in Budapest next September, I may well go on over and check it out.

Dalton Caldwell on Twitter

Dalton Caldwell always has interesting things to say about the social Internet, and today is no different - he notes that Twitter has appointed MySpace hack Peter Chernin to its Board of Directors and is essentially recapitulating the successful trajectory of MySpace into the social Internet powerhouse that it is today.  Exciting!

They are pivoting from being a service for microblogging - discussion - to being a service for passive media consumption, because that's where the money appears to be.  Facebook is, of course, famously doing the same thing.

So whence the social Internet?  Do network effects inevitably trend towards passive media consumption?  Is that just what humanity really is?

Related articles for this news are all over the map.  BusinessInsider inexplicably thinks it's a good move.  Gigaom gives a little insight into why they think so; his function is to bring street cred on the only street Twitter really needs: Madison Avenue.  The only problem is that Madison Avenue doesn't understand the Internet at all - Madison Avenue is pretty sure the only thing wrong with the Internet is that it's not more like TV, and as soon as we all understand that, the better off the world will be.

But the plot really thickens when we discover that Chernin was the head of NewsCorp - and is credited with the ratings success of FOX News. FOX News - the ratings powerhouse that singlehandedly turned the news media, which Thomas Jefferson correctly identified as crucial to the working of a functional democracy to the extent that he enshrined that necessity in the Constitution itself, into entertainment.  FOX News, which preserved the form of news without the messy, expensive content.

Is that really where the Internet needs to go?

Update: Ouch.


Another in-depth article from Steve Wittens, about animation, GL, and JavaScript.

The world moves pretty fast

How to stay relevant in it (Web programmer version). tl;dr: polyglot, change, jack-of-all-trades, keep learning, do it all (or at least understand it all).

Update: here's another similar article.


A reinvention of the Unix command line, TermKit wraps commands in a JSON HMI layer, parses commands while you're typing so that the tokens work right without all the quotes and escapes, and does a lot of other neat stuff that we should really be thinking about this century.

I don't particularly love his aesthetic choices, but the overall idea is fantastic.  Unfortunately, he seems to have run aground on development.

I probably ought to look at Haskell

Haskell seems like it's maybe not 100% ready for prime time as a production language through the whole stack [previously], but the Haskell community is looking at some really, really neat stuff.  (Quite aside from factorization diagrams.)

Case in point, and discovered while writing about factorization diagrams: embedded domain-specific languages.  (Like their Workflow module.)

Factorization diagrams

Remember those factorization diagrams in Haskell a couple weeks ago?  These guys animated them, which makes them much, much cooler.

Oh, Zemanta, very nice!  The followup article by the guy who wrote the Haskell article, listing all the zany things the Internet did with them!  Zemanta seems to be pretty liberal about "related articles", and I really don't like how they format related articles if you let them do it for you - but once in a while they find something extremely relevant.

Message-oriented programming

Here's an interesting little insight about program architecture - by breaking things down into little pieces that pass messages back and forth, maybe written in different languages to take advantage of different library availability or even running on different machines, we can concentrate better on doing things right.

Like the Unix philosophy, like the database modeling approach, the more we concentrate on the problem instead of the shape of the code itself, the better off we are.

And then for completely unrelated reasons, I stumbled over message queue systems like 0MQ again.

Wednesday, November 14, 2012


I'm trying out Zemanta, which sits in your blogging editor and analyzes your text on the fly to suggest possible relevant links and images.

It remains to be seen how useful it'll be, but at it's text analysis, I can't help but be interested.  (And they've got an API!)

So far, the only negative is that I have to use the mouse to pull my sidebar down for tags for my posts.  That's not too horrible.


Defining new languages seems to be a popular hobby these days.  "Formal" is an ML dialect.

Patching machine code

Dear Lord is this geeky.  Fantastic article.

Common machine learning mistakes

Common mistakes applying machine learning to financial modeling - but they're common mistakes made in applying machine learning to anything.  Worth reading.

Unicode for dummies

A fantastic explanatory article about Unicode.

Music player in Python

This is neat!

Python for humans

A nice, if overly Pythonic, presentation about how to structure things better for the coder in complex areas like HTTP retrieval.

Here's the thing. Python's standard HTTP retrieval is a horrible mess because it provides options for every eventuality - and in Python, we all know that There's Only One Right Way.

This is ludicrous.  I see where they're coming from, but it holds them back.  There are always a multitude of Right Ways for something as inherently as complex as HTTP retrieval.  Sometimes you need those nasty callbacks.  And sometimes - arguably, most of the time - you really don't.  Insisting that all use cases always have to be identical is ridiculous.  We don't do that in the real world or in natural language, so why force it further than it should go in a programming language?

This is the sort of thing that should be documented in some kind of literate macro style or something, with wrapper objects or something of that nature.


European payment gateway Paymill.

ccv computer vision library

This is just amazingly cool.  For computer vision, the open-source tool of choice is ccv.  The link is a blog post demonstrating tracking of an erratically moving object from frame to frame of a video.  Neat stuff!

Software development and Romney's loss

A couple of good articles about the Romney campaign's snazzy new GOTV tool, Orca, which went down in a blaze of glory, much like the iconic picture of his blimp, but for entirely predictable reasons: insufficient testing, apparently in part because of a misplaced urge to centralization and secrecy, and way insufficient user training, along with some operational foulups (sending out incorrect passwords to users, etc.)  It was a fiasco in every sense of the term.

It was downright corporate.


A new, presumably rethought, database.  Decentral, and NoSQL.

Monday, November 12, 2012

I give you ...


I've been taking a lot of my intermediate bloviating offline lately, because I wrote myself a little notes system to organize all the many threads of my life (I'll write about that at some point, but I want to let it mature a little).  So forgive me for not having mentioned that, having finished the minimum viable module for Data::Table::Lazy (an iterator-based in-memory table/matrix module), I have started work on Tree::Walker, which will walk directories and other hierarchical structures and output lazy tables, which in turn will be amenable to processing by an action framework as yet undesigned.

Note that what I've been doing in this thread lately is taking a lot of the components I wanted in Decl, and breaking them out into proper CPAN modules.  Decl will be a little syntactic sugar on top by the time I'm done with that effort - which is exactly right.


Yeah, so I started Tree::Walker.  And rather than just implement the first tree kind of thing that sprang to mind, I decided to examine what others have written about trees.  Which just leads to infinite regress, of course, so here's a short list, in rough order of discovery.

  • Tcl has TreeQL, the Tree Query Language.  I honestly don't find it very convincing.
  • Here's a much nicer approach, TQL, standing for the same thing.  TQL has a from and a select clause, with a match pattern in the from that can bind variables, and it does a kind of unification thing. Once nodes have been identified, the select clause determines what to return.  I really like this approach.
  • ANTLR does tree grammars, but here's an article I found deploring the practice.
  • TDL is a Tree Description Language.
  • Our old friend TXL also returns to the field [earlier].  If you recall, TXL combines BNF to create a tree with a pattern matcher and action specifier to transform the tree into whatever we like.  It's a nice concept that I thought a lot about during Decl 1.0.
  • But right now I was wanting to work with walkers, not transducers.  So yeah, there aren't actually all that many options for a traversal algorithm: here's basically the whole list.  It consists of pre, post, and in.  The "in" option really only makes sense with binary trees.
So that's kind of the stuff I read to get my head back into the tree game.  As a result, I have a little clarity:
  • A walker has to specify prefix, postfix, or after-n-fix sorting, or I suppose provide its own traversal function to specify "what to visit next" for a given location (this can be generalized to include a queue, perhaps)
  • For each node visited, we end up with a type, a key (name, etc.), and arbitrary data
  • For each valid type, we can specify a traversal action, a data return action, and an action action.
  • Matching is built on top of traversal.  To match, we are given a multipart key (or a set of them) that matches a vertical series of nodes, and we track the key on the way down the traversal.  If it matches fully, the match's action fires.  These actions can again be actions or data retrieval actions. The driver has to be able to take an arbitrary single-node match pattern and say yea or nay for a given node.
  • The whole thing has to go into an iterator.

Friday, November 9, 2012


HackerNews has an API.

Rosalind bioinformatics sequence

Rosalind [hnn] is a really neat sequence of bioinformatics problems and a grader (it generates a data file and checks your output - you program it however you like, but you've got five minutes from getting the data to submit your response).

It's got a forum system, badges, the whole nine yards, and a branding thing if you want to use it in your class.

I'm working through the sequence - but it's a good target application, too.  But yeah, bioinformatics is kind of neat!

Intensive Gaston Unit

The Internet scares me sometimes.

Thursday, November 8, 2012

Forking data

Heroku has a neat new feature: data forks of PostgreSQL databases.  The advantage: you think less in terms of database servers and more in terms of the data.

That's a good insight.

Nice mathematically based graphics

I give you Ogre's gallery.

Gray code and Kohonen maps

A Gray code is a binary encoding of numbers where neighboring numbers differ by only one bit.  I actually independently invented this in the summer of ... must have been 1995, I guess, reasoning backwards from life events.  I implemented it in Visual Basic as an experimental memory based on Kohonen maps, then lost the code.  Every now and then I try to reconstruct it, but get lost in the Gray code concept - and today I learned it has a name!

Ah, the Internet.  Life is so much better with it.

I really need to reconstruct that research thread.  It was a good one.  The idea was to use a Kohonen map as the index for a semantic space, with semantic units encoded as vectors of keys into the map.  Thus each key self-organizes into a self-describing semantic unit that can be expanded into its components in an organic way in working memory.  Or something.  I really need to reconstruct that.

Wednesday, November 7, 2012

The Goal is to be Like a Bad Hacker Movie

This post by James Hague is what took me to his site this week.  The idea is quick turnaround, like in hacker movies.  And he's right.  Instant-on programming would be a fantastic thing to have at your fingertips, and sometimes our current tools nearly allow us to achieve it.

But data analysis with quick plots, yeah, that too.

James Hague on Visual Programming

Interesting post.  His blog is full of good stuff, though.  Anyway, the idea: graphical presentation of program structure.

A tcpdump primer

This is neat - tcpdump is a kind of scary tool but essential for networking work.


BonsaiJS is a very neat SVG renderer that runs in the browser in JavaScript.

That, my friends, makes Toonbots a whole new ballgame.  I need to revisit the Toon-o-Matic.  Seriously.


Relational programming and constraint logic programming for Clojure.  Good links to follow, should I ever get the time to look at this stuff again.

Data science in HFT

Oooh, financial AI, always an interesting topic!  And a great article, too.

Rx: reactive extensions to .NET and JS

So Rx is a reactive extension library that brings reactive programming to the .NET world, built on top of LINQ, which I'd also like to look at - and Microsoft just open sourced it.

It seems to me that reactive extensions are a natural outgrowth of lazy lists (plus an event subscription model for new additions to the list) and I could profitably bring them to Perl based on Data::Table::Lazy.  Just a thought.

I feel as though I'm recapitulating a lot of the pieces of Decl, only in standalone form.  This feels like a good thing - in the end, Decl will be just syntactic sugar on a lot of powerful tools, which is where it should have been in the first place (I just wasn't finding the tools I wanted).

Data science in the Obama campaign

Time has an interesting article on the Obama campaign's use of data analysis to target their ad dollars (including his appearance on Reddit, of course, which cost him nothing but time).  What I find fascinating is this general trend towards science - data analysis in the campaign, poll analysis by Nate Silver - that is proving itself in the public mind in a much more direct way than the old model of personal genius doing the incomprehensible.

Strange days indeed.

Monday, November 5, 2012

Target application: Bonitasoft

It says "open source" on the site's title - but I see no indication that it actually is.

Anyway, my throat still clenches when I see this stuff.  I still want to finish the wftk, twelve years later.

Interesting big data/textual analysis blog

Another Word For It.  The author is interested in topic maps, an automatic semantic structuring technique for large document collections.

My problem: I have never developed the habit or a toolset for consuming news.  If something bubbles up on HNN, that's historically been enough for me (before HNN, it was just whether I encountered something in other online communities; HNN is a lot more efficient in finding things I like, though).

I need to do something about that.

Update 2012-11-06: And then I find CPAN module "TM" (topic maps) today - freaky!

Sunday, November 4, 2012

Datev Mittelstand Compact Pro

Target application.  If Datev can sell this to Germans, there's a reallly big market for small-business accounting software.

Saturday, November 3, 2012


Target application - QuestionsThree [blog] [hnn] is a nifty gadget that gives you a local phone number that people can call to buzz you into the gate, then texts you that they've done it.  Very neat!

Slick little idea; now the builder is wondering 2. ??? 3. Profit, as one does.

Tower Defense generator

There's not even a Website for this project - I followed links through the wild vastness of the Net to find it.  But - cool!  Generate Tower Defense games using a vocabulary and analytic framework!  I love that kind of stuff.

How to build a Web app from scratch

Boilerplate! [github]

Base components: Coffeescript, jQuery, Underscore, Backbone, Handlebars (template engine), Less, WordPress API for content entry.

Nice overview!

A Django REST framework

APIs and Web frameworks: a Django REST framework.

Generative graphics in JS

Amusing little JS generative landscape; lacks happy clouds.

Django drip

A good real-life, but small, Django application that's open-source and thus ready for understanding, Django drip is for management of email drips.

Hacking language learning

Another interesting way to hack the language learning process.  HNN seems interested in these lately...

UI gallery


(A note: the "boilerplate" tag isn't just for boilerplate, but also the bits you plug into the boilerplate, I guess.)

Mining massive time series

Some very interesting aspects of time series similarity.

Infinite Gangnam Style

  1. Analyze Gangnam Style to split it into beats
  2. For each beat, see the other most-similar beats that follow it
  3. Make a big transition network
  4. Animate frames from the video to match your beats
  5. Put it all on the Web as Infinite Gangnam Style
  6. ???
  7. Profit!
Very, very cool.


Grepsr appears to be a consulting service slash platform for repeating web scraping.  Pricing appears high.

Study hacks

Another learning-support article, by the guy who worked through an MIT CS BS in one year, outlining how he did it.  The key insight I see is the Feynman technique: for a given idea or method you don't understand, write down a lecture on it (an article) from scratch.  When you get to a point where you're stuck, you've identified what you don't understand.  Study it in more detail.

The idea of the "learning support" thread is really not too germane to this blog (except insofar as it's a natural target for a programming system that purports to be semantically motivated).  It's motivated by two main things, though: first, supporting human performance is what programming is for, and second, understanding human learning is part of understanding semantics.

Version your APIs!

Do what this post says.


Wow.  Fabric is a Python module/utility that allows you to turn Unix commands on the local or remote system into building blocks.  This is just so cool.

Tools for coders and developers

Another one of those list articles.  I like them.

Errors vs. Bugs

An extremely thought-provoking thesis, contrasting errors vs. bugs and how the distinction is useful when learning to play piano or, actually, anything at all.

"World countries" database

Here's a neat little idea, a Web service that generates MySQL code to create and load a list of countries according to fresh data.

This could be generalized to another data-dictionary kind of gallery.  Think about that!

Coursera CS curriculum

This is cool.  This kind of curriculum should probably be a thing.


Model this: BOMs for kits.  Neat idea!

Spaghetti? Or just Big?

Here's a thoughtful article about why spaghetti code is harmful (it's too hard to understand all at once) but pointing out that spaghetti is only one way that code can become humanly impossible.  Big Code is the underlying problem: code that has gone beyond some threshold complexity that is maintainable.

It makes you want to get back to work on AI.

On that note, here's a link dump on static code analysis:

What's good about ugly Java

Finally for today's link dump, a thought-provoking article from a guy at Twitter about why Java works so well for enterprise-scale applications: among other things, the many different options for a given object let you tune a massive application to suit your needs.  The result is harder code to understand, but that's because the subject itself is hard to understand.

There are other points in this article, which I need to reread to really get, but I think that this tunability is not a final answer to the question of code beauty.  Lately, I've been thinking in terms of annotations on code, and this is another place where an annotation would be appropriate.

As I'm rapidly prototyping a system, I'd simply instantiate some sort of default object; metadata for this object, however, would indicate that it could be tuned, perhaps by including additional callbacks, perhaps simply by selecting a particular set of tuning values - whatever.  As I needed to further tune my system after learning how it works, I'd annotate that call.  The annotation could be any arbitrary setup code, but here's the key: it would be invisible - a footnote.  Then I could have a reasonable overview version of the code that I could understand on an intuitive basis, without losing the detail that is required for the grungy version, and this, crucially, is why serious books have footnotes.  Seriously!  A straight down-the-list presentation of everything at once isn't conducive to a good understanding of complex ideas in any fields, so why should programming be any different?

Footnotes.  That's what a serious coding system needs.

Functional Reactive Programming

This is really a neat idea that I've mentioned before, but reactive programming is the idea of setting up a network of values (Elm calls these signals - values that change over time) that affect one another dynamically.  This is an inherently declarative style of programming (as the second link there explicitly states).

On that same site, there's an excellent exposition of how to write a Pong game in Elm using this programming style, and a more recent article on how reactive programming avoids the proliferation of callbacks that characterize API-style and more recent non-blocking frameworks like node.js.

Elm itself is a language that compiles to JavaScript on the browser and (I think) on the server as well.  It seems to have some way to integrate with Haskell on the server, but I honestly haven't spent much time investigating it - and I should. I should investigate all the things.

Doxing a Russian hacker

This is a neat procedural.

Deep learning

Interview with one of the winning team of the Kaggle Merck competition - they did it with neural networks, which are back in the news lately.


This is pretty neat - a little Web service that allows you to compose other Web services.

API generator

...whatever that is.  Deserves investigation.

Command line tools for developers

More Unix command-line tools - some I never heard of.  I need to do a database for these.

PDF site

I think Planet PDF has essentially everything you want to know about the PDF format.  Here's a sample: PDF stamping.

Wednesday, October 31, 2012

Using Dropbox as a database

Cool idea: use JSON files on Dropbox for database storage.  Built into Opa (a server-and-client JavaScript language that I've noted before).


Circular is a Backbone.js application that uses PHP, Bootstrap, and MongoDB for storage.  It allows you to store up tweets in advance and have them sent to Twitter on a schedule. And it's open source.  Good boilerplate extraction example.

Sunday, October 28, 2012


I should learn Ruby.  Starting here.


A pretty hilarious take on the state of Haskell package management.  The highlight: Haskell people don't know when they're shooting themselves in the foot because they can't remember what it's like to have a foot without a bullet wound.

Software approaches to memory training

Language learning is a subject near to my heart, of course, so here's an interesting little piece in Salon about language software.  There's plenty of room in the market.

Also, HVPT word pair training.

Saturday, October 27, 2012


An open-source geometry library.

Condensing fact from the vapor of nuance: security

So yeah, it's possible to instrument your own virtual memory on a VM running on a shared processor in the cloud in order to detect the differences in instructions run by a SSL key decoder and glean enough (noisy) information from that to reconstruct the private keys used to do the decoding.

That just blows my mind.

Friday, October 26, 2012

Here's a to-do manager built in PHP on a NoSQL backend

This is something small enough to work as a good first step in understanding PHP architecture: a to-do list manager that stores JSON in a NoSQL database.  Cool!


Learned about a new markdown variant today, AsciiDoc [cheat sheet].  It's in Python, but that doesn't necessary mean it's bad.  (ha)  I'm considering using it for my notes application.

Also, there's talk about standardizing Markdown.  Interesting.

Thursday, October 25, 2012

How to use SSH for fun and profit

I've never been good at SSH.  Here's a great blog post about using it.

Also: ssh-copy-id.

Druid: open-source real-time analytics store

Druid has just been open sourced.  It appears to be a competitor for Hadoop.


Prose is an online editor for your Github documents.


A training workbook service.  Pretty cool, actually.  This is both a target application and a monetization strategy.

Substance Document API

This is kind of interesting.  A company named "Substance" has published an API for documents, to be used for Web applications involving group editing of documents - with annotations and a change history.

That's pretty neat.

Evolving regexps

This is neat.

History of the United States in 141 maps

DDD-based page that really rocks.

Bot links

I'm collecting too much stuff in the tabs, so here's a little link dump, some of which is redundant within this blog:

  • Scrappy (blog post by the author; it's been rebuilt on Moose)
  • WWW::Wikevent::Bot - a useful example of a bot
  • Javascript in Perl!  Seems a little static (2010) but it's better than anything I've built yet.
  • Example code for Mechanize - good for reimplementation, you see.
  • TreeBuilder.  Everything coming into Bot::Page will be parsed.
  • RDF::Scutter.  I still don't know what this does, except maybe it's gathering semantics?
  • The Spidering Hacks book (2003) from O'Reilly.  Mine it for examples.

Web scraping

And yet I just can't get past thinking about Web scraping as something fun and profitable.

If it could just be simplified, a lot.  So I'm thinking again about declarative means of describing the "shape" of a site in terms of where the useful data is - and I'm coming up empty.  Again.

The only way to get my mind around it is to build some Web scrapers.  Elance is not going to be an interesting place to find challenging scraper specifications, so I'm going to have to look at the ones on ScraperWiki and go from there.

Oh, ho! The Mechanize Cookbook is replete with interesting examples.  I shall start there.

Update: Those seem boring and old.  Instead, I've subscribed to the ScraperWiki mailing list, which involves requests to the masses.  Here's a cool one already: find all the churches in Germany, with lots of links to start with.  So yeah.

Wednesday, October 24, 2012

The state of job boards

The last time I looked at Elance was 2008, and there were reasonable amounts of jobs by people who seemed to know what they were doing.  Now I'm looking there and it's all people who have a fantastic idea that they want implemented for five bucks.  (Or so it seems.)

And none of it seems well-specified.

Maybe that particular strategy is dead.

Tuesday, October 23, 2012

Mathematics/computer algebra

So SFTP under Perl requires Net::SSH::Perl, which in turn relies on Math::Pari, which interfaces to PARI [wiki], a computer algebra system that implements a lot of number theory algorithms.  PARI is a library that is normally accessed through GP/PARI, a scripting language specifically written for it.

Here's the thing.  Math::Pari doesn't install on Windows (nothing I tried tonight installed on Windows) and, reading into it, the implementer of Math::Pari seems like a real ... ahem, doesn't seem to play well with others.  Math::Pari will only install if you have built PARI on your own machine; it requires the build directory to build the Perl.  Period.  Unfortunately, it doesn't react at all well to the current version.  Like, "Perl dies" levels of poor reactions.

I'd like to wrap PARI a little better.  Maybe Inline::GP or something, I don't know, but there is most definitely room for improvement.

But that aside, I ran across Sage again.  Sage is essentially an open-source mathematics Swiss Army chainsaw.  What it is, is Mathematica re-implemented on open-source, and it includes PARI, SymPy, and a boatload of other open-source tools of that nature.

There's a whole comparison list here.

There, my friend, is a domain crying out.

Monday, October 22, 2012

The Prime Pages

Model this.  It's a database of information about prime numbers.  Way Web 2.0 before that was even a concept (founded 1990-freaking-4, before even I got into online databases...)

Sunday, October 21, 2012


... is JavaScript in reverse.  I don't know what it means, but it all looks neat.

Saturday, October 20, 2012

Carbon: compiles to C

This is pretty neat: a really thin OO C extension (libco2) that has a higher-level C-like language defined on top of it (Carbon).  Carbon supports direct object orientation, exceptions, and some other stuff, and compiles to C.

It compiles to C.  Like Decl could compile to Perl.  (Or to C, I guess.)

Friday, October 19, 2012

Visual programming doesn't work because it fails to scale

Good post on visual programming, the point being that it can't deal with any complexity at all without becoming incomprehensible.

Although you kind of think that maybe it could still work if it's just organized well enough.  There should probably be a limit of 7±2 items on any given view.

Online JSON editor

This is kinda neat:


A macro-enabled JavaScript variant.  Now that's what I call cool!

Free texts on machine learning

Someday I'll have time to get back to ML.

Free email detector API

Interesting little service: given an email address, detect whether it's a one-time use address.  Available via API.

More jQuery plugins than you can shake a stick at

If you use jQuery, then something here is surely going to be useful.

Thursday, October 18, 2012

PostgreSQL and Redis

PostgreSQL can talk to anything.  Like Redis.

Learning things

OK, so flashcard learning is a discipline that has some serious history.  Here's a really refined flashcard algorithm that pumps words into your head.

On that note, another language learning program (a game, for Bulgarian).

Signal processing primer

What it says on the tin.

Wednesday, October 17, 2012

Rap Genius

This is slick, very slick: Rap Genius (link goes to transcript of second Romney/Obama debate) allows you to post lyrics to a song (or any other fixed text) and highlight individual words or phrases with popup notes.  (Its original name was Rap Exegesis, which I prefer.)  Neat idea!

They want to expand it to law, a great application.

Okapi workflow manager

Perusing my own blog over at Xlat-perl, I was reminded of Okapi's workflow manager.  It strikes me that this is where I want to go with the "low-level business process language" thing of files and directories and FTP servers I envision.  Tagged as "workflow" for lack of a better idea, although it's not really workflow without managing tasks for humans.


Neat little programming-snippet bounty site.  Looks like some fun stuff there!  If it catches on, it'll be interesting to see how the data trends.  Tagged as a "job source" - but it isn't really that.  The bounties are more like prizes than pay.

PHP comment-style annotations not good

Nice post on PHP annotations in comments and why they're probably a mistake - but indicating that metadata in code is not a mistake.  And even offering an alternative that doesn't suck!

Personally I think treating PHP as the compilation target for a macro language is probably the better approach, but maybe that's a claim requiring extraordinary proof that I can't yet offer.


Some sort of data science toolset?  Check it later.  Nice blog post!

Data structures

An open textbook in data structures. is a neat gallery/forking site for database patterns - it seems a little thin (you don't seem to be able to download patterns in a machine-readable form, for example), but the site itself is on Github and the idea is groovy.

Take that and make it machine-readable in a schema description format that can be composed and used to output SQL and man, I love it.

Deployment to a VM

So testing deployment should be pretty easy if I have a VM to test against.  (I'm looking at VMs again due to OpenLogos, natch).  The idea is basically to fire up a blank VM of the OS in question, then run scripts against it to build ... whatever.  The "whatever" then being a deliverable VM, but built using a deliverable, repeatable script.  And that script could then itself be an expression of the higher-level deployment description language I envision (and that already exists in other deployment tools).

Ah.  VirtualBox has a command-line interface.

Downloadable Oracle development VMs

Wow - this would have been nice back when I was developing an Oracle adapter for my workflow toolkit: Oracle now has downloadable development VMs available for their enterprise software, explicitly marked as no support and not suitable for production use but otherwise free of charge for development.

Very cool!

Data analysis languages

A very nice comparison of MATLAB/Octave, R, Julia, and a little Python at the end.

Monday, October 15, 2012

Syte: Personal site boilerplate (kinda)

Syte is a Github project that allows you to customize a few things and have a personal Django-based, mega-socially-interactive Website up on Heroku at breakneck speed.  You just fork it, run it, and customize it with its own tools.


Sunday, October 14, 2012

Pattern: Python web-mining system

Drool.  Of course, nothing restricts Decl to Perl...

Visualization in Perl (and not in Perl)

There was a post to the Quantified Onion group this week about visualization in Perl.  Here are some useful links that arose from that discussion.

Another landing page tester

Here's a code-free landing page builder that gives you the ability to let customers select a pricing model preference during their interest registration.

Mingw package manager yypkg

The open-source takeover of Windows continues.