Monday, December 31, 2012

Voice recognition

Simon, an open-source voice recognition library/system.

Open source contribution

14 ways.

Also, a way to organize and compare Git repositories, apparently.  Interesting, if I understand correctly.

Clojure by koan

Nice TDD tutorial on Clojure.

Business process diagramming/definition languages

Just some leads:

  • BPMN is an open notation format.
  • Some flowchart examples.
  • Gliffy does BPMN, apparently, as a Web service - probably easier to use it than rolling my own.

New Year resolutions + gamification

Nice!

Web scraping with node.js

Nice post on Web scraping, using node.js - but the techniques are pretty universal and really worth a read for any platform.

Exit traps for bash

Better bash scripts using exit traps for cleanup and error handling.

NinjaBlocks

NinjaBlocks is a neat-o series of home automation blocks on the cheap.

MessagePack

MessagePack is a data serialization library that packs to a compact binary format.

Patterns

Here's what I'm slowly realizing about how people use patterns (this iteration of this particular insight is due to an article on business patterns I ran across): "The pattern gets you part of the way to the final objective by starting you off from a tried-and-trusted place. It looks familiar, but it is not a solution in itself—it is an abstraction of the solution that you have to embellish to make it the solution that suits your needs."

Here's the insight, yet again: by using the pattern only to document the solution instead of including that description into the code structure itself, we're losing information at both ends.  First, the pattern's structuring of the code is lost once the code is written, and second, changes to the code are not reflected in the pattern, so a learning opportunity for the pattern base is lost.

The pattern is the semantic structure of the solution.  The solution can almost be seen as the syntactic expression of the pattern in this view.

Food for thought.

Website design best practices

A couple of links for check list items for site design:

Thursday, December 27, 2012

Data science architecture

Now here's a useful post: an overview of Big Data architecture.

Website copy design

A couple of articles came up recently:

Sikuli

Sikuli is a cute idea: a scripting language (built on Python) for graphical environments that uses optical processing to find bits of screen to automate.

Free data science books

A list of free online data science books, posted by a data scientist.  This is nice.

Cascading for the impatient

I know I mentioned Cascading once before, but there's a handy tutorial blog series now.  I should work through it, especially since there's a focus on TF-IDF, which I find myself needing lately.

Grunt

Grunt seems like an interesting JavaScript-based development workflow/build automation system.  This is the kind of task automation I'm spending a lot of time thinking about these days.

Error handling in Node.

Error handling in node.js is just as weird as the rest of node.js.

OLAP and MDX

I've been poking around looking at techniques for database design, because every actual application I go to design these days essentially starts with a database.  (More on this later.)  (Well, and previously, of course.)

But while looking at these things, I ran across the concept of OLAP (online analytical processing), which is essentially based on the concept of keeping a matrix of figures of high dimensionality available for fast ad-hoc querying.  The query language usually used is MDX.  (MultiDimensional eXpressions.)

I don't have much specifically to say about this, just wanted to note that OLAP exists but is apparently entirely dominated by big players.  Which is interesting.  Is it because only the big players can market successfully to the corporate buyer most interested in OLAP, or because OLAP just isn't that interesting to the open-source world?  (I find the latter improbable, so maybe there's a market for low-price OLAP.)

Saturday, December 22, 2012

Mandrill

This is a neat tool by the folks at MailChimp - handles incoming and outgoing mail and uses Webhooks for everything.  Mandrill.

Spidey

A boilerplate extractor in Ruby to simplify scraper writing.

Selecting frameworks

Here's an interesting article on a tool to help you select a cross-platform JS framework.

This is an interesting notion.  Alternative frameworks are close to being alternative boilerplate expressions of the same underlying concepts, but different ones can have different strengths.  There are tradeoffs to selecting one over another.

For a maximalist version of this sort of thing, look at the 30+ implementations of the same app at TodoMVC.  That just cries out for a detailed analysis!

Thursday, December 20, 2012

Ember table by Addepar

Very good data table browser!  Exactly what I need for ... everything!

jsPlumb

Graphing UI library jsPlumb - very nice.  Too cartoony for my taste, but I'm sure it's all configurable.  The point is all the heavy lifting's already done.

HTML5 Bones

Bare-bones HTML5.

Moonbase: animation builder

This looks cute!

Parse's new data browser

This is cool - I mean, it's basically Access (ha!) but still very cool.

I've been thinking again about UI programming.  I need some kind of higher-level UI specification language that can describe the way the UI acts in a platform-independent way.  Then translators into the different UI platforms.

Probably not so impressive, since half the UI platforms already have some kind of similar notion.  Still. From a semantic standpoint, this seems like where I need to be going.

Wednesday, December 19, 2012

Ontology

Hey, look: a free huge ontology at ontologyportal.org!

12 online tools for big data

Big Data, as a buzzword, is starting to leak.  I wonder what will come next?  Good list of tools/startups, though.

Hilarious pitfalls of C++

This is a pretty funny article MST3K-ing the C++ FAQ.

Language concept mindmaps

Now here's an utterly fascinating comparison of the concepts used in Coffeescript, Ruby, and C++ - I truly wish there were more things like this! Comparative programming linguistics.

Tuesday, December 18, 2012

Why Scala?

It appears to fix problems.

Maybe Perl 6 will be kinda cool

You know I'm not ready for Perl 6 yet, but still.... This can be defined in Perl 6:

die "resistance is futile" if !($resistance ~~ 4.7kΩ ± 5%);

Simple HTML5/jQuery game

A useful exercise.

Put chubby models on a diet with concerns

A Rails post that - seriously - sets up a VX resonance in my brain.  I should probably learn me some Rails.

Berkeley: Big Data with Twitter

A blog post at Berkeley apparently recapping a class just completed.

Blekko crawl data on Common Crawl

Web crawl data.  Also, Common Crawl is cool.

Backbone boilerplate

How to start writing backbone server-side JS.

Blaze: compiling data science in Python

Blaze is the new - compiling - generation of NumPy.  It's fast.  I wonder why people don't just write macros that write C anyway?  (I'm thinking code quality, but seriously - that would be interesting to investigate.)

Ithkuil

Interesting constructed language and an unbelievably Gibsonesque story about it.  (Anything featuring the countries formerly known as the Soviet Union ends up feeling Gibsonesque to me, I guess.)

How to learn data science

Version 1.38 - nice list of resources.  Including linear algebra at OCW.

Stanford Core NLP via Perl

Well this is neat!

RapidMiner

Open-source data mining suite of some kind. With a service based on it.

Fun with graphics and music

Exhibit one, the Aphex face. Exhibit two, some music I like better, with an oscilloscope.

Combinators presented well

This may well be the most salient article I've ever seen explaining why combinators are a useful tool.  Raganwald is a pretty hoopy frood.

Sunday, December 16, 2012

VX technology

Just discovered a whole subreddit about VX tech - man, that brings back memories!  I did a summer internship at Ball State in 1983 and they had an obsolete VX5 in the basement.  I worked through the manuals in my time off - copious - and at one point managed to bring in Radio Moscow on the secondary fibrillation coils (the Danffy eigenvectors had to be accurate within four sigs, though - you had to use a special-order triangular-calibrated sliderule to get anything better than three, back then, but I recently saw an article about a guy who'd successfully simulated that tri-cal stick on a Beowulf cluster).

I had to quit messing with it when the resonance arrays heterodyned with the Dean's fillings, though, and I had to work without pay the rest of the summer to pay for the restorative dental work.

There's a Wiki, too.

Smoke testing CPAN on Windows

As a Perl programmer on Windows, I've often had moments of supreme frustration. I can't use SSH or SFTP from Perl, and I can't use some of the mail handling modules. When I got started back in the early years of the century, I couldn't even use CPAN; I had to compile my own Perl and accept the fact that tests just mostly had no chance of working. Over the past year or two, however, especially as I've come to rely on CPANtesters for my own modules, I've started to realize that one reason for this is the lack of dense testing of modules on Windows. (Yeah, Paul Evans had something to do with that realization, too.)

But then Gábor Szabó posted on Google+ that he hoped to make it to #20 on the Windows smoke tester leaderboard. (As of today, he's at #19 - congrats!)

Wait. There's a leaderboard?

[more at blogs.perl.org]

Holy schemoley, SQL::Translator exists!

Now I don't have to write it!  Manual here!  Holy Toledo, what a fantastic thing this is!

Design

I'm getting serious - finally - about learning design, precipitated largely by this lovely article on "How to make your site look half-decent in half an hour."  Answer, mostly: use Bootstrap and then mess around with it for half an hour.  (No seriously!  That's great advice!  Plus she's got more specific tips than that, so go read it already.)

Friday, December 14, 2012

Noir

Deprecated, but interesting: a Web not-quite-framework in Clojure.

Math in programming

Here is a thoughtful article on the proper role of mathematics in hacking.

Math, done right (I am slowly beginning to understand), is semantics - the semantics of models.  The rest is just detail, and there's a great deal of attention paid to being sure your manipulations of models are correct, but essentially, "doing math" (or doing applied math, anyway) is just the creation of models.

As such, all of our current programming languages suck.  No, seriously!  They do!  This is where a declarative model language would shine - this is actually pretty close to what Haskell is, though, so probably I'm just reflecting my own bias.

I really need to learn me some Haskell.  And math.  I wish I knew where to start.

"Either" in C#

Here's a neat little article about implementing a variant "either" structure (this is a feature of Haskell that permits a flag in a structure to switch between two subtypes) in C#.

Thursday, December 13, 2012

Fast game-writing competition

Oh, look - another programming competition, this one a series of games! And a specific post listing favored components.

Punch

Punch is yet another static content builder, and looks pretty neat.

This post is tagged "Web frameworks, build systems, boilerplate".  It's interesting to see those tags converging.  Very interesting.

Boilerplate

The other little insight I had today is that boilerplate is actually the syntactic pole of a mixed unit, in the old Langacker terminology.  Different bits of boilerplate would start to look a lot like language, but boilerplate proper is a whole chunk of "language" that has no direct counterpart in natural languages.  And yet a boilerplate outline (an architecture, perhaps) is definitely a semantic unit deserving of all the semantic love that any word might get.

Chew on that a bit.

New policy here

In the past, I've tended to open interesting tabs from HNN and leave them there until the sheer weight of Chrome bogged down my system to the point that the exhaust from the fan started charring my desk surface.  By the time I roll around to blogging them (which is the point of leaving them open), I usually have fifty to chug through and I'm left with very little of interest to say - not that this is necessarily a bad thing, because linking itself is a perfectly valid goal, but still, it's another pile of tasks that bogs down me to the point where my exhaust chars my vicinity.

That bogging down means that instead of blogging being a joyous way to express myself, it starts to look like, well, work.  Unpaid work, my least favorite kind.

So: new policy.  If it's interesting, blog it now.  If it's not interesting enough to blog before I have three tabs open, then close it.

There is an infinite fire hose of interesting things on the Internet.  If I miss some, nobody cares, not even me.  And this is a way to push the balance back away from consumption and towards production.

Semantic databases

Hmm. Z came up with a "related article" for the last post - not too related, I guess, but still - the inclusion of the word "semantics" still has a high probability of hitting something I find interesting.

The new keyword is "semantic databases", and it's set off a veritable breakdown cascade between my curiosity and the Internet.  In the interest of getting something done tonight, here's the trajectory (although in no particular order):

OK, that's enough for now.  I'm almost caught up with work.

The semantics of code

Some thoughts I've been turning over in my head the past few days involve the semantics of code.  Not the semantics addressed by the coded solution - the semantics of the code itself, which clearly do map onto the semantics addressed by the coded solution.

Here's the thing.  Each section of code is made of meaningful parts in a hierarchical structure. The parts are things like "variables", "loops", and this kind of low-level thing. By recognizing these (which, yes, are pretty close to the syntactic objects they denote) and grouping them by purpose, a human programmer can intuit the intent of the programmer.  At a low level, the intent of the programmer is something like "get data out of this file" or "sort this list".  At a higher level, we work with APIs (which themselves have an internal semantic structure) to form semantic units that are closer to human actions, like "put this record in the database" or "show this box on the screen".

Once the intent of the programmer is understood (whether correctly or not), we can ask questions about that intent.  Does the code actually meet the intent?  (The code could be wrong.)  Then we have a coding error that should potentially be fixed.  This kind of thing is a higher-level example of what static analysis does (static analysis actually does some pattern matching on the code for common errors, and warns the programmer that certain sections of the code look fishy).

Now.  At the highest level of the code, our semantic structures should look a whole lot like those expressed in the requirements and specifications documents.  These are human-readable documents that (hopefully) express the purpose of the code in a way that the programmer has implemented or is supposed to implement at some point in the future.

Here, too, errors can occur, and a semantic code checker that could understand something of English might be able to check whether the semantic structures look similar.  (And of course eventually will do just that.)

Moreover, tests could be derived from the semantic structures at the specifications/requirements level that could empirically check whether the code matches.

So that's the code-semantics aspect of my recent thinking.

Open Sauce: unlimited free testing for open-source projects

Interesting.  Interesting.

You know where I'm going with this blog, right?

I want to do textual analysis on both the blog and everything it links, then search for similar things so I can just let the blog run itself, more or less.

Instead of letting HackerNews find things for me, I can do it myself and maybe start feeding HackerNews instead of the other way around.

roots toolbox for web products

roots is a static site compiler written in node.js.

Hidden communities on Reddit

Tracing the non-explicit connections at Reddit, StackExchange, and SomethingAwful.

Wednesday, December 12, 2012

Better analytics

HNN had a pretty interesting thread up due to Analytics.js, an open-source aggregator for all your analytics frameworks calls.  There are also two open-source JS analytics tools mentioned: Piwik.org and OWA Open Web Analytics, both of which look pretty nice.

Here's the thing: set up your events and analysis carefully and you can use your analytics as a workflow feed.  Metrics and alarms can trigger any kind of action.  That integrates directly into your business processes, and you're halfway cyborged.  That's the goal.

Landing page checklist

Checklist is basically another word for boilerplate (just at a higher level).

Magpie

Language of the week!  Magpie, a pattern-based language written on a VM (JVM and a higher-powered C++ machine that has fewer features implemented).

Development appears to be ongoing; might be worth looking at.

GYP - build builder

GYP (Generate Your Projects) is a build descriptor system.  Interesting.

Build monitoring

TravisLight is a build monitoring tool for online (continuous integration) builds.

Interesting database designer

Build your database using Excel-like tables. Online.

OpenERP

Whoa.  Open-source SAP replacement - and we all know what a profitable ecosystem SAP is!  This is open-source!

So ... gotta look at this in much more detail.

Update: And here is its slightly more scammy-seeming cousin VIENNA Advantage.  Apparently this one is built by a single company with what sure looks like a remote-control office (I am not free of this guilt myself, of course) who has reportedly opened their source.  If it's open access source, color me unimpressed, natch - and there are some textual weirdnesses on their cluttered front page as well that make me kind of wonder how serious they are.

But they exist.  And as such, they're data.

Update #2: dear Lord, their user interface is in Silverlight.

Design placeholders

HNN thread (lorem pixel)

PhantomJS

A full headless browser in JavaScript.  Good for testing and scraping, apparently - worth investigating.

10 reasons for message queues

This is a neat architectural article.

  • Decoupling
  • Redundancy and scaling
  • Elasticity and spike tolerance
  • Resiliency
  • Delivery guarantees
  • Sort order guarantees
  • Buffering
  • Transparency in data flow (at the queue boundaries anyway)
  • Asynchronous communication
Very cool indeed.

Visualizing.org

I think maybe I linked visualising.org once before, but the affairs of Zeus are kinda neat.

Amazon random shopper bot

Cute.

Tuesday, December 11, 2012

CodeMirror

CodeMirror is a JavaScript-based online code editor that looks pretty darned snazzy.  I'd like to combine it with other representations of the code - eventually I think this is where I'll be doing for code editing.

Proclet

There's a lot of really cool stuff on CPAN.

Smaller samples -> greater variation

DeMoivre's law - and how people really don't get it. This is really a good read.

Nginx+LUA?

Did you know Nginx can be scripted?  I sure didn't.  With Lua!

SVGO SVG optimizer

This is kinda neat.

Sunday, December 9, 2012

DataWrangler

A data wrangling tool from Stanford.  Data wrangling itself is actually a topic pretty near and dear to my heart.

Hacking Pokemon from inside and language security

A very cool playthrough of Pokemon Yellow that manipulates the item list to make it play a song [HNN thread] - related to "language security", which is limiting the security exposure due to overly Turing-powerful interfaces in unexpected places.

In other news, there is a whole community of people who spend their free time writing bots to play Pokemon for them.  That's pretty cool.

Data model XML

The Data Mining Group has published an XML standard "PMML" for the interchange of data models.

You know, I really need some kind of better semantic model for XML.  For myself, actually, but at the language level as well.  There's got to be a good way to think about XML as it relates to data structures or APIs or something, but somehow when I look at an XML specification I feel dry.

Building stuff with math

Great presentation of the math behind simulations, along with the github source of the presentation - doubly fantastic!

Partly based on Mentions Nature of Code, an online (and now e- and even paper!) book about Processing.js.

boilerpipe

When scraping, removal of boilerplate is job #1.  Boilerpipe is a library to do that (one of several, of course).  It provides a sort of "de-boilerplating" step.  (And this is probably a really good way of looking at things.)

On the same topic, a really fantastic overview of web scraping here.

How to use git again

More and more I think a workflow tool for daily use is just about essential.  Some curated git tips and workflows.

The business case for APIs

Good business-level justification for APIs.

Programming language competition

Here's a fun competition series - every month, there will be a goal, and competitors will develop a programming language specifically to meet that goal.

Auto-threading compilers

Here we go: Microsoft has developed a C#-like language whose compiler can decide which parts are amenable to parallel cores.  That's kinda neat!

Update 2012-12-14: A follow-up post by the same guy explaining how it works: by declaring some data structures immutable.  He suggests we read the paper. I didn't fully realize that Microsoft Research publishes papers...  Kind of a no-brainer now that I've considered the question at all.

Saturday, December 8, 2012

Friday, December 7, 2012

Coding quality

A couple of interesting posts about coding itself that I ran across:
  • Coding Horror: All Abstractions are Failed Abstractions
    The example being LINQ abstracting away the SQL, and sometimes making bad decisions in doing so.  I think in this case, my response is that an abstraction is useful as far as it goes, but a better programming system would allow you to capture all the useful levels of abstraction when and if needed.
  • Joel on Software: Making Wrong Code Look Wrong
    The original purpose of Hungarian notation.  I am blown away, having learned Hungarian notation in the second wave where it had become useless.
    But the point here is (to me) packing semantics into your syntactic code.  Wouldn't it be better to address the semantics from the start?
  • Programming like a Pirate
    A nice point made about overdoing the extraction technique - the point is for your code to explain what it does, not to have fifteen levels of abstraction everywhere.

Web API design

A white paper on how to build a Web API properly.

SQL injection for fun and profit

An absolutely fantastic article about SQL injection.  Don't treat SQL like a string! You will come to regret it.

Command and Conquer in JS

This is apparently a complete implementation of an existing RTS game Command & Conquer in HTML5/Javascript.  Which is really pretty amazing.

Data exploration

Data exploration in Unix.

But a few days later, the O'Reilly "Try R" class at CodeSchool was posted on HNN and prompted a rich outpouring of data science linking (and book recommendations, too).  MIT OpenCourseWare statistics class, and an online book on statistics and R, stand out.  Also the recommendation not to learn R without a simultaneous good grounding in statistics itself, because R is first and foremost a statistics platform.

frothkit

frothkit is a Web application framework for Objective-C.  And Gnustep is an Objective-C platform for the rest of us.

Monday, December 3, 2012

Inversion of control

Now here is an absolutely fascinating Perl module, Bread::Board, which implements "inversion of control".  It's a way to organize a large software system that permits you to define services, then allow them to be invoked and constructed in the right way without your needing to worry about the detail.

It's a declarative framework, is what it is.  Study the hell out of it.

Don't repeat yourself

DRY.  I'd seen it bandied about.  It's a good principle - but essentially it's one that you can only implement in LISP or another macro-enabled language.

Python productivity

One of Python's key features is its simple expression of some very cogent mental structures.  Here's a nice short article on that:

  • Dictionary and set comprehensions
  • Counter objects
  • JSON pretty printing
  • Quick and dirty Web service
(Note that I've construed the article a little differently from its author's intent.)

The secret of success: suck less

A really nice point by one of the authors of BugZilla: suck less over time.  This should be incorporated into any software maintenance process.

Understanding Fourier transforms

An intuitive explanation of Fourier transforms, with comments contributing two different animations.  That's the kind of thing I like to see humanity doing!

Rx - nascent schema for JSON/YAML

Interesting.

Rich text editing

I'm kind of kicking the notion of rich text source code around in my head, although I'm honestly not sure it's a good idea.

The cheapest, easiest way for me to start doing this would simply be to kick some RTF out of Perl into a notes file and call Word on it.  If I go that route, Perl has some RTF support (RTF::Tokenizer) that will probably do what I need it to do.

Alternatives would be to be a different rich text editor.  Scintilla isn't it - but Wx does include a rich text control.  That's one route.  The other would be to run as a server and pop up a CKEditor window.  I have to admit I really like this notion.  (See also Maplat's DocsTextEditor; Maplat in general looks pretty cool and not entirely unlike what I'd like to do in this arena, in terms of the document management aspects.)

The obvious advantage to using a little server to serve up an app is that it's getting to be a lot easier to write UI as HTML5, and of course you get the remote option then.

Sirea

Wow!  Sirea is a Haskell framework for Reactive Demand Programming - tl;dr of that being "better Excel".  Kind of where I've wanted to go for years with dataflow programming, long before I got entangled with Decl.

But the Sirea project has immense amounts of ramifying thought behind it, to the extent that it must be made part of the Decl canon.

Saturday, December 1, 2012

Decision trees

So I've been doing various little bite-sized programming tasks (at HackerRank and Rosalind, both of which I heartily recommend) and thinking hard about what I'm doing with them, along with getting back into the literate programming game to do this.

One of the things that came up this week was decision trees.  Here's the thing - complex if-then-else structures come up pretty frequently in even the simplest of tasks, like serialization of data structures into domain-specific formats (game boards, for example).  And that kind of code, to me, is a write-only thing - given that I'm doing these things in little bits here and there and otherwise dealing with paying (non-translation) work and family, my rule is that everything I do has to be instantaneously comprehensible in bite-sized chunks.

So I realized that any if-then-else structure deeper than a single level should really by rights be expressed as a decision tree.  A decision tree is a (to me) more declarative, domain-specific way of expressing some varieties of rules.  Naturally, I thus came up with a simple expression language for decision trees and planned my first literate-programming macro - "decision".  This macro reads the decision tree description and outputs target code expressing it.  I am writing my own preprocessor, one that will work on both C and Perl - as well as any other language you choose.  We'll see whether it pays off, without necessarily taking me quite so far as a full Prolog-like unification engine.  (Although come to think of it, it could be extended to express just such rules...  That would be interesting.)

And so I did another absolutely fascinating random walk through programming land.  Here's what I found:

  • SmartDraw is a dandy target application - it allows you to draw diagrams and charts that are data-accessible, which I find absolutely fascinating.  It includes decision trees as one of its chart types.
  • One can, of course, learn decision trees, and there is plenty of literature available about that.  Note on the latter: references Milk, a machine-learning toolkit in Python.
  • Python Enterprise Application Kit does decision trees.  That looks like a dandy place to mine for ideas.  As does IBM's corresponding thing, which of course is not open source, but probably way more ramified.
  • Finally, Methods and Tools magazine (cool!) has a teeny little article about decision support systems from 2004.  That one reminds me of decision tables, which are essentially just an alternative way of expressing decisions that are somewhat less restricted than trees.  But good stuff!  Again, one of the benefits of decision tables is that they're human-comprehensible and, if well-designed, can easily be used to allow very detailed configuration by non-programmers.
  • As though to illustrate that very point, here is a timely and political Greek bond default decision tree that has been used to illustrate a complex situation in a rather tidy way.
  • And here's a nice article on the use of decision tree learning in gauging party affiliation, which is political but slightly less timely, given it's December.
So that's our random walk for the day.

Processing

I always love Processing stuff.