Semantic programming: May 2013

Friday, May 31, 2013

Tutorial on AngularJS with Backlift persistence

Nice little tutorial. Backlift offers an API for data persistence (a straightforward service that makes sense to me!) and have provided a little tutorial on building a webapp in Angular that uses it.

A comparison of CSS preprocessors

Nice. Sass vs. Less vs. Stylus.

JS framework popularity

An analysis of Github Archive to gauge popularity trends in JavaScript webapp frameworks. I love this stuff.

Wednesday, May 29, 2013

Django HNN clone tutorial

Here's a cool thing - an "advanced Django tutorial" implementing an HNN-like site in Django.

Minizinc is a language specifically for constraint programming problems. It's showcased a little in this article using it on a recent xkcd [oops, not so recent: 287].

Listed under AI unless I get a bee in my bonnet about constraint programming, which seems unlikely.

You can't handle security

In fact, nobody can. Here's an entertaining article giving three examples that will convince you of this.

There's a Matasano Challenge that will step you through the basics of security. I think I may work through this sometime soon.

Regex power

Here is a fantastic article about regular expressions, not as they are defined mathematically, but as implemented in real languages - with backreferences, regexes become NP-complete... Which blows my mind. I had no idea!

They're still unreadable, of course. Which means it might be worthwhile (1) to work through (or write) a really good tutorial set and (2) write some kind of ... regex preprocessing parser thing. Which probably already exists, of course.

Anyway, that article led me to the existence of Thue, a string-rewriting language that looks ... weird, to say the least. Note that Thue is definitively described on a Wiki devoted to esoteric programming languages.

Tuesday, May 28, 2013

BitTorrent synch

This is cool.

Up and Down the Ladder of Abstraction

Fantastic article on visualization as a tool for thinking, and simultaneously kind of a call to action for programming tool makers.

Open-source game clones

A whole long list of open-source game clones of all description.

Webapp design stuff

Ember is a JS Webapp framework. Here is the tutorial list.

Then there are Pure CSS and Topcoat that have shown up on HNN this week.

Sunday, May 26, 2013

ML tutorials

A couple of useful Stanford tutorials in machine learning: Andrew Ng's open courseware ML class, and a second tutorial building on that for the new stuff in deep learning.

Lucy - C-language Lucene for embedding in dynamic languages

A tutorial. Apache has also spun out its NLP tools into a separate project OpenNLP. And Lucene can query based on Levenshtein distance, which is pretty cool.

Saturday, May 25, 2013

Bounce rate for blogs

Here's an interesting tip for blogs and other sites where "conversion rate" doesn't really apply: check after a short time to see if the user is still there (using a JS snippet on the page with a timer).

ydiff: structural diff for code

Interesting.

DataNitro: Python from Excel

Neat little article using DataNitro to link SQL databases to Excel. Neat.

Yet Another Blog Framework

Hexo. Also Wintersmith. All in all frameworks are thick on the ground these days.

Flat file module in Perl

This is kind of neat. I love wheel reinvention - you usually learn something.

List of probability distributions

Statistics is cool.

UnQLite

This is neat - the NoSQL equivalent of SQLite, except for the license. (SQLite is public-domain; UnQLite is not). I'm particularly fascinated by the fact that it embeds a small interpreter for data manipulation.

Friday, May 24, 2013

Bret Victor and data visualization

Wow. I'm not normally one to take the time to watch a video, but this one was worth it. Victor is essentially building drawings at the semantic level - his tool even expresses the semantics of the steps in words! Fantastic vision.

I can't even watch it straight through. The idea cascade is deafening.

Tuesday, May 21, 2013

Git is really a lot more opaque than it needs to be

Here's a fantastic tutorial about Git branching that has finally made a lot of things make sense to me. But seriously, the management commands provided, while they make perfect sense in terms of the actual things git is doing, make no sense whatsoever in terms of the semantics of managing branches and versions.

I conclude that git's actual UI is horrible. Moreover, there's no real reason for there to be multiple DVCSs anyway, since they all essentially do the same thing - implement the same abstract architecture. Again, the specifics of the UI obscure the actual things being done and why they're done.

A version history as represented in git (or any other DVCS) should be a part of the documentation in graphical form, like any other part of the documentation. That report should then be available as a contextual orientation for performing actions on the official history of the project. Specific branches and commits should be tracked explicitly and graphically - obviously while preserving the underlying command structure for those that think that way, but seriously. Git is crazy.

Monday, May 20, 2013

Sieve watches pages and takes action on changes

Neat component.

Sunday, May 19, 2013

RRDTool

A useful utility for working with and graphing time series.

Building .DS_Store files for the Mac-impaired

Neat.

Outgoing spam detection

Interesting perspective.

Cross-site request forgery

CSRF exploration tool. Interesting. Also provides mitigation techniques.

Dojo tutorial

Dojo for jQuery developers.

Docker container engine

Docker is a lightweight alternative to VMs.

Flaky: Functional number classes

Flaky is an open-source (Go) project implementing numbers in quasi-symbolic form to avoid accumulating error. This is a slick idea.

StackOverflow hive mind

Ooh. Here's a site that lists the top-linked sites on StackOverflow for any given (programming-related) keyword!

Full circle in literate programming (for me)

I think I'm back to proposing literate programming in Word again. I've been considering approaches to LP that I would actually use - and waaaay back in prehistory I actually wrote the beginnings of a Word-based litprog tool.

Here's the thing. To somebody who's used it a lot - and God help me, I've used Word a lot - Word is pretty nifty tool for editing human-readable text. The point is to provide tools that get out of the way of presentation of logic, both to the human and to the computer, right? So let's consider a document, in this case, to be any structured text that permits sections, headers, internal references, bookmarks, and footnotes. You could use LaTeX if you wanted, or some kind of Markdown, but for me, editing words comes easiest and most naturally in Word.

Now we map our document structure onto code structure. Again, I want something that supports macros (expressing macros in whatever language is convenient) and templates that are expressed as code. I also want widget tools that permit the encoding of some logic as diagrams with graphical editors of some kind.

I want tables embedded in the text to be accessible to macros. Still easy. And I want bullet lists and numbered lists to be similarly accessible.

Sections might be standalone invocations of templates. I'm not sure yet.

References go into a bibliography; the bibliography is literally links to additional libraries, organized on principles I haven't yet thought through. But as I've said elsewhere, or maybe here (forgive me, the paying work has been glorious lately) - I want to have something amenable to peer review. Model open coding on scientific publication.

So that's this week's goal.

The next secret weapon in programming languages

... according to Michael O. Church: strong static typing. Except, he is at pains to add, not the way Java does it, requiring strong typing for everything.

To which I reply: yeah, I can see that. It's static typing fundamentalism that riles me personally - but anything that allows even moderate automated checking of programming logic has to be a win. So certainly static typing has lots of places where it's useful, as long as it doesn't get in the way of quick expressiveness and clear oversight of the logic.

Saturday, May 18, 2013

A philosophy of programming languages

Presented here [hnn] for your delectation. I think I'd need the prerequisites even to approach this approach. Eventually, once I get the fledglings shoved out the nest... Maybe I'll have some time this summer.

Midsummer update, July 29: HA! AS IF!

Figure design from Aaron Diaz

Not really semantic programming, but ... Aaron Diaz (of Dresden Codak authorship fame) talks about figure design for cartooning, and for me, cartooning is inherently a visual semantics and programming thing. Your mileage definitely varies.

Shazam's music fingerprinting algorithm

This is cool. Shazam has that groovy service that allows you to ask your phone what music you're hearing in the wild; it does this by taking a sliding FFT of the audio and matching salient points against a database of bazillions of songs it knows. It's patented, but the authors have published a paper about how it works.

Well, in 2009 there was a blog post about this, and in 2010 Roy van Rijn wrote it up in Java and wrote a very well-thought-out account of the process. Very good article. Shazam's lawyers contacted him about it, "encouraging" him to take it down on the theory that even though the algorithm was patented in the US only, van Rijn might be contributing to violation elsewhere - even though he'd literally only written code to the specifications already published in the patent for Christ's sake this stuff makes me so made I can spit but let's go on, shall we?

Anyway, HNN link from this week and somebody else published full working code on Github, so that's why I'm noting it now. Mostly I'm just fascinated by audio processing but don't have discretionary time right now that's not already spoken for.

Friday, May 17, 2013

GAWK as an AI language?

Here's a fascinating little post from the 90's about how programming students doing AI do a better job with GAWK, which permits (forces!) them to take a high-level view of the process and let Unix utilities do the detail work.

Neat!

Hacking Java bytecode

Here are parts one and two of a cute little series about Java bytecode. They're fun!

Wednesday, May 15, 2013

Error stream processing

Errordite allows you to triage your online error stream and take specific action. Always neat.

Random Turing machine simulator

This is really cool (the link goes to a configuration I like a lot, but you can randomly generate your own as well).

Here are some more I like: this, this, this, this, this, this, this!, this, this. This. This, this, this.

There's a thin range between excessive order and sheer chaos that makes the best patterns.

Terra + Lua = something pretty impressive

Terra is a new low-level compiled language that is designed to be embedded in Lua. Lua functions as its metalanguage, permitting all kinds of cool metaprogramming tricks while still compiling down to something that runs essentially as fast as C.

This Terra/Lua phenomenon is getting pretty fascinating.

Problems with MongoDB in the wrong situation

Here's a pretty fascinating set of experiences with MongoDB at Scrapinghub.

Tuesday, May 14, 2013

Record-based data retrieval

Back in the prehistoric days of 2001 or so, when I was working on the original wftk, I kept running into the fact that really, if I wanted a quick tool for defining business processes as workflow, I needed some kind of tool to define data records that could extract information from diverse sources and put it all together into a single record, then update the various data sources correctly when that record was changed.

The initial version of that tool was the repository manager and it consumed my coding thoughts until about 2004, when I qualified for EIC and stopped pretending I programmed for a living and switched to technical translation full-time. It was a complicated time for me.

Anyway, every time I start to do business-related programming, I want to revive the wftk. And every time I want to revive the wftk, I still want an easy and declarative way to describe data sources, load data into a more-or-less-structured record that I can work with in memory, and write things back out to appropriate data storage in an arbitrarily complex manner.

Some of that is simply impossible to do declaratively, of course. Sometimes you really just have to build a class to handle things. But sometimes it should be possible to do things without (much) coding.

Somewhat to my surprise, CPAN doesn't really have a lot of things that help me here. I think this is just a weird way to think about data or something (even though the storage of arbitrary records into a key-based list is what has the NoSQL crowd all riled up these days, and I was doing it in 2001). Ideally, my records had the following features:

Key-value retrieval from an arbitrary composite data source
Key-value storage to the same source as needed
Hierarchy: a value could be a list of records or text
Version control: a value or the whole record can retain a history or a version number
Document management: a value can be a document
Named values with a path-based composite key retrieval/update mechanism
Composite data sources: some parts of a record could be stored in different places; for example, I might retrieve a list of documents from a directory, then keep arbitrary data about some of them in an SQL table. Retrieving all the information from the table and the directory is a single retrieval call.
Layers of composition: sometimes I don't want to retrieve from expensive data sources

That's pretty much it. The result is that I should be able to set up a declarative, record-based data structure (a repository) that lets me give a list (a source) and a key, and get back a record.

This is a broad enough requirement that I think I should split it out of the wftk into its own data system. I just don't know what to call it. Data::Repo isn't bad, actually. Data::Storage is taken by a 2004 skeletal data storage structure that seems to have been abandoned. Data::Record is taken. (There are fascinating things to be discovered on CPAN probes like this. Data::CapabilityBased, GitStore, VCI, Treemap, TAP::DOM and TAP::Parser for working with Perl test output, and (ding ding ding) Data::DPath, which is precisely what I want for composite value retrieval from my in-memory record structures.

I think Data::Repo. So mote it be. The record itself is just a hash (or optionally could have a class assigned as Data::Repo::Record) - the idea is that you'll just remember where you got it and will store it back to the same place should that be necessary. If you extract a D::R::Record explicitly, it will remember where it came from, but seriously, I'd rather keep the data anonymous in terms of type and work up data manipulation things that are right in the language instead of OO in nature.

Friday, May 10, 2013

The Economics of Spam

Interesting.

Sunday, May 5, 2013

Simple online editor

Nice. Integrates with Bootstrap.

Geometee

Cool. Parameterized drawings and knobs to twiddle the parameters.

Understanding scam victims

Nice article about looking at security flaws from the victim's point of view.

Cmx.io

A cartoon renderer for the generation of XKCD-style comics.

Security problems with server-generated JS responses

I inherently like code generation, but here's an article on security flaws in Ruby-generated JS (RJS) architectures.

Reverse engineering RDS-TMC

A fun article about reverse-engineering the encryption on RDS-TMC traffic updates in Europe.

Responding to excessive volume

A neat little discussion at StackOverflow (superuser) about how to take an action when the microphone volume goes too high, with some really fascinating responses, including one cobbling it together with Pure Data, an open source visual programming language I'd never heard of. (It's a dataflow thing.)

Saturday, May 4, 2013

Lucene

Lucene is a search engine indexing library. It has a Perl wrapper. Sooner or later I'm going to have to investigate it.