Tuesday, December 31, 2013

Scheme. In Perl?

You know, TinyScheme is essentially in the public domain, and would really be easy to embed into Perl. In fact, you could just use Perl's own scalars as a basic Scheme type and get most of the R5RS spec for free that way...

I think I want to think harder about that. There are a lot of things I want to do in Lisp-y environments without giving up my CPAN, and TinyScheme is really, really lightweight. It's quite conceivable that you could also use the Perl module structure (and CPAN) to organize your Scheme libraries. Hmm...

But even more fascinatingly, a Scheme embedded in XS could also be used to get some introspective (and on-the-fly!) access to other XS-defined items. Say, OLE.XS, which still has me stymied in my effort to get IEMech up and running in the modern age.

So - TinyScheme is probably the lightestweight choice, but there's also Chibi Scheme that looks a more full-featured alternative, plus of course there's already Inline::Guile for serious scheming.

(I'm going to go with TinyScheme, though, and make something self-contained, a la SQLite, that already has everything you need to start with Scheme.)

Oh, here's a link on what hygienic macros are. Makes sense.

Anyway, the motivation for all that is that Sussman physics-in-Scheme course. That thing is incredibly densely written! Fortunately I'm married to a theoretical physicist who, so far, appears impressed at my staying power and is more than happy to talk about anything I find unclear. At length.

Speech synthesis

So I just found out about something called Vocaloid, which is a Yamaha product, closed source, for text-to-singing. There's nothing remotely like it in the open source world, but I suspect you could cobble it together from parts already in existence (mostly) by including the melody and timing into an existing text-to-speech engine.

Possible engines might be:
  • MARY (another project from the DFKI)
  • eSpeak - this is more or less the Linux default
  • flite, which is Festival lite.
  • And Festvox, which is the full Edinburgh/CMU Festival system.
I can only imagine that Vocaloid is a unit synthesizer with a large database; the output is pretty natural-sounding, in contrast to the state-of-the-art of truly synthetic speech.  It would be a lot of fun to play with this stuff, especially in the context of music.

Friday, December 27, 2013


More LogicBlox/Datalog stuff:
I'm lagging in my investigations.


Here's a complete markup editor for formatting posts for StackOverflow and other markup-based stuff. It's bewilderingly good.


windres is the Unix-y toolchain utility for Windows resources.

Code search engines

An HNN thread listing lots of code search engines.

A useful catalog of easing animations

Easing and bounces in JavaScript - neat!

Wordpress deployment

Genesis: a deployment tool for Wordpress.


Harp looks like a nice static site generator/server for modern languages and approaches.

Game Developer magazine

Free open archives.

Text/NLP stuff

Like clockwork, I collect links to interesting NLP stuff.

  • GATE: a General Architecture for Text Processing
  • textteaser extracts summaries with machine learning
  • Dezi is a Lucy/Lucene-alike in Perl

Sound generators

A couple of sounds for ... stuff.

  • A whole site with a bunch of tunable sound generators. These are fascinating. "Noise machines."
  • Dial-up modem sound effects, for the sake of nostalgia.

Post-mortem (literally) of the Knight Capital bug

It was a process error. Worth reading!


Again. I think. Maybe this isn't - it's really more of a statistics thing.


A beautiful icon font.

Audio analysis with image processing algorithms

This is very cool.

Machine learning to find memory leaks

This is pretty cool.

Negative captcha

A Ruby framework for building forms that are more bot-proof. This is nice from the standpoint of technique.


Website templates based on Bootstrap.


I always love anything about d3.js, so October saw three links:

Thursday, December 26, 2013


I've launched into a project at last, a terminological database toolset that's been on the drawing board for a very long time indeed (with what I hope will prove to be an accompanying business model to harness it all), and one thing that I ran across in my initial data scheme for termbases is the "context" field. Logically, that context is an ontological specification - a kind of "where am I?" in the larger scheme of the vocabulary of the full language - and it's used to draw distinctions about the terminology used in specific applications.

Well, so I delved into the available literature about ontology tools. Of which there are many.  And I hadn't really looked in many years; they've proliferated, especially in the context of the semantic web and bioinformatics, so here's a partial linkdump of some of the information that looks most promising.

  • A decent overview.
  • KIF = Knowledge Interchange Format [here] [in SUMO]. This is a declarative language with LISP syntax used to express first-order logic predicates about concepts.
  • SUMO = Suggested Upper Merged Ontology. Sort of the basic list of concepts that underlie everything else.
  • Tips on ontology development. And pitfalls.
  • A few basic tutorials about the semantic web. It's based on a graph database model (for semantic networks).
  • RDF is used to encode chunks of graph data in the semantic web (it can also be embedded in HTML, of course).
  • Ontological data about RDF documents is encoded in RDFS and OWL.
  • OntoSelect is apparently a cataloging service for ontologies found/discovered on the Semantic Web - here's a mention, but the service itself seems to be down.
  • Biology is another area where ontologies are used extensively; here, for example, is the Experimental Factor Ontology. Note that it is downloadable in OWL format. Experimental ontologies are generally available as free, open-source data, while anything with any hint of commercial usefulness is blisteringly expensive (pharmacovigilance, for example - the adverse effects ontology used for drug side effect reporting).
  • The Gene Expression Atlas is also ontology-based. This is a real-world application of something that used to be considered hard AI, and I find that pretty fascinating in and of itself.
  • Aaand a protein ontology that I've linked partly because proteins are inherently cool and partly because the legend is so pretty.
  • Bioinformatics ontologies aren't always published in OWL; OBO is a competing standard. The Obofoundry catalogs a bunch of ontologies.
  • Ontobee.org is an ontology viewer for ontologies published on the Web. Here's the display for an adverse event ontology.
There are reams and reams of information about ontologies these days. Those are the more interesting things I ran across while determining that I don't need to go into that kind of depth to do what I need to do.

Sunday, December 22, 2013


Saw a post on the Perl jobs list yesterday for an ETL expert. Apparently this is the name Big Data people give to batch data transformations: Extract, Transform, Load.

What I just call "indexing" in the greater sense.


Sequences from neural nets

Generating Sequences With Recurrent Neural Networks on the arXiv.

Viral spread model

A Scalable Heuristic for Viral Marketing Under the Tipping Model on the arXiv.


Some kind of open-source machine learning project.

Sentiment analysis

... at Stanford, using deep learning.


Schema is a Clojure DSL for defining "data shapes" - lightweight types, effectively. Looks pretty snazzy. Types are useful; heavy use of them, however, gets in the way of reuse.

Bayesian updating

... of probability distributions.

Course in Machine Learning

Another one.


Given any process, there is a method for identifying and prioritizing risks. This is the kind of systems thinking that go into the procl folder in my notes.

GPG audit

GPG is, of course, an important piece of software in the security world. It's kinda crufty and old. It probably needs an audit. Tptacek on HNN says more than that, it needs some decent code documentation, hence my idea of an exegesis, a deliteralization of sorts.

Anyway, multilevel code understanding and presentation.

Angular tutorial

Tutorial here.


Free game engine. Makes my laptop sound like a jet engine warming up for takeoff - but then essentially anything does that, so it's not really an antifeature.


An attempt (abortive, apparently) to use Bayesian techniques to detect stupidity. Obviously, this can't detect stupidity at the semantic level, but it might be able to pick up on syntactic markers of stupidity. It's an interesting exercise.


Sleep is a Perl-like scripting language for Java, which is pretty cool.


UnQLite is kind of the same thing as SQLite but for NoSQL-type document-based architectures.


A translator between English descriptions of C type declarations, and the type declarations. Pretty fascinating!


A framework for anonymous speech online.

The Architecture of Open-Source Programs

Online book. I think I linked it already - well, it deserves multiple linking.

Ginko: tree-shaped organization of text

An interesting perspective on document organization: hierarchical, two-dimensional organization of text into successive layers of detail. I kinda like it.



Maude is a rewriting logic system for doing pretty much the same things as other logic programming tools.

Brilliant vs. insane code

Here's an odd little ditty musing about a line of Python Stavros Korokisthakis (perhaps HNN's StavrosK?) ran across:
def GetContourPoints(self, array):
    """Parses an array of xyz points and returns a array of point dictionaries."""

    return zip(*[iter(array)]*3)
Hmm. Like it says on the label, it takes an iterable of points and returns an iterable of triples in order. But as Stavros notes, it's not at all obvious how it does that. You have to reason your way through it.

It's clockwork, and quite clever - and not the way people think (well, except insofar as people build clockwork and this Python in order to do things like this of course). In terms of code understanding this code is not self-documenting in any way. To determine programmer intent, we have to simulate what it does and see why it does that.

It's kind of like a syntactic artifact of a semantic reasoning process, one that we can recover (hopefully!) with careful reasoning. But the original reasoning is gone.


Random user generator

Randomuser.me gives you a random user profile - "Lorem ipsum for people". (See how this ties back to the contact management thing?)

So here's a semantic pole for ya: people. They just keep coming up all over the place. And they often share a lot of things with one another. So why don't we have a range of tools like this one, for people, for companies, etc.?

Sort of a semantic toolbox kinda thing.

Contact management

A recurring problem for everybody who deals with people. Which is ... everybody. [musings] [hnn]

Seems to me that part of the problem is that not every application of contact management requires a full-on heavy-artillery solution. So - as with many, many other domains - there is a kind of sliding scale of complexity that could be modeled using a set of mapped semantic domains.

I really think this concept is going to pay off once I have it clear in my head.


By the way, that was the first post from stuff I bookmarked in September. I seem to have a 3-month lag that is more or less constant, at this point.

Funny vs. LOL

Identifying semantic ... poles? Foci? by textual analysis. Starting with the distinction between funny and LOL in posted graphics. [slides] [hnn]

I'd like to do that with the stuff I'm linking from here. See how close my tag cloud is to the detailed semantic knobbiness.

UI patterns

So I've come back quasi-full circle, finding myself thinking of Wx-enabled smart widgets based on the wftk (seriously, it's like it's 2002 again), and last night after the laptop was off, I scribbled down the following note:
Basic mail UI against repo. Define mail-like queues as a thing.
The first sentence is a UI pattern (basic mail reader, with groups, then messages, then a document view for the mail itself), and the second is the data pattern it presents to the user (the concept of a message queue, possibly with threads, certainly with some kind of topic grouping).

This kind of pattern-based "architectural" programming could and should be carried down to the lowest possible levels of programming. That's a semantic mode of conceptualizing software.

Anyway, so I looked up "UI patterns", thinking that would get me that Yahoo effort (YUI, already noted elsewhere on this blog) - but instead it turned up UI-patterns.com, a short-lived effort by Danish web developer Anders Toxboe, an attempt to develop a UI pattern gallery/database/article focus site that seems to have gone on for 2010 and 2011 and stopped. Plenty of spam in the comments sections, but otherwise a ghost town.  Too bad, because it's pretty much what I'd like to start from in this attempt to come up with a language of UI pattern design.

Some good grist for the mill, anyway.

Filed under UI design, patterns, and architectural patterns, because I suspect the UI design has to be based on some conceptualization of the data that would be reflected in the system architecture.