Wednesday, February 29, 2012

Task: graphical scripting environment

I've been kicking this around a while, and I'm defining an explicit task. This would be an Wx GUI, a very simple one.
  • The basic notion: a project, really just a group of files or directory structures. The basic window would allow you to drag files onto a fresh group, and save/load groups to a file list. Opened in a directory with such a file list, the tool would load it automatically.
  • "Steps". Not sure where those are defined, but they're individual scripts to run against files or subsets of the files. Might want to toss in file filters to define subsets.
  • Step lists. Macro actions consisting of multiple steps. Later with a more sophisticated language (loops and tests, essentially - probably just generic Perl of some kind, munged into a DSL).
That gives us the basic GUI functionality of Okapi, but in a highly configurable shell. This kind of thing could be wrapped around anything.

Visualizing log entries

Here's an utterly fascinating code article: parsing the SSH logs for cracking attempts and displaying them on an animated world map based on IP geolocation. Wow!

Towards a better API framework

From the server side of APIs, here's an interesting article on the past (Rails-style MVC frameworks) and the future (HATEOAS, essentially).

This deserves some thought. Maybe a plain function isn't the best way to organize code, and maybe OOP isn't quite there yet, either. Maybe something like an API is a valid level to understand code at.

I think I need sleep.

The Julia language

Julia is a fast math language. The benchmarks are really impressive, and the language itself seems quite readable.

Pretty LISP

Pretty-lisp.org is an alternative display/edit environment for LISP code. I don't care for it, but I like the general idea of a graphically oriented code editor implemented in JS on a Web page. It makes me think that Decl might usefully implement a local server for the purpose of editing Decl code, and the notion of putting different options in different places on a page has always been very attractive.

Update 2012-11-14: Darn.  I should have taken a screen shot.  The site is dead.

Google Event Tracking

Huh. Google Analytics also supports JavaScript event tracking. Fascinating.

Web text scraping

Good article on figuring out where the heck the content is on a given Web page. References this blog post elsewhere.

Graphics and their placement

A fascinating article talking about David Ogilvy and a study he commissioned about ad copy back in the misty days pre-Net, and how his results apply to content on the Web.

REST and HATEOAS

And speaking of API clients, I'm seeing a bunch of stuff lately about RESTful APIs.

Kill Math

I can't remember whether I've blogged this before, but this kind of visualization stuff always turns me on. The Kill Math project is devoted to finding ways to model math interactively and intuitively, to solve problems without symbolic algebra. It's pretty neat stuff.

Task: Survey API clients on CPAN

I don't know what pathology this is, but I have the tendency, when approaching a technical task, of wanting to build the tools to build the tools to make the task easier [ob.xkcd 974]. A little while back I had the cute little idea of organizing my official tasks on blogs using the Blogger API. Well. CPAN has a lot of examples of API client access, and so I'd like to do a real survey of what's there, and compare and contrast.

If I can manage not to spend much time on tools to organize a survey, or research in general, I might manage to do something.

Poetry generation in Ruby

Slick little example Ruby program that takes today's headlines from a list of RSS feeds, then finds (quasi) rhyming couplets. The couplet metric is crude, but the results are still fun. It would be slick to do this with better couplet metrics using NLTK or something.

Tuesday, February 28, 2012

Berkely DB - NoSQL before it was cool

The title essentially summarizes the article. But it's a valid insight.

Bayes classification of HNN posts

I love this meta stuff. I also love the idea of something to read the news for me. Ha.

NYT on automatic book writing

The NYT has the story of a man who has self-published 200,000 books, essentially (obviously) all auto-generated from Web searches. That sounds pretty cool.

HTML in JavaScript

Now here's a really neat way to "compress" HTML going out on the wire, by rendering it in JavaScript. I like this notion a lot.

Stylometry

I remain unconvinced that stylometry can do more than identify "statistical identity trends", but it's still interesting work. [boing boing]

Getting experience with big data

A list of suggestions of how to get initial experience with large datasets.

Monday, February 27, 2012

The Great Web Framework Shootout

[github] More or less a Rosetta Stone for about twenty different Web frameworks in various languages.

Sunday, February 26, 2012

Infunl API

Just as an example of how an API can be set up (and because I find logs interesting and an API about logs doubly so), here's Infunl's API. Infunl tracks user visits in the browser (apparently), kind of like Google access stats.

Tuesday, February 21, 2012

APIs in general

Just had a thought today that might be worth keeping. I've been thinking about Web-based APIs and how to use them in Decl, and it occurred to me that really the way I do databases is a decent way to think of APIs. In other words, once I've defined (and named) an API, I should simply be able to refer to it and have a session automagically created. But if I want to create multiple sessions, then I do that in a separate instantiation step.

Once I've defined an API in a module somewhere, then I should then be able to write quick scripts against it very quickly and easily. I hope this makes sense when I read it later.

Update: Here's a nice example of how a Perl RESTful API class can look: ElasticSearch.

Workflow and the Mechanical Turk

I'm reasonably sure I've made this decision before, but I'm making it again: workflow will be in core Decl.

I've been thinking of a Blogger-based task manager (finding tasks defined in Blogger posts). To consider those native tasks, we again need the concept of a map. The result would be something like:
  • Define an API (Blogger) and a filter on it (getting all tasks defined in the blog headers).
  • Map those tasks onto a workflow structure by defining how changes to one view of the data changes the other.
  • Work with the workflow tools built into the language.
What those tools will end up being is somewhat vague, of course, but essentially workflow will let us define when the system checks for changes to tasks, and what it will do when there are changes found. Any data source can then be mapped onto workflow tasks and treated as human interaction.

Case in point: here's an article comparing oDesk and Mechanical Turk. Imagine if we mapped both oDesk and MTurk onto a common API, then represented the whole thing as a parallel workflow with job queues farmed out to oDesk and MTurk. Fast, scalable tasks would go to the Turk, while longer-term relationships with trained workers could be modeled on oDesk. Quality control for MTurk might be built into the API, or it might be explicitly represented in a workflow module - that's obviously an open question, but hopefully it makes you think about the possibilities inherent in giving your programs human intelligence, quasi built into the language.

Saturday, February 18, 2012

User interface design

A slideshow on designing better user interfaces.

Design patterns in JavaScript

Interesting (online) book on JavaScript design patterns.

Target's data mining

Interesting article about Target's data mining and how they can tell a woman is pregnant - and how they don't creep her about knowing that, instead subtly modifying just her coupon mailers to include more baby-oriented coupons.

Interesting. Creepy - but interesting.

Monday, February 13, 2012

Concatenative programming

"Why concatenative programming matters", a good little article introducing me to what I always called "stack-based languages". Now they're concatenative. Still a great article.

GUI Architectures

This is an utterly fascinating article I simply haven't had the time to read yet, on GUI architectures. I like this kind of post-hoc analysis/unification/whatever kind of article.

Open-source Django apps

I'm a little unsure exactly what a Django app comprises, given that Django is a CMS, but ... here's a list of some.

Sunday, February 12, 2012

Raganwald: why literate programming matters

Must be literate programming week.

Some good points here - points that were good twenty years ago but still don't go far enough to really support me to the extent I want to be supported.

Update 2012-06-02: A more careful read reveals that this is a really good article about the semantic issues lurking behind the human activity of programming.  I'm glad some anonymous person searched on "Raganwald twitter" today.  (Still don't know what this has to do with Twitter ... well, of course, now it does.)

Why Patrick MacKenzie doesn't self-host WordPress

A typically fascinating article by everyone's favorite tech entrepreneur about how hosting WP at $200/month saves him serious money. Extremely valuable for its insight into how his hosting company does scaling. tl;dr: Load balancer, then Varnish caching proxy, then nginx for static content, then finally Apache for dynamic content.

Evolution of pictures again

Remember that evolution of images last year or whenever? Here's a guy who got really beautifully obsessed with investigating how that might be used as an image compression algorithm. Verdict: not such a hot image compression algorithm, but quite interesting nonetheless.

Uses of git

Here's an interesting little post on the many things you can shoehorn git into doing for you. This should be generalized; that hoary old software component database idea again.

Backbone patterns

Must be patterns week.

MapReduce patterns, algorithms, use cases

What it says on the tin.

Simpless: CSS compiler

Another Less-to-CSS compiler. These are good to study.

Ruby on Rails tutorial

Another thing I want to learn.

Book: Machine Learning for Email

Another good one from O'Reilly. The ebook is only 15 bucks.

Book: Machine Learning for Hackers

Coming out this month. I'll probably buy it.

Game development tip: capabilities instead of inheritance

Nice little insight: direct inheritance isn't always the right development pattern to use. It's illustrated in terms of game development, but it's a nice thing to keep in mind anyway.

Weave: Web-based Analysis and Visualization Environment

I haven't looked too closely at this, but it looks neat.

Nice user feedback pattern

Connex.io has a tasty little feedback pattern: just ask somewhere prominent, "Are you happy with the way this works? (yes/no)". If the user clicks no, drop down a text area to ask why.

This is really a nice idea!

Suffering-oriented programming

Clever little article:
  • First make it possible
  • Then make it beautiful
  • Then make it fast
  • Rinse and repeat
Words to live by, friend. Words to live by.

Saturday, February 11, 2012

Target application: Link mill

Remember the guy who did the nice sorting algorithm visualizations, Aldo Cortesi? He has a great suggestion for an application/service: a link mill manager.

1. Generalized feed consumer: RSS, subreddits, Google+, Facebook, Twitter, etc. All of this goes into not a reader, but a link extractor; for each link, the system keeps track of who recommended it and when.

2. For each link queued, it's presented (along with who recommended it, I suppose, and some kind of extract) to the user, who can do an initial pass of "interesting"/"not interesting".

3. Anything judged interesting is downloaded, through Instapaper or as PDF or what have you, and placed in an easily downloaded reading queue. As you read, you can rate the link, which will propagate back to a rating of the source.

Voila, instant feed management. I'd add that you probably want some way to manage outgoing sharing, via blog post or what have you; to that end, you probably want some way to mark the really interesting things.

Update 2/26/2012: That was quick: Linkrdr.

Friday, February 10, 2012

Open-source target: Hacker's Diet Online

So I've been using the exercises from the Hacker's Diet, an adaptation of the old 5BX standby we all know and love, and poking around Walker's site, I see that he's put together an online service you can use to track your Hacker's Diet data (scroll down to the bottom for code references). And the fascinating thing about that site is not just that it's open source, but that it is the first actual instance of literate programming using Nuweb I've ever seen in the wild. [Note to self: poke around that site's links some more.]

It deserves praise for that alone.

Let's make an MP3 encoder!

Pretty fascinating article. If I ever need to do sound, this is probably a good place to start.

Burrito-Bot

The Burrito-Bot (github link) is a Python module that plays a Flash game. I tried one of those once, and it was great fun.

Another attempt to express the goal

I'd like to be able to express software in terms of its semantic content - then change it very quickly in response to changes in requirements. Ideally, I'm really after semantic-level (instead of neural-level) machine learning.

In other words, I'm still trying to do exactly what I wanted to do twenty years ago, except now I have better tools.

Tuesday, February 7, 2012

Open-source target: Hackful

Hackful is the European equivalent to HNN (just started). They're developing the code from scratch in Ruby on Rails. I think contributing to that would be a nice goal, and there's certainly a need given the fact that they've just started.

Sunday, February 5, 2012

Speech recognition

Mulling over possible ways to use speech recognition in my translation business, I discovered that Windows 7 actually has a speech recognition engine built in. It does a pretty darned good job, too - unfortunately, it appears that my translation skills are more or less built on serialized output through my fingers - I tend to be able to formulate the translation only in phrases, and I stumble when trying to put them into words vocally. I hope that's just going to be a matter of training, because if I learn stenography only to find out I can't actually translate as fast as I type, I'm going to be sorely disappointed.

Anyway, speech recognition is still interesting in terms of output bandwidth from my brain, so here are a couple of links:
  • Microsoft documentation library for SAPI. I know this type of link tends to rot pretty quickly thanks to Microsoft's ongoing efforts to erase their own history, but it'll be good for a year or two anyway.
  • A 2007 overview article on Microsoft speech recognition.
  • CMU Sphinx is a popular open-source speech recognition engine that deserves examination. The same sorts of vocabulary hints that ought to be available to typing due to the segment being translated ought to really help in quality of speech recognition, so eventually I can envision some pretty darned good input techniques.
Anyway, yet another field that needs investigation.

A whole spam site?

Google spam, I think, at thetechnologyreview.com - it looks like a blog, but the English ... isn't. It hits well on keywords, though. So is it written by bots, or by people who don't speak English?

Who knows?

"GET UPDATES ON LASTEST TECHNOLOGIES, ANDRIOS, IPHONE, LAPTOP & MUCH MORE…"

Update 2014-11-28: Ah, a shame. It's evaporated.

Saturday, February 4, 2012

Modern language wish list

Here's a cool post.
  • String manipulation (regexp, splitting, joining, replacing)
  • Polymorphism
  • Basic containers (list, array, hash map, set)
  • I/O, including easy binary (and, I'd add, UTF-8)
  • Web requests
  • URL manipulation
  • Garbage collection
  • Namespaces
  • Homoiconicity
  • Extensible syntax
  • Math (actual numbers, matrices, vectors, equations)
  • Units on numeric values
  • Time (time, dates, calendars)
  • Error on numbers (+1 for tracing the source)
  • Unification/pattern matching
  • Before/after functions (aspects)
  • Parser
  • CSV library (I'd say more serialization built in than that, but it's going in the right direction)
  • Good error messages
  • Immutable values
  • Explicit model of time (for concurrency)
  • Graphics
  • Sound
  • Common file formats
  • Game input
  • GUI
  • First-class functions
  • Easy way to store data in flat files (which I think is kind of redundant with CSV above)
Decl is kind of going in this direction.

Friday, February 3, 2012

Open-source target: Bookmarkly

A quick node.js bookmarking site example, open-sourced by Dan Grossman. [HNN]

JavaScript fractal viewer

YAJSFW. This one's very smooth.

Mental models of software

While hacking (just a very little) on the Fly stenography tutorial in Python, getting it to run on Windows, I was musing a little bit about just how I do that. How do I get from a general description of task ("Make this run on Windows.") to actually making the changes to the code needed to get things running?

Clearly, the initial thing is to build a mental model of the software, then - guided by that model - look at the specific parts of the code that might be problematic. But what does that mental model look like?

Embarrassingly, I haven't dedicated any time at all to researching the literature in mental models of programming, despite having written a blog called Semantic Programming for two years. A good place to start might be with this paper (CMU and Microsoft Research) and its citations.

A brief sketch of the mental model I've built of Fly might be something like this:
  • The "main" file sets up the UI, then starts the thread; the thread starts the Plover listener and then goes into a standard Pygame loop.
  • During each turn around the loop, Fly checks up on all the parts of the UI, checks and handles keyboard and mouse events, and redraws the UI appropriately.
... You know, writing it all down would be impossible. (Which is why nobody ever does.) It is literally easier to just poke around in the code for a day, which is essentially the problem with programming. How do you get from code to a simulated mental model? That's probably more important.