Friday, December 30, 2011

Dedicated hosting at a reasonable price

I might move.

Top skills for 2012

Just FYI.

Interesting cryptography/number theory library

Interesting post on HNN recently (as always): a bloke whose uncle spent a lifetime implementing some pretty amazing algorithms just open-sourced them.

Perl and Twilio

Who knew?

Perl documentation in the news

OK, so there have been some efforts this year to get Perl looking as up-to-date as it actually is; the problem is that the language has been around so long that typical search results may be a decade old, even though the community has decisively moved on. Since competing languages haven't even been around, the result is that it looks like it's easier to do modern tasks in, say, Ruby - because that's what you see in relevant results.

My Lego inventory project should do well for this; it's a typical Web scraping database program that will make a good article. I just have to find out how to cross-post to, say, perl.com.

Waffles command-line ML toolset

Waffles is a comprehensive set of command-line tools for doing machine learning and data mining.

I really, really need to sit down and do a survey/implementation tool thing.

Sunday, December 25, 2011

Scaling a blog

Another scaling article - worth reading, tomorrow.

Task: Lego inventory tracker

This one's actually underway, a Christmas present for my son. I'll write it up separately somewhere - perhaps on the Vivtek site itself! (A blast from the past - I haven't written anything new there since probably 2009; Blogger has been so much more convenient.)

Task: News scraper/tracker

I want to scrape the Reuters news feeds (later, others) into a database for various analytical purposes [eg]. That's going to consist of a daemon on my fileserver that checks the feeds on a period basis and loads things into a database. Then we'll do other analysis on that database. I'm most interested in linking stories and identifying trends.

Yeah, OK, I know this isn't groundbreaking research. It's new for me, though. And it will be a good microcosm of scraping tasks for declaratization as well as a valuable component for all kinds of things. So ... it's a task.

Trading platforms

A couple of trading platforms that were advertised on a blog I follow. Probably stupid even to think about securities trading this year. Or next.
  • Trade Architect from TD Ameritrade.
  • OptionsHouse allows you to do simulated trading to start off and has flat trading fees for real trades. Interesting.

John Carmack on static code analysis

Short version: DO IT.

2011 Visualization Roundup

What it says on the tin.

Markov chains in Chutes and Ladders

... Cute.

How startups succeed

Good article. tl;dr:
  • Fire lots of bullets, not cannonballs (MVPs again)
  • Fanatic devotion to performance goals even when times are hard
  • Productive paranoia: cash in the bank, reduce risk whenever possible, anticipate killer strikes
  • Don't bet on luck. Bet on being good.
  • Seize opportunity when it arises.

Context.IO: mail replacement API?

Looks neat.

Questions for startups

Here's a funny little look at VC questions by a Croatian startup. Toss it into the slushpile.

Open source target: IndexTank

IndexTank was bought by LinkedIn and is now open source. It's apparently also used by Reddit. I need to learn it.

20 sites pushing the limits of JS

Another cool JS roundup.

DD-WRT

... is the open-source firmware for my new router. I want to do per-MAC bandwidth tracking, and here are some leads.

Probably my best bet is to assume a control panel on the local PC (which is the situation I've got) manipulating a remote "sensor head" on the router. The router doesn't have a huge amount of resources, after all.

Twilio set to explode

I need to be on board. Patrick usually knows what he's talking about.

Data mining without prejudice

Interesting article from MIT about a paywalled article in Science about a new technique for data mining developed at MIT, the upshot being apparently that it's doing curve fitting with no preconceived notions of the variables being fit. Or something. I need to read it after some sleep.

Open Government

Another post about open gov - "Dear Internet, it's no longer OK not to know how Congress works", which is clever, but instead largely about disrupting the system with better political software, which I like.

25 Time-saving generators

Another link-list Webdev post.

HTML too complex?

And by HTML, I think the industry now means HTML+CSS+Canvas, as a Flash replacement. Interesting point here about "I'm too lazy to be a HTML dev" - which just means the level of abstraction is wrong.

And that's interesting.

Saturday, December 17, 2011

Task: Write a Blogger to-do list manager

I do my thinking aloud here and on other blogs, and one of the perennial problems I have with that is that Blogger has no particular way of dealing with task lists. (Boy, that sounds stupid, doesn't it?)

Seriously - a blog is a fantastic way of entering tagged text that could be scanned for tasks, progress notifications, and even completion of tasks in a structured way. Remember how I said Blogger has an API? Well, how about the following scheme, then?

1. Introduce a task by prefixing it in the title, like "Task: Write a Blogger to-do list manager". Then introduce the tag just by making up a tag for it, e.g. "todo list manager". The tag can now be a miniblog for the task, you see - for free.

2. Progress reports are just posted using the tag. Optionally, if you put a percentage in there, you could use it as a completion estimate.

3. Completion is also flagged in the title, with the word "complete".

4. The current to-do list can now be generated automagically with a simple script you run whenever the blog is updated (or periodically, or whatever). I'd personally write it as a Perl script against the Blogger API run on my local machine.

5. If you post any post named "To-do list" or something along those lines, the to-do list can be updated into it (say, at the end, or wherever a given comment appears). The current to-do list can link back to old to-do lists of historical interest, and you can just post another one whenever you feel it's appropriate.

6. The updater can even make sure that the current to-do list post is the one linked from a sidebar highlight. You could even put your to-do list on the sidebar (perhaps in an abbreviated form).

So. I should do this as soon as my vacation starts. And on that note, I'm going to get back to work to hasten that very day.

Startup escape path

Swombat again. Toss it kind of into the "procedural" pile.

Big data predictions for 2012

Interesting article, for which I have time measured in microseconds. It's more time than I have just to post a link to it here.

Google on bug prediction and Microsoft on empirical programming

Neat post on HNN about bug prediction at the Goog; answered at HNN with a link to Microsoft's publications on empirical programming, many of which are mouthwatering. Gotta look at this when I have a minute.

Blogger has an API

Blogger has an API. Lots of things have APIs, actually.

Research tools in Python

Not sure how much of this is directly useful, but don't have time right now to figure it out. Good for statistics, perhaps.

Coroutines

An absolutely thought-provoking presentation on co-routines in Python.

Things not to forget

OK, so as soon as I've finished my current project-that-will-not-die, there are a few things I've been meaning to pay more attention to. Here is something like a list, roughly in order of age.
  • Paraphrasing tools. This is something I came up with a couple of years ago that would be a lot easier now that I've spent some time thinking harder about NLP.
  • HVPT word pair trainer.
  • Depatenting, still, I guess.
  • Despammed rebirth, possibly based on CRM114.
  • Practical PHP exercises as kata.
  • Run back through the big translation project management tasks from last spring in light of Windows automation.
  • Code structure examination of OpenLogos, finally.
  • In general, continue automation of my translation workflow.
  • The Heritage Health Prize. Even doing halfway decently on it would be good advertising.
That ought to keep me out of trouble for a while. Now I can close some windows.

Target application: Todoist.com

Nice to-do/project manager application - but so very many of its features are premium! (Which is smart, sure. It's just that I've wanted to do a task manager [again] for a long time. And this one is ripe for analysis.)

Here's a top-ten list of Web to-do apps.

Wednesday, December 14, 2011

Infunl query language for clickpaths

This is really, really neat. (examples here) I want to see more of this kind of thing.

Tuesday, December 13, 2011

Sketching UI

Interesting article on designing UIs on, you know, paper.

Running R on the GPU

I guess? (Can you tell I'm in a hurry this week?)

Tokenizing the Common Crawl corpus

Interesting scalable approach.

ATS: programming language du jour

ATS. Statically typed sysadmin language?

PDFMiner

Python library for reading PDFs.

Evolutionary database design

Heck, I haven't even had time to read this. Looks promising, though.

Programming in Syn

OK, now here is a macro language to end all macro languages - Syn. The point of Syn is to provide a language that just operates on parse trees, and thus compiles to ... anything. Exactly where the code generation of Decl is aimed. Fascinating read!

Sunday, December 11, 2011

Evolved to Win

Ebook about GA evolution of gameplaying algorithms or strategies. Interesting stuff!

Friday, December 9, 2011

Target application: WildChords

Neat iPad app that listens to the mic on your iPad to analyze your guitar playing, then runs a game where you have to play particular chords to lead animals out of the zoo. Like Guitar Hero, except it actually teaches you to play the guitar!

The market cap is stupidly large. I mean, really stupidly large - do you know how many people want to learn an instrument? It's applying the superstimulus of gamification to allow you to reach a goal you desire. So I predict they're going to make money by the boatload.

I'd like to reverse engineer their signal processing (which they say is patent pending, to which I say boo!) and provide it for open-source games. That would be neat.

Infographic: What tools developers actually use

Neat survey of 500 developers on what tools they use in different categories, presented in infographic form.

Wednesday, December 7, 2011

Tuesday, December 6, 2011

XML in PostgreSQL

Dang, PostgreSQL can do some really groovy stuff, like querying on Xpaths right in the database query on XML stored in text columns. You can even index on them!

Reactive programming

Here's a term I hadn't seen before: reactive programming. Reactive programming is a declarative style in which relationships between values are defined, then changes to one value propagate to the other. A data flow graph is created, in other words. I've been stumbling towards this in Decl, of course, but here's Elm, a type-safe functional reactive programming language that compiles to JavaScript.

Apparently, there's nothing that can't be done in JavaScript these days.

I personally find this code nearly unreadable (I'm sure I'd improve with some practice), but the notion of declarative specification of JavaScript I see in the examples is utterly enthralling.

Lucene

Lucene is a (Java) indexer for full text. It's the basis for a lot of built-in search engines today, and it's probably something I need to learn first. Here's a good place to start, and here's another.

Open source target: Civic Commons

Another effort (still just getting underway, really) that makes sense; Tim O'Reilly posted about it recently and I guess O'Reilly is supporting it: Civic Commons: "Sharing Technology for the Public Good". The only actual open-source release I can find is EAS, the "Enterprise Addressing System", which apparently provides a database for civic organizations to use in keeping addresses up to date? Anyway, its issue list is actually kind of long, and it's in Python. Perhaps it makes sense to analyze this as well.

Monday, December 5, 2011

System administration

There's a Python module for system administration. That's neat!

Analyzing the Enron corpus

Fun stuff: evil vs. football.

N-grams

So I wrote some code (finally!) into NLP::Tokenizer to pull out n-grams, and ran across this article about using bilingual n-grams in translation. This article blew my mind, for a simple reason: it assumes that n-gram alignment between languages even makes sense. Fine, I guess if you're restricted to English and French, like the article (actually a set of slides, not an article - whatever), then you might be OK. But German? Hungarian? These guys aren't translators.

So anyway, I ran the n-gram extractor on a rather large German corpus extracted from some HTML files, and ... honestly, I couldn't see much of a way to use the results. I'm thinking really that something more like a Markov network and subsequent identification of ... frames or whatever you want to call them would be more useful.

Not sure yet, but later this month I want to spend some time finding out.

CRAN: knncat

There's a lot of good stuff on CRAN. I really, really need to understand that.

Some neat Perl things

Ran across a few promising-looking modules in CPAN this week, all grist for the (currently hiatized) WWW::Declarative mill:

HTML::Seamstress is a module that actually uses HTML and classids as the template language for HTML output. That's ... pretty clever!

Web::Scraper is another Web scraper module, but it looks rather promising.

HTML::Entities is a fantastic module for dealing with quoted HTML entities. It came in quite handy in textual analysis of some HTML I had to do this week. Very nice indeed!

VP trees

A quick indexing structure: VP trees. Read this again.

Website spam fighting

Another article:
  • Rename default pages
  • Set up honeypot fields (hidden fields on the form)
  • Follow up on spammed companies
  • Have human moderators and an educated forum community
I gotta get back into this.

Webapps with R and Wt

Interesting.

Word representations

So Joseph Turian (the Metaoptimize guy) has a neat little study here about statistical measurement of the semantics or a semantic space of a list of words. Something else to grok.

Metaoptimize QA

I may already have posted this, but it's a StackOverflow thing for machine learning and NLP. If I don't die in the next week from translation overwork, then later this month I'll be spending some quality time here.