Semantic programming

Michael

I might move.

Michael

Just FYI.

Michael

Interesting post on HNN recently (as always): a bloke whose uncle spent a lifetime implementing some pretty amazing algorithms just open-sourced them.

Michael

Who knew?

Michael

OK, so there have been some efforts this year to get Perl looking as up-to-date as it actually is; the problem is that the language has been around so long that typical search results may be a decade old, even though the community has decisively moved on. Since competing languages haven't even been around, the result is that it looks like it's easier to do modern tasks in, say, Ruby - because that's what you see in relevant results.

My Lego inventory project should do well for this; it's a typical Web scraping database program that will make a good article. I just have to find out how to cross-post to, say, perl.com.

Michael

Waffles is a comprehensive set of command-line tools for doing machine learning and data mining.

I really, really need to sit down and do a survey/implementation tool thing.

Michael

Another scaling article - worth reading, tomorrow.

Michael

This one's actually underway, a Christmas present for my son. I'll write it up separately somewhere - perhaps on the Vivtek site itself! (A blast from the past - I haven't written anything new there since probably 2009; Blogger has been so much more convenient.)

Michael

I want to scrape the Reuters news feeds (later, others) into a database for various analytical purposes [eg]. That's going to consist of a daemon on my fileserver that checks the feeds on a period basis and loads things into a database. Then we'll do other analysis on that database. I'm most interested in linking stories and identifying trends.

Yeah, OK, I know this isn't groundbreaking research. It's new for me, though. And it will be a good microcosm of scraping tasks for declaratization as well as a valuable component for all kinds of things. So ... it's a task.

Michael

A couple of trading platforms that were advertised on a blog I follow. Probably stupid even to think about securities trading this year. Or next.

Trade Architect from TD Ameritrade.
OptionsHouse allows you to do simulated trading to start off and has flat trading fees for real trades. Interesting.

Michael

Short version: DO IT.

Michael

What it says on the tin.

Michael

... Cute.

Michael

Good article. tl;dr:

Fire lots of bullets, not cannonballs (MVPs again)
Fanatic devotion to performance goals even when times are hard
Productive paranoia: cash in the bank, reduce risk whenever possible, anticipate killer strikes
Don't bet on luck. Bet on being good.
Seize opportunity when it arises.

Michael

Looks neat.

Michael

Here's a funny little look at VC questions by a Croatian startup. Toss it into the slushpile.

Michael

IndexTank was bought by LinkedIn and is now open source. It's apparently also used by Reddit. I need to learn it.

Michael

Another cool JS roundup.

Michael

... is the open-source firmware for my new router. I want to do per-MAC bandwidth tracking, and here are some leads.

From the dd-wrt forum.
Refers to Google code here.
Possibly upgraded here.

Probably my best bet is to assume a control panel on the local PC (which is the situation I've got) manipulating a remote "sensor head" on the router. The router doesn't have a huge amount of resources, after all.

Michael

I need to be on board. Patrick usually knows what he's talking about.

Michael

Interesting article from MIT about a paywalled article in Science about a new technique for data mining developed at MIT, the upshot being apparently that it's doing curve fitting with no preconceived notions of the variables being fit. Or something. I need to read it after some sleep.

Michael

Another post about open gov - "Dear Internet, it's no longer OK not to know how Congress works", which is clever, but instead largely about disrupting the system with better political software, which I like.

Michael

Another link-list Webdev post.

Michael

And by HTML, I think the industry now means HTML+CSS+Canvas, as a Flash replacement. Interesting point here about "I'm too lazy to be a HTML dev" - which just means the level of abstraction is wrong.

And that's interesting.

Michael

I do my thinking aloud here and on other blogs, and one of the perennial problems I have with that is that Blogger has no particular way of dealing with task lists. (Boy, that sounds stupid, doesn't it?)

Seriously - a blog is a fantastic way of entering tagged text that could be scanned for tasks, progress notifications, and even completion of tasks in a structured way. Remember how I said Blogger has an API? Well, how about the following scheme, then?

1. Introduce a task by prefixing it in the title, like "Task: Write a Blogger to-do list manager". Then introduce the tag just by making up a tag for it, e.g. "todo list manager". The tag can now be a miniblog for the task, you see - for free.

2. Progress reports are just posted using the tag. Optionally, if you put a percentage in there, you could use it as a completion estimate.

3. Completion is also flagged in the title, with the word "complete".

4. The current to-do list can now be generated automagically with a simple script you run whenever the blog is updated (or periodically, or whatever). I'd personally write it as a Perl script against the Blogger API run on my local machine.

5. If you post any post named "To-do list" or something along those lines, the to-do list can be updated into it (say, at the end, or wherever a given comment appears). The current to-do list can link back to old to-do lists of historical interest, and you can just post another one whenever you feel it's appropriate.

6. The updater can even make sure that the current to-do list post is the one linked from a sidebar highlight. You could even put your to-do list on the sidebar (perhaps in an abbreviated form).

So. I should do this as soon as my vacation starts. And on that note, I'm going to get back to work to hasten that very day.

Michael

Swombat again. Toss it kind of into the "procedural" pile.

Michael

Interesting article, for which I have time measured in microseconds. It's more time than I have just to post a link to it here.

Michael

Neat post on HNN about bug prediction at the Goog; answered at HNN with a link to Microsoft's publications on empirical programming, many of which are mouthwatering. Gotta look at this when I have a minute.

Michael

Blogger has an API. Lots of things have APIs, actually.

Michael

Not sure how much of this is directly useful, but don't have time right now to figure it out. Good for statistics, perhaps.

Michael

An absolutely thought-provoking presentation on co-routines in Python.

Michael

OK, so as soon as I've finished my current project-that-will-not-die, there are a few things I've been meaning to pay more attention to. Here is something like a list, roughly in order of age.

Paraphrasing tools. This is something I came up with a couple of years ago that would be a lot easier now that I've spent some time thinking harder about NLP.
HVPT word pair trainer.
Depatenting, still, I guess.
Despammed rebirth, possibly based on CRM114.
Practical PHP exercises as kata.
Run back through the big translation project management tasks from last spring in light of Windows automation.
Code structure examination of OpenLogos, finally.
In general, continue automation of my translation workflow.
The Heritage Health Prize. Even doing halfway decently on it would be good advertising.

That ought to keep me out of trouble for a while. Now I can close some windows.

Michael

Nice to-do/project manager application - but so very many of its features are premium! (Which is smart, sure. It's just that I've wanted to do a task manager [again] for a long time. And this one is ripe for analysis.)

Here's a top-ten list of Web to-do apps.

Michael

This is really, really neat. (examples here) I want to see more of this kind of thing.

Michael

Interesting article on designing UIs on, you know, paper.

Michael

I guess? (Can you tell I'm in a hurry this week?)

Michael

Interesting scalable approach.

Michael

ATS. Statically typed sysadmin language?

Michael

Python library for reading PDFs.

Michael

Heck, I haven't even had time to read this. Looks promising, though.

Michael

OK, now here is a macro language to end all macro languages - Syn. The point of Syn is to provide a language that just operates on parse trees, and thus compiles to ... anything. Exactly where the code generation of Decl is aimed. Fascinating read!

Michael

Ebook about GA evolution of gameplaying algorithms or strategies. Interesting stuff!

Michael

Neat iPad app that listens to the mic on your iPad to analyze your guitar playing, then runs a game where you have to play particular chords to lead animals out of the zoo. Like Guitar Hero, except it actually teaches you to play the guitar!

The market cap is stupidly large. I mean, really stupidly large - do you know how many people want to learn an instrument? It's applying the superstimulus of gamification to allow you to reach a goal you desire. So I predict they're going to make money by the boatload.

I'd like to reverse engineer their signal processing (which they say is patent pending, to which I say boo!) and provide it for open-source games. That would be neat.

Michael

Neat survey of 500 developers on what tools they use in different categories, presented in infographic form.

Michael

Neat.

Michael

Dang, PostgreSQL can do some really groovy stuff, like querying on Xpaths right in the database query on XML stored in text columns. You can even index on them!

Michael

Here's a term I hadn't seen before: reactive programming. Reactive programming is a declarative style in which relationships between values are defined, then changes to one value propagate to the other. A data flow graph is created, in other words. I've been stumbling towards this in Decl, of course, but here's Elm, a type-safe functional reactive programming language that compiles to JavaScript.

Apparently, there's nothing that can't be done in JavaScript these days.

I personally find this code nearly unreadable (I'm sure I'd improve with some practice), but the notion of declarative specification of JavaScript I see in the examples is utterly enthralling.

Michael

Lucene is a (Java) indexer for full text. It's the basis for a lot of built-in search engines today, and it's probably something I need to learn first. Here's a good place to start, and here's another.

Michael

Another effort (still just getting underway, really) that makes sense; Tim O'Reilly posted about it recently and I guess O'Reilly is supporting it: Civic Commons: "Sharing Technology for the Public Good". The only actual open-source release I can find is EAS, the "Enterprise Addressing System", which apparently provides a database for civic organizations to use in keeping addresses up to date? Anyway, its issue list is actually kind of long, and it's in Python. Perhaps it makes sense to analyze this as well.

Michael

There's a Python module for system administration. That's neat!

Michael

Fun stuff: evil vs. football.

Michael

So I wrote some code (finally!) into NLP::Tokenizer to pull out n-grams, and ran across this article about using bilingual n-grams in translation. This article blew my mind, for a simple reason: it assumes that n-gram alignment between languages even makes sense. Fine, I guess if you're restricted to English and French, like the article (actually a set of slides, not an article - whatever), then you might be OK. But German? Hungarian? These guys aren't translators.

So anyway, I ran the n-gram extractor on a rather large German corpus extracted from some HTML files, and ... honestly, I couldn't see much of a way to use the results. I'm thinking really that something more like a Markov network and subsequent identification of ... frames or whatever you want to call them would be more useful.

Not sure yet, but later this month I want to spend some time finding out.

Michael

There's a lot of good stuff on CRAN. I really, really need to understand that.

Michael

Ran across a few promising-looking modules in CPAN this week, all grist for the (currently hiatized) WWW::Declarative mill:

HTML::Seamstress is a module that actually uses HTML and classids as the template language for HTML output. That's ... pretty clever!

Web::Scraper is another Web scraper module, but it looks rather promising.

HTML::Entities is a fantastic module for dealing with quoted HTML entities. It came in quite handy in textual analysis of some HTML I had to do this week. Very nice indeed!

Michael

A quick indexing structure: VP trees. Read this again.

Michael

Another article:

Rename default pages
Set up honeypot fields (hidden fields on the form)
Follow up on spammed companies
Have human moderators and an educated forum community

I gotta get back into this.

Michael

Interesting.

Michael

So Joseph Turian (the Metaoptimize guy) has a neat little study here about statistical measurement of the semantics or a semantic space of a list of words. Something else to grok.

Michael

I may already have posted this, but it's a StackOverflow thing for machine learning and NLP. If I don't die in the next week from translation overwork, then later this month I'll be spending some quality time here.

Friday, December 30, 2011

Sunday, December 25, 2011

Saturday, December 17, 2011

Wednesday, December 14, 2011

Tuesday, December 13, 2011

Sunday, December 11, 2011

Friday, December 9, 2011

Wednesday, December 7, 2011

Tuesday, December 6, 2011

Monday, December 5, 2011

Random Post

More information

Search This Blog

Blog Archive

Topics of interest

Alphabetically