Friday, December 30, 2011
OK, so there have been some efforts this year to get Perl looking as up-to-date as it actually is; the problem is that the language has been around so long that typical search results may be a decade old, even though the community has decisively moved on. Since competing languages haven't even been around, the result is that it looks like it's easier to do modern tasks in, say, Ruby - because that's what you see in relevant results.
My Lego inventory project should do well for this; it's a typical Web scraping database program that will make a good article. I just have to find out how to cross-post to, say, perl.com.
Sunday, December 25, 2011
This one's actually underway, a Christmas present for my son. I'll write it up separately somewhere - perhaps on the Vivtek site itself! (A blast from the past - I haven't written anything new there since probably 2009; Blogger has been so much more convenient.)
I want to scrape the Reuters news feeds (later, others) into a database for various analytical purposes [eg]. That's going to consist of a daemon on my fileserver that checks the feeds on a period basis and loads things into a database. Then we'll do other analysis on that database. I'm most interested in linking stories and identifying trends.
Yeah, OK, I know this isn't groundbreaking research. It's new for me, though. And it will be a good microcosm of scraping tasks for declaratization as well as a valuable component for all kinds of things. So ... it's a task.
A couple of trading platforms that were advertised on a blog I follow. Probably stupid even to think about securities trading this year. Or next.
Good article. tl;dr:
- Fire lots of bullets, not cannonballs (MVPs again)
- Fanatic devotion to performance goals even when times are hard
- Productive paranoia: cash in the bank, reduce risk whenever possible, anticipate killer strikes
- Don't bet on luck. Bet on being good.
- Seize opportunity when it arises.
... is the open-source firmware for my new router. I want to do per-MAC bandwidth tracking, and here are some leads.
Interesting article from MIT about a paywalled article in Science about a new technique for data mining developed at MIT, the upshot being apparently that it's doing curve fitting with no preconceived notions of the variables being fit. Or something. I need to read it after some sleep.
Another post about open gov - "Dear Internet, it's no longer OK not to know how Congress works", which is clever, but instead largely about disrupting the system with better political software, which I like.
Saturday, December 17, 2011
I do my thinking aloud here and on other blogs, and one of the perennial problems I have with that is that Blogger has no particular way of dealing with task lists. (Boy, that sounds stupid, doesn't it?)
Seriously - a blog is a fantastic way of entering tagged text that could be scanned for tasks, progress notifications, and even completion of tasks in a structured way. Remember how I said Blogger has an API? Well, how about the following scheme, then?
1. Introduce a task by prefixing it in the title, like "Task: Write a Blogger to-do list manager". Then introduce the tag just by making up a tag for it, e.g. "todo list manager". The tag can now be a miniblog for the task, you see - for free.
2. Progress reports are just posted using the tag. Optionally, if you put a percentage in there, you could use it as a completion estimate.
3. Completion is also flagged in the title, with the word "complete".
4. The current to-do list can now be generated automagically with a simple script you run whenever the blog is updated (or periodically, or whatever). I'd personally write it as a Perl script against the Blogger API run on my local machine.
5. If you post any post named "To-do list" or something along those lines, the to-do list can be updated into it (say, at the end, or wherever a given comment appears). The current to-do list can link back to old to-do lists of historical interest, and you can just post another one whenever you feel it's appropriate.
6. The updater can even make sure that the current to-do list post is the one linked from a sidebar highlight. You could even put your to-do list on the sidebar (perhaps in an abbreviated form).
So. I should do this as soon as my vacation starts. And on that note, I'm going to get back to work to hasten that very day.
Neat post on HNN about bug prediction at the Goog; answered at HNN with a link to Microsoft's publications on empirical programming, many of which are mouthwatering. Gotta look at this when I have a minute.
OK, so as soon as I've finished my current project-that-will-not-die, there are a few things I've been meaning to pay more attention to. Here is something like a list, roughly in order of age.
- Paraphrasing tools. This is something I came up with a couple of years ago that would be a lot easier now that I've spent some time thinking harder about NLP.
- HVPT word pair trainer.
- Depatenting, still, I guess.
- Despammed rebirth, possibly based on CRM114.
- Practical PHP exercises as kata.
- Run back through the big translation project management tasks from last spring in light of Windows automation.
- Code structure examination of OpenLogos, finally.
- In general, continue automation of my translation workflow.
- The Heritage Health Prize. Even doing halfway decently on it would be good advertising.
That ought to keep me out of trouble for a while. Now I can close some windows.
Nice to-do/project manager application - but so very many of its features are premium! (Which is smart, sure. It's just that I've wanted to do a task manager [again] for a long time. And this one is ripe for analysis.)
Here's a top-ten list of Web to-do apps.
Wednesday, December 14, 2011
Tuesday, December 13, 2011
OK, now here is a macro language to end all macro languages - Syn. The point of Syn is to provide a language that just operates on parse trees, and thus compiles to ... anything. Exactly where the code generation of Decl is aimed. Fascinating read!
Sunday, December 11, 2011
Friday, December 9, 2011
Neat iPad app that listens to the mic on your iPad to analyze your guitar playing, then runs a game where you have to play particular chords to lead animals out of the zoo. Like Guitar Hero, except it actually teaches you to play the guitar!
The market cap is stupidly large. I mean, really stupidly large - do you know how many people want to learn an instrument? It's applying the superstimulus of gamification to allow you to reach a goal you desire. So I predict they're going to make money by the boatload.
I'd like to reverse engineer their signal processing (which they say is patent pending, to which I say boo!) and provide it for open-source games. That would be neat.
Tuesday, December 6, 2011
Another effort (still just getting underway, really) that makes sense; Tim O'Reilly posted about it recently and I guess O'Reilly is supporting it: Civic Commons: "Sharing Technology for the Public Good". The only actual open-source release I can find is EAS, the "Enterprise Addressing System", which apparently provides a database for civic organizations to use in keeping addresses up to date? Anyway, its issue list is actually kind of long, and it's in Python. Perhaps it makes sense to analyze this as well.
Monday, December 5, 2011
So I wrote some code (finally!) into NLP::Tokenizer to pull out n-grams, and ran across this article about using bilingual n-grams in translation. This article blew my mind, for a simple reason: it assumes that n-gram alignment between languages even makes sense. Fine, I guess if you're restricted to English and French, like the article (actually a set of slides, not an article - whatever), then you might be OK. But German? Hungarian? These guys aren't translators.
So anyway, I ran the n-gram extractor on a rather large German corpus extracted from some HTML files, and ... honestly, I couldn't see much of a way to use the results. I'm thinking really that something more like a Markov network and subsequent identification of ... frames or whatever you want to call them would be more useful.
Not sure yet, but later this month I want to spend some time finding out.
Ran across a few promising-looking modules in CPAN this week, all grist for the (currently hiatized) WWW::Declarative mill:
HTML::Seamstress is a module that actually uses HTML and classids as the template language for HTML output. That's ... pretty clever!
Web::Scraper is another Web scraper module, but it looks rather promising.
HTML::Entities is a fantastic module for dealing with quoted HTML entities. It came in quite handy in textual analysis of some HTML I had to do this week. Very nice indeed!