Monday, May 28, 2012

Google stats on Blogger are essentially useless

The new Blogger UI presents basic stats integrated right into the blogging interface, so I see them.  And they are utterly useless.  Yesterday's post about shell command apps has garnered 48 hits.  From where?  Why?  No idea.  I do see 43 hits (to somewhere on the blog) from a porn site in Russia - which is, yes, a way of spamming my stats reports, thank you very much.

I can only see the top 10 search phrases that brought people here.  That's, yes, useless - on a blog with a topic as sprawling as the intersection between programming and semantics, of all things, the phrases indicating people's interests must number in the hundreds or thousands.  But I can't see them, so I don't actually know what people are interested in reading about.

It's enough to make me write my own analytics platform.  Just for Blogger, and just to track what I actually want to track.

Update 2012-12-12: and I'm not the only one - see my post today on Better analytics!  I need to set up a feed for search terms.

OECD country ranking explorer

Here's a neat little JavaScript app from the OECD ranking countries.  What I find most attractive here is that the state is saved to the URL, so if you bookmark the page or send it to somebody, you'll be sending the current version you're looking at.  Brilliant!  (And exactly what I want to do with the book series.)

Markdent

Well, naturally I want to write my books in Markdown instead of HTML, so I've been prowling around CPAN for proper Markdown processors...  They're all direct-to-HTML converters, except one: Markdent, by the ubiquitous Dave Rolsky.  Markdent is an event-based, configurable Markdown dialect parser.  So that's what I'm going with - because I want to be able to pull out links and the like as I go along my tree, and later I want to be able to output HTML5 article/section/aside tags, especially "aside" for sidebars, and I want to be able to write a markdown block for those.

Actually, I'm probably going to standardize on Markdent in Decl as well, once I get back to it.

Of course, that introduces a Moose dependency, but you know what?  Maybe it's time.

Sunday, May 27, 2012

Quick aside on shell-command apps

Heckle is the third shell-command application I've written in the past few months and I'm starting to see enough repetition that I'm feeling the need to factor out some boilerplate.  I'm just saying.

What I really want is Jekyll...

... only in Perl.  Which, you know, doesn't actually exist.  There's Hyde in Python.  Maybe Heckle or Hekyll?  Hekyll is actually taken.  Being, you know, obvious.  "Magpie" would be nice, but ... is even taken on CPAN.

I think I'm going with Heckle.  App::Heckle.  I'll worry about a plugin structure later.  Use TT for templates.  Stupid static generation.  Multiple blog feeds (any directory can be a blog).  Tags.  Book structure - i.e. hierarchical chapter/section structure.  Math.  External resources.  Assume the use of a modern VCS.

Thinking about book compilation

I defy you to find anything about compiling books with Google.  I get a lot of interesting links to books about compiling, though.

So this time, I'm trying (for once) not to reinvent the wheel, and use something already written to compile my books, and I keep coming back to Wiki compilers, of which there are several in the Perl universe.  The top contender seems to be Ikiwiki - it's actively developed and supported, has a great plugin structure, and already runs on top of a VCS out of the box, which is exactly what I want.

The only thing that worries me is this: how much is a tutorial series like a Wiki?  I suspect I'm overthinking this, so I'm just going to install Ikiwiki and try writing some tutorials and formatting them.  Someday I'm going to have to learn just to start.

Update: Oops.  Ikiwiki appears to be non-Windows-compatible due to a sad predilection for colons in its filenames (and maybe other reasons, I don't know yet).  I'd still like to see if I can get it running, but if not, then maybe I'll be reinventing that wheel after all.

Saturday, May 26, 2012

Moment of geekness

Back in 2010, Quantz's latest dinosaur comic featured an anagram.  The moment of geekness for me today was discovering the solver site for that anagram.  (The only real problem with this post is I don't know how to classify it.)

Target application: Woopra

Woopra is the analytics platform that Google tracking isn't.  Its feature list is droolworthy.  I'd like to do something like it as one component in an overall business process workflow system or something.

A quick note on setting up site hosting at Github

This was actually easier than I'd thought.
  1. Create a new repository (in this case, mltut).
  2. Create a page branch from the admin screen.
  3. Check out the page branch:
    - Make a directory, in this case, mltut-site in my projects directory.
    - Set the origin to, in this case, get@github.com:Vivtek/mltut-pages.git
    - Fetch from origin
    - Check out branch gh-pages
  4. Register the domain (in this case, mltut.com).
  5. Set the IP to 207.97.227.245
  6. Create a file in mltut-site named CNAME containing, in this case, www.mltut.com
  7. Push that file to the branch.
  8. Wait a few minutes for DNS records to get pushed.
And then you have a new site: www.mltut.com.  There isn't much there yet.  The idea is to put the tutorial material into the mltut project, and publish it with as-yet-unspecified tools into mltut-site, which will then get pushed out to the site itself, as though by magic!

Friday, May 25, 2012

List of Linux monitoring tools

Some of these are pretty basic, but some I'd never heard of.  List here.

Exception::Base

Exception::Base is another Perl exception mechanism - one that might have a decent place in Decl.

Getting started with node.js

Nice rundown of open resources.

SIP

SIP is a part of the whole VoIP thing, apparently.  Here's the first open-source SIP client.

n0tice - open journalism toolkit

The developer's page for n0tice.  I'll just leave it here...

Input validation

Nice post on input validation in Perl.  This is a concept - like error handling in general - that I need to think more deeply about.

Python for data analysis

O'Reilly has a new book out on data analysis.  I'm sorely tempted.

Another robot competition

Scribd has a fun little competition up for robot programming.  Nice Javascript tutorial space!

Geometry as interactive design space

Bruce Cohen, Speaker to Managers, has a fascinating post up on interactive geometry that's worth a read or two.

Monday, May 21, 2012

Upcoming NLP conference in Austria

I might go.

Rory Sutherland talks about economics and marketing

Here's an interesting talk by Rory Sutherland which mercifully has a transcript (seriously, I'm just not video-oriented).  A pretty bad transcript, mind you, but intelligible.

Upshot: people aren't reasonable, and there are patterns in this irrationality that we can both understand and exploit in our marketing.

Sunday, May 20, 2012

Multitasking in Perl

Gabor Szabo's posts on G+ have been pretty interesting lately.  Here's a nice article about Perl multitasking in a Web environment.

Minimal Perl embedding

Nice minimal example of embedding Perl in C.

"Why I am creating a programming language"

Always nice to see people articulate one's own thoughts.  New programming languages represent new ways to see the world.  As such, you can't have too many.  I like it!

Citation cartels

Another interesting tidbit from the realm of citation analysis: the emergence of citation cartel by means of which the editors of Cell Transplantation cited a huge number of papers in their journal in order to increase its ranking.

Adversarial design

Adversarial design is a way to present debates.  An example.

Flotr2

Another nice HTML5 graphing library.

Google's knowledge graph

Google does semantics.

Data-intensive text processing with Map/Reduce

A book that I should read.

Event detection in a photo stream

This is cool; they've got a multi-source photostream with an API, and they've implemented some kind of online aggregation to detect "events" that are getting lots of pictures taken of them.

Email::Simple::Markdown

Email::Simple::Markdown, a slick way of generating multipart MIME mail without the cruft of Email::MIME.

LDAP

I tangled briefly with LDAP once before.  How relevant is LDAP these days?  I always thought it was a pretty slick alternative to the RDBMS.  Anyway, here's a "gentle introduction".

Thursday, May 17, 2012

Blog spam

Searched on "book hosting" - got this.  Fascinating.

Machine learning in Perl

Well, my ability to keep up with the Caltech ML course seems to have been roughly equivalent to keep up with any other course - worked great for two weeks and then ground to an ignominious halt.  However, I have to say that I truly love the format of the class.  The homework is really only loosely based on the lecture, so to do the programming required to answer the homework questions, you have to think about it all.  It's a great way to learn.

Which brings me to my next project idea: write a book roughly along the order of presentation of the (practical parts of the) Caltech lecture, leaving references to most of the math and providing code samples.  Yes, I mean write my homework problems at my leisure over the next few months, and then some explanatory text around them, and call it all a book.

But this would be a little more refined than just that.  I'd like to keep the notion of making you work for it, if you're so inclined, so first, each chapter would break down into a presentation, then a set of questions for you to answer (the homework, in other words), then code samples answering the questions and a walk-through of why they do so.

And then it would be just a short step to an interactive course - courselet? - that would allow you to force yourself to answer the homework before going on.  Seriously, if these Ivy Leaguers can do it, so can I.  The key is that the Caltech homework is multiple-choice, but designed in such a way that you have to do extensive experimentation (read: write a bunch of code and run it to get some measurements) to answer the questions.  Some questions on the Coursera quizzes are organized in the same way, but I thought Caltech took this to a rather nice extreme - which I really liked, because it allowed me to work in Perl as God intended, rather than having to submit to their language choice.  Why?  Because I was running the code on my own machine, not on their server.  The key is basing your questions on measurements, not on actually running the code.

The answer evaluation part could even be in Javascript - no server-side needed at all.  The idea is to "unlock" further portions of the book, like a game almost.

Just a thought.

Sunday, May 13, 2012

Light Table, a new IDE concept

With Kickstarter funding, I might add.  Worth thinking about.

Botnet operator AMA on Reddit

So here's a German guy operating a small botnet in his spare time, writing his own code (and making it polymorphic by shuffling the source code using Perl, then recompiling, which I find pretty fascinating).  Interesting stuff here.

HelPico

Neat micro-sized email-based helpdesk software application - HelPico.

Saturday, May 12, 2012

Metaprogramming on StackExchange

Very interesting thread on StackExchange. [hnn]

LiquidFeedback

About the time I started getting back into programming and started this blog, Reddit kinda tried to start a Pirate Party.  Some of us were kicking around software-based ways to make collective decisions - it didn't really amount to anything.  But I do remember one of the Germans being pretty interested.

I'm pretty sure that interest ended up influencing the new online collaborative decisionmaking software LiquidFeedback.  It is, of course, open source.  Here's the further-development page.  Seems to be Lua!

JS best practices again

A pretty good article on JS best practices and why the community should pay attention to them.

IPtables: rate limiting by IP

Another starting point in my quest to grok iptables.

Modern Python idioms

Y'know, Python 3 has some sweet syntactic concepts going for it.

Dice roll simulator

Here's a neat little JavaScript dice-rolling simulator (numbers of pairs of dice arranged in a convenient resizable popup).

Thursday, May 10, 2012

Social network analysis

Oh, look - an online textbook on SNA.

More open-source books on computing

There are a couple of interesting titles here, including Data Mining Algorithms in R.  Definitely worth a look.

Citation network analysis

Oh, now here we've struck paydirt!  This is an article describing analysis of the citation network of papers making a particular medical claim (a particular substance being associated with a syndrome), and how the citations lent certain papers authority even though they ultimately could be shown to have left out experimental data critical of the claim while citing data supporting it.

This is so very much the kind of project I'd like to be doing.  More on it later.  I've filed it under "data journalism", even though it's not, really - but it is the kind of analysis that could support journalism.  I've also started a new tag "social network analysis", because this article alone has convinced me that it's a real thing and not just a buzzword.

mag.js - Interesting-looking JS magazine

I need to spend a little time reading this one. The initial issue has a couple of interesting articles, especially a nice little list of good links on learning JavaScript.

Recursive drawing

This is just jawdropping.  I need to spend some time thinking and playing with it.  Be sure to watch the video.

Speaking of moving

I haven't been posting much - and worse than that, work on Decl has ground to a halt since about October.  Things have been hectic here, with lots of travel, and I've been trying to improve myself by taking the many online classes now available (Stanford ML, Stanford/Coursera NLP, Caltech ML).  Unfortunately, a class schedule is not proving very compatible with my workload, but they do seem to be sapping my will to create.

My hope is that once all the travel is done (along with the upcoming intercontinental move to Hungary) I'll be able to get back to real programming again.  So that's my meta post for the month.

Prezi - cloud slideshow app

Prezi (a Hungarian startup) has been seeing some impressive statistics on use of its app - over 10 million users and growing.  (This is a plug for Hungary, soon to be my country of residence for a while.)