Semantic programming: August 2012

Wednesday, August 29, 2012

Pipes

Remember Yahoo Pipes? Yeah, neither did I - but the idea is damn good, and IFTTT, for one, is doing it with profit.

Building programs using Web APIs as building blocks. Don't forget this.

Steve Yegge on practicing programming

As usual, thoughtful blogging from Steve Yegge (not that it's new).

Gamification and the death of Angry Birds

Interesting article for gamification industry obsessives. This should probably go under a "business plan" rubric at some point.

Numba

A NumPy-aware Python compiler. That's ... mind-blowing, actually.

How to Build a Web Startup

Steve Blank has another distillation of the startup process. I love this stuff! It's like popcorn.

Monday, August 27, 2012

The NY Times just published a fascinating article on a guy who sold reviews for ebooks on Amazon, B&N, and other venues. It didn't last long (goes against Amazon's ToS) but at its peak he was drawing $28,000 a month.

I love reading about business hacks like that.

Here's an exercise for the reader (and for me, should I ever get the time) - design the business processes for that business, including any interface code for the various roles (reviewer, author, etc.) and the interface code to Amazon for submission and tracking of reviews. (That latter should be done with Javascript in the reviewer's own browser.)

It should be brain dead easy to design and iterate a business of this nature. Less than a day, at most, between idea and execution. That's my goal.

Update: just realized I saw the same story in an NLP context last week: textual analysis to determine whether a review is false or not, with the same usual counterintuitive linguistic markers.

Word

Getting back into some Word and Excel automation for a programming task, and it has once again become painfully obvious that the state of automation of Office software is incredibly antiquated.

So: a real Perl OO wrapper for Word (and for Excel, but of course I know Excel in far less detail) would be a valuable thing, and could easily be factored out of my earlier Decl work with Word. (And of course used by it, once I figure out that bug.)

This could easily be generalized to an OLE wrapper class as well that could be done in a more declarative way, but compiled.

This sort of thing would be incredibly useful in my daily life, and it's clear now that coming at it from the Decl end was a diversion. I got excited about Decl and wanted to do all my programming there, but I should have been working directly in Perl.

Honestly - just a good documentation of the Word object model that didn't depend on Word's own help system would be a step forward. (It's just that it's huge.)

Decisions:

Call the class Win32::Word.
I'm chickening out and not using Moose. I still feel uncomfortable about the overhead, which is probably just me being old.
I will, however, use the same declarative approach for OLE wrapping that Moose does, with the same syntactic sugar function trick.

Here's a neat idea I thought of, incidentally - search Github (and other sources of code) for instances of function/method/variable names (come to think of it, the constants would be a good thing to search on!) unique to the Word API. That should net me a nice library of use cases, both for unit testing and for a nice cookbook.

Let's face it - Word automation sucks incredibly, it really does. But it wouldn't have to.

Win32::OLE::Word should be the thin API, but I think something like Win32::OLE::Word::Sugar could include some significant simplifications and extensions to deal with common use cases (like searching for things in multiple stories).

Friday, August 24, 2012

Graphical generation

Just a note for my later amusement: I've been thinking of the automatic generation of graphical forms - specifically, 19th-century masonry. I want a city generator that does for me what London or Richmond, Indiana give me - shivers down my spine that I don't even understand.

Asteroids in LISP?!?

Yes.

Machine learning books

Some books and things on machine learning.

Foundations of Machine Learning (MIT Press 2012): Mohri, Rostamizadeh, and Talkawar - expensive and new.
Elements of Statistical Learning: Data Mining, Persistence, and Prediction (2009) - free online: Hastie, Tibshirani, Friedman.
Mohri's class at NYU.
Pattern Recognition and Machine Learning (2007): Bishop.

Tips for cleaning up crufty codebases

Refactoring and cleanup of old code is perilously close to code understanding.

Programming language social mores

The ever-entertaining Zed Shaw writes about the realization that what are often called programming language "idioms" are actually social mores used as community membership signifiers.

AMP Camp

AMP Camp was a Berkeley thing about big data that I didn't have time for. Maybe soon.

Thursday, August 16, 2012

Data mining astronomical literature

NASA has been working hard since the 90's to ensure open access to the astronomical literature. Here is one man's data mining response.

Anic

Another neat experimental language, Anic.

Backing up data on paper

I have utterly fallen in love with PaperBack.

Take a set of files. Turn them into a graphic. Print them and stick them in a drawer. Later, scan them and restore the files. Dear Lord, is that beautiful or what?

Adapting price points for SaaS

Patrick Mackenzie again, this time with an article about adapting price points in software as a service. He's all about the autostartup.

Steve Yegge on "conservative/liberal"

Yegge weighs in on mental habits in programming, making the analogy to political conservatism/liberalism by noting that risk-averseness is the key insight. (And not hexapodia, as previously believed.)

Grok

Here's a fascinating little article about a Google project called Grok - led by Steve Yegge, who's been doing some equally interesting bloviation about programmer mentality lately (of which more in a separate post).

Grok is still close to the vest, but it's presumably about code analysis as expressed in an improved and unified build system. It has subsumed the no-longer-public Google Code (durnit). In short, it's making me salivate.

DataNitro?

A snippet about options pricing in Excel with Python and DataNitro.

Syntactic

Syntactic is an open source, unsupervised lexical categorization project.

HNN muses about failure

So some guys did some video games on the AppStore, and it turned out they were doing it wrong when it came to the actual part where you make enough money to survive. HNN talks about it.

Wednesday, August 15, 2012

Musing article about ORM vs. ... not ORM

This is a good article about aspects of the philosophy of ORMs.

ORM or not, you still need separation between the model and the persistence layer. Keep all the SQL together or you'll end up with a maintenance nightmare.
ORM is a quick out-of-the-box solution for lightweight systems.
ORM isn't too great for complex data models or database-specific functionality (PostgreSQL) or if you need performance.

Essential C

For those learning C, you could do worse than this convenient page at Stanford.

Twilio hacking

I'm not even "hacking" yet - just trying to find the time to get off the ground. Here are a couple of bookmarks along the way.

Client browser soft phone
Scheduled reminder app
A little Perl script hitting Twilio
A second number service for testing (since I'm not currently in the States)

Claws mail

Another possibility for a mail client front end. I can't tell how automatable it is, but at least it provides a nice list of features to strive for should I want to come up with a reasonable Perl-based mail client.

Booking: mining massive datasets

Here's a useful-looking book on scalable data mining.

Processing schema.org markup with Perl

Nice outline of quick extraction of semantic markup in Web pages using Perl and a microparser.

Thursday, August 9, 2012

Transparency in scientific research

Open access and transparency in science - a juggernaut.

Combinatory parsing library ... in C

Here's an interesting library for building fast parsers in C.

Scalable Machine Learning

Here's a Berkeley class on scalable machine learning.

Titan graph database

Titan is an Apache-licensed distributed and scalable graph database.

Niklaus Wirth's current text on compiler design

Fresh from Zürich, Niklaus Wirth's latest.

Command lines

Here's an interesting point arguing that command lines are based on verbs and thus conceptually superior to GUIs, which are noun-based. Hmm. Or as he also puts it, CLIs are a linguistic interface, which I agree with.

The problem for me is that CLIs are far too restricted. Decl is about something like adding pronouns and antecedents.

Another introduction to machine learning

Thick on the ground lately.

Zyngapocaplyse: towards gamification 2.0

Some interesting thoughts about the future of online gaming, from Techcrunch, a source I don't normally associate with in-depth examination.

git's object model and why it sucks

I have to say, even though I've been using git for a little while now, I'm only scratching the surface of understanding it. This article is useful.

HNN muses about personal computer hacking story

Lots of good advice here.

Pixar open-sources a component

This is cool! OpenSubdiv calculates surfaces for animation.

Detecting billboards in photos

Neat little post-mortem on some scripting by a billboard company involving image processing.

Edge prediction in a social graph (Kaggle post-mortem)

Neat! I love it when people explain how they solved a problem.

Using SAT and SMT to defeat hashing algorithms

Using operations research algorithms for fun and profit. I need some time to read this one.

NLP for the working programmer

Ebook site nlpwp.org covers natural language processing for the working programmer - in Haskell.

UX/UI design on HNN

HNN has a great ask-me post on UX/UI design. I know it comes up often, but one of these days I really need to pay more attention to it.

Berkeley does climate analysis and publishes code

A team at Berkeley recently did some (Koch-funded - richly ironic) climate research to reevaluate anthropogenicity of climate change. The refreshing thing is they published the code they used to do so. This is a fantastic new trend and I hope it continues to gain momentum.

Code management at Intuit

Intuit has been around since 1983, and offers a series of financial management products that largely share a single codebase. A codebase of 10 million lines of C. Dr. Dobbs interviewed one of their managers about how they do that, but here's the gist:

A single massive codebase and build system that manages multiple build variants (localized versions, versions on different platforms, actual different price-point product complexity variants, etc.).
Continuous integration and testing.
Automated QA.
Rapid iteration in small teams of five or six, which I find somewhat surprising.

Wednesday, August 29, 2012

Monday, August 27, 2012

Friday, August 24, 2012

Thursday, August 16, 2012

Wednesday, August 15, 2012

Thursday, August 9, 2012

Random Post

More information

Search This Blog

Blog Archive

Topics of interest

Alphabetically