Thursday, March 31, 2011

Target application: transloadit.com

Neat concept site: upload a file to a target and have robots do predetermined tasks. Cool!

Tuesday, March 29, 2011

Boilerplate

That's the word I was looking for, for overall structure of a given project or deliverable or system or whatever. Boilerplate.

Monday, March 28, 2011

Coding patterns?

I'm wrestling again with what, exactly, I think I mean by "semantic programming". Clearly (sometimes) it's to develop some kind of structure that represents the purpose of program parts, their semantics.

In short, design patterns, except maybe at a very small level.

It's astoundingly hard to define this.

Perl Design Patterns is something along the lines of an attempt to codify this stuff. Kind of. And people talk about more and less Pythonic code - the very idea of programming language "accents" is something along these lines, as the semantic structures behind certain language features can vary broadly.

I want code that understands other code. Just not sure how to get there from here.

Usually, it seems that the notion of "design patterns" is intimately associated in people's minds with object orientation and Gang of Four thinking - which is pretty natural given that the GoF originated the term. But it's clear to me that even the humblest for loop or if embodies a design pattern of its own. That's the level I want to target.

It's what you perceive when you look at some code and say, ok, here's a function, here's a loop, here's a variable. It's all the low-level concepts you use to understand code.

Update: back to the Google well I went, and I found! The key was "loop patterns" - beware; 1997 = broken links! That led me to "Pattern Languages of Programs" = PLoP = ChiliPLoP, which appears to be managed by the Hillside Group. Here's a good place to start. A "pattern language" is not what I think it should be. Honestly, what people call a "pattern language" is more what I'd call a "pattern library", and it essentially describes a bunch of nouns that people use to make sense of things. It's in the right direction, but ... not far enough in the right direction. Yet.

Also: workflow patterns. Really, at one level, Decl is going to be about machine-readable patterns.

Thursday, March 24, 2011

File structure specification

Here is the specification for the IFO format used on DVDs. Decl's "file" tag should also be able to parse binary files in some way. This is a typical file format, so I'm tossing it out as an example.

Update: General directory structure is important, too. An example of How DVDs Work is at doom9.net.

Machine learning

So Waffles is a C++ library/command-line toolkit for common machine learning tasks, and looks mondo cool. And just in time, too, because machine learning is in the news!
  • The Heritage Health Prize is a $3 million competition in machine learning to predict unnecessary hospitalization. They have a Facebook account for news posts. Starts April 4.
  • It'll be run on Kaggle.com (previously) and apparently there are two warmups: one is author identification from Arabic handwriting samples (!) and the other it also pretty interesting: try to develop solutions that don't overfit if given test cases roughly equal in number to the number of variables.
  • Finally, found through spacehack.org: DASH, a collaborative program for data mining pertaining to aircraft systems health. Verr' cool!
Again: in a metaprogramming mode, Decl could generate code in any language to satisfy requirements in a given domain.

Saturday, March 19, 2011

Pratt parsers

An implementation of a Pratt parser in Java. I'm too tired to understand anything but Bahnhof.

Algorithms

More link dump.

The BK tree is a method of arranging trees of strings by similarity - probably not what TRADOS does, but something like it. Useful in future.

Modular Data Processing library for Python. There's no reason a Decl program couldn't write its own tools in Python, then either embed them using Inline or call them through the shell.

Parsing Techniques: A Practical Guide. Ebook of the old edition is free.

Speaking of NLP, the HBGary emails might provide grist for the mill.

Overview of text extraction algorithms, or a pointer to same; it's actually here.

PHP frameworks

A good article in Spanish.

Node.js

The gift that keeps on giving.

Design

More link dumping.

Link dumping: scaling

It's time to reboot my browser, so it's link dump time.

A PDF presentation about scaling and PHP best practices from the NY PHP User Group.

And PHPFog, a scalable PHP hosting service with a convenient why-to.

Also: a blog, High Scalability.

Target app: Mywordpuzzle.com

A Hacker News word finder.

You can think of lots of add-ons, right? So can I.

Thursday, March 10, 2011

Scrumblr

Another excellent node.js application! Deserves disassembly.

Wednesday, March 9, 2011

TXL: Tree Transformation Language

I essentially think of Decl macros or maps or whatever as falling into two categories: first is the text expression of a template, but the other works on tree structures.

Googling on "tree transformation" is a useful exercise, and five minutes perusal finds me two good links: TXL (the tree transformation language) is precisely what I want to be able to do (with a rule-based approach), and then there's an interesting paper ("A Language for Bidirectional Tree Transformations") by the Unison guys at UPenn about reversible lenses that act between tree structures, which is of course my map concept. (I should probably read everything they're doing.)

So those need to be read carefully. I'm sure there's more.

Anatomy of a crushing

An excellently written article on lessons learned in scaleability.

Link dump pure and simple

And now some harder-to-classify stuff:
  • Metamarkets: why generic machine learning fails.
  • MRPostfixBounce: an approach to email bounce handling. Should implement this for Despammed. Should do a lot of things for Despammed... Sigh.
  • Some lectures on organizing systems. My sleep dep level is high enough I can't process them right now.
  • pycparser: Yeah, a full C parser in pure Python. Neat!

Best practices for code organization (Web app version)

By the author of Phonify. This is a good approach.

Class::Declarative --> Decl

This week I decided that the overall semprog framework doesn't really belong in the Class:: module hierarchy, so I renamed it to "Decl". Saves typing, too. I also rebuilt the tag declaration code - it's still not as powerful as I want, but at least it's starting to address the namespace issue.

So there's that.

Target app: wireframe prototype tools

A survey. #15 is just HTML, interestingly.

An excellent example app in node.js

Now this is the kind of article I like to see! I'm going to suck all that into a template for WWW::Publisher.

Jade (templating in node.js)

Jade is a pretty neat templating system based on Javascript. I can steal a lot of it for Decl.

Target domain: Accounting

You have to admit it would be useful. An interesting theoretical approach here.

Javascript loaders compared

In a convenient Google spreadsheet.

UX design

Hmm. One post (designing with behavioral economics) and the blog.

Using Mechanical Turk

Some interesting movement out on the MTurk front: HNN post, Crowdforge (a framework for using MTurk effectively), and Soylent (a Word add-in for harnessing MTurk to improve documents).

Update 2013-1-1: Crowdforge, I now see, is released under a non-commercial license.  But it appears to have some kind of notion of workflow definition.  So it's interesting, but not sufficient.

Some NLP things

First, a Language Log post on legal automation. This made me sit up and take notice (as usual) and I found a long list of resources at Stanford, and an MIT OpenCourseWare graduate class on NLP that I will be working through. There's a Stanford class as well.

Link dump time: Scrappy

Scrappy is a Perl web scraping module that is starting to look pretty damn nice.

Sunday, March 6, 2011

Neat: noise addition to element backgrounds

A jQuery plugin to add noise to backgrounds of elements. Neat effect.

Wednesday, March 2, 2011

CSS and icon link dump

Cleaning up my browser tab bar again. Designyourway.com is pretty cheesy, but they link to a lot of really great stuff. Resources feed. 27 CSS frameworks, including css-boilerplate, which seems moribund but still interesting. And 35 of the best minimalist icon sets.

OpenCYC

CYC is open source! Probably happened years ago, but this is great news to my 25-year-old self.

Variant mapping of source code

I had an idea a little while back that keeps coming up in my thoughts, and that's variant mapping.

For example, Anaphraseus is a translation tool similar to TRADOS's Word-native libraries, but it's written for OpenOffice. I want to use it in Word, though, so I thought it would be nice (see) to write a cross-parser that would create a converted Word-native Basic variant. As updates arise for the OO.o version, you'd keep running the converter, and if you made changes on the Word side, you'd back-convert to put your changes into the master. Obviously, to a certain extent this would be inherently lossy (some changes wouldn't make it back), but if done carefully, it should work.

Then I ran across the ongoing drama about forking reddit open-source and realized this is the same kind of thing. Reddit's license for forks requires detailed documentation of all differences between the original and the fork, which an automated system would be able to do.

Just a thought. Even if it only managed to auto-convert some portion of changes to one end or the other and could flag the rest, it would be a useful tool.

Squirrelmail

So I realized the best possible way to understand how a PHP application is put together is to examine an actual open-source PHP application. Squirrelmail is a possible target - it's rather large and thus harder to get a handle on, but it's old and stable and thus probably a decent model to emulate.

One thing I noticed immediately about PHP that I really don't like - side effects. It seems to be OK PHP practice to have a require or require_once that sets up variables for later use. Good God. There's no way to know where they came from; it's horrific.

Not that I don't do the same with my local loop variables, I suppose. Oy.

Anyway, so it would be nice to be able to set up an analysis framework for Squirrelmail (or what have you in terms of a Web app) that could impose some structure on the whole thing to allow redesign.