Wednesday, September 28, 2011
Final flurry of Stanford-related links
Here is a fantastic compendium of "nature-inspired" AI algorithms, each described with pseudocode and Ruby. It's this kind of thorough survey work that really makes my world go round.
Machine learning
So as I mentioned earlier, I'm going to be working through Stanford's online machine learning course this fall. I expect there to be a lot of things I'll want to work into Decl and its approach.
The class itself is here; I've also got some supporting links I'll want to keep track of:
- BYU's machine learning and data mining course.
- Book: The Elements of Machine Learning. More math, I think.
- Octave documentation.
- Reddit on Machine Learning and on this class.
Ideally, I'd like to have a toolset of reusable tools that was literate-programmed using Decl when I'm finished with this course. Might be too ambitious, but we'll see.
Sending email: best practices
When sending email from a host, here are the things you want to do to avoid getting onto anybody's block list. Good luck on that.
Target application: Promoter
This is a great example of a simple, clear, well-implemented application that represents the kind of thing I want to be aiming for. It combines a database with Web monitoring. Perfect.
haXe
haXe is the follow-on for the Ming SWF library; instead of compiling Actionscript, they decided just to define their own language that could then compile to SWF, Javascript, and some other stuff.
I find that a tad irritating because I'd like to have something that could compile Actionscript 3.0 into SWF for me that was open-source, but the effort itself is pretty cool and worth looking at.
Diagramming again
Along with Octave, I've been thinking that presentation of systems might benefit from a diagramming language. Not that this is a new thing for me; it recurs with tiresome periodicity, actually.
But this is its first recurrence since I have a pseudocode parser at the ready.
So I've been looking at different diagramming systems for inspiration. I'm not even primarily interested in a diagram editor - just a display based on a structural language. Like graphviz, really. I could see diagrams being editable later, and I could certainly see them being clickable links in a system description or the like, but right now my primary focus is typesetting of readable documents, I think.
This post has been pretty thin in terms of doing anything except establishing a point of interest, hasn't it?
TeX
I've been meaning to get back to TeX lately, for two reasons. First, my wife is doing things with physics that need to be typeset, and second, TeX has some pretty neat diagramming tools, like xy-pic.
The only problem with xy-pic (and with TeX in general) is that it's syntactically unreadable. I am so terribly uninterested in decoding @{0->}<>33x\dot or whatever, and so again I say "Decl macro system".
So I'm mulling over kind of a TeX/wiki mashup, I guess. More on that later, I suppose.
Octave
I've enrolled in the Stanford online machine learning course, and even before it's started it's making me think. It looks as though Octave will figure strongly in the course.
Octave has a pretty groovy set of capabilities, but I'm just not the unadorned command line kind of guy. I'm sure I'll be mostly running Octave on files. And here's the thing - what I really want to do with that is to generate the Octave code from Decl, because I'd like to be able to generate presentations from the same code, keep data in databases, and so forth. (Actually, I believe Octave knows how to talk to databases, but still.)
So I'm looking at Inline::Octave and we'll see how that works out. This is also a natural place to finish my macro system, which seems to have run aground again. Literate Octave programming.
Tuesday, September 27, 2011
DevOps choices at AppNexus
I don't even know what AppNexus is, but they need to scale, and their devops infrastructure is outlined in a handy-dandy blog post here.
Open source targets: BuddyPress and CUNY Academic Commons
BuddyPress is a WordPress plugin that implements a social network. It's used as the platform for the CUNY Academic Commons, an open-source platform for, well, academics at CUNY. It would be nice to help out open-source projects where possible, so this would be one place to start.
Draw a Stick Man
Very neat HTML Canvas design. [hnn] You could really imagine all kinds of storytelling on a platform like this.
Ticket Servers: Distributed Unique Primary Keys on the Cheap
A nice trick they use over at Flickr to get unique keys using a simple MySQL instance.
This prompts me again to muse about an infrastructure description language.
Real time face substitution
Very cool! This tool finds the face in a video feed in real time and substitutes it - kind of - with another face assembled from stock photos or whatever. It's ... uncanny valley squared.
But it's interesting, because it uses something called openFrameworks (a C++ library for creative work) as a platform, then combines it with FaceTracker (a C++ library for ... face tracking) and an image cloning library that looks pretty rad, too.
All of that is pretty neat stuff, so I wanted to bookmark it.
Saturday, September 24, 2011
mrjob
Yelp has a parallel job framework called "mrjob" - no, not Mr. Job, but map-reduce job - that currently only supports Python but that could be used to manage map-reduce jobs in any language.
Might be cool to try that with Perl.
NLP
At some point I'm just going to have to start, but:
- NLTK, and a list of suggested NLTK projects for further thought
- OpenNLP is an umbrella project for all kinds of NLP open-source projects
- ClearTK is a Java-based NLP library
- LingPipe ditto
- GATE ditto
- Xerox has a finite-state tool
OK, so here's the idea, and it's always the same idea. In a given NLP-domain problem, I'd model the data and the toolchain in Decl. Thus given a problem, you'd state the problem in Decl, and refine your solution progressively, always keeping the Decl semantic structure for the problem intact at each step. Here, it's almost a note-taking or documentation tool; the actual program would be written in Python and/or Java and invoked by Decl. It could also be embedded, of course, via Inline - but the point is that Decl needn't be seen as an exclusively Perl-based tool. It's also a litprog tool that can use macros to build anything else.
Ah, well. That's probably not all too clear. I'm tired.
What prompted this flurry of NLP searching was this Yelp blog post about a data set they're releasing to researchers.
Wednesday, September 21, 2011
Book: Mining of Massive Datasets
A Stanford book/course on the topic named. I should really just work through the thing.
Rhetorical analysis
I'm not even sure how the analysis of rhetoric fits into semantic programming, except that (1) it's NLP kind of, (2) it's research and therefore database-oriented kind of, and (3) I keep coming back to it.
The trigger is an article on CNN [HNN discussion] by Bill Bennett of the Claremont Institute tearing down the concept of spending public money on education (god forbid the teacher's unions should get tax money). There are a few little nasty tricks he throws in. I think it would be possible to analyze this kind of rhetorical treatment, maybe. Eventually. I'm not sure how to start, but it fascinates me.
The trigger is an article on CNN [HNN discussion] by Bill Bennett of the Claremont Institute tearing down the concept of spending public money on education (god forbid the teacher's unions should get tax money). There are a few little nasty tricks he throws in. I think it would be possible to analyze this kind of rhetorical treatment, maybe. Eventually. I'm not sure how to start, but it fascinates me.
Anyway, the article just pissed me off, so I thought I'd bookmark the stuff with this post.
Slick Perl trick: "enchantment" of coderefs
I should do this throughout Decl, actually: equip coderefs with debugging facilities. [monks] [another post]
Software maintenance target: CiviCRM
CiviCRM is a "constituent relationship manager" for politics. I like open-source politics, which is what got me back into programming in the first place, a couple of years back. So the software maintenance domain could usefully examine those projects, bug reports, and ... work on meta-software-maintenance tools.
Hey, I just want to immanentize the Eschaton, that's all.
Target domain: software maintenance
OK, so this is probably kind of a fluff post, but I want to be able to model the software maintenance process to the point that I can jump in and analyze a given open-source project, then contribute to it. That's it. Thanks for listening.
Monday, September 19, 2011
State
Persistence is a pretty central issue in programming in general. Let's assume I'm writing a family of scripts that will be used to interact with a particular set of issues, say (oh) invoicing. One of the things I've been mulling over is that if there is information missing for a given job, then invoicing can't proceed until that information is filled in. The action of filling in is thus a blocking action.
So far, so good.
But once I've come up with some actions, I have two options: toss it all, or save it to state. The benefit to saving to state is that if the state is human-readable, I can always write addenda to the state (say, filling in the missing values) that the script can then read in next time. And of course the state can be shared between processes and so forth.
Is that worth enshrining in the language? I honestly don't know. State is similar in this aspect to configurations, and naturally the database is also a form of state (or that is, can be used to save it). With state comes the concept of sessions, which can get arbitrarily complex.
How much goes into a language like this? I want to put everything in that's general enough that it recurs in different problem domains, and it seems that things like configuration, state, command line handling, and so forth meet that criterion.
So a state tag is going to declare persistent values of some kind. This will be the second collection of tags that require an extensible structure (the first being the database tags), because state could be:
- A node written to a file
- Something else in a file
- Some database structure
- Some combination of other persistent structures
- Anything else
So we need a driver system. Especially the "some combination of other persistent structures" deserves some careful thought, too.
Friday, September 16, 2011
Vagrant
This was on my to-do list, and now I don't have to mess with it. Vagrant sets up VirtualBox environments to spec (I think).
There's also a new online variant of the same idea: StackRocket.
Thursday, September 15, 2011
Monday, September 12, 2011
Dada Engine
OK, so back in the 90's, Andrew Bulhak came up with a snazzy little engine that interprets a grammar to generate random text. Well, we've all done that, of course, (here's a Tcl translation of one grammar) but his is implemented as an interpreter that takes the grammar as a specification language, and it can use troff to generate some pretty nice output. Like this, famously. And more recently, spam, apparently, which explains a lot. Here's a page that can run it on any grammar you like. (Incidentally, a Google search on "Dada Engine" turns up a paid ad for a code generator.)
So the engine itself is kind of boring, although he's put some really nice features into it. What really gets me going is thinking about that grammar specification language. I know this is Not a New Idea, but his sentence or phrase patterns are templates - syntactic units expressing semantic units, getting back to my Langacker days - that bear research. What he's missing is the semantic pole, although the context his grammar carries along is something in that direction.
What would be interesting would be something that could mine the Web for such patterns. And a statistical analysis of their use and interrelatedness to produce some kind of indication of voice/register for the text. That kind of thing. Not to mention a compact set of "typical error message" patterns, etc. for practical text generation in software. The point being that I think the pattern-based approach could work for both text analysis and text generation (not that this is a new idea).
This is back to my notion of the Lexicon, last looked at seriously in 2005. I really have to take a sabbatical from this damned having-to-earn-money thing.
Sunday, September 11, 2011
Hassle maps
Here's a PDF article about "hassle maps". This is essentially a way to think about user experiences in a typical task in the day, describing a narrative and identifying hassles in it - pain points - so that a solution can be evolved to remove them. It's worth a read.
Text generation porn
Oh man, the NYT has a thinly-veiled PR piece today about Narrative Science, a company that spins out stories automatically based on whatever data feed you've got. It makes me drool because I want to do that. Obligatory HNN reaction. (With mention of a competitor.)
Wednesday, September 7, 2011
Ifttt.com: Neat automation service
Ifttt.com: lets you build little web robots that update every 15 minutes and can take API actions. Pretty neat!
Video talk: You're stealing it wrong
I am really not normally one for video (it's inherently tl;dr to me), but this presentation by tech historian Jason Scott at this year's Defcon was quite entertaining. And at the very end, there is a list of URLs of sites in which Scott has had influence, to wit:
- Textfiles.com - a vast archive of text files from yesteryear. There's some great stuff in there!
- Archiveteam.org - a group into archiving stuff before it evaporates. They archived Geocities.
- And some other stuff: two documentaries, Cow.net, and Welcometointernet.org.
Scott seems like a guy after my own tastes. We shall watch his career with great interest.
Tuesday, September 6, 2011
Nephtali: functional PHP Web framework
Interesting. I took some time to read through things and get my head around it - I think its abstraction is a little too low-level for my taste, but it's a fascinating project.
Free Rails 3.0 tutorial
Here, on Github. I like this trend towards githubbing of documentation like this.
NLP parsing tutorial
A short tutorial on the use of some tools for NLP. [pdf] I really want to start developing a toolbox for NLP but need some direct motivation to do so. (Although I suspect the motivation would crop up as soon as I had a toolbox I could trust.)
ClueWeb09 - NLP database
The ClueWeb09 database is a set of a billion Web pages harvested in 10 languages in 2009 for research in natural language programming. You can self-host if you buy a 2T hard drive for $600, or you can use their APIs to do your research.
The license essentially requires you to not to republish content, and to delete anything anybody asks them to delete from the repository, both of which make eminent sense.
Self-education in Web design
Good, if short, article on where to start with learning Web design. It's always nice to find this kind of summary article.
Patterns For Large-Scale JavaScript Application Architecture
Well-thought-out article about architectural choices in large JS applications. Easily extended to any app, of course - all-around good writing.
Subscribe to:
Posts (Atom)