Monday, May 30, 2011
HTML overview
Just because I'm finally doing HTML::Declarative, here is a useful link to an HTML overview that I'm going to use to build a database of valid attributes and tags.
Unix tools
Man, this is a domain rife for semantic clarification if ever there were one. It'd be good for me just to write it, actually, whether it ever got used or not.
Error handling
Oh, man, my Achilles heel is error handling. So it was with great interest that I read this article about how (not) to do it. It kind of ends up being a long-format advert for Erlang, but that's OK; non-mutable variables are a good way of dealing with exceptions.
What I'd really like to think harder about is to address error/exception handling at the semantic level. Somewhere in there, errors are workflow. They might be really simple workflow, but at least potentially they're a matter of making a bookmark to a record that tripped us up, and doing something about it later.
The notion of saving state is also interesting, as is the concept of transactions as a method of keeping changes in bundles.
At some point, I'd really like to get serious about a uniform way of talking about errors.
Huh. Looks like I was already thinking about this in 2009. Well - to be fair, it was already bothering me in 1989.
Project management with Task Warrior
Task Warrior is a command-line tool for task management. It's really quite extensive. A project management domain could interface with it, say.
Domain: crowdsourcing strategies
Here's an interesting paper on the efficacy of a number of different crowdsourcing strategies (using Mechanical Turk). Which leads to the concept of "programming for crowds". If I can program a company, then I can surely program a crowdsourcing application, right?
Think about it.
Saturday, May 28, 2011
Pastebin harvesting
Here's a fascinating idea. Pastebin has a public section. Concerned about abuse, one Internet citizen scraped it to see what was there - there were lots of .... things there of dubious ethics. Let's say.
Why not scrape it daily and analyze what's there? This seems kind of interesting to me, and I'm not even entirely sure why.
NLP Christmas in May
I found out that not only is the journal Computational Linguistics now free online - but a lot of journals in computational linguistics are free online (archive here). All I need to do now is build a semantic bot to analyze them...
Rosetta Code
So there's this site Rosetta Code that shows snippets of how to do Task X in lots of different languages. I find that pretty fascinating from the semantic-programming standpoint; in a certain sense, each set of solutions encodes the same thing, the same meaning.
The link goes to "open a window"; Perl has five ways of doing it, depending on the GUI framework you're using, and I'd like to explore that parallelism in Decl.
Wednesday, May 18, 2011
COLM: COmputer Language Manipulator
Tuesday, May 17, 2011
API Marketplace
O API Marketplace, where art thou? Very thought-provoking article about APIs and clearinghouses for them, along with a maybe somewhat dubious business model he wishes somebody else had.
PHP best practice for "require"
A good benchmarking article on different ways to include PHP files.
The concept of "compiling to PHP" really starts to make more sense given these different alternatives. Compiling to anything, really, is what I want to do.
DIY node.js server on EC2
Here is a really nice blow-by-blow set of instructions for setting up a good node.js server on EC2. Recall that I also wanted to do more or less the same thing for VirtualBox setups. Clearly, what's needed is a semantic description of operating environments, maybe a Deployment::Decl or something. Anyway, bear in mind. Node.js needs more investigation anyway.
Monday, May 16, 2011
MISC: a map-based, vaguely LISPish language
This is cool. MISC runs on Javascript and is a map-based, lazy-evaluated language with a lot of the features of LISP. Worth the read!
Sunday, May 15, 2011
Telescopic text
So look here: telescopictext.com. This is a simple story: "I made tea." Click on highlighted words for more detail. A lot more detail. And yet the overall narrative remains that the author made tea. [More detail at telescopictext.org, including a toolset!]
This is more or less what I'm saying about semantic programming. Looking at the high-level specification for a particular action, we see "Make tea". When we consider that action in more detail, we resolve further specifications that were previously invisible. Like varying f-stops that the eye cannot perceive, the mind elides these vast gulfs of detail in order to make sense of the world.
This process is nearly imperceptible to us. As programmers, we're familiar with it - it's the reason for Hoftstader's Law, if nothing else - but programming languages don't take it into consideration (except insofar as some are higher-level, and of course LISP can be made to do some of this, with its fancy macro system to hide cruft at will).
Maybe "telescopic code"?
Miscellaneous links
Down to the last three open in my browser, but categorization fatigue has overwhelmed me.
- CloudMade has a neat Javascript API for online maps.
- Stanford has Protovis, a visualization library for statistical things.
- ArsTechnica reports on a Georgia Tech project Kermit, to permit ease of management for home networks (throttling bandwidth, etc.) My router is hopelessly inadequate for these tasks, but on the plus side it was cheap. I'm thinking of loading Unix on a router and doing things right. Real Soon Now.
Data modeling
So here's a thoughtful post about why relational databases aren't always the right thing to do. TL;DR - Oracle's standard order management model has 126 tables, and surely that's sometimes overkill.
This is exactly what I mean with the notion of semantic programming. Semantic structures have sliding levels of detail, and the actual number of tables or location of things in RDBMS or NoSQL or whatever should have no effect whatsoever on the actual semantics of the problem domain.
So: food for thought. It's a good article.
A new approach to comments
Target domain: project management
OK, OK, this is another clogged space. Still - I'd like to look closely at the semantics. Unfuddle.com is yet another target app. They make money, so ... I could, too, presumably. Why not? The key is to be able to iterate faster than other humans, and making your specific software a fungible expression of a fixed semantic domain is definitely the way to go.
Patterns/boilerplate/idioms linkdump
- Some LISP idioms for various tasks. It's the meta-level thinking I'm interested in here.
- How to write your own native Node.js extension: some boilerplate stuff.
- HTML5 canvas cheat sheet. Not really boilerplate, but I've been cleaning up my browser all day and I'm suffering from categorization fatigue.
- A neat-o classification of HTTP APIs. This deserves more serious thought.
Design linkdump
You can always tell when my browser gets so full I can't see the icons in the tab headers.
- 39 CSS3 box shadow tricks
- Umm... I thought I had more than that open for design. Apparently not.
Target application: wireframe / chart / whatever
This space is beyond full, of course. Here are a couple more:
- LucidChart (charts)
- Pidoco (wireframe prototypes)
Natural language linkdump
Some interesting things coming up in natural language this month.
- Text-processing.com has some online NLTK demos and offers NLP as an API! Cool stuff!
- Article content extraction (once the exclusive domain of Readability and its ports) is now getting a little more coverage: the Goose library (Java, sadly).
- A paper on semantic analysis. I need more sleep even to read the title, apparently.
- Tag extraction from text. !!! The Apresta tagger library.
Algorithmic trading: field studies
Ha. Still a topic dear to my heart, although I think I'm too late to the table.
Webapp theme boilerplate
I'm really getting into the notion of boilerplate. Here's an article presenting a couple of Webapp themes.
Angry Birds on Chrome + evolution = WIN!!
So Angry Birds is now available in a Javascript port on Chrome. I have yet to really study it beyond, you know, playing it for a couple of days to see what all the fuss is about (and yeah, it's a pretty addictive game!) So I had this cool idea, as one does.
Back when tower defense games were all the rage, I spent a little time developing some tools to play tower defense for me. It was more challenging than it sounds - but only because TD games are all in Flash, and Flash has no machine-accessible output or state beyond its actual screen output. Screen capture is slow. So actually responding to screen output is essentially impossible. (Not to mention the shocking dearth of easy-to-use OCR libraries, which seems still not to have been resolved - and it's 2011!)
Anyway, the description of a level is presumably in a nice little string. That string could be evolved with a GA - making new levels that people could play in a Web2.0 fashion, providing grist for the evolution! But that would require a lot of people.
So why not evolve playing strategies as well? This would consist of a list of pullback coordinates and delays. The ease with which a given population could evolve a playing strategy would allow the calculation of a "playability metric" - and that in turn would be an evolutionary metric for the new levels.
So by harnessing two levels of evolution, you could (maybe) generate an arbitrary number of entertaining Angry Birds levels.
Cool, huh?
Image processing
After many, many years, my sister and I have convinced my mother that all those decades of old photographs need to be scanned and curated. My sister being a CPA, she has a nice document-feeding scanner. So I ran a few test runs, and it turns out that at 600 pixel full-color resolution, any dust at all inside the scanner leaves vertical stripes on the scanned image that are ... not really monochromatic, but sort of a transparency of a monochromatic stripe.
Naturally, I figure there must be software out there to help me remove those stripes. My best strategy so far is to scan each picture twice, once upside down, so that the stripes will be in different places - then do something to recognize the stripes and eliminate them using the corresponding places on the other image.
So far this is a Hard Problem. Here is a link dump of some of the things I've run across while researching it:
- The CImg library - and here I thought ImageMagick was all there was!
- The hdrprep script - a Perl script to manipulate imagesets prior to stitching them together to average out their light levels (HDR = High Dynamic Range, a very neat technique)
- ALE, which is a tool that works magic on images of such refined scope that I can't even truly understand the explanatory blurb - except that I know what registration is, and it's Good Stuff. This only runs on Linux, as a command-line utility, so it's going to take some actual effort to use it because I'm lazy and my Linux box is downstairs and thus requires ssh to hit.
Friday, May 13, 2011
Semantic representations
Here's a little article musing about HTML escaping: the tl;dr is that you need to know your application before really being able to do a perfect job. This being not so far from the problem with i18n and l10n of resources, it reminded me of an idea I had, which is to produce a generic library for the higher-level manipulation and preparation of program output.
It's kind of in a sketchy state, but - the generation of natural language being much easier than its parsing, it might be a good place to start with more flexible output design.
MediaWiki parsing
... is apparently a hard nut to crack. I find this a little surprising. Here's an interesting article about a rigorous parsing approach (Dropbox backup) and the HNN thread.
Here's an example parse.
Some excellent design patterns for signups and logins
Smashing Magazine has a truly wonderful article musing about best practices for signups and logins. Read it.
Subscribe to:
Posts (Atom)