Saturday, June 25, 2011

Graph databases

An overview. A really, really detailed overview.

Resource list

Quoted without comment.

MapReduce summary, with example

Neat!

Things CS people should do

Before graduating. Or after. Or automate them.

Echoprint

An open-source music recognizer. Odd name, though.

Intelligent content

So my cousin Erin has written a book, and it's about content, and I had to read the introduction (and may actually (gasp) purchase the book itself, yes, I know, it's a frightening concept) and then I read the comments on the introduction, and Erin's response to one of them, where she talks about intelligent content, which I take to be some sort of adaptive system to generate or modify content based on user input.

Well, Gentle Reader, if indeed you exist and you've read anything at all I've written, you know that topic is guaranteed to make me sit up and drool. So I have a research goal, I guess: find these people of whom she speaks, and make them give up their secrets.

I want to take a short moment here and say just how profoundly strange it is for someone whose birth I remember to have exceeded my own accomplishments in the field I (tangentially) ended up in. Although in Erin's case, it's OK. She was always just about my favorite cousin.

Update: Joe Gollner; a slideshow. I find it ... corporate. But intriguing. And a post about Ann Rockley.

Friday, June 24, 2011

Best practices

You know what's damned hard about software? It's that best practices change over time. There has to be a way to track best-practice answers to specific questions, and come up with a warning of some sort when your old design assumptions for a given thing turn bad.

Case in point: Instapaper had a server confiscated by the FBI by mistake (probably) and posted about it in public. The community notified Marco that SHA-1 hashes of passwords are no longer considered secure; bcrypt or scrypt is the best practice today.

So ... I'm having troubles really envisioning how exactly this would work, but ... the design of a given software system has dozens of answers to specific questions of this nature, where an algorithm or a library is selected to meet a need. As time goes on, it should be possible to know when there is an incipient risk, and ideally the programming system should just reprogram the application to use the updated solution.

How do you get there from here? I dunno.

Sunday, June 19, 2011

Cheap and Nasty

Interesting article about two protocol patterns, Cheap and Nasty. tl;dr: don't try to use the same tool on every problem.

Smooth CoffeeScript

Book. Looks good.

More thoughts on tasks

So here I am, doing some sysadmin stuff for Techspex, and thinking about how really, a task is the semantic unit of action. That is to say, when I think of things I have to do to get something done (e.g. install Wordpress - this requires upgrade of MySQL, and that requires a dump to be done, etc.) each of those verbs denotes a task.

The definition of those tasks at the human level might include snippets of shell code to execute the commands required, it might refer to documentation pages, and so on. All those things involve the semantic environment that a human requires to make sense of the actions being done and to be sure that they're reasonably correct.

That's really the essence of a semantic approach. How can I get from a high-level description of a set of tasks to be performed to the specific code required to perform them? That's what programming is, of course. That's where I need to go.

Another consideration: there are certain short lists and items of data that describe a given sysadmin environment - host names, IP addresses, directories, what have you. If these are assigned string variable names, you haven't gained anything; you still have to remember those naming strings. Instead, you need some kind of semantic note-taking structure that can store information of that nature in such a way that it can be retrieved in a purpose-oriented manner.

And that ties back into Code Bubbles, really: the point of that IDE is to arrange a working set of information being used to address a given ... task. See? See how this all makes sense?

Going back even further into my past, I need to resurrect my notion of the semantic database or Lexicon. That's where items of this nature would be grouped. A given context might be "sysadmin work for Techspex". That would be a subcontext of "sysadmin work", and that supercontext would provide useful things to know about any sysadmin environment, such as the hostname, etc. (This could be a checklist of things to discover about a new environment, say.)

But the point there is that the information in that context would be indexed with things like what a hostname is, how it can be determined, how to choose one for a new machine, I don't know - all the things that represent what a system administrator knows. A semantic domain indeed - far more semantically oriented than the Decl domains I keep proposing right and left. Eventually those Decl domains will grow into this concept, but that's still a ways off.

But system administration is a domain where it may make sense to explore it. If only I had more of it to do. (Except that's a great way to lose sleep and hair.)

Saturday, June 18, 2011

Redefining JS Array as security flaw

This is weird and cool - an article about how to avoid cross-site scripting security issues when returning a JSON object. The security flaw is the unobvious fact that in Javascript, even the Array constructor is a first-order object, thus permits redefinition.

Freaky! I like Javascript - very neat language.

Friday, June 17, 2011

Task afterthought

It occurs to me that (at least as I think of coding) the task is a natural grouping for chunks of code in a particular, say, subroutine. So the task is a semantic element at a very basic level indeed.

Workflow as core semantics

You know what the unit of workflow is? It's the task. And you know what the natural grouping of tasks is? The checklist.

Build those two things into the language, and I think that's the only really basic support you need for workflow. I suspect (I haven't taken the time to think this through) that all other workflow structures can be derived from those two. For example, a sequence (as opposed to the parallel nature of a pure checklist) can be expressed as a checklist in which the completion of each task's predecessor is a prerequisite for its start.

So what's a task? It's a macro action consisting of:
  • A set of actions to carry out (this really is a sequence)
    • Plain code
    • Subtasks in a subchecklist
  • A set of prerequisites, or pre-existing conditions
    • Completion of other tasks
    • Resource requirements
    • Assertions about input data
  • A set of post-facto assertions, or expectations
    • Expected outcomes of the task
In addition, we might want to declare some of the data (files, etc.) the task is going to work on, and so forth, but that's kind of inherent in the declarative style.

A checklist can persist beyond the technical process running the workflow, and that's really the essential component that makes workflow workflow - but even without the persistence, the checklist is a useful design component. The order in which tasks execute in a checklist is undetermined; the checklist is only complete when all its tasks are complete. The post-facto assertions are used to determine completeness - always.

If an assertion fails, this is an exception. There may be exception handlers, etc. - but if not, the entire checklist hangs (persistently) until the exceptions are dealt with at the human level.

An example is the 1694 LUZ project I've been spending time with lately. Here, the issue is the translation of a few thousand documents of various formats in a complex directory structure. After translation, each file must be cleaned, and there are a multitude of ways in which this cleaning step can fail. As things stand, I have no good exception mechanism; the result is a laborious process of making sure I haven't lost my place when fixing individual files.

A persistent checklist would already be able to handle that situation, and as I say, the non-persistent checklist (a sort of "parallel loop") would handle similar things inside a single technical process.

Task dependency within a checklist is an additional organizational layer on top of this, and really has little to do with the underlying checklist-and-task structure. Similarly, other types of control flow can be modeled with items that can change dependencies, introducing dependencies on local variable values, and so on. Conditionals can be modeled using post-facto assertions that bypass the entire execution of a branch (i.e. that something is complete before it starts). Loops can be modeled by adding tasks to a checklist dynamically while it's still running. For performance, the checklist should really be a queue (minus the presumption of order) - completed tasks are simply removed once complete.

Add logging to a checklist and you've got a good history mechanism. Again, persistence makes a true log of this.

A checklist should include the concept of multiple actor roles (=task queues); the system is one, but even the system should have a list of outstanding tasks in a given checklist. It's a simple extension to add that list of outstanding tasks in an index over a given class of active (persistent) checklists.

I'm pretty sure that basically covers the entire set of workflow functionality. The wftk had some other mechanisms that are good (notification, delegation, etc.) but they're essentially extraneous to the core workflow engine. That core - checklists and tasks - needs to be inherent in the Decl core semantics. It's just too useful not to include it.

Thursday, June 16, 2011

Advice from an Old Programmer

Zed Shaw - always a way with words.

Something I don't think I blogged at the time

CodeBubbles. I don't even have time to pontificate on it - tl;dr is that it's kind of a mental snapshot of coding for a given issue, and a whole new approach to the IDE. I love it.

Wednesday, June 15, 2011

Kratko.js Refactoring Javascript

Refactoring should probably be part of the coding domain. Or something. Here's a brief article about a Javascript refactoring library. Or something.

My point: refactoring is the kind of reasoning about software that a semantic programming system should (somehow) support. This point is still somewhat vague in my mind, as is doubtlessly obvious.

ORM is a dangerous anti-pattern

Oh, oh, oh, this is an article that speaks to my heart! [HNN] Takeaways:
  • Not every object should be in a relational database
  • SQL (RDMBSes in general) answer questions; thus SQL doesn't necessarily map to an object definition. This is so damned important; I need to rework some of my database stuff just on this insight alone.
  • The practice of deriving all SQL from an object model is pernicious: "They'll get you up and running quickly, but you'll be running in the wrong direction."
  • Grouping SQL into one place is a good idea, but in the sense that you are defining an API consisting of answers to questions you can ask your database. I can't say how clarifying that is!
Interestingly, Googling ORM turned up not only object-relational mapping, which the article is about, but also object role mapping, an interesting approach indeed that I want to think about in more detail.

Starbucks coffee language

A very interesting article about the language of coffee at Starbucks and how it enhances customer experience and loyalty. And here's the blog [post] that led me to it - probably that blog has more interesting things in store.

AXR: the Web done right

The AXR project is an interesting rethinking of Web presentation languages, with content in XML and style in "HSS", a hierarchical stylesheet language based on CSS that ends up basically doing the same things as less-css. Interesting reading, though still pretty young. Hmm. This might just be a takeoff on less-css anyway, as it even reuses the phrase "done right". Still - hierarchical style organization seems to be a Good Idea.

Talk on mining the Wikileaks stash

I don't have the time to sit through it at the moment, but it looks interesting. [here]

An interesting dialog post on workflow vs. state machines

Read it again. (Specifically involves a Ruby library, Ruote.)

Target domain: machine learning

OK, OK, I think I've already highlighted this as a target domain, but ... there have been a lot of new textbooks [here and more or less here] and other information [here on decision trees, whole blog is interesting] posted recently and frankly it would be nice to work through one or more of them and Do Things Right.

So: target domain, machine learning.

Code snippet database

This is a Wiki with minimal code snippets for basic tasks in various languages. I like this. I don't like its being a Wiki all that much, though - wouldn't a real database be interesting? That deserves some thought.

(Update 2013-04-18: it's still maintained and weeded, but seems otherwise moribund; nothing but minor changes in the last month.)

Monday, June 13, 2011

State machines

An interesting little article about state machines and useful patterns that build on them.

Sunday, June 12, 2011

NLP in Prolog in IBM's Watson

Here. Drooling.

Python idioms

Here's a nice rundown of some Pythonic idioms. The interesting thing about idioms between languages is that they syntactically encode the same (or similar, or sometimes congruent) mental/semantic structures about what the code is supposed to do.

That deserves thought.

Saturday, June 11, 2011

Another parser toolkit

This one in Javascript: Language.js.

Target application: mail (and archive)

Same thing applies to mail. Unison really doesn't do well with mbox-formatted mail, for the obvious reason: Unison works with files. I need a way to categorize mail that synchs between my different machines. And along the way, I need a way to search mail that presents an SQL API. And a way better means of accessing mail from Perl.

On top of that, a client - eventually. But at least I should be able to define some mboxes and work from there. Synching between mbox sets should be easy.

Mail needs to be categorizable with keywords (not just single folders) and honestly, the keywords should be structured as well, so I don't always need to see every job number in the world when categorizing things.

Archival into longer-term storage would be as keyword-specific mboxes. But short-term indexing needs to happen in, say, SQLite.

The client doesn't need to be very impressive; really, a very simple set of functionality should just expose Perl modules dealing with mail. From my research, Mail::Box/Mailtools is kind of the usual solution, but is probably overweight. Reviews mention MIME::Lite and Net::SMTP, but obviously I need to sit down and think about it a bit.

This would entail a Mail::Declarative module. Sorely needed.

Target application: photo archive

I know photo archives have been done to death, but I need one, and I might as well do it in Decl, right? I've got directory and file support now, so defining an archive with special properties should be a cinch.

Here's my problem: I take a lot of pictures. Well, that's not the problem - the real problem is that I do it while traveling, so the pictures end up on my laptop or my desktop. And some of them my wife wants on her laptop for the screen saver, etc. Then there's the fact I'd like to share some out to Flickr, print them at Meijers by way of Snapfish, and so on.

As you know, I love Unison for file synchronization, but it's not quite fine-grained enough - I can't keep a subset of an archive somewhere, because Unison doesn't really have the concept of different machines, just "local" and "remote" for each pair of machines at a time. (Which begs the question of file synchronization between specific sets of machines - but I'm not going there yet.) (Yet.)

So what I really want is a synchronized index, and a central storage location by means of which files can be retrieved in bulk based on queries.

That sounds pretty refined. I think it might require an API. In other words, it's perfect for a semantic programming experiment. So: target application.

Wednesday, June 8, 2011

Normalize.css

Another CSS framework.

Tuesday, June 7, 2011

Monday, June 6, 2011

io: a simple, compact, Smalltalk-like language

You have to think that modeling the semantic structures of oddball programming languages is one thing I should be doing.... I think it's still somewhat beyond my grasp.

Sunday, June 5, 2011

Pandoc: universal rich-text converter

Written in Haskell, no less. [about]

Pointers for REST API design

Nice little article here.

Music: VexFlow and the author's blog

VexFlow [library, blog] is a cool-looking library for typesetting music and tablature, now also with music-theory functions, making it even cooler.

Algorithms for massive data sets

Princeton class. Nice overview.

Saturday, June 4, 2011

CSS grammar

A helpful recursive descent parser definition for vanilla CSS. (Not Less-CSS.) So I could either just use the CSS module or roll my own. Honestly: rolling my own is attractive, but ... so is finishing things. I wouldn't think of writing my own HTML parser, after all. (I'm crazy, but not that crazy.)

Of course, I'm also not proposing switching back and forth between text and Decl in HTML definitions, as I am with CSS. So ....

Iswim

Iswim (I see what you mean) was an experimental programming language in 1965 [pdf, scribd] - these historical languages are always so educational.

HNN post on freelance jobbing

Here. Extract those links as grist for the mill.

Image Analysis blog

Fascinating blog.

SetupBot

The SetupBot is a set of automagical scripts that install Wordpress for you. A good starting point for considering deployment in general.

PHP examples and tutorials

Just a little browsing, and voilà!

CSS by example? Elements of design?

I'm now looking more closely at CSS (after saying I could get away with ignoring it). Building on the basic insights of LessCSS and OOCSS, I'd like to go through a bunch of CSS examples and ... do whatever comes naturally.

For example, here is a list of CSS examples. Or I suppose I could have looked at StackOverflow to start with. Another list of tutorials. Really, there's a lot written about Web design. Go figure.

Thursday, June 2, 2011

Semantic markup

Google, Microsoft, and Yahoo support schema.org: a central repository/specification for semantic markup. Yet another thing to master! (This is a good thing.)

PHPOpen.net

PHPOpen.net is a directory for open-source PHP Web apps. I've selected PHP Agenda as a candidate for understanding how PHP works. (Because it's small.) There's no way I can even pretend to be able to hit a June 15 launch for Depatenting, but by God I'm making progress.

Essentially, there are five languages involved in a modern LAMP/WAMP application: HTML for presentation, CSS for style, SQL for data storage and retrieval, Javascript for local actions, and PHP for server actions. So far I have decent representation (although not yet complete) for two of those languages. Honestly, I could get away with ignoring CSS for now. But that still leaves two languages I haven't really mastered.

Until I do, I'm not ready to move.

Wednesday, June 1, 2011

More neat Javascript tricks

JQuery WormHole. An intuitive way to drag things between containers. This is the kind of thing that could so easily be a building block in an advanced UI description language.

Speaking of which, I'm making loads of progress on HTML::Declarative in the context of WWW::Publisher. Nearly to the point where I can see where that advanced UI description language might come from.

Unicode

A fantastic, fantastic tchrist comment on StackOverflow about UTF-8 and Unicode handling in Perl. Honestly, this should be read repeatedly.

Before I get too much further with Decl, I really need to sit down and think hard about text encodings. I mean, I treat text as a separate datatype already, essentially - it should be fully Unicode-aware. Which is not easy.

Article: Don't Design like a Programmer

Some excellent UI design tips that highlight the differences in thought between the UI designer and the programmer. tl;dr: think of UI in terms of the tasks your user is performing, not in terms of the underlying data structures you've built to support those tasks.

Formly: Javascript form pretty-izer

This is cool. More to the point for me, it would be convenient to mine it for good form design macros.