Thursday, January 29, 2015

Structured proofs

I know, I know, I'm never going to be a mathematician, but here is an article about a talk by Leslie Lamport about how 21st century proofs should be written. And here is Lamport's own paper on the topic. I'm totally with him on the structure - I think a hierarchical proof structure is a great idea, and a much more careful and explicit structure is also much more understandable. But when he goes to defining a formal language TLA+ for expressing proofs, I have to say that the sheer ugliness of the format is an instant dealbreaker for me. No way. That is not the elegant way to express proofs.

Further thought is necessary. But I'll bet I can come up with a prettier language.

How Patrick McKenzie uses Twilio

Short and sweet - a boon to the international traveler.

Tuesday, January 27, 2015

Beautiful JS chessboard library

This is how everything should be handled.

Thursday, January 22, 2015

Top Python mistakes when dealing with Big Data

Interesting article.
  1. Reinventing the wheel. For example, writing one-off code to read a CSV instead of using a convenient purpose-built library that could offer deeper functionality. (Python Pandas, to be specific, in this case - interesting stuff, actually!)
  2. Failing to tune for performance. Cuts down on testing cycles per day.
  3. Failing to understand time and timezones. Ain't that the truth.
  4. Manual integration of different technologies in a solution (copying results files back and forth by hand, etc.)
  5. Not keeping track of data types and schemata.
  6. Failing to include data provenance tracking. Oooh, I like this notion.
  7. No testing, especially no regression testing.
All good points.

Identifying programmers

Another application of NLP techniques to source code. Interesting.

Sunday, January 18, 2015

Friday, January 16, 2015

Torch

Torch is a Lua-powered numerical environment that I've never heard of, but that looks incredibly neat. I only mention this because Facebook released some open-source deep-learning modules for Torch today.

Rosetta Code analysis

Neat paper analyzing programming languages based on their Rosetta Code submissions. Comparative programming linguistics, I guess.

Cquence for really simple JS animation

Cquence looks useful.

Tuesday, January 13, 2015

What happened to Old School NLP?

Answer: statistical methods work OK and they're way easier. 'Struth!

Parsley

Parsley is a data extractor for HTML.

Color

I don't know why this kind of database always interests me.

Sunday, January 11, 2015

Flow-Based Programming

Flow-based programming is not a particularly new idea, but it's getting some more attention lately.

Exegesis, literate programming, and Decl

The basic outline of Decl's new syntax parser is complete and passing tests, and as always when a milestone is reached and I look around the corner at what's next, I'm a little overwhelmed. It makes me philosophical.

My initial Marpa NLP article was actually a very simple exegetical analysis of a prototype script I wrote myself, and as usual when doing a first stab, I ended up writing a lot of special-case code and syntax to handle things, representing a technical debt that in such situations nearly always strangles its host. Fortunately, I iterated quickly this time, taking the insights from that and putting them into a new Decl-based plan.

Slowly, I'm feeling my way towards a Decl-based system for literate-style transformational exegesis, one that I hope will eventually encompass everything this blog has been about.

The advantage to basing everything on Decl is that parsing is done. I can now parse a very rich, informationally dense data structure that is designed right from the start to group things in more or less the same way natural language does. It's easily extended and configured, easily indexed - in short, it's a way of taking notes about program structure and using them to evolve a program.

So that's where I'm going. Very slowly.

Sunday, January 4, 2015

Some thoughts on decisions and other things in workflow

So the general areas of functionality of the new wftk (pretty much the same as the old wftk) are: data, actions, events, decisions, agents, workflow, notifications, and schedules. That doesn't match up all that well with the rough functionality outline taken from the chapter headings of the old book, but these are kind of the things that make up business processes.

The data organizer is a nice piece of functionality that breaks off conveniently, defining and naming data in terms of stores, indexes, and documents. My declarative language Decl is taking shape as a document-based language (its syntax and semantics are governed by the metadata of the document in question), and so it's becoming clear to me that a business action is also a document-based kind of thing, which can be expressed in Decl, use various resources, consume input documents, and create a result document (as enactment) and a series of attached output documents.

Events are essentially input queues. From a technical standpoint they're not terribly interesting, but events drive the machinery of a process. An event source pushes flow.

Which brings us to decisions.

A decision in a workflow context can be as simple as a logical test of variables, or as complex as an entire subprocess that involves not only calculation but even the execution of experiments that cost money and the input of human decision-makers. Decisions are arguably one of the most important aspects of a business. They need to be given careful thought.

There are a number of different ways to present decision structures, including simply as logical combinatorics, tables, and flowcharts, and they can be learned as decision trees (categorization is a generalization of decision - a non-binary decision, effectively).

I don't think they're going to rise to the level of being a separate module like the data organizer, but I think it's still valid to put decisions into a separate chapter of discourse, as it were.

Saturday, January 3, 2015

Generative ebook covers

A fun article on generative graphics.

PyScribe

PyScribe is a kind of neat debugging logger for Python. I've always been partial to printf-style debugging myself, so I like the look of this.

Friday, January 2, 2015

How vulnerable are Quora posts to writing style analysis?

Verdict: pretty vulnerable. The tools used are of interest.

OCCRP

Organized Crime and Corruption Reporting Project (OCCRP) is a data-journalism kind of thing - mostly journalism, but with some data aspects I find interesting. They seem to have collaborated on an investigative tool, the Investigative Dashboard, which does seem pretty data-oriented.