Semantic programming: 2015

Friday, December 18, 2015

Meta post

So I've been busy. Actually, the database says I've been insanely busy with paying work this year, by far my record year since I've been doing translation. As the year winds down and we move into January, I hope to get back to coding in earnest.

I've been looking at Stockfighter as a testbed for some of these Decl notions - in the sense that doing programming forces me to evaluate directions in the tools. For example, right off the bat I want to write a Perl API wrapper for convenience, and I'm doing that in a (quasi) literate manner using Decl syntax. The syntax stuff is pretty solid already; nodal access is still kind of primitive but it's just this kind of task that helps improve things like that. Most of what I'm doing can roll back into the Decl core as a whole, because it's all stuff like mapping and text generation.

I'm looking at TXL again to get a mental handle on tree transformations, and thinking of working through its tutorial examples, then translating them into a Decl context. TXL works from general languages to generate its internal syntax trees, and of course Decl basically starts from the syntax tree already, but the problem space that TXL was designed to work with is one of the things I think Decl should address.

Editing has also been on my mind again; I'm thinking of adapting Wx::StyledText to work with Decl in a native manner. It would be quite useful for debugging to be able to pop up a Decl text in a window any time I liked, especially once mapping starts to work.

I think I have a good handle on what needs to go into the Decl core, and a decent presentation framework for it as a book/tutorial/cookbook series. That seems pretty tasty. Although now I think maps are going to have to go in there, and they weren't yet (even though I was actually using them in the book already - as a part of literate programming, so maybe this is just added detail in that existing chapter).

I'm very close to being able to do a filesystem domain, and once I've got a handle on that, project management will be nearly trivial. And that means I can get back into NLP for real, because I'll have the framework in place to use it in the day-to-day in meaningful ways. That should pay off pretty well.

So that's kind of the rough situation here at the end of 2015. I've done some pretty solid work this year, even with the sad necessity to earn all that money. And I really think 2016 is where things are going to start to heterodyne.

Update August 2016: So instead we decided to leave Budapest and return to Puerto Rico by way of Bloomington. The result: no heterodynage. Maybe now that things are settling down and we have a new house in Puerto Rico. Of course, the coffee project is now starting to heat up, and that's a radically different kind of work.

Codebase as living organism

Here's an interesting article with a fascinating new metaphor, really two insights: (1) the machine is not the codebase and (2) the codebase can be seen as a living thing. Worth a read.

Friday, November 13, 2015

The next C

So people are actually to the point now where the replacement of C as the ubiquitous system programming language is at least within the limits of plausibility. Here's a Quora post from one of the originators of D identifying three candidates: D, Rust, and Go. Each has strengths, each has weaknesses. I suspect we'll see some convergent evolution.

I like this proliferation of languages of late. Shows the diversity of thought.

Joke generation with Wolfram

Interesting. Not all that fascinating, but any kind of computational handling of language is pretty cool.

Parinfer and typing/understanding Lisp

Now this is the kind of thing I love to see: Parinfer [hnn]. It uses indentation to show structure while inferring paren closure to preserve syntactic validity. You can switch back and forth between inferring parens or inferring indentation. It makes sense. This kind of thing is really exciting to see.

Sunday, November 8, 2015

ChucK

ChucK is a ... well, it's a language just for building music machines, apparently. Also, HNN link for lots of related things, like SuperCollider. I really want to take some time to do this stuff. Someday.

Git book

This might be a good thing to study.

OOSMOS

OOSMOS is an object-oriented threadless concurrency ... thing for C/C++ that can also run in firmware on the Arduino, which is pretty fascinating.

Saturday, November 7, 2015

Geometrical figures

Just a little note for myself here, but Geogebra is probably one of the best ways to manipulate geometrical things right now. We did some diagrams in InkScape for a LaTeX paper, but it really wasn't ideal.

That said, though, PGF and TikZ seem to be the best-practice approach in the LaTeX field (not a graphics editor, but a description language).

Wednesday, November 4, 2015

Deep linguistic learning for email responses

This has been showing up in the usual places (even on my Facebook feed, today) and is good for some toothgrinding as I fervently wish I had more time to work like this.

Tuesday, November 3, 2015

Parsing a command line

You would think this would be easy to find. I've got a string - perhaps taken from a file - that represents a command. I want to parse it into an ARGV-like structure, with proper string quoting and that kind of thing.

Here's a way to do it in C#: CommandLineParser.
Or you can get Windows to do it for you, from Perl.
Oh, right. Text::ParseWords. I knew I already knew the answer to this.

That said, I should probably roll my own based on the same parsing tools in Decl::Syntax.

Friday, October 30, 2015

Deep neural network classifies selfies

This is just a cool, well-written article.

rr

So this is cool. The debugger "rr" approaches things from a different perspective from most debuggers. Instead of instrumenting a live, running program, it makes a thorough recording of a full run, which you can then run back and forth through with a debugger. Sounds resource-intensive, but also pretty fantastic - like the author, I hardly ever use a debugger, but even my few times using one have been characterized by shooting past the point where things actually started to go wrong.

FizzBuzz as the illumination of thought about requirements

It's interesting how FizzBuzz has kind of become the go-to program for thinking about software development as a whole. Here's an interesting article from the standpoint of maintainability and understanding the requirements behind the requirements.

Monday, October 26, 2015

Declarative description of financial contracts

This is something moving rather close to a semantic approach. Very interesting.

devd ubersimple command-line HTTP server

Quick testing of web apps.

openCypher

SQL for graph databases, only ... it's still basically vapor. Looks promising, though.

Update 2016-02-18: still unchanged. I'm not holding my breath on this one.

Fuzzing Raft

I'm starting to really enjoy posts about fuzz testing.

Cracking PDF DRM with Clojure

Fun little post.

Monday, October 19, 2015

Machine learning courses again

I may live to regret this (like I did in 2012), but I'm taking another stab at the MOOG machine learning courses. Well, the Stanford one at Coursera, anyway, but maybe I'll take another run at the Caltech one as well, after I'm done.

Some links, to keep them in a convenient place. Some are useful for actually doing the classes, some for understanding the math, and some ideas for what to do after the course, in no particular order.

Main Coursera page for the course
A decent presentation of the math behind least squares
Partial derivative in gradient descent
Coursera Wiki
Deep Learning Tutorial at Stanford [hnn]
Caltech's Learning from data course

So. Machine learning.

Update 2016-01-09: Unsurprisingly, I didn't have time for the Coursera run. Did break my word count record for 2015, though.

Model-View-Whatever

A pretty fascinating look at the philosophies (dare I say the semantics?) behind MVC, MVVM, MVP, Model-View-Whatever architectures. Thought-provoking.

Thursday, October 15, 2015

Visualization of pathfinding algorithms

Cool!

Refactoring Clojure

A thoughtful article about refactoring Clojure. Refactoring, of course, being exactly what semantic programming is about, ultimately.

Gantt charts and project management

Oh, look: an open-source Gantt chart editor.

Sunday, October 11, 2015

Scaling

Quora often has very interesting posts.

Static blog weavers

Ran across this list of static blog builders provided by a static-page hosting company. One in particular, blogc, stood out as it's a command-line ANSI C tool. That's kinda neat.

As usual, this kind of thing just screams "semantic domain" to me and begs analysis from that standpoint. Soon, compadres. Soon.

Sunday, October 4, 2015

PowerPoint cognitive style

Edward Tufte really doesn't like PowerPoint.

Timely dataflow

Still haven't quite figured out exactly what timely dataflow is (something like JIT-evaluated lazy streams that take advantage of parallel processing resources), but here's a paper to read.

Here's a Rust implementation of a similar system, with differential dataflow built on top of it.

FSM-enabled key-value store

I'm not even sure exactly what keyvi is, except something very wonderful. If I understand correctly, it finds strings in a fuzzy manner. I should really look at it more closely.

Rethinking cron

I guess I'm not the only one that doesn't like cron much (except, you know, the part about how it always works). Alternatives for many purposes appear to be in the category of job queuing systems, so that should probably be examined at some point under the "system architecture" rubric.

Also, here's a Python job scheduler inspired by this article. ("Job scheduling for humans.")

007, a simple macro language

This is very cool. 007 is a macro-enabled language that is written on top of Perl 6. I really kind of need to look at Perl 6 sooner or later.

A curated list of NLP resources

Rat cheer. Excellent list.

Also on the topic of NLP: Semantics, grammars, and representations for deep learning, a paper I'd like to read when I have the mental capacity to really read it. (Having the Android tablet has really helped with that.)

Also, a primer on deep learning tools in NLP.

Saturday, October 3, 2015

The best regexp trick ever

It's true. It is! It's just this. Say I want to find all instances of X that aren't in quotes. Then I just say /"X"|(X)/ and ignore overall matches, just taking group 1 matches. Pretty brilliant, actually!

Declarative Arduino

I've been doing a little initial dabbling in Arduino tutorials in preparation for an upcoming project that involves (gasp) actual hardware, and as usual my thoughts continue to want to make it all a new domain. What I'd really like to see is a single specification that defines the circuit on the Arduino, the code running in it, and any UI code on the host that talks to it (whether through the serial port or by radio).

Those pieces would then draw the schematic, write the Arduino Java code, and write the host-side C or Perl or whatever is running over there. Later, you could even automate the assembly of the circuit on the Arduino side (wouldn't that be a cool project? Combine it with the circuitry evolution thing I saw a couple of weeks ago....)

Thursday, October 1, 2015

pydoit

So "doit" is a build tool written in Python, designed to be more general in scope than make. The documentation speaks of workflow. It looks like an attractive tool.

And speaking of attraction, it seems to me that there are natural "semantic attractors" when it comes to the many tools written for the same domain. What you'd really want to do is to boil things down to a semantic core and then map out how each tool expresses things in its own unique way. Factor out the commonality, as it were.

So the build tool domain would be an interesting domain to do that in, especially given its patent usefulness in building tools.

Gear Generator

Oh, sweet: a gear generator. Does exactly what it says on the tin.

Organizers

Specifically, Transpose. Here we have a really attractive online tool for sticking structured notes in the cloud. "Structured", here, means typed fields that go into something in the back office that can be searched by type. Notes are also typed in that they are arranged into templated lists; each template has a set of typed fields. If my template includes a date and a location, for instance, I can look for all notes within a mile of a given set of coordinates, plotted on a Maps window.

That is pretty freaking excellent.

Transpose the business is a freemium model that incorporates crowd sourced templates in its "public library". That allows you to leverage some level of information about how other people do things. That's pretty freaking excellent, too.

The free version permits up to 10 templates.

Your mission, if you choose to accept it: write a sketch of the system and the business in Decl. Work on the semantics behind the scenes to make it a reality.

Friday, September 25, 2015

Autoassociative memory

Back in, oh, the 90's sometime, I had the entire summer off and we were in Budapest. During that time I implemented a Kohonen network in Visual Basic (and invented Gray codes while I was doing it, because I wanted neighboring numbers to differ by only one bit - only much later did I discover they had a name before I was born), and successfully saw it retrieve geometrical shapes given a noisy input.

My ultimate idea was to implement some kind of "cognitive stub" that would be a (long) vector that somehow encoded a "semantic flavor" of a given semantic structure. Put those into an autoassociative memory and you've got something that kind of feels like human memory.

I still think it's a good idea, but there have been a lot of higher-priority things, like getting the kids through school and making sure I can retire before I'm 98. Also, sometime between then and now I seem to have entirely lost that code, which sucks, actually, because I really hate data loss.

But you know, I just ran across a reference to autoassociative memories (a password reminder, odd application but there you go). That author uses a discrete Hopfield network library of his own devising, but I wanted a better overview of the field, like this here.

Maybe I'll dabble with these things soon. Like this winter.

Wednesday, September 23, 2015

LISP macros on JavaScript ASTs

Oh, another project that's all kinds of cool: eslisp is an S-syntax for JavaScript parse trees, with a macro expander that treats any JavaScript function as a perfectly valid macro. That is just brilliant!

Spidermonkey exposes the AST as an API, incidentally. This project makes use of that.

Tuesday, September 22, 2015

PlantUML

Here's a UML diagrammer driven by DSLs. Ran across it linked from the description of another project by its author that used it to document his class diagram. Slick!

Monday, September 21, 2015

Compressing word lists into useful DAWGs

Cool data structure article.

NYT extracts recipe data from natural language

Very, very cool. They used CRF.

Selenium

Remember how I said my desktop machine is now running Linux? Yeah, I don't have to worry about getting Selenium to work under Windows any more if I want to automate browser tasks. And I do, yes I do, because the browser is one basic way business tasks are presented.

Selenium: http://www.seleniumhq.org/download/
Selenium client in Perl: https://metacpan.org/release/Test-WWW-Selenium

Sunday, September 20, 2015

Adventures in Wubuntu architecture

So I bought a new desktop machine, a MiniITX setup in a very small case that fits in carry-on luggage. But because it's a desktop, and new, it's got a much faster processor and more memory than my three-year-old laptop, and I put in a large HDD and a good-sized SDD for quick booting.

Given that I had all that power at my disposal now, I decided to install Ubuntu as the main OS, with VirtualBox running a VM with Windows 7 for my work tools (the translation industry is entirely Windows-based). With the guest additions and seamless mode, you are literally running both operating systems at once, and it rocks.

I call this unholy chimera "Wubuntu".

Anyway, there are obviously a few little weirdnesses. I use Linux for all browsing, email, and coding, and Windows for editing documents under Office and running all the different tools of the translation trade. But honestly, the Windows Explorer is a good way to get around a project directory. I'm used to it. Wouldn't it be nice to have a right-click that could open a Linux bash shell in the directory? And open a Linux text editor on a given file?

Well, running commands on the guest OS from the host is already supported under VirtualBox, but doing things the other way around is obviously ... weird, under most circumstances, so that's not really supported.

My current notion for a solution is to run lighttpd on the Linux side, and hit it with tinyget from Windows. The latter is a component in the IIS 6 Resource Kit, so not terribly easy to find, but it does work great. From Windows, the Linux localhost is 10.0.2.2 (it's the gateway from Windows' point of view), and I've confirmed that works perfectly.

So put some useful commands on lighttpd, build command files on the Windows side to install on the context menu, and it should all work just fine!

tinywm: Linux windows manager in 50 lines of C

A great example to get off the ground in Linux/X programming!

Relational programming

An article on implementing simple relational programming (a declarative style) in Ruby. Features unification, among other things. More later, when I've actually read it.

Forth on RISC architectures

Here's an article about writing a Forth compiler for RISC architectures - which turns out to be harder than you'd think, given the simplicity of Forth.

Random walk in Python

An interesting article about the random walk hypothesis (quantitative finance) and an implementation in Python that passes the NIST tests for random algorithms. Very cool.

Saturday, September 19, 2015

SCID

Source Code In Database. This was actually a passing reference at the very end of a humorous essay listing ways to write Java code that ensure your continued employment (i.e. writing easily misunderstood code). The essay is copied to a place not its original home, because frankly its original home is oddly formatted and broken up into multiple pages, and the copy was all one readable text extent.

But following that link brings you to this oddly-formatted list of excellent things you could do if you were dealing with a system that actually understood your code. Refactoring on the fly, as it were, because you've stored at least part of the code at the conceptual level instead of at the pure syntax level.

Very thought-provoking indeed! Some of them are sheer brilliance, obviously written by a guy who has done a lot of maintenance programming.

Stan, a probabilistic programming language

Wow. Stan is a "probabilistic programming language", apparently meaning that it is an engine for executing statistical models directly. My level of statistical naïveté being what it is, I can't actually even understand the manual very well, but it looks fascinating. Writing a set of tutorials for this would be a meaningful endeavor. As always, though, the HNN discussion is valuable, especially in regards to alternatives and books.

Wednesday, September 16, 2015

Donald Knuth's list of "readable programs"

Literate programming by the master.

Flowchart.js

Well, this is all kinds of attractive. A declarative language that is converted into flowcharts. You just can't beat that.

Monday, September 14, 2015

Evolving analog circuitry, and some not-very-related thoughts about compilation

So this thing here is all kinds of cool: given a problem, run a bunch of circuits through SPICE until a solution is found. That is just ... wow.

Weirdly, it kind of dovetails with some notions I was sorting through on my walk today. Modern GUI programs are heavy and use plugins because they tend towards a solution-for-all-problems approach. If I want to reuse the concept of "email", it's easier to write plugins for Thunderbird than start a new email client from scratch - or even a new quasi-email client. Like a workflow client - which is why Thunderbird, and other email clients, tends to spread towards task management, calendars, etc.

But this is because our software technology is primitive. Really, we should be thinking in terms of compiling special-purpose tools and then recompiling when our needs change, which was of course the original approach of Unix back when the world was young and resources scarce.

Resource consumption is serious business. This may be another motivation for semantic-level programming.

Well...

I know I said I was only going to write if I had something to say, but I just bought a new desktop machine to replace/augment my aging laptop. And since this new machine has ample capacity, I put Linux on it and I'm running Windows in a virtual machine (a real break for me - I've always run Windows on the desktop and Linux on the server, so this Wubuntu mashup is heady stuff!)

Anyway, so I'm using Firefox on the new machine, Chrome on the laptop. I like Chrome synching because I can cue things up to peruse later on my Android tablet, but synching between Chrome and Firefox? That would require third-party stuff I don't want to get into.

So now I've got two sets of queued links. I know, I know. More first-world problems. But my solution is just going to be to resume posting link posts, at least sometimes for things I find on Firefox that I also want to peruse on my tablet.

Sunday, September 13, 2015

Spitbol

Spitbol is the fast version of SNOBOL, and it's being maintained by Dave Shields. Single-handedly. This came up on HNN a couple of weeks ago, and now Vice seems to have picked up on the story for a personal angle [hnn].

I've considered using Spitbol for some kind of NLP stuff, but after all the work I've already put into some pretty sophisticated tokenization I'm not sure it's worth the effort for my main language-handling tasks. We'll see. It's certainly tempting.

Friday, August 28, 2015

Why the semantic level?

For some reason I seem to have gotten off onto a philosophical tangent while walking the dog this evening, with the resulting epiphany: programming is a process of comprehension. And then documentation of that comprehension.

Back in the day, I spent several years working on two large systems (and here and there some other stuff, but the bulk of my programming work was on two systems). The first was a pharmaceutical document management system, and the second was a searchable online machine tool database. The first was for a corporate gig, obviously - pharmaceuticals are not something done in the garage by a startup - and so it was relatively well-managed and the technical debt was relatively well controlled.

The second was a startup, I was the technical lead and entire staff, and over the course of some dozen years I managed to run up a significant technical debt. Due to that debt, working on the system became an increasingly painful chore (every attempt to address one issue simply reminded me of the dozen related issues that were not going to be fixed because the customer had a very limited budget).

The process of leaving that situation and moving to technical translation took many years, and something in me fought it every step of the way. It was the last paid programming I ever did. Since that time, I've been a hanger-on in the startup programming community, but nothing ever gets off the ground, essentially because I have a fear of technical debt.

But why do we have technical debt? I'll tell you: because we commit to specific platforms and solutions during the programming process, and it is very difficult to undo those early decisions later. By programming at the syntactic level (some of which is of course unavoidable) we lock ourselves into low-level structure we can't easily back out of.

Addressing things at the semantic level - were it possible with existing tools - would avoid at least some of that technical debt. If we have semantic structure - if we are defining not software but the concepts behind the software - then the programming itself starts to look more like a compilation process. And just as we can recompile most code onto a new platform (maybe after fiddling with some flags and libraries), we could back out of syntactic-level, stack-level decisions by "recompiling" a set of concepts on a new platform.

Indeed, in a sense a new set of requirements would be a sort of recompilation.

It's a vague ideal, but this is essentially what I see as the promise of semantic-level programming. I earnestly hope I'll be able to make some forward progress on this over the next year.

Saturday, August 22, 2015

Literate programming: contemplative versus exploratory programming styles

A couple of days ago, HN was asked, "Why did literate programming not catch on?" Predictably, answers ranged from "Because it's useless for real requirements" to "What do you mean? We use it all the time by policy." But a rough synthesis of the overall sense of the meeting, as it were, led me to consider that literate programming requires contemplation. And sometimes you just don't have the time, or sufficient knowledge, to contemplate.

In the typical startup environment, code is written quickly to address specific needs, and as the business pivots and refines, it mutates quickly. So many of the responses addressed that: literate styles don't react well to code churn, and you end up with a literate explanation of code that no longer matches the code (which is of course always the objection to documentation of any kind).

Reading actual code to determine its purpose is effectively reverse engineering. Sure, well-written code should be readable and so it feels odd to call that reverse engineering - but so much real-world code is unreadable that I think it's a good default attitude.

Ultimately, the exegetical stance of integrating literate programming with reverse engineering should support a pretty good overall software development style: quick prototypes to sound out a new task, followed by contemplation of lessons learned and a literate presentation of useful tooling. That's the goal.

Tic-tac-toe and the minimax algorithm: playing imperfect players

I saw an interesting here's-how-it-works post about the minimax algorithm in the context of Tic Tac Toe. I wrote a tic-tac-toe player myself a few years ago, for a HackerRank puzzle, and of course back in the Dark Ages when I was doing my MS in Computer Science. But in this particular post, Jason Fox makes an interesting point: a perfect tic-tac-toe strategy is fatalistic.

While testing his player, he noticed that once the bot saw it couldn't win, it didn't seem to care how long it took to lose. He modified his algorithm to prefer longer fights, just to make it seem like it was putting up a fight.

Why would this behavior occur? And why does it bother us? Turns out it's simple: a perfect strategy assumes that its opponent is playing a perfect strategy as well, because that's easier to code. In other words, when scoring the different options, the perfect minimax strategy assigns its values by playing the other side perfectly as well. Mathematically, it doesn't matter whether we lose perfectly earlier or lose perfectly later, does it?

But that glosses over a fundamental truth out here in the world, and it's why it rankles to see this algorithm playing so oddly - when playing actual games, it's not at all uncommon for our opponent to screw up. And the longer we draw out the game, the more likely it is that they will. So intuitively, we want to fight a delaying action, because our own internal algorithms tell us that the longer we're in the game, the better off we are in terms of an eventual surprise win.

To reflect that, the tic-tac-toe minimax algorithm should really be modified to reflect the probability of error on the part of the opponent. Even a small change in that score should serve to weight longer fights a little higher, and that's enough for the algorithm to choose that path.

It would be instructive to write that up as a real article, but man, I've got stuff to do that's higher priority than that.

Thursday, August 20, 2015

I've been busy

I see I haven't posted here since March... Partly because I've been busy, but also because increasingly I find myself wanting to do something real instead of just posting a gallery of things others are doing. And since I'm slowly making some progress on that end - the third go at Decl is starting to come together into something that makes some sense - I think my brain cycles are going there instead of here.

That said, I have a ridiculous queue of interesting things to note. But new rules. I think I'm not going to post unless I have something to say. Which really shouldn't be a problem, should it? I've always got plenty to say.

Monday, March 9, 2015

Railway oriented programming

Here's a pretty fascinating talk about how to think of mature error handling as a kind of two-track railway, with errors forcing execution down onto a Failure track. I really like this.

Saturday, March 7, 2015

The data engineering ecosystem

Interesting. Very interesting, actually. Sort of an architectural mapping approach.

Explorable explanations

This is a cool post on "the rise of explorable explanations". Very interesting collection of stuff there. On that note, a little musing about combining R and D3.

Thursday, February 19, 2015

Nial

Nial is a programming language built on arrays - it compiles in-situ for execution, apparently, so it would seem to be very efficient for data manipulation.

Another spam recipe posted to a blog instead of being processed

Interesting stuff: http://pastebin.com/U7jv9jxS [hnn]

Thursday, January 29, 2015

Structured proofs

I know, I know, I'm never going to be a mathematician, but here is an article about a talk by Leslie Lamport about how 21st century proofs should be written. And here is Lamport's own paper on the topic. I'm totally with him on the structure - I think a hierarchical proof structure is a great idea, and a much more careful and explicit structure is also much more understandable. But when he goes to defining a formal language TLA+ for expressing proofs, I have to say that the sheer ugliness of the format is an instant dealbreaker for me. No way. That is not the elegant way to express proofs.

Further thought is necessary. But I'll bet I can come up with a prettier language.

How Patrick McKenzie uses Twilio

Short and sweet - a boon to the international traveler.

Tuesday, January 27, 2015

Beautiful JS chessboard library

This is how everything should be handled.

Thursday, January 22, 2015

Top Python mistakes when dealing with Big Data

Interesting article.

Reinventing the wheel. For example, writing one-off code to read a CSV instead of using a convenient purpose-built library that could offer deeper functionality. (Python Pandas, to be specific, in this case - interesting stuff, actually!)
Failing to tune for performance. Cuts down on testing cycles per day.
Failing to understand time and timezones. Ain't that the truth.
Manual integration of different technologies in a solution (copying results files back and forth by hand, etc.)
Not keeping track of data types and schemata.
Failing to include data provenance tracking. Oooh, I like this notion.
No testing, especially no regression testing.

All good points.

Identifying programmers

Another application of NLP techniques to source code. Interesting.

Sunday, January 18, 2015

Data structures

Here's a great open resource for data structures.

Why Perl didn't win

Whew. Yeah.

Friday, January 16, 2015

Torch

Torch is a Lua-powered numerical environment that I've never heard of, but that looks incredibly neat. I only mention this because Facebook released some open-source deep-learning modules for Torch today.

Rosetta Code analysis

Neat paper analyzing programming languages based on their Rosetta Code submissions. Comparative programming linguistics, I guess.

Cquence for really simple JS animation

Cquence looks useful.

Tuesday, January 13, 2015

What happened to Old School NLP?

Answer: statistical methods work OK and they're way easier. 'Struth!

Parsley

Parsley is a data extractor for HTML.

Color

I don't know why this kind of database always interests me.

Sunday, January 11, 2015

Flow-Based Programming

Flow-based programming is not a particularly new idea, but it's getting some more attention lately.

Exegesis, literate programming, and Decl

The basic outline of Decl's new syntax parser is complete and passing tests, and as always when a milestone is reached and I look around the corner at what's next, I'm a little overwhelmed. It makes me philosophical.

My initial Marpa NLP article was actually a very simple exegetical analysis of a prototype script I wrote myself, and as usual when doing a first stab, I ended up writing a lot of special-case code and syntax to handle things, representing a technical debt that in such situations nearly always strangles its host. Fortunately, I iterated quickly this time, taking the insights from that and putting them into a new Decl-based plan.

Slowly, I'm feeling my way towards a Decl-based system for literate-style transformational exegesis, one that I hope will eventually encompass everything this blog has been about.

The advantage to basing everything on Decl is that parsing is done. I can now parse a very rich, informationally dense data structure that is designed right from the start to group things in more or less the same way natural language does. It's easily extended and configured, easily indexed - in short, it's a way of taking notes about program structure and using them to evolve a program.

So that's where I'm going. Very slowly.

Sunday, January 4, 2015

Some thoughts on decisions and other things in workflow

So the general areas of functionality of the new wftk (pretty much the same as the old wftk) are: data, actions, events, decisions, agents, workflow, notifications, and schedules. That doesn't match up all that well with the rough functionality outline taken from the chapter headings of the old book, but these are kind of the things that make up business processes.

The data organizer is a nice piece of functionality that breaks off conveniently, defining and naming data in terms of stores, indexes, and documents. My declarative language Decl is taking shape as a document-based language (its syntax and semantics are governed by the metadata of the document in question), and so it's becoming clear to me that a business action is also a document-based kind of thing, which can be expressed in Decl, use various resources, consume input documents, and create a result document (as enactment) and a series of attached output documents.

Events are essentially input queues. From a technical standpoint they're not terribly interesting, but events drive the machinery of a process. An event source pushes flow.

Which brings us to decisions.

A decision in a workflow context can be as simple as a logical test of variables, or as complex as an entire subprocess that involves not only calculation but even the execution of experiments that cost money and the input of human decision-makers. Decisions are arguably one of the most important aspects of a business. They need to be given careful thought.

There are a number of different ways to present decision structures, including simply as logical combinatorics, tables, and flowcharts, and they can be learned as decision trees (categorization is a generalization of decision - a non-binary decision, effectively).

I don't think they're going to rise to the level of being a separate module like the data organizer, but I think it's still valid to put decisions into a separate chapter of discourse, as it were.

Saturday, January 3, 2015

Generative ebook covers

A fun article on generative graphics.

PyScribe

PyScribe is a kind of neat debugging logger for Python. I've always been partial to printf-style debugging myself, so I like the look of this.

Friday, January 2, 2015

How vulnerable are Quora posts to writing style analysis?

Verdict: pretty vulnerable. The tools used are of interest.

OCCRP

Organized Crime and Corruption Reporting Project (OCCRP) is a data-journalism kind of thing - mostly journalism, but with some data aspects I find interesting. They seem to have collaborated on an investigative tool, the Investigative Dashboard, which does seem pretty data-oriented.