Friday, September 28, 2012

GoogleTalk from Perl

Neat use of a Google API.

Thoughts about being a good programmer

Nice article, suitable for contemplation.

Learnable programming

An absolutely fascinating system for linking code to graphical results to facilitate understanding of the algorithm.  Wow.

Spam starting point

Forum spam. We loves it, preciouss.

Freemind

I've encountered Freemind before, but a very interesting programming strategy article popped up today: Freemind lets you extend your working memory by indexing knowledge in an accessible manner.

On the one hand, I do find this a very exciting concept. I had to do something on Unix just the other day and - as freaking always - it took me twenty times the amount of time it should have, because I have no memory of commands and parameter structures at all.  I never have.  So I certainly need something like this to back me up.

But on the other hand, I just can't believe that this somewhat simplistic structure is the thing to do that job.  Maybe it is - it would certainly be better than nothing, which is what I use now.  And I certainly have to admit that the look of the program is fantastic - much, much better than the last time it crossed my sights.

But it's Java.  Not.. that I have a personal and burning hatred for Java or anything, but it's a way conservative language and I'm a way liberal programmer, so it fits my style poorly.  I've never been able to get mental traction with it.  (Maybe now would be the time.)

And the other thing: it's too restricted.  It's a single hierarchical structure (or so it appears right now), with no multidimensional tagging or sorting, no alternative structures like maybe tables embedded, the text seems restricted in size...  If I had a more extensible structure I'd be more interested.

Something like it based on Decl or actually integrated into the language would probably be closer to what I want.  After all, Decl is supposed to be there to model mental maps.  A graphical interface seems like a logical tool.

Wednesday, September 26, 2012

Complexity in data systems

Here's a slideshow (I know, sigh) about runaway complexity in data systems and a strategy for mastering it.  This is really one of the things I want to address with a programming system: I want to be able to understand the complexity with artificial tools.  I've added a tag "software complexity".

The Data Science Loop

Ha.

Arch Linux

Another simple distro.

Skip lists

Another data structures and algorithms post.

Simple PHP example

Prime factorization.

Websockets 101

Nice overview.

Javascript design stuff

A couple of neat things:

Tuesday, September 25, 2012

Decl is dead. Long live Decl.

I just pulled the plug on the old Decl at Github, replacing it with a tabula rasa generated by module-starter.  The goal at hand is simple: rethink everything from the ground up.  So first things first - our first program to implement is this:
text "Hello, World!"
To make that work, we need a few things:
  • The interpreter environment itself (an object of type "Decl").  The environment was itself a node in the last iteration, and I suspect that's a mistake.  There should be a root node for ease of self-printing, but that should be a child of the environment, not the environment itself.
  • A "decl" command right from the start that acts like an interpreter.  The environment should be a shell, I think, so we can interact with the environment.  It's not a Python shell - the Python shell builds the environment as the result of a series of verbs, and that's specifically what Decl doesn't want to do.
  • Loading code into an environment has to be easier: (1) with a source filter, (2) passing a string in, (3) from a file, and (4) passing some intermediate data structure in - all those have to be supported more transparently than the last version.
  • The output handling system has to be in place in at least a rudimentary fashion here.
  • Marpa parser.  I'll probably need to refine the grammar as I go.
About that. In the old version, I had sigils to determine how the body of a node would be handled.  I'm eliminating that.  Instead, a trailing quote will mark a text body, and brackets a code body.  Anything else is vanilla nodes. Sigils are simply too hard to remember.

Similarly, the language a code body is in was marked with perl < { } or python < { }.  I doubt that's a good idea. It's clunky and ugly - and I want to be able to use Python with ease and elegance (as well as C).  I'm not yet sure what the solution is.  For multiline code bodies, I can see "{ (perl)" as an override of the default language - the default language being set sometime earlier.  For single-line code bodies, though, I don't see that as sufficiently elegant.

Another thing I've been thinking of.  When using a semantic domain module, I want a much, much more explicit definition of the tags in the domain.  To date this has really sucked.  If a module is used from within a Decl environment, then a lot more information should be provided right at the start.  If the same module is used from Perl, though, it should act like a Perl module - and load Decl itself.

This won't come up for a little while, of course - definition of domains is a little way down the rebooted road.

This might take a while.

Sunday, September 23, 2012

Dexy

Dexy is a neat documentation tool - here's its examples page.  Since I've started looking more closely at Heckle again for use in rebooting Decl by writing the tutorial first, Dexy seems like a good model.  (Yeah, I know, I should just use Dexy - but I'm like that.)

Anyway, its flexibility is fantastic.  That's a great model.

Testing mail

Nice framework for handling mail in a test environment.  In Rails.

Thursday, September 20, 2012

Editing text

A computer science view of distributed editing of text.  I need to reread it.

Common Crawl code contest

Cool.  They want volunteers (I don't (yet) qualify as a paid coder there).  I should really look into that.

Portable Perl script testing

Testing scripts is hard to do portably, so a lot of module writers just don't.  Now it's easier.

Google Spanner

A scalable database.  What won't they think of next?

Bookkeeply

Bookkeeping for freelancers.  Definitely a target application.

LDTP/Cobra

LDTP (on Linux) or Cobra (on Windows) is a desktop automation and testing tool.  Looks neat.

Monday, September 17, 2012

Sorting and searching at the library

A fantastic article thinking about sorting algorithms that involve humans and why they're fundamentally different from automated ones, ranging on into algorithm design and ... well, all kinds of thought-provoking stuff.  10 out of 10, would read again.  And should.

Coroutines vs. async APIs

Another reconstrual of asynchronous APIs, now that node.js is getting popular.

Fenwick trees

Another nice article about neat algorithms.

Animated GIFs

I'm not even sure where to file this, but it's such a cool application of graphics that I have to remember it: the animated code screenshots for Sublime Text 2.0.

RTF format documentation

Voluminous and honestly not all that helpful, but this is the documentation available for RTF.

Perl-based open-source applications

Szabó Gábor put together a list of Perl-based open-source apps that are actual applications on the market.  They should all really be considered open-source targets.

Website elements

Yessss, yessss, boilerplate.  Processessssssss.

Calling the shell

I forget why I was researching calling shell programs from scripting languages, but:

HTML snippet gallery for Bootstrap

Snippet galleries and boilerplate rock my world.  Here's a gallery of HTML snippets for Bootstrap.  I don't think it's getting much traction (I should probably archive it), but it's a snazzy idea.

Matching for JavaScript

Yeah, matching is a good concept.

Unicode database

This is cool!

Also, just so I don't forget it: Unicode character folding.

Sunday, September 16, 2012

Friday, September 14, 2012

Open-source voter registration tool

The Democrats have open-sourced a voter registration tool.  Simple, quick, Ruby app that spits out a PDF of the completed form.

Lisp-ish conditions

The condition is Lisp's general-purpose, and dare I say elegant, way of handling errors and other out-of-band ... things.  Here's a good description.  Conditions and events might end up being the same thing in Decl.  We'll see.

Oh, that's right - here was the "Conditions for non-Lispers" article that got me thinking in this direction this month, along with a 2003 post by Ovid on the Monks about exception handling in Perl specifically.  Perl is essentially not capable of implementing Lisp-like conditions, or at least not without a lot of very ugly boilerplate.  But Decl would be.  Maybe even (under certain circumstances) within embedded Perl.

Towards a Decl reboot

In a month, it will have been a year since my last active work on Decl.  At this point, I think it's time to reinvent that wheel again, and start clean; I can always steal the good stuff.

So ... what was Decl?  What should Decl be?  I have a few principles I want to adhere to, and maybe a bit of a plan of action as well.

1. Somebody else is doing better parsing than I am (Marpa), so first and foremost, eliminate my HOP-based parser code, which had performance problems anyway.

2. The internals of Decl are pure cruft.  Now that I know how some of this stuff works, I can do a better job.  I can also do a much better job in providing Decl support for definition of Decl, and better syntactic sugar (thanks to some pointers from Dave Rolsky).  So .. redo that stuff.

3. The use cases for Decl are too difficult.  It has to be easy to define a module based on Decl that I can call from regular Perl without worrying about it being Decl.  (I'm looking at you, Word module.)  Not that calling Decl as an independent language has to be avoided (I know how to define that better now, too), but ease of use has to take higher priority now that I know how I want to use it anyway.

4. I need to be able to define something in Decl and compile to non-Decl-dependent Perl data structures.  If possible.  At least I need to be thinking about that at every step.  Lightweight flexibility.

5. The definition of domains has to be more semantically motivated than it is right now.  All my example domains ended up being horrible jumbles of cruft, with helper functions defined haphazardly and without easy ways to self-discover.  About as non-semantic as I could have done it.

6. Similarly, the placement of all tags into a hard directory is a mistake.  Tag discovery has to work by looking at @INC, simple as that.

7. The basic concepts that I started building into Decl were valid.  Here's a list I've come up with right now; I want to refine it and organize it, and motivate things better.  The kitchen-sink approach is .. maybe too much.  On the other hand, the point of Decl is to provide a framework for representing the cognitive structures programmers actually use in thinking about programs, so ... does unification need to be in the core?  Maybe?
  • Parser support
  • Events - this should be expanded to a Lisp-like condition system for error handling
  • Shell support (text commands) at every object level, including the main program
  • Templates and filters and macros.  Oh, my.
  • Iterators and lazy evaluation.
  • Code munging and generation.  This should do more than I was doing, using templates, probably.
  • Output handling up the tree, just started scratching the surface there
  • Tabular data and DBI support in the box.
  • Hierarchical data, the file system, and walking iterators.
  • Text as a special thing, with rich text and so on.
  • Handling of the system and command line 
  • Control flow and workflow - handle the user in a sophisticated manner
  • Logic programming, unification, and matching
  • Explicitly represented state machines
  • Gulp - explicitly represented data types.  When I want to.
  • Assertions and explicitly represented behavioral tests?
  • Bidirectional maps.
  • If I can figure out how to do it, views of code at different levels of detail supported in the basic language.  Perhaps some system of annotation?
That's a heavy list.  Honestly, there's almost nothing I can take out of the core without feeling as though I'm missing something, and in fact there are some things I didn't put in the core the first time that obviously should have been there (error handling, I'm looking at you).  What I need to do is to write a principled primer of Decl, going down this list and writing example code for each and every thing.  Then make it all work.

Write the tutorial first.  Odd thought.  Here's the first chapter; it's the Hello, world for Decl:

text "Hello, world."

That's it.  By default, text's action is output, and by default, output goes to stdout.  From a semantic standpoint, how could this be any different?

Wednesday, September 12, 2012

Best practices in email server management, 2012 edition

Best practices are our friends!

Time series database

Neat post by a guy playing with a time series database filled with data on his personal life.

Couple more deployment links

First is Bower, a "package manager for the Web".  (Allows you to install packages into a Webapp for development.)

Then there's Yeoman, which scaffolds out various Webapp formats based on questions asked.

Scaling intro

Very nice for-dummies overview of scaling architectures.

Music mashups

Steve Streza made a video/music mashup.  It went very viral. [hnn]

Damn cool algorithms

This is a good blog!  [hnn]

PostgreSQL vs. MySQL

Fantastic look at some of what PostgreSQL can do - which is a lot.  PostgreSQL has the notion of modeling a lot of logic right in the database.  As such, it's pretty declarative.  This is something that deserves thought.

To-do lists

An interesting post on to-do lists and why they don't always work.  Food for thought. Filed under workflow because workflow.

Then there's a post on cognitive load and flow - more or less germane.

Simple Linuces

So something people do to gain their hacker chops is build a Linux core from nothing.  This might be fun sometime, so I gathered a few links.

Toolbox for ML/data science

A nice rundown of useful Python tools for machine learning and data science.

Madison

Hmm.  A collaborative document creation tool.  This is where democracy's going, folks.

Data handling

I'm lumping a bunch of stuff together into the general rubric of "data handling" that is really kind of poorly defined.  But no matter how poorly defined it is, it appears that people keep writing about it, and a a not insignificant portion of many practical machine learning books is devoted to it.

Anyway, it basically involves all the moving around of files and databases that you wave your hands at, and end up being most of the fiddly work of any practical project.  I'd like to set aside a little time to think about how to do it right (kind of a best-practices semantic domain, as usual).  And I'm getting the occasional link about it.

My fileset module has to do with data handling (as a way to define files that should be subjected to an action).  The Data::Table module is a handy in-memory way to cut off blobs of relational data and manipulate them in handy ways.  Excel is a good place to stash tables like this in a file.  And so on.

A lot of workflow involves "data handling" - grouping things into documents and that kind of thing.  Taking items from this document and summarizing them into that one.  The "bizop" semantic-level language/view I've been musing on is largely a matter of data handling.

Another simple but good idea

Deadman.io is a cloud deadman's switch.  It doesn't hear from you, it initiates a notification process.  Pretty neat.

Tuesday, September 11, 2012

Fun with JavaScript

I can't really just link to all the fun JS toys I see, right?

edn: extensible data notation

Interesting.

Python for economists

Python, Perl, whatever.  The idea is tools for data handling.

EPJ Data Science

A Springer open journal on data science.  No excuse not to read everything in here as it comes out.

BYOVoIP system

Here's a neat little series on cloning Skype.  Ha.

Translation of chemical names

Here's a pretty fascinating survey of chemical name translation (I've been doing a lot of pharma translation this month).  Turns out it's pretty tricky - looking at it, I'm not 100% sure it's as tricky as people make it out to be, because it's typical of language people that they find software magical, and typical of programmers to find natural language unreasonably hairy.  But still - I think there's probably a (small) market for this kind of tool.

Cross-posted.

Target app: Boomerang

Boomerang is an add-on for Gmail that lets you manage things on a schedule.  You can hide things from your Inbox until a later date, send messages on a schedule, track whether you've gotten a response to an outgoing message, and so on.

Stuff any mail client should actually be able to do.  We've forgotten email; it's been static for 15 years or so.  I blame the damn spammers.

Sunday, September 9, 2012

Source maps in Javascript

Now here is a concept I love - the new JavaScript source map.  It allows you to start with a given source representation, then crunch and minify your script - but given the map, you can get back from a given execution state to the point in your code it reflects.

That's brilliant.

Saturday, September 8, 2012

REBOL

Here's a cool new find, for a language that's on its last dying gasps (maybe): REBOL.  I love the look and feel of the language; it's a lot like a more flexible version of what I'm trying to do with Decl.  Unfortunately, the author attempted to go with a closed-source business model and essentially ran out of steam - the question currently on the table is whether to open-source the language and let a committee save it from otherwise certain death.

Tough decision to make - maybe especially because there's only one real answer.

Tuesday, September 4, 2012

Calculating Web things with a solver

Interesting design approach.

KHC text analysis library

Another.

"Me-too" products are just fine

Somebody agrees with me.  One of the things I want to be able to do is clone a Website quickly - by describing it.  Lots of low-hanging fruit available on the job boards doing this sort of thing.

Typicons for simple icons

Here's a neat little icon set.

Architecture of the BBC Olympics websites

This is fun.

SymPy

Symbolic algebra in Python!  Cool!

Many-to-many relationships using Bloom filters

Here's a neat idea - using a lossy link to improve many-to-many performance.

Why C is better than C++ for low-level infrastructure

This makes sense, actually.  It's a post (in a series of posts) by the author of ZeroMQ.

Saturday, September 1, 2012

Marpa

Marpa is a new parser module (and algorithm) that somehow God has seen fit to provide in Perl first.  I mean, not God, but His messenger on Earth, Jeffrey Kegler.

The list of Marpa's features seem nothing short of, well, everything I ever wanted in a parser, and so I will be rebuilding Decl's parsing on Marpa and throwing out my own parsing code.  I'm actually pretty darn proud of that parsing code, of course, having invested a couple of months in it, but - well, I learned a lot and it's a thing to remain proud of, but if somebody else is doing a more thorough job with something, it's always true that you're better off stealing their work.

I wonder how successful Marpa would be in analyzing natural language?