Sunday, December 28, 2014
Thursday, December 25, 2014
Impulse Tracker open-sourced
Impulse Tracker was an influential piece of software in the 90's electronic music scene (and is doubtless still influential today, but isn't used as much, naturally). Its author just open-sourced the whole thing. It would be an interesting code understanding/exegesis target.
Tuesday, December 23, 2014
Generating mazes and dungeons
Truly fascinating and well-written article on maze/dungeon generation, with interactive JS illustrations. This is the kind of writing about programming I love. [hnn]
Sunday, December 14, 2014
Really teensy ELF executables
Binary data parsing - well, reverse binary data parsing... or something. Cool article, anyway.
Remote controlling browsers
So a significant portion of "business things" that a workflow/business process system has to handle consist of things done in browsers.
Sure, sure, you can automate Web things effectively with a bot, but sometimes what you're controlling is a JavaScript application that, honestly, will only run well in an actual browser. It's a pain, but there you go.
One avenue has traditionally been IEMech (moribund at the moment due to OLE/COM complexity that has changed in later versions), but there are also different remote control solutions available for Firefox and Chrome.
Firefox's FF-Remote-Control is a great little add-on that works quite well. For the time being, therefore, Firefox is going to be my automated browser of choice even though Chrome is currently my actual browser.
For Chrome, the situation is somewhat different, as Chrome's security model doesn't permit an add-on to listen on a port. As a result, the Chromi extension hits a running server on localhost (the Chromix remote-control system). It doesn't seem as flexible as FF-Remote-Control, but I haven't spent much time with it yet.
So: for now, Firefox.
Sure, sure, you can automate Web things effectively with a bot, but sometimes what you're controlling is a JavaScript application that, honestly, will only run well in an actual browser. It's a pain, but there you go.
One avenue has traditionally been IEMech (moribund at the moment due to OLE/COM complexity that has changed in later versions), but there are also different remote control solutions available for Firefox and Chrome.
Firefox's FF-Remote-Control is a great little add-on that works quite well. For the time being, therefore, Firefox is going to be my automated browser of choice even though Chrome is currently my actual browser.
For Chrome, the situation is somewhat different, as Chrome's security model doesn't permit an add-on to listen on a port. As a result, the Chromi extension hits a running server on localhost (the Chromix remote-control system). It doesn't seem as flexible as FF-Remote-Control, but I haven't spent much time with it yet.
So: for now, Firefox.
Monday, December 1, 2014
Fuzz testing
Fuzz testing is throwing randomly perturbed inputs at a given piece of software to see what breaks. I was entirely unaware of the state of the art of fuzz testing, though. afl-fuzz is a tool that watches the execution traces of its target while reacting to the input. It can synthesize a legal bash script from nothing, by watching how bash reacts to different byte sequences. (And it discovered that bash vulnerability that made everybody upgrade last month or so.)
To which I can only say: holy Toledo. I have seen the future.
It discovered CDATA sections in XML. Randomly. Against the expectations of its author, who says, "it's an example of the fuzzer defiantly and secretly working around one of its intentional and explicit design limitations". Evolution is weird. Almost magic.
It made a legal JPG from the seed string "Hello" - again, by noticing different execution paths taken in response to different bytes of input. Here are some test sets for graphics. Interesting stuff.
To which I can only say: holy Toledo. I have seen the future.
It discovered CDATA sections in XML. Randomly. Against the expectations of its author, who says, "it's an example of the fuzzer defiantly and secretly working around one of its intentional and explicit design limitations". Evolution is weird. Almost magic.
It made a legal JPG from the seed string "Hello" - again, by noticing different execution paths taken in response to different bytes of input. Here are some test sets for graphics. Interesting stuff.
Breadcrumb
Just a note on progress, for later calibration of my past timelines: I've got Decl 2.0 parsing working rather nicely (still a few loose ends) and I have an excellent plan for integrating it into a notes application for literate programming. I think the combination will end up being something pretty powerful.
My first real target for transformational exegesis using this kind of tool will be Melanie Mitchell's Copycat - I want to get it converted to Clojure so it will run on something I actually own. But I'm also exploring various programming sequences and contests as a way of provoking thought about the actual writing of software.
Anyway, that's all pretty jargon-laden but it means something to me. At a later date I hope to circle back around and write about this stuff in more detail, but I'm not in that phase of the cycle at the moment.
My first real target for transformational exegesis using this kind of tool will be Melanie Mitchell's Copycat - I want to get it converted to Clojure so it will run on something I actually own. But I'm also exploring various programming sequences and contests as a way of provoking thought about the actual writing of software.
Anyway, that's all pretty jargon-laden but it means something to me. At a later date I hope to circle back around and write about this stuff in more detail, but I'm not in that phase of the cycle at the moment.
Friday, November 28, 2014
Clappr: open-source media player
Here's a cool thing: Clappr is an open-source pluggable media player for the Web.
UFla Sparse Matrix Collection
A nifty set of real-world sparse matrices, with links to programming challenges and algorithms. Not to mention pretty pictures.
Ransom: the new spam?
So online extortion is a thing. And it seems that vigilante justice, the old-fashioned way, might be an answer. [hnn]
Tuesday, November 11, 2014
Thursday, November 6, 2014
Monday, October 27, 2014
Language-oriented programming
Oddly, following the Wikipedia list of different programming paradigms to "language-oriented programming" led me right back to MPS, and a raft of fascinating articles by Martin Fowler about the notion of a "language workbench". [Here], [here], and [here].
Chief among the things that people seem iffy about in LOP is the idea that the stored representation is in fact no longer text (and what that does to version control) - but you know what? Decl is probably an ideal representation language for a DSL-oriented approach (one of the things I was struggling towards in the first iteration) and is entirely text-based.
So that begs further exploration.
Chief among the things that people seem iffy about in LOP is the idea that the stored representation is in fact no longer text (and what that does to version control) - but you know what? Decl is probably an ideal representation language for a DSL-oriented approach (one of the things I was struggling towards in the first iteration) and is entirely text-based.
So that begs further exploration.
Constraint programming
Wikipedia on constraint programming. I'm working through a mathematical modeling course and linear programming is a subset of this.
Saturday, October 25, 2014
Eco
Holier Toledo! Eco is a text editor that parses arbitrary languages on the fly and allows you to insert "language boxes" to embed other languages within them. The data structure actually saved is the composite parse tree, not the text, which is mildly alarming given the reliance of the existing programming infrastructure on text source code (although Laurence Tratt, the editor's promulgator, notes that Smalltalk has solved many of these problems).
This is really quite attractive. I can't think of a good reason it couldn't be duplicated in something along the lines of Padre, using Marpa.
This is really quite attractive. I can't think of a good reason it couldn't be duplicated in something along the lines of Padre, using Marpa.
JetBrains MPS
Holy Toledo - this is amazing! It's an editor/DSL integrated environment that allows you to compose your own DSLs into code in the editor, then autogenerate the output language when needed - through multiple levels of abstraction if necessary. Just watching the demo screencasts is making me smarter.
Anyway, this is kind of what I want to do, except, like the emperor in Amadeus, I find there are "too many notes". I need more simplicity. Maybe. Although damn. It sure is pretty. There's one embedded decision table - right in the C code - that gets translated into a C-like language with a "gswitch" statement and then further into straight C. That is tasty.
Anyway, this is kind of what I want to do, except, like the emperor in Amadeus, I find there are "too many notes". I need more simplicity. Maybe. Although damn. It sure is pretty. There's one embedded decision table - right in the C code - that gets translated into a C-like language with a "gswitch" statement and then further into straight C. That is tasty.
Friday, October 24, 2014
Binary patching on the fly
Binary patching is a cool notion (not the first time I've said that), but this is the first time I've considered that it might be done on the fly as a binary comes down over the wire. That's pretty freaking radical, actually.
Thursday, October 23, 2014
VST (Virtual Studio Technology)
VST is an interaction protocol for music generation software. Here's a neat, very simple VST host called VSTHost. The current version 1.5.4 is closed source, with the latest open source version at 1.16k. (He got tired of people ripping off his features without crediting him - can't blame him for that.)
I ran across this at all because the Windows-native version of ZynAddSubFX is VST-capable and bundles VSTHost. ZynAddSubFX is a Unix music mixer, open-source. I ran across it from a short article summarizing Linux music software, which I turn had found on a search for "open source composer software" after seeing references to Mario Paint Composer for building 8-bit game music, along with a masterwork of the medley art here. (That must have taken forever to do.)
So that was my recent music software trajectory.
I ran across this at all because the Windows-native version of ZynAddSubFX is VST-capable and bundles VSTHost. ZynAddSubFX is a Unix music mixer, open-source. I ran across it from a short article summarizing Linux music software, which I turn had found on a search for "open source composer software" after seeing references to Mario Paint Composer for building 8-bit game music, along with a masterwork of the medley art here. (That must have taken forever to do.)
So that was my recent music software trajectory.
Tuesday, October 21, 2014
yEd diagrammer
Wow. This is what a fantastic tool looks like. yEd basically looks like magic. But if you really want magic, look at the gallery.
Basic tips for writing Unix tools
Here. Except really I'd like a superset of Unix-y functionality. Wouldn't it be nice to have the same kind of tinkertoy approach but with SQLite or something? (Kind of where my SQP is going. Slowly.)
The semantics of event schedules
I seem to be stuck in survey mode tonight - and ran across this groovy little lightweight event scheduler from the Czech Republic, EasyTime. And really, event scheduling is pretty ramified if you start really thinking about it. When I was working out the basic feature list for the wftk, I sort of intuited that an event scheduler would have to be part of it - but seriously, you could do worse than looking at the feature list of EasyTime.
Friday, October 10, 2014
Rich command shells
A very nice overview of what's out there at the moment (with history). I keep thinking I want to do this with SQP, but I'm still just not sure how it should look. (Probably the notebook thing.)
(Note to self: merge CLI and command line tags - they're the same thing...) (Or are they?!?)
Also, holy schemoley - xiki: xiki.org/screencasts/
(Note to self: merge CLI and command line tags - they're the same thing...) (Or are they?!?)
Also, holy schemoley - xiki: xiki.org/screencasts/
Wednesday, October 8, 2014
Article up, and also: exegesis and code analysis/understanding
My first article kinda-sorta based on an exegetical approach (of my own prototype code) is up on the Vivtek site, and the six-day code rush to write code to build the article from a note database has really whetted my appetite for more of the same. There are all kinds of exegetical efforts I want to make, getting into code reading in a big way.
So I trolled around Google for salient things. Here's a list of interesting things.
So I trolled around Google for salient things. Here's a list of interesting things.
- TipsForReadingCode at c2.com was quite helpful as a set of ... well, tips for reading code.
- Code comprehension tools at grok2.com; the vast majority of this class of tool is closed-source and rather expensive. I think this is largely because large codebases are not typical for open source projects, but rather enterprise code, and enterprises have the money to pay for expensive tools. That's my reading, anyway. But:
- cscope is a venerable tool for static analysis of C code, anyway, and has been perverted to handle other large-scale grep-like analyses of large numbers of files. The tool itself may or may not be something I'm interested in, but its approach is probably pretty valid.
- cflow is another flow dependency analysis tool, also open source.
- Perl is, as always, a special case.
- Well, what about static code analysis in general? Here's another list of tools.
- There's a clang-based analyzer.
- This moribund project on code "aspects" looks fascinating.
- Finally, a book on pattern-based OO refactoring, which also looks pretty fascinating.
So, as always, lots of people are doing things tangentially related to what I want to do, but nothing is 100% there - because what I want to do is an extract-then-literate-programming kind of thing. We'll see how much sense this approach makes. Wish me luck.
Labels:
articles,
code analysis,
code understanding,
exegesis
Monday, October 6, 2014
Stamplay: back-end development by drag-and-drop
Here we go, folks. The future. High-level descriptive language (visually expressed) for website construction by component. Exactly what I was thinking about three years ago, only done slicker than I would have managed.
Sunday, October 5, 2014
Sunday, September 28, 2014
Scratching the surface of German NLP, from ParZu down
Back in June, looking for parsers for the German language, I ran across ParZu, which is from the University of Zurich. Test sentences thrown against its online demo were translated handily, and all in all it's a convincing parser, so I'm going to be working with it for a while to get a handle on things. It is written in Prolog.
For the past three days, I've gone down the rabbit hole of NLP tools for German, starting from ParZu. There is (of course) a vast amount of previous work, and it's really difficult to get a comprehensive grasp, but this post should at least link to some of it, with initial thoughts, and I can go from there later. I had considered writing an article, but honestly none of this is sufficiently coherent for an article. There's kind of a threshold of effort I expect from articles on the Vivtek site, and that's not there. Yet.
OK. So ParZu can work with any tool that delivers text in a tab-delimited format (token-tab-tag) using the STTS tagset (Stuttgart-Tübingen TagSet, if you were wondering). My Lex::DE can already be converted to generate some of these, so my best bet at the moment would simply be to continue work on Lex::DE and feed it directly into ParZu. Even better, of course, would be to do this online by talking directly to Prolog, probably ideally through HTTP to avoid 32/64-bit process boundaries. More on this notion later. The cheap way to do this is just to kick out tagged text and go on.
The output from ParZu uses the CoNLL format, which seems pretty straightforward.
Which is all very nice and self-contained, but how do the Zurchers do their tagging? I'm glad you asked! The main tagger is clevertagger, which works on the output of Zmorge. Zmorge is the Zurich variant of SMOR, which is the Stuttgart morphological analyzer, although active development seems to have moved to Munich.
clevertagger has a statistical component that uses CRF (Conditional Random Field) training to judge, based on the Zmorge lemmatization output, which POS is most likely for the word based on your corpus. You can use either Wapiti or CRF++. The point of doing this is to eliminate POS amibiguity (or to quantize it? but no, I think it's a disambiguation step), which is what I hope to use Marpa to do directly - instead of providing unambiguous parts of speech, with Marpa I'll be able to provide alternatives for a given word, and disambiguate after parsing. Well, that's the idea, anyway - but that's going to take some effort.
(Note, by the way, that since ParZu is coded in Prolog, I can probably cannibalize it relatively smoothly to convert to a Marpa grammar, so none of this effort will be lost even if I do switch to Marpa later.)
Anyway, the CRF thing leaves me relatively unexcited. It would be nice to take an aside and figure out just what the heck it's doing, but that's pretty low priority.
Zmorge is based (somehow) on a crawl of the Wiktionary lexicon for German, and uses a variant of SMOR, SMORlemma, for the meat of the processing. I'm unclear on exactly how this step is done, but I do know that SMOR has a lexicon that is read into the FST on a more-or-less one-to-one basis, so I presume that Zmorge is putting the Wiktionary data into that lexicon, and then using updated rules for the rest of the morphological analysis. It would take a little exegesis to confirm that supposition. Maybe later.
SMOR and SMORlemma are both written in an FST-specific language SFST, which is just one example of a general FST language. It's roughly a tool for writing very, very extensive regular expressions (well, that's nearly tautological, in a sense). There are other FST-specific languages originating in different lineages, including OpenFST (developed by Google Research and NYU), AFST (an SFST fork developed in Helsinki - notice that a lot of the original FST work in NLP was done in Helsinki), and the umbrella library that sort of combines all of the above and some other stuff as well, HFST (Helsinki Finite State Technology). Overall, there's been a lot of work in finite-state transducers for the processing of natural language.
There are some tasty-looking links proceeding from the OpenFST project, by the way.
From my point of view, what I'd like to do might consist of a couple of different threads. First, it would be nice to look at each of these toolsets and produce Perl modules to work with them. Maybe. That, or possibly some kind of exegetical approach that could approximate some kind of general semantics of FSTs and allow implementation of the ideas in any appropriate library or something. I'm not even sure.
But second, it would be ideal to take some of the morphological information already contained in the various open-source morphologies here (note: OMor at Helsinki, which aims to do something along these lines, and of course our old friend Freeling) and build that knowledge into Lex::DE where it can do me some good. How that would specifically work is still up in the air, but to get good parses from ParZu (and later from Marpa), it's clear that solid morphological analysis is going to be crucial.
Third, I still want to look at compilation of FSTs and friends into fast C select structures as a speed optimization. I'm not sure what work has already been done here, but the various FST tools above all seem to compile to some binary structure that calls into complex code. I'm not sure how necessary that is - until I examine those libraries, anyway. Also, I'd really like to get something out of lemmatization that isn't a string. Those structures bug the hell out of me, because I still need to parse them again next time I do something. I want something in memory that I can use directly. (Although truth be told I have no idea whether that's premature optimization or not - until I try it out.)
Fourth, there are other POS systems as well. One that naturally caught my eye is hunpos.
So that's the state of the German parsing effort as of today. Lots of things to try, not much actually tried yet.
Update 2014-09-30: A closer look at the underlying technology of ParZu, the Pro3gres parser originally written for English, as described in a technical report by the author, has me somewhat dismayed. I'm simply not convinced that a probabilistic approach is ideal - sure, I might be wrong about this, but first I want to try the Marpa route. Yesterday I sat down to try parsing something with ParZu, and found myself writing an initial Marpa parser for German, working from my own tokenizer (which, granted, has absolutely horrible lemmatization and POS assignment). I think I'm going to continue down that path for now.
That said, SFST is a fascinating system and the German morphologies written in it are really going to come in handy - so I might end up using that before even considering the parser level.
For the past three days, I've gone down the rabbit hole of NLP tools for German, starting from ParZu. There is (of course) a vast amount of previous work, and it's really difficult to get a comprehensive grasp, but this post should at least link to some of it, with initial thoughts, and I can go from there later. I had considered writing an article, but honestly none of this is sufficiently coherent for an article. There's kind of a threshold of effort I expect from articles on the Vivtek site, and that's not there. Yet.
OK. So ParZu can work with any tool that delivers text in a tab-delimited format (token-tab-tag) using the STTS tagset (Stuttgart-Tübingen TagSet, if you were wondering). My Lex::DE can already be converted to generate some of these, so my best bet at the moment would simply be to continue work on Lex::DE and feed it directly into ParZu. Even better, of course, would be to do this online by talking directly to Prolog, probably ideally through HTTP to avoid 32/64-bit process boundaries. More on this notion later. The cheap way to do this is just to kick out tagged text and go on.
The output from ParZu uses the CoNLL format, which seems pretty straightforward.
Which is all very nice and self-contained, but how do the Zurchers do their tagging? I'm glad you asked! The main tagger is clevertagger, which works on the output of Zmorge. Zmorge is the Zurich variant of SMOR, which is the Stuttgart morphological analyzer, although active development seems to have moved to Munich.
clevertagger has a statistical component that uses CRF (Conditional Random Field) training to judge, based on the Zmorge lemmatization output, which POS is most likely for the word based on your corpus. You can use either Wapiti or CRF++. The point of doing this is to eliminate POS amibiguity (or to quantize it? but no, I think it's a disambiguation step), which is what I hope to use Marpa to do directly - instead of providing unambiguous parts of speech, with Marpa I'll be able to provide alternatives for a given word, and disambiguate after parsing. Well, that's the idea, anyway - but that's going to take some effort.
(Note, by the way, that since ParZu is coded in Prolog, I can probably cannibalize it relatively smoothly to convert to a Marpa grammar, so none of this effort will be lost even if I do switch to Marpa later.)
Anyway, the CRF thing leaves me relatively unexcited. It would be nice to take an aside and figure out just what the heck it's doing, but that's pretty low priority.
Zmorge is based (somehow) on a crawl of the Wiktionary lexicon for German, and uses a variant of SMOR, SMORlemma, for the meat of the processing. I'm unclear on exactly how this step is done, but I do know that SMOR has a lexicon that is read into the FST on a more-or-less one-to-one basis, so I presume that Zmorge is putting the Wiktionary data into that lexicon, and then using updated rules for the rest of the morphological analysis. It would take a little exegesis to confirm that supposition. Maybe later.
SMOR and SMORlemma are both written in an FST-specific language SFST, which is just one example of a general FST language. It's roughly a tool for writing very, very extensive regular expressions (well, that's nearly tautological, in a sense). There are other FST-specific languages originating in different lineages, including OpenFST (developed by Google Research and NYU), AFST (an SFST fork developed in Helsinki - notice that a lot of the original FST work in NLP was done in Helsinki), and the umbrella library that sort of combines all of the above and some other stuff as well, HFST (Helsinki Finite State Technology). Overall, there's been a lot of work in finite-state transducers for the processing of natural language.
There are some tasty-looking links proceeding from the OpenFST project, by the way.
From my point of view, what I'd like to do might consist of a couple of different threads. First, it would be nice to look at each of these toolsets and produce Perl modules to work with them. Maybe. That, or possibly some kind of exegetical approach that could approximate some kind of general semantics of FSTs and allow implementation of the ideas in any appropriate library or something. I'm not even sure.
But second, it would be ideal to take some of the morphological information already contained in the various open-source morphologies here (note: OMor at Helsinki, which aims to do something along these lines, and of course our old friend Freeling) and build that knowledge into Lex::DE where it can do me some good. How that would specifically work is still up in the air, but to get good parses from ParZu (and later from Marpa), it's clear that solid morphological analysis is going to be crucial.
Third, I still want to look at compilation of FSTs and friends into fast C select structures as a speed optimization. I'm not sure what work has already been done here, but the various FST tools above all seem to compile to some binary structure that calls into complex code. I'm not sure how necessary that is - until I examine those libraries, anyway. Also, I'd really like to get something out of lemmatization that isn't a string. Those
Fourth, there are other POS systems as well. One that naturally caught my eye is hunpos.
So that's the state of the German parsing effort as of today. Lots of things to try, not much actually tried yet.
Update 2014-09-30: A closer look at the underlying technology of ParZu, the Pro3gres parser originally written for English, as described in a technical report by the author, has me somewhat dismayed. I'm simply not convinced that a probabilistic approach is ideal - sure, I might be wrong about this, but first I want to try the Marpa route. Yesterday I sat down to try parsing something with ParZu, and found myself writing an initial Marpa parser for German, working from my own tokenizer (which, granted, has absolutely horrible lemmatization and POS assignment). I think I'm going to continue down that path for now.
That said, SFST is a fascinating system and the German morphologies written in it are really going to come in handy - so I might end up using that before even considering the parser level.
Saturday, September 27, 2014
Interesting simple workflow tool
I don't have much time to look closer, but tasklet looks pretty slick.
Monday, September 22, 2014
Friday, September 12, 2014
Videogrep: automatic supercuts using Python
This is a cool little thing! [github] It searches subtitle files and then uses the moviepy library to splice together video based on the subtitle timing. Neat!
Wednesday, September 10, 2014
Plagiarism detection competition
This is cool: PAN is a yearly contest for plagiarism detection. Definitely an interesting task to look at for NLP.
Tuesday, September 2, 2014
Machine learning technique flowchart
Another flowchart for choosing between machine learning techniques.
Man, posting has been thin on the ground lately - our summer in the States was fantastic, though.
Man, posting has been thin on the ground lately - our summer in the States was fantastic, though.
Tuesday, August 5, 2014
Bot becomes trusted member of social network
So some guys in Italy were doing research in social networks and found surprising behavior when their bot's regular visits to people were noticed. Turns out people could see that and responded by trusting the bot, so they started making random recommendations, which people who trusted the bot responded to quite well.
Social aping online can be incredibly simple (on the Internet, nobody knows you're Eliza) and as always I want to extrapolate to a bot that can operate a simple small business.
Social aping online can be incredibly simple (on the Internet, nobody knows you're Eliza) and as always I want to extrapolate to a bot that can operate a simple small business.
Tuesday, July 8, 2014
Scanners
Scanning is one of those places where confusion reigns and there is essentially no good open-source software to be found. XSane is the closest the open-source world has come, and it's not very Windows-compatible.
So it might be a reasonable idea to investigate this as a possible relatively simple target UI application.
Anyway, SANE is the Linux project for scanner drivers. There is no Windows support.
So it might be a reasonable idea to investigate this as a possible relatively simple target UI application.
Anyway, SANE is the Linux project for scanner drivers. There is no Windows support.
Sunday, July 6, 2014
Comparative literate programming
Now here is an article that is bang-on the kind of stuff I want to write: a comparison of JavaScript typing completion code with a newer, cleaner, Clojure one that is literately woven from the article itself. This is transformational exegesis, or at least a first stab in its direction. (The jQuery code isn't actually quoted all that much.)
BONES Scheme
A Scheme-to-assembly compiler for all your bare-bones Scheme needs. It's a little stripped-down for performance (some error checking omitted, and so on).
Marpa, German, and ParZu, oh my!
I spent most of May working through my old natural-language tokenizer, adding a vocabulary-driven lexer/lexicon for German, all in preparation for undertaking a Marpa-based German parser. That's looking halfway decent at this point (except I need to do much better stemming), and then I decided to do a general search on German parsers and found ParZu.
The unusual thing about ParZu, among parsers especially, is that it's fully open source. That is, it has a free license, not a free-for-academics-only license - and it's hosted on GitHub. Also, I can try it online. So I fed it some more-or-less hairy sentences from my current translation in progress - and it parsed them perfectly.
So here's the thing. I kind of want to do my own work and come to terms with the hairiness of things myself. And then on the other hand, parsing German by any means would allow me to jump ahead and maybe start doing translation-related tasks directly....
It's a dilemma.
The unusual thing about ParZu, among parsers especially, is that it's fully open source. That is, it has a free license, not a free-for-academics-only license - and it's hosted on GitHub. Also, I can try it online. So I fed it some more-or-less hairy sentences from my current translation in progress - and it parsed them perfectly.
So here's the thing. I kind of want to do my own work and come to terms with the hairiness of things myself. And then on the other hand, parsing German by any means would allow me to jump ahead and maybe start doing translation-related tasks directly....
It's a dilemma.
Saturday, July 5, 2014
Thursday, June 5, 2014
MozRepl
One of the recurring problems I have with Mozilla products is that they are essentially unscriptable using technology I know how to use. XPCOM has no Perl support (apparently it started to, at one time, but that module seems to have been dropped along the way), and the only real recourse appears to be embedding of JavaScript through mechanisms I don't really have the time to understand.
Well - no more. I learned of the existence of MozRepl, which is a plugin that provides a local telnet command line that can be used to inject JS into running Mozilla-based applications (Firefox and Thunderbird being the ones I actually care about). And MozRepl does have a Perl module to talk to it.
So that's another option I have for automating my work. I'm not 100% sure yet how best to use it, but at least I know it's there.
Well - no more. I learned of the existence of MozRepl, which is a plugin that provides a local telnet command line that can be used to inject JS into running Mozilla-based applications (Firefox and Thunderbird being the ones I actually care about). And MozRepl does have a Perl module to talk to it.
So that's another option I have for automating my work. I'm not 100% sure yet how best to use it, but at least I know it's there.
Tuesday, June 3, 2014
docopt
At docopt.org - a specification language for command-line interfaces. Shades of ... that groovy Perl module I can't remember the name of. Right - Getopt::Euclid! I used it for the Marpa tester. Shades of that! Except a general specification language. I think I'll steal this idea eventually.
Thursday, May 29, 2014
And speaking of cool C things
Here's a package that brings Lisp macro syntax to C. Do want! (Run C::Blocks through that!)
C::Blocks and Devel::Declare
There are a couple of different ways to hook into the Perl parser and define a keyword that flags your own funky syntax - neither of which I'd heard of until encountering the absolute genius that is C::Blocks.
C::Blocks embeds C code right into your Perl, using a cblock { this is C here } syntax that is just incredibly groovy. It's way more inline than Inline. OK, so it's not production-ready yet, but still - it will be. And I'm going to do that same thing for HBScheme.
So anyway, C::Blocks uses the shiny new pluggable keyword API to do that (introduced in 5.12) - but Devel::Declare lets you do pretty much the same thing, but less elegantly and safely. The difference appears to be that the pluggable keyword is available from XS and expects you to return opcodes (that is, it's a real live hook into the interpreter) while Devel::Declare runs in Perl and returns strings that will then be interpreted by the Perl interpreter. I might be wrong about that, but I haven't really gotten into it yet.
Yet.
C::Blocks embeds C code right into your Perl, using a cblock { this is C here } syntax that is just incredibly groovy. It's way more inline than Inline. OK, so it's not production-ready yet, but still - it will be. And I'm going to do that same thing for HBScheme.
So anyway, C::Blocks uses the shiny new pluggable keyword API to do that (introduced in 5.12) - but Devel::Declare lets you do pretty much the same thing, but less elegantly and safely. The difference appears to be that the pluggable keyword is available from XS and expects you to return opcodes (that is, it's a real live hook into the interpreter) while Devel::Declare runs in Perl and returns strings that will then be interpreted by the Perl interpreter. I might be wrong about that, but I haven't really gotten into it yet.
Yet.
Tuesday, May 20, 2014
Article: Perl and Windows UAC
New article written on the Vivtek site, a little in-depth investigation of Windows UAC and how to manipulate it from Perl, along with the release of a CPAN module.
Tuesday, May 13, 2014
Marpa stuff
As I get further into Marpa, I'm starting to see there's a whole little world of cool stuff out there based on it. Here are a couple of bookmarks for later.
- A fantastic article on using Marpa to convert Excel spreadsheet formulas into Perl using AST transformations.
- Kegler's "Ruby slippers" parsing technique: essentially ways to trick a simple grammar into functioning within a larger whole by using invisible tokens and wishing the language were easier to parse. Marpa is ... well, it's beyond cool and into virgin territory.
- Another Kegler post on mixing declarative and procedural parsing that should come in handy here and there.
- Here's a gist showing a Marpa parser for CSS that uses a tokenizer external to Marpa - the key technique is in the loop starting on line 187, where we pass each individual token to the recognizer. Only after the token stream is complete do we read the value from the recognizer. (So for a series of sentences, do we have to create a new recognizer for each sentence? I think we actually do. That will be something for experimentation later.)
Thursday, May 8, 2014
Using a spreadsheet as data for templating
The Python copytext module takes a spreadsheet and loads it as a data structure for expression in a template; that's kind of a neat ... I'm trying to think of the phrase I want to use ... "dataflow component" seems to be as close as I can come tonight.
Anyway, this is kind of a neat idea and is probably a way forward for the Data::Table::Lazy module. Note also that D::T::L should directly know how to work with Excel and with Google spreadsheets for maximum fun and profit.
Anyway, this is kind of a neat idea and is probably a way forward for the Data::Table::Lazy module. Note also that D::T::L should directly know how to work with Excel and with Google spreadsheets for maximum fun and profit.
FreeLing
Another NLP package I'd never heard of. I gotta take another sabbatical week soon and get all this NLP stuff under control. Anyway, I found this one on a more-or-less random sweep through CPAN - I forget what I was searching on, but ran across Lingua::FreeLing2::Bindings. Perl bindings make me happy, so I think I'm going to poke around here as soon as possible.
Saturday, May 3, 2014
Irssi
So Irssi is a text-based IRC client that I believe I love already. I only hit IRC about every five years or so; maybe this time will be the time I stick there. It's very popular for software development groups.
Anyway, Irssi seems to embed some kind of Perl automation. I'm going to figure it out eventually.
Anyway, Irssi seems to embed some kind of Perl automation. I'm going to figure it out eventually.
Thursday, May 1, 2014
Marpa
So I decided to sit down finally and write the line parser for the new Decl, and since it was parsing, I decided not to unearth my old HOP-inspired parsing code but rather take the plunge and try Marpa, to avoid getting bogged down in parser issues.
I am in love.
It basically looks like Marpa can do anything related to parsing. It can even handle ambiguous parses! One of the test cases is literally "time flies like an arrow"!
But what doesn't yet exist (there's a partial beginning) is a tutorial set, a "Gentle Guide to Marpa". I think I'll write one.
I am in love.
It basically looks like Marpa can do anything related to parsing. It can even handle ambiguous parses! One of the test cases is literally "time flies like an arrow"!
But what doesn't yet exist (there's a partial beginning) is a tutorial set, a "Gentle Guide to Marpa". I think I'll write one.
Tuesday, April 29, 2014
Code challenges
A nice article recommending the solution of code challenges against an autojudge to improve programming skill.
Saturday, April 26, 2014
Decl and top-heaviness
Man, reading through all the stuff the v0.11 Decl::Node object supports, it's really no wonder I bogged down. It was just doing too much. I really hope that splitting things out into syntactic and semantic poles will make a difference. (Or really, even more than just the two, given the declarative extraction phase in the middle.)
So yeah, I suppose a post on that is in order. The new regime is finally getting underway, given that the last update to Decl was in 2011 and it's 2014 now. I've started coding Decl::Syntax, which is the handling of syntactic nodes.
Note that a syntactic node is used to derive two different sets of semantics. The first is the machine semantics, the second being the human semantics. This is equivalent to the concept of literate programming, except that literate programming also parses the code chunks for indexing, which (initially, and maybe permanently) we will not be doing.
So the surface structure is the indented stuff. To derive the machine semantics, we go through two more phases. Actually, three.
The first is markup. During markup, a Markdown ruleset is used to convert all Markdown nodes into X-expressions. The ruleset can be specified in the input, or can be one of a few named ones.
After markup comes declarative extraction. Here, we extract a tree of declarative nodes from the syntactic structure. These contain only the "true children" of each tag. X-expressions are converted to tag structures during this phase, and transclusions are resolved. Annotations are inserted into structured parameter values. Macros might be expressed, I won't know this until I try expressing some things with prototypes.
The result of declarative extraction is a thinner tag structure that contains only machine-meaningful information. Anything explanatory is discarded, although obviously it's still available for examination if there's a need.
After extraction comes semantic mapping. Here, a set of vocabularies map declarative structure onto data structures. A default vocabulary might just map everything into vanilla Perl structures or objects, but more interesting vocabularies will build more interesting objects.
Finally, execution does whatever action is encoded by the semantic structure. This runs code, builds documents, activates the GUI, or whatever.
Keeping these phases strictly separate makes it possible to build all that detailed functionality into this system without losing sight of what's where. Or so I fervently hope.
So yeah, I suppose a post on that is in order. The new regime is finally getting underway, given that the last update to Decl was in 2011 and it's 2014 now. I've started coding Decl::Syntax, which is the handling of syntactic nodes.
Note that a syntactic node is used to derive two different sets of semantics. The first is the machine semantics, the second being the human semantics. This is equivalent to the concept of literate programming, except that literate programming also parses the code chunks for indexing, which (initially, and maybe permanently) we will not be doing.
So the surface structure is the indented stuff. To derive the machine semantics, we go through two more phases. Actually, three.
The first is markup. During markup, a Markdown ruleset is used to convert all Markdown nodes into X-expressions. The ruleset can be specified in the input, or can be one of a few named ones.
After markup comes declarative extraction. Here, we extract a tree of declarative nodes from the syntactic structure. These contain only the "true children" of each tag. X-expressions are converted to tag structures during this phase, and transclusions are resolved. Annotations are inserted into structured parameter values. Macros might be expressed, I won't know this until I try expressing some things with prototypes.
The result of declarative extraction is a thinner tag structure that contains only machine-meaningful information. Anything explanatory is discarded, although obviously it's still available for examination if there's a need.
After extraction comes semantic mapping. Here, a set of vocabularies map declarative structure onto data structures. A default vocabulary might just map everything into vanilla Perl structures or objects, but more interesting vocabularies will build more interesting objects.
Finally, execution does whatever action is encoded by the semantic structure. This runs code, builds documents, activates the GUI, or whatever.
Keeping these phases strictly separate makes it possible to build all that detailed functionality into this system without losing sight of what's where. Or so I fervently hope.
Friday, April 25, 2014
Reactive programming again
Two little JS frameworks: ripple.js, which aims to be tiny, and "React.js and Bacon", a look at another way to do reactive stuff.
Monday, April 21, 2014
Article: KeePass through SSL with Perl
New article at the Vivtek site on accessing KeePass using the KeePass plugin from Perl. I ran through progressively more elegant prototypes before coming up with a nice wrapper. I released the whole shebang as a CPAN module WWW::KeePassRest, which uses a new JSON API wrapper that is ... minimalistic in its design.
It'd be nice to be nice and principled about API wrappers on something like this basis, but that's definitely way down the priority list.
It'd be nice to be nice and principled about API wrappers on something like this basis, but that's definitely way down the priority list.
Friday, April 18, 2014
Code reading (and by extension, code presentation)
Here's an article by Peter Seibel I missed in January: Code is not Literature. Instead of reading code like literature seminars, we should rather consider presentation of code more like what naturalists do: "Look at the antenna on this monster! They look incredibly ungainly but the male of the species can use these to kill small frogs in whose carcass the females lay their eggs."
This really resonates with me, as it's more or less what I've got in mind with exegesis: a list of articles focusing on sections of the code, highlighting interesting techniques and extracting the knowledge embedded in it (and as the technology matures, also extracting some of that knowledge in a reusable form of some kind).
James Hague then weighs in today with "You don't read code, you explore it," saying essentially the same thing, and adding that only by interacting with the code does he feel as though he achieves true understanding (and mentioning Ken Iverson's interactive J presentations, which sound pretty interesting as well).
So there you go. What people are thinking about writing about code.
As practice, I've written two articles on vivtek.com in the past week and am well into a third: one on TRADOS 2007 and its language codes, so far presenting only a prototype script, a list of the codes used in a convenient format, and explaining a little about discoveries I made on the way; and one on Windows UAC and how to use it from Perl, which I backed up by publishing the module Win32::RunAsAdmin to CPAN.
If I can keep up something like this pace, I'll have fifty articles in a year. That's a lot of writing - and honestly, I have a lot of things to write about. I just wish there were more examples of writing about code for me to emulate. I'm still looking for source material.
This really resonates with me, as it's more or less what I've got in mind with exegesis: a list of articles focusing on sections of the code, highlighting interesting techniques and extracting the knowledge embedded in it (and as the technology matures, also extracting some of that knowledge in a reusable form of some kind).
James Hague then weighs in today with "You don't read code, you explore it," saying essentially the same thing, and adding that only by interacting with the code does he feel as though he achieves true understanding (and mentioning Ken Iverson's interactive J presentations, which sound pretty interesting as well).
So there you go. What people are thinking about writing about code.
As practice, I've written two articles on vivtek.com in the past week and am well into a third: one on TRADOS 2007 and its language codes, so far presenting only a prototype script, a list of the codes used in a convenient format, and explaining a little about discoveries I made on the way; and one on Windows UAC and how to use it from Perl, which I backed up by publishing the module Win32::RunAsAdmin to CPAN.
If I can keep up something like this pace, I'll have fifty articles in a year. That's a lot of writing - and honestly, I have a lot of things to write about. I just wish there were more examples of writing about code for me to emulate. I'm still looking for source material.
GlobalSight
GlobalSight is a translation management system that was closed-source until 2008 (I believe). After its acquisition it was open-sourced by replacing a few dependencies with open-source equivalents, which is pretty excellent.
At any rate, this is an open-source target I'd like to put a little effort into, given my actual income structure.
At any rate, this is an open-source target I'd like to put a little effort into, given my actual income structure.
Thursday, April 17, 2014
ScraperWiki closed?
Huh. The open ScraperWiki forum structure seems to have been closed up. That's a shame. I wonder where people interested in scraping congregate now. (Well, now it's Big Data and monetized, I guess. Maybe there is no such general-interest forum now that it's getting ramified like that.)
Wednesday, April 16, 2014
Analytics
Who the gods would destroy, they first give real-time analytics. (Ha.) Because not waiting for a reliable sample is bad, bad statistics.
That said, I do want real-time reporting on incoming links and searches, and Google Analytics is abysmal on that front, as I've mentioned in the past. Now that I've moved the static content at Vivtek.com over onto Github ... well. I did that a year and a half ago, but now that I'm writing again and care about incoming interest, and given that I don't have my raw traffic logs any more because that's not something Github does, I need something better than Google stats.
The answer is a system I've noted in passing before: Piwik. It not only includes the JS bug to phone home, it also provides full reporting in a dashboard you configure on your own host. As soon as I get two minutes, I'm going to go ahead and convert Vivtek.com to Piwik, and then I can actually know what people want to read about.
That said, I do want real-time reporting on incoming links and searches, and Google Analytics is abysmal on that front, as I've mentioned in the past. Now that I've moved the static content at Vivtek.com over onto Github ... well. I did that a year and a half ago, but now that I'm writing again and care about incoming interest, and given that I don't have my raw traffic logs any more because that's not something Github does, I need something better than Google stats.
The answer is a system I've noted in passing before: Piwik. It not only includes the JS bug to phone home, it also provides full reporting in a dashboard you configure on your own host. As soon as I get two minutes, I'm going to go ahead and convert Vivtek.com to Piwik, and then I can actually know what people want to read about.
Monday, April 14, 2014
Code read through Plack
I'm studying ways to write about code, and here is a short article series about Plack.
Win32::Exe
Oh, here's a cool little thing Mark Dootson did to manipulate executable files on Windows: Win32::Exe.
Sunday, April 13, 2014
Article: TRADOS 2007 and its language codes
I wrote a technical article for the Vivtek site today for the first time since 2009. I had to rewrite the publication system for the whole site to make that work, too. Very instructive!
Anyway, it's the saga of building a useful tool for my technical translation business. It's just a prototype; eventually I'll wrap it all up into a nice module and write another article on that.
Anyway, it's the saga of building a useful tool for my technical translation business. It's just a prototype; eventually I'll wrap it all up into a nice module and write another article on that.
Saturday, April 12, 2014
Chrome extension boilerplate generator in Perl
Posted today to the PerlMonks: a Chrome extension boilerplate generator.
Thursday, April 10, 2014
Log4Perl
Wow. Log::Log4Perl implements the perfect in-code logging system for all your Perl coding needs. It is a thing of sheer beauty.
Saturday, April 5, 2014
Programming the Lego RCX and NXT
The RCX and NXT are little embeddable processors for robots control. There's a lot of RCX/NXT hacking information out there. Great RCX page here, and two languages, NBC and NQC.
Autocommitting under git
Since I use Github to serve my site, a git autocommit has to be part of my publishing process. Here are ways to do that, at StackExchange.
Directory Monitor
Here's a useful little tool for Windows automation: Directory Monitor. The same guy has a good command-line mailer, too (which can handle attachments, a real problem with command-line mailing under Windows).
Friday, April 4, 2014
Exegesis as normal publishing tool
I've been kicking around the notion of a "code exegesis" for a little while, which is the attempt to take some
software project (in the simplest case a single file) and to "back-comment" it, that is, explain the author's intent
and strategies in the development of the code, as well as possible, and also to focus on different aspects of it
in a series of separate articles (or chapters if the whole work is considered a book).
This is an exegesis as classically understood - detailed commentary on the ideas and history behind a given work, often scripture but also e.g. Homer. I call this "interpretive exegesis" to distinguish it from literate programming, which is essentially the same thing except that it independently *generates* the code, so I call it "generative exegesis".
With me so far?
All the publishing I want to do at this point is code-based. So far I had considered doing a Markdown-enabled Perl weaver that is essentially Jekyll in Perl, so I called it Heckle. It was entirely vapor, fortunately - because I'm renaming it "Exegete" instead. I'm going to use exegesis as the basis for all my publishing, because I'm going to be quoting from things all the time anyway. The same document organization tools could be used for anything, not just exegesis, but honestly, it's still a great name.
There are a couple more ideas here.
First is the realization that the same explanatory exegetical structure would be doubly appropriate for binaries, for disassembly and reverse-engineering. Here, instead of a dynamic picture like a conventional disassembly tool (which can be seen as a kind of explorer/browser), we'd explicitly be supporting the writing of articles about a given binary structure, but overall the same principles as IDA or Radare would apply: the identification of small extents that express a given set of actions and ideas.
And then there's the notion of a "derivative work" - a kind of hybrid of interpretation and generation which transforms the original into a new work with changes. This is not going to be a very normal mode for most purposes, because it's not the same as normal maintenance, which is typically done in a more evolutionary fashion. This is definitely intended for those punctuational cases like porting, or reimplementation of archeological finds from the 70's or something. A good term for this would be a "transformational exegesis".
And of course it would be perfect for patching binaries or similar reverse-engineering tasks.
So that's kind of where my thinking is at. Since all this involves the writing of text, probably extensive text, that includes references to and quotations of code objects, it's pretty much ideal for the kind of tech writing I want to do anyway.
This is an exegesis as classically understood - detailed commentary on the ideas and history behind a given work, often scripture but also e.g. Homer. I call this "interpretive exegesis" to distinguish it from literate programming, which is essentially the same thing except that it independently *generates* the code, so I call it "generative exegesis".
With me so far?
All the publishing I want to do at this point is code-based. So far I had considered doing a Markdown-enabled Perl weaver that is essentially Jekyll in Perl, so I called it Heckle. It was entirely vapor, fortunately - because I'm renaming it "Exegete" instead. I'm going to use exegesis as the basis for all my publishing, because I'm going to be quoting from things all the time anyway. The same document organization tools could be used for anything, not just exegesis, but honestly, it's still a great name.
There are a couple more ideas here.
First is the realization that the same explanatory exegetical structure would be doubly appropriate for binaries, for disassembly and reverse-engineering. Here, instead of a dynamic picture like a conventional disassembly tool (which can be seen as a kind of explorer/browser), we'd explicitly be supporting the writing of articles about a given binary structure, but overall the same principles as IDA or Radare would apply: the identification of small extents that express a given set of actions and ideas.
And then there's the notion of a "derivative work" - a kind of hybrid of interpretation and generation which transforms the original into a new work with changes. This is not going to be a very normal mode for most purposes, because it's not the same as normal maintenance, which is typically done in a more evolutionary fashion. This is definitely intended for those punctuational cases like porting, or reimplementation of archeological finds from the 70's or something. A good term for this would be a "transformational exegesis".
And of course it would be perfect for patching binaries or similar reverse-engineering tasks.
So that's kind of where my thinking is at. Since all this involves the writing of text, probably extensive text, that includes references to and quotations of code objects, it's pretty much ideal for the kind of tech writing I want to do anyway.
Wednesday, April 2, 2014
Attempto controlled English
A controlled/minimal grammar for pseudo-English that can be used for expressing specifications and so forth. Neat project, and parsable without leaving the Slow Zone.
Keybase
A little article about Keybase that I'm too tired to understand right now. I'll get back to it.
Tuesday, April 1, 2014
Bootstrapping a compiler from ... the editor
This is a fun little thing! An article from 2001 about bootstrapping a compiler for a simple language starting from just a text editor to enter hex files. This looks fun!
Source open news
Remember Source, the open news tool consortium/group/whatever? They've got a code directory. Good target.
Sunday, March 30, 2014
Tuesday, March 25, 2014
Editing binaries
Here's a cool article on the why and how of editing binaries, with a convenient link to an open-source disassembler. (About time!)
So ... the analysis of binaries is really the same thing as exegesis, just at a lower level. That makes it really tasty from my point of view, if I only had a sabbatical coming up. (I'm seriously thinking of that, by the way. I need to do some technical things and it's clearly never going to happen at the profitable workload I've got going lately. The only time I slow down is when I'm sick, and nothing technical happens then, either, for obvious reasons.)
So ... the analysis of binaries is really the same thing as exegesis, just at a lower level. That makes it really tasty from my point of view, if I only had a sabbatical coming up. (I'm seriously thinking of that, by the way. I need to do some technical things and it's clearly never going to happen at the profitable workload I've got going lately. The only time I slow down is when I'm sick, and nothing technical happens then, either, for obvious reasons.)
Monday, March 24, 2014
Duktape
One-file embeddable Ecmascript engine (that's JavaScript to you and me). This would be relatively easy to put into Perl...
Sunday, March 23, 2014
Nightingale translator
Enter a string, click the button, and this generator assembles a string of nightingale song encoding your string. Pretty!
Saturday, March 22, 2014
2048
This one's not from the archives, it's timely! 2048 [github] is the latest popular Internet game (you can tell because even Randall Munroe is playing it [xkcd 1344]. And of course, StackOverflow has discussed strategy and said strategy has been automated (a simple minimax approach).
Life moves pretty fast. If you don't stop and look around once in a while, you might miss it.
Life moves pretty fast. If you don't stop and look around once in a while, you might miss it.
Buffer not empty after all!
I found a bookmark trove from 2008-2011. A good quarter of the links are dead, which is a little worrisome, but some are still good, and quite interesting. That'll keep me off the streets for a few more days.
Wednesday, March 19, 2014
I've run out of buffer
I've been so used to having a three-month buffer of bookmarks that it's a very strange feeling to ... have caught up today. Posts will now only happen as I find actual new things; since I've fallen out of the habit of frequent scans of HNN, that means posting will probably get more scarce. On the other hand, the posts I do write will probably be longer than ten words now. Like, actual thoughts, not just bookmarks.
We'll see. I thought it was interesting to have caught up, anyway.
We'll see. I thought it was interesting to have caught up, anyway.
The 17 equations that changed the world
From Business Insider. When did they get to be such interesting journalists?
Writing a planner
Here's an article about a STRIPS-like planner that chains together different calls to utilities according to the starting situation - I'd like to do this in the general case.
Monday, March 17, 2014
Obesity system influence map
I'm not sure what this really is - a kind of mind map of the interacting concepts and behavior patterns in the mind that influence obesity, more or less. I find it fascinating both from a technical standpoint and for the fact that it's an impressive map in and of itself.
Filed under mental models and diagramming...
Filed under mental models and diagramming...
Shell explainer
This is utterly fantastic! Give it a shell command of arbitrary complexity and it will draw a nifty chart-slash-diagram with explanatory text for it. I love this.
Sunday, March 16, 2014
COM stuff
And then there is just a whole passel of stuff about COM in Perl - again.
- Get the type name of a COM object
- A truly excellent rundown of C++ COMming.
- AutoHotKey can do it, too.
- Getting to the Registry in Perl is boring.
- Registry.
- ProgIDs in the Registry - this would probably be a smaller doable module.
- Registry. Registry.
- And finally Tcl for the win with COM!
Saturday, March 15, 2014
More Windows stuff
You can also use IE.ExecWB to download stuff using IE on Windows. You can do a lot of interesting automation on Windows, it's just that today's Windows security models make it a royal pain. For the obvious reasons, of course - they're trying to slow down the botnets.
Statistics Done Wrong
What the title says. "The woefully complete guide." I'd be willing to bet it's not.
Trials and tribulations of IE::Mechanize
So yeah, Internet Explorer is a very broken and odd application, everybody knows it. In November I hurled myself into that breach, and here are some of the links I ran across in the attempt to figure out how to use Perl under Windows.
- The OLE IE API.
- Capturing IE screenshots with Perl using Imager::Screenshot, a tasty module that looks quite useful.
- Navigation between security domains silently starts a new IE instance, so OLE automation breaks. Thanks, Bill. Here are some clues about keeping that handle, which really use some stupid shit.
- And how to find the IWebBrowser object given an HWND. Again - stupid. Direct link to the sanctioned hack. And again in Perl (quite a useful link). And on StackOverflow.
- SAMIE is a non-CPAN competitor of IEMech, who knew? Also obsolete at the moment.
- A little more information about the security (integrity level) problem with IE>7. It might be solvable with this method. However, for the sake of testing I discovered a workaround with the Mark of the Web (oy): motw. (If it weren't for the fact that everything Microsoft does is this hacked...)
- Then the Monks come up with a beaut: accessing C++ COM objects from Perl. This can probably be polished up and made usable. Another take on it here, I believe.
- Counterpoint, kinda.
- More obsolete Windows manipulation.
- Here's something I bookmarked about IE security management. This might also have something to do with it. Or this might.
- The security zone for a given WebBrowser object can be downgraded, but not upgraded. I think.
- A nice way to manipulate package stashes in XS.
Analog literals for C++
Cute. This is kind of along the lines of what I'd like to do for layout in Decl.
Sprite-based games with HTML5 Canvas
Cool!
But a full list of HTML5 gaming engines is better found here, with rankings by feature and price. Scirra's Construct 2 appears to be the current winner, at a basic price of GBP 79 and going up to GBP 259.
But a full list of HTML5 gaming engines is better found here, with rankings by feature and price. Scirra's Construct 2 appears to be the current winner, at a basic price of GBP 79 and going up to GBP 259.
Tuesday, February 25, 2014
Leap.se email client
Here's another email client doing crypto, pushed by the Freedom of the Press Foundation. Probably worth supporting.
Wednesday, February 12, 2014
PStricks and TeX
Graphics in Tex can do some amazing things: here's a library that builds figures for chemical lab setups (that are beautiful!) and here is a graphical description language for embedding in TeX.
Tuesday, January 21, 2014
Musical score tools
There are, of course, a lot of different tools for musical scoring.
- Lilypond seems to be one of the biggies in terms of making pretty scores from a text definition language.
- Sibelius is one of the two major commercial software packages.
- MuseScore is an open alternative to Sibelius. These are both GUI tools for composition that also permit MIDI input, etc.
- Here's a list of six tools for notation.
Decl 2.0 syntax parser
I've been working on a lot of thoughts about the Decl reboot lately, including a ground-up rethinking of the basic way of handling syntax, and I've come to some conclusions.
Indentation is a misleading way of thinking about this. Indentation is just an indication of the two-dimensionality of text. Especially if we look at Markdown and its friends and relatives, we really have to realize that at least the block elements are there to exploit that two-dimensionality, to arrange information vertically as well as horizontally to present and shape information.
In fact, I'm getting a lot closer to just saying that Decl syntax and Markdown are sort of the same thing. And so I want to come up with a parsing language for two-dimensional text that is not a grammar built for one-dimensional sentences. Or at least is only partly a one-dimensional grammar.
Along the way, I hope to start looking at some naturally two-dimensional text items:
Indentation is a misleading way of thinking about this. Indentation is just an indication of the two-dimensionality of text. Especially if we look at Markdown and its friends and relatives, we really have to realize that at least the block elements are there to exploit that two-dimensionality, to arrange information vertically as well as horizontally to present and shape information.
In fact, I'm getting a lot closer to just saying that Decl syntax and Markdown are sort of the same thing. And so I want to come up with a parsing language for two-dimensional text that is not a grammar built for one-dimensional sentences. Or at least is only partly a one-dimensional grammar.
Along the way, I hope to start looking at some naturally two-dimensional text items:
- Diagrams
- Musical scores
- Other timing diagrams
- Workflow charts, GANNT charts, etc.
- Page layouts and screen layouts for forms, buttons, etc.
And all that could be directly supported by at least part of the parser. Using indentation and block rules, we can do a "terraced scan", as it were, identifying blocks first and then drilling into them to identify more details.
The combination of Markdown with the Decl parser and interpreter, moreover, gives me a very natural way to implement literate programming tools in a way that finally makes sense to me.
I think this is going to be very fruitful.
Friday, January 17, 2014
Binary formats: PE presentation by the guy that did that PE poster
Excellent (if quite dense) slide show about the PE format and doing surprising things with it. Here is his Google code site.
Binary data structure construction is another kind of templating, and unboiling a data file should be pretty much equivalent to unboiling text. A form of exegesis, in other words. Just a thought.
Binary data structure construction is another kind of templating, and unboiling a data file should be pretty much equivalent to unboiling text. A form of exegesis, in other words. Just a thought.
Monday, January 13, 2014
Name explorer
This has to be one of the neatest data explorers Í've ever seen: popularity of names by year in the United States. (And a fantastic HuffPost article about it.)
Sunday, January 12, 2014
Boilerplate
So OK, what about boilerplate? (Again...)
First, there are a number of modules in CPAN that produce boilerplate of various description, the first being naturally Module::Starter, which I use on a weekly basis. Its structure is surprisingly straightforward, but that's just another way of saying that it is expressing things in Perl that (in my opinion) could better be expressed in a specific boilerplate DSL.
There are others, such as Module::Starter::PBP by Damien Conway, based on "Perl Best Practices". Drupal::Module::Start. A Padre plugin (I did not know this until just now). There's Test::STDmaker, which takes Perl test output and puts it into a boilerplated document to conform to military purchasing standards. HTML::HTML5::Builder, which "erects scaffolding" for Web apps. Even WWW::Mechanize::Boilerplate, which I really would like to look at more closely.
In other words, CPAN contains a lot of knowledge about boilerplate, both Perl-specific and otherwise. (Another reason for a survey, yeah?) But what occurs to me is that I don't just want to look at boilerplate generation. I also want to explore boilerplate degeneration, as it were - extraction of higher-level information from a given text based on recognition and abstraction of boilerplate. This is just phrase-based parsing writ large, using more complex lexical entities, but I think it would be - well. It's a lot of what I expect an exegesis to be, to be honest; an abstract "understanding" of syntactic forms.
So there is actually a boilerplate extractor on CPAN, Text::Identify::Boilerplate, which, given a set of files, will do a line-by-line diff and extract the boilerplate. That's pretty slick!
But first, I propose Text::Boiler, which will take some kind of declarative boilerplate description and build that. Then Text::Unboiler, which will undo boilerplate (perhaps with overrides for changes to the boilerplate itself) and return to you the original record used to create the final files.
Ah. Right. Boilerplate + information = syntax. Boilerplate contains named fields, probably also lists and so forth, and the record contains those fields (which can also have default values if the record omits them). But the record can also override anything in the boilerplate. If the boilerplate also has named sections to make that easier, then unboiling should be pretty flexible indeed!
I think this is going to be a pretty profitable way of looking at things, especially in terms of exegesis, which is kind of a "manual unboiling".
First, there are a number of modules in CPAN that produce boilerplate of various description, the first being naturally Module::Starter, which I use on a weekly basis. Its structure is surprisingly straightforward, but that's just another way of saying that it is expressing things in Perl that (in my opinion) could better be expressed in a specific boilerplate DSL.
There are others, such as Module::Starter::PBP by Damien Conway, based on "Perl Best Practices". Drupal::Module::Start. A Padre plugin (I did not know this until just now). There's Test::STDmaker, which takes Perl test output and puts it into a boilerplated document to conform to military purchasing standards. HTML::HTML5::Builder, which "erects scaffolding" for Web apps. Even WWW::Mechanize::Boilerplate, which I really would like to look at more closely.
In other words, CPAN contains a lot of knowledge about boilerplate, both Perl-specific and otherwise. (Another reason for a survey, yeah?) But what occurs to me is that I don't just want to look at boilerplate generation. I also want to explore boilerplate degeneration, as it were - extraction of higher-level information from a given text based on recognition and abstraction of boilerplate. This is just phrase-based parsing writ large, using more complex lexical entities, but I think it would be - well. It's a lot of what I expect an exegesis to be, to be honest; an abstract "understanding" of syntactic forms.
So there is actually a boilerplate extractor on CPAN, Text::Identify::Boilerplate, which, given a set of files, will do a line-by-line diff and extract the boilerplate. That's pretty slick!
But first, I propose Text::Boiler, which will take some kind of declarative boilerplate description and build that. Then Text::Unboiler, which will undo boilerplate (perhaps with overrides for changes to the boilerplate itself) and return to you the original record used to create the final files.
Ah. Right. Boilerplate + information = syntax. Boilerplate contains named fields, probably also lists and so forth, and the record contains those fields (which can also have default values if the record omits them). But the record can also override anything in the boilerplate. If the boilerplate also has named sections to make that easier, then unboiling should be pretty flexible indeed!
I think this is going to be a pretty profitable way of looking at things, especially in terms of exegesis, which is kind of a "manual unboiling".
Afterthought on Markdown: Hoedown
The standard Apache module for Markdown is now Hoedown, which supports lots of extensions, is written in no-dependency C, and is essentially bulletproof. It also separates the parser from the renderer, which is important if you want to index various text pieces. There is a CPAN Text::Markdown::Hoedown module which compiles and passes tests on my box; we'll see how easy it is to use (not much documentation, but that can be fixed with a pull request...)
In poking around and reading the documentation for Markdent, I found a couple of interesting proposals for Markdown extensions from Dave Wheeler: definition lists and better tables with multiline content.
So what I really want is a Hoedown-based parser generator that can add extensions at will. (Maybe later...) Because honestly, what this is all about is different ways to use simple 2-dimensional, as opposed to 1-dimensional, arrangements of punctuation to delimit different items in text in as general a way as possible, build data structures based on that text, then render presentations based on those data structures. Each level of that process is interesting in and of itself.
In poking around and reading the documentation for Markdent, I found a couple of interesting proposals for Markdown extensions from Dave Wheeler: definition lists and better tables with multiline content.
So what I really want is a Hoedown-based parser generator that can add extensions at will. (Maybe later...) Because honestly, what this is all about is different ways to use simple 2-dimensional, as opposed to 1-dimensional, arrangements of punctuation to delimit different items in text in as general a way as possible, build data structures based on that text, then render presentations based on those data structures. Each level of that process is interesting in and of itself.
Thoughts on Cookie Clicker
So back in December, BoingBoing posted a list of a few browser games. I bookmarked it because my son is interested in the gaming industry (like all teenagers, I suppose, and to a certain extent like myself). Then a couple of days ago we took a look at Cookie Clicker.
It's fun. It's essentially an investment game: you click on a cookie to bake cookies. You use cookies to buy various implements that can bake cookies automatically. You use cookies to buy upgrades, which can adjust all kinds of variables in the game. Achievements translate into additional upgrades, and so on. There are a lot of fiddly details, it's all written with a great sense of humor, and the jargon makes it even more fun ("I got a click frenzy last night that put me over the top to buy an antimatter condenser and that got my cookies-per-second up above ten million." And so on, with the Grandmapocalypse and the larval stages of Santa Claus and reindeer giving you three billion cookies and kitten engineers that work for milk, etc.)
It's really fun. And there is a community of people who play pretty seriously. It's written in JavaScript, so its guts are wide open for the enterprising hacker. (There are even specific achievement badges if the game detects you cheating!) There's a community Wiki that explain some of the math, explain back story, and suggest strategy.
And there's an add-on, Cookie Monster, that lets you work at a higher level. It shows you costs in terms of seconds until you'll have that many cookies, it works out BCI and ROI on various investments, and so on. (The Github development branch is here; it looks like it might be based on Node, so it's pretty interesting in its own right.)
Which got me to thinking.
This is a perfect context in which to explore strategy. In other words, we can define strategies in terms of "if this condition holds, take this action" or "if this condition holds, prefer this bias" - and run them multiple times to compare them. We could look at specific events, draw graphs, I dunno - interesting things! But in general, this kind of approach could probably map well onto real-world business decisions (and of course the theory involved is already out there).
I started down this path once before, back when Tower Generator games in Flash roamed the earth, and wrote some interesting DSLs in Python to fire events on a schedule - but I couldn't easily work with strategy at an adaptive level because I couldn't read the numbers (open-source OCR is still a no-man's land, actually.) It would be really fulfilling to do it right now - in a JavaScript game, after all, nothing needs to be OCR'd because it's already all available in the processor state.
Then, too, analyzing the design of CC while also attempting to determine what players enjoy would be instructive. Split testing of games? Sometimes people want to play the same game again so they can improve their intuitive strategy, so split testing can't be universal. But you'd have to learn some interesting things, probably involving categories of player that could all be addressed (or simply target your games - or game play during the game - to the player type).
(You could see that as the strategy of the game writer...)
The notion of an open-ended strategy game of this sort is also attractive. All this is already based on upgrades and rule-changing things - why not make it truly open-ended with a little programming language and let people come up with them in a network or something? Just a thought.
It would also be fun to explore the idea of games within games, as CC is already exploring (he has a beta dungeon game you can access through the factories). A whole Internet of games or something, I dunno. A metagame, perhaps?
A "game toolkit" is a fun idea, as always (and see another BoingBoing feature, Playfic, a toolkit/community/hosting ... thing for Zorkoidal text adventure games). (See also the Tower Defense generator I ran into last year.)
It's fun. It's essentially an investment game: you click on a cookie to bake cookies. You use cookies to buy various implements that can bake cookies automatically. You use cookies to buy upgrades, which can adjust all kinds of variables in the game. Achievements translate into additional upgrades, and so on. There are a lot of fiddly details, it's all written with a great sense of humor, and the jargon makes it even more fun ("I got a click frenzy last night that put me over the top to buy an antimatter condenser and that got my cookies-per-second up above ten million." And so on, with the Grandmapocalypse and the larval stages of Santa Claus and reindeer giving you three billion cookies and kitten engineers that work for milk, etc.)
It's really fun. And there is a community of people who play pretty seriously. It's written in JavaScript, so its guts are wide open for the enterprising hacker. (There are even specific achievement badges if the game detects you cheating!) There's a community Wiki that explain some of the math, explain back story, and suggest strategy.
And there's an add-on, Cookie Monster, that lets you work at a higher level. It shows you costs in terms of seconds until you'll have that many cookies, it works out BCI and ROI on various investments, and so on. (The Github development branch is here; it looks like it might be based on Node, so it's pretty interesting in its own right.)
Which got me to thinking.
This is a perfect context in which to explore strategy. In other words, we can define strategies in terms of "if this condition holds, take this action" or "if this condition holds, prefer this bias" - and run them multiple times to compare them. We could look at specific events, draw graphs, I dunno - interesting things! But in general, this kind of approach could probably map well onto real-world business decisions (and of course the theory involved is already out there).
I started down this path once before, back when Tower Generator games in Flash roamed the earth, and wrote some interesting DSLs in Python to fire events on a schedule - but I couldn't easily work with strategy at an adaptive level because I couldn't read the numbers (open-source OCR is still a no-man's land, actually.) It would be really fulfilling to do it right now - in a JavaScript game, after all, nothing needs to be OCR'd because it's already all available in the processor state.
Then, too, analyzing the design of CC while also attempting to determine what players enjoy would be instructive. Split testing of games? Sometimes people want to play the same game again so they can improve their intuitive strategy, so split testing can't be universal. But you'd have to learn some interesting things, probably involving categories of player that could all be addressed (or simply target your games - or game play during the game - to the player type).
(You could see that as the strategy of the game writer...)
The notion of an open-ended strategy game of this sort is also attractive. All this is already based on upgrades and rule-changing things - why not make it truly open-ended with a little programming language and let people come up with them in a network or something? Just a thought.
It would also be fun to explore the idea of games within games, as CC is already exploring (he has a beta dungeon game you can access through the factories). A whole Internet of games or something, I dunno. A metagame, perhaps?
A "game toolkit" is a fun idea, as always (and see another BoingBoing feature, Playfic, a toolkit/community/hosting ... thing for Zorkoidal text adventure games). (See also the Tower Defense generator I ran into last year.)
Friday, January 10, 2014
Multimarkdown
Multimarkdown is sort of next-generation Markdown; the main project has transitioned out of Perl into C for performance and now addresses all kinds of output.
The output I'm interested in is just the parsed structure - the DOM, if you will. I'm torn between recycling their parsing code (which doesn't actually parse everything I want, but works and works well) and writing my own parser (which usually leads to everything in flames and me losing my hat).
There is, of course, a Text::MultiMarkdown - but it doesn't quite parse correctly and would have to be subclassed to add additional features. And there's Dave Rolsky's Markdent, which does actually provide an event-based Markdown parser but comes with an entire ecosystem of modules and doesn't appear all that easy to extend (unless you're quite familiar with the Moose paradigm).
But overall, you just can't get away from Markdown these days when writing content. So I just keep coming back to it.
The output I'm interested in is just the parsed structure - the DOM, if you will. I'm torn between recycling their parsing code (which doesn't actually parse everything I want, but works and works well) and writing my own parser (which usually leads to everything in flames and me losing my hat).
There is, of course, a Text::MultiMarkdown - but it doesn't quite parse correctly and would have to be subclassed to add additional features. And there's Dave Rolsky's Markdent, which does actually provide an event-based Markdown parser but comes with an entire ecosystem of modules and doesn't appear all that easy to extend (unless you're quite familiar with the Moose paradigm).
But overall, you just can't get away from Markdown these days when writing content. So I just keep coming back to it.
Mojolicious!
Here is some Mojolicious stuff:
- Its developer's blog.
- Another blog featuring lots of Mojo.
- Mojolicious + TwilML = Twilio serving stuff!
Wednesday, January 8, 2014
Command-line in JavaScript
So I have a convenient little library-slash-utility kit (SQP - and I can't remember what it's supposed to stand for) that I use for private tools (my invoicing, my notes, household finances, that kind of thing); it consists of a wrapper for Perl's Term::Shell and includes convenience functions for arbitrary SQL, which it knows how to format nicely, and provides a useful framework for quick-and-dirty list and modification commands.
But it's still pretty clunky, and for a while now I've toyed with the idea of putting the CLI into JavaScript and fronting the whole thing from a local HTTP server, like Mojolicious.
Yesterday, I did just that, in about four hours. Turns out Mojolicious is very easy to wrap my head around, and it's been long past time to start hacking JavaScript, so ... all is well.
I hit a number of fascinating things while poking around looking for stuff:
But it's still pretty clunky, and for a while now I've toyed with the idea of putting the CLI into JavaScript and fronting the whole thing from a local HTTP server, like Mojolicious.
Yesterday, I did just that, in about four hours. Turns out Mojolicious is very easy to wrap my head around, and it's been long past time to start hacking JavaScript, so ... all is well.
I hit a number of fascinating things while poking around looking for stuff:
- Here's how easy it is to slap together a bare-bones CLI in jQuery. Look at the demo! It's freaking cool! And that's probably the best framework for my son's planned text-based adventure game.
- Naturally I also found things about JS from the command line, like this GlueScript: a Wx/JavaScript monstrosity that has me thinking hard (it looks pretty groovy; hack up your UI in a bundlable thing and attach an embedded Perl for CPAN-y things and man, you'd be cooking with gas!)
- Another jQuery terminal - lots of features.
- I ended up going with Termlib, and not only because its author is in Vienna while I sit in Budapest; it also has no jQuery dependency and it offers AJAX command handling out of the box without my thinking hard about it. So it was a natural choice.
- Somewhere along the line I ran across PPI, a Perl DOM parser ... um, the Perl DOM parser actually.
Anyway, turns out it was dead easy to capture stdout and route it back to the browser. Where I'd really like to go with this is something a little more elaborate, though - more "textured" objects and larger text snippets returned from scripts should be placed into a return, and there should be a log.
What really kicked all this off is IPython, because Peter Norvig used it to analyze XKCD's recent regexp meta-golf (now you have infinite problems, hee) - for the exact same reason I wanted to - to play actual regexp meta-golf. IPython lets you work along interactively with data structures and hit code against them, but in a way I personally don't like. I want little editors to pop up and still keep that history and be able to switch to a notebook editor while I'm still working. All that jazz. And if I do that in Wx (say) it will take me another twenty years - but in the browser I can do it in a couple of weeks.
So I'm gonna.
SQP is going to be my quasi-REPL, except I hate REPLs. I want a rich REPL, so that's what I'm going to write. But of course it will also default back down to the command line if you're not in browser mode. (That actually also exists, in e.g. Perl::Shell and Shell::Perl and surely many others.) Well, we'll see what I actually end up doing. The basic idea is the same, though. I want an interactive environment kind of like Mathematica, where I can see things in a nicely formatted way, put things into files, build files using templates and literate programming techniques, make schematics and diagrams with live links to items - who knows? All that and more.
Thursday, January 2, 2014
Sleuthing back doors in routers
This stuff is so cool. Guy in Holland finds a Linksys back door (blow-by-blow courtesy HNN) and figures marvy stuff out using magic. Here's another one for another situation. I freaking love this stuff.
A retrospective on Bazaar
This kind of history is pretty fascinating. I never really got into Bazaar - I was one of the ones that hit git first when people started to phase out CVS. But it was Github that really made the difference to me.
Subscribe to:
Posts (Atom)