Monday, March 19, 2012
Devel::Declare
Sunday, March 18, 2012
Porter stemming algorithm
NLP class assignment 1
Saturday, March 17, 2012
A little Word scripting
Friday, March 16, 2012
Caltech ML
Plucene
NLP class
Monday, March 12, 2012
Neat MakeMaker feature
Saturday, March 10, 2012
Task: write a new Perl interface to ImageMagick
Friday, March 9, 2012
CPAN is big
Thursday, March 8, 2012
Archive::Tar
OK, so CPAN HTTP client survey first
Tuesday, March 6, 2012
First part of my CPAN Web API client survey article
A Survey of Web API client code on CPAN
Why a survey? And how do you start?
For the past few years, I've organized most of my thinking on Blogger – I first got into it while keeping various friends posted on my efforts with house renovation, and it just kind of stuck. Now I tend to start a new blog for every project I undertake. At some point (actually, on December 17, 2011) I had the bright idea that I should be able to do my task management right in Blogger as well, perhaps by the simple expedient of typing a title like "Task: do XXX" right into a blog post.
Earlier that day, I had realized that Blogger has an API, and suddenly, it was obvious how to proceed with this plan. I needed to write a Web API client to build my task indexer.
But like nearly everything I do, I was beset by the sudden fear that I might do it wrong. Maybe I'd be making assumptions I'd regret. Maybe other people were doing it better. (Note to self: this is why you never get anything done.)
I've got very little time to work on side projects – two teenagers, a full-time freelance translation business, and the aforementioned house renovation project make sure of that – so essentially everything technical is on the back burner, and so this one stayed as well, while I chewed on my fear. Occasionally in an off-moment I'd hit CPAN and look for modules that implemented other API clients, and I'd wonder what sorts of functionality might be nice in a more general Web API client support module. Finally, I just started scanning down the list of modules a search returned for "RESTful API", with the vague idea of doing a more or less comprehensive survey. Then I saw the WebService namespace and realized it contains over thirteen hundred modules. Good God. Not something I could actually survey in any meaningful way.
Clearly I needed to search CPAN in a more specifically useful manner. And just as clearly, I needed to do that locally. Which led me to CPAN::Mini. Randall Schwarz wrote this in 2002 when a colleague asked him for a CD with CPAN burned on it and he realized that the size of "the CPAN" (when did we drop the "the"? Or is it just me?) was far too large, but a "mini-CPAN" with just the latest version of each module would be 200 MB and easily fit on a CD.
As of this writing, of course, even a mini-CPAN won't fit on a CD, being 1.84 GB in over 30,000 files. But I downloaded it anyway. I have a CPAN.
What I'm going to do first is just to find all the dependencies on LWP, WWW::Curl, Net::Curl, HTTP::Client, HTTP::Client::Parallel, HTTP::Tiny, and HTTP::Lite. If I run across any other basic HTTP clients, I'll include them in the seed list as well.
No, wait, I guess what I'm going to do first is to try to come up with a more or less complete list of HTTP clients on CPAN, while whistling past the infinite-regress graveyard. (Note: this is a TODO in the article.)
Anyway, the modules we find that way will break down into three categories: (1) modules that implement an API client, (2) support modules that provide an API client framework, and (3) modules that just retrieve HTTP for other purposes, which we'll ignore. Then I'll repeat the step for the modules found in (2) to find indirect dependencies. Obviously, the tool I want is something that can take an input module name and return a list of all modules that depend on it, so I'll do that in the next section.
It might be instructive to get a list of all the URLs used in these APIs. But my ultimate goal here is to see how people are doing things, and see how many of these implementations might be useful in coming up with best Perl practices for writing a Web API client.
Monday, March 5, 2012
New project: Toonchecker.com
- Perl walker to scan a list of Web comic sites for each user. (Obviously the sites are shared.) This spider checks for update on, say, an hourly basis. If the site has a feed, I'll use that. If the site pushes an email notification, I'll use that. One way or another, though, I'll figure out what changes and when.
- For each list of toons, then, we can present a list of updates since the user last checked in and read. That list will show ads, but only that list will show ads. My ads will never appear on the screen at the same time as any comic. That's pretty thin monetization, but it will have to do.
- The reader consists of a very thin frame at the top with forward and back buttons and a title. No ads on the frame. No ads on the frame. No ads on the frame. The bottom frame is then the entire target URL, with the cartoonist's own ads.
- A comic counts as read when you've gone to the next page (in case you get called away, lose your connection, whatever). So we have a bookmark for each and every comic we read.
- With multiple users, we'll be able to start forming a similarity metric for recommendations.
WebService:: namespace
API modules
- http://search.cpan.org/~mpgutta/WebService-Soundcloud/
- http://search.cpan.org/~cvicente/Netdot-Client-REST-1.02/lib/Netdot/Client/REST.pm
- http://search.cpan.org/~sschneid/REST-Google-Apps-Provisioning-1.1.9/lib/REST/Google/Apps/Provisioning.pod
- http://search.cpan.org/~sschneid/REST-Google-Apps-EmailSettings-1.1.6/lib/REST/Google/Apps/EmailSettings.pod
- http://search.cpan.org/~tokuhirom/Cache-KyotoTycoon-REST-0.03/lib/Cache/KyotoTycoon/REST.pm
- http://search.cpan.org/~drtech/ElasticSearch-0.51/lib/ElasticSearch.pm
- http://search.cpan.org/~imalpass/WebService-Etsy-0.7/lib/WebService/Etsy.pm
- http://search.cpan.org/~manwar/Filter-DisposableEmail-0.02/lib/Filter/DisposableEmail.pm
- http://search.cpan.org/~bklaas/Blitz-0.01/lib/Blitz/API.pm
- http://search.cpan.org/~cvega/WWW-MediaTemple-0.02/lib/WWW/MediaTemple.pm
- http://search.cpan.org/~cvega/WWW-RottenTomatoes-0.03/lib/WWW/RottenTomatoes.pm
- http://search.cpan.org/~manwar/IP-Info-0.05/lib/IP/Info.pm
- http://search.cpan.org/~manwar/WWW-MovieReviews-NYT-0.04/lib/WWW/MovieReviews/NYT.pm
- http://search.cpan.org/~gbudd/IPsonar-0.23/lib/IPsonar.pm
- http://search.cpan.org/~mndrix/RDF-Sesame-0.17/lib/RDF/Sesame.pm
- http://search.cpan.org/~bricas/SRU-0.99/lib/SRU.pm
- http://search.cpan.org/~bkaney/Bio-Cellucidate-0.03/lib/Bio/Cellucidate.pm
- http://search.cpan.org/~doggy/Net-UpYun-0.001/lib/Net/UpYun.pm
- http://search.cpan.org/~lyokato/Net-OpenSocial-Client-0.01_05/lib/Net/OpenSocial/Client.pm
- http://search.cpan.org/~jwied/BZ-Client-1.04/lib/BZ/Client.pm (very complex)
- http://search.cpan.org/~shiriru/WebService-GData-0.0501/lib/WebService/GData/YouTube/Doc/GeneralOverview.pod
- http://search.cpan.org/~nheinric/WebService-MyGengo-0.012/lib/WebService/MyGengo/Client.pm
- http://search.cpan.org/~symkat/WebService-CloudFlare-Host-000100/lib/WebService/CloudFlare/Host.pm
- http://search.cpan.org/~rplatel/Net-OpenSRS-OMA-0.02/lib/Net/OpenSRS/OMA.pm
- http://search.cpan.org/~lkundrak/WWW-GoodData-1.6/lib/WWW/GoodData.pm
- http://search.cpan.org/~oalders/Net-FreshBooks-API-0.23/lib/Net/FreshBooks/API/Client.pm
- http://search.cpan.org/~franckc/Net-Backtype-0.03/lib/Net/Backtype.pm
- http://search.cpan.org/~cjm/WebService-NFSN-1.02/lib/WebService/NFSN.pm
- http://search.cpan.org/~pjobson/WWW-TheMovieDB-Search-0.03/lib/WWW/TheMovieDB/Search.pm
- http://search.cpan.org/~mramberg/WebService-PutIo-0.3/lib/WebService/PutIo.pm
- http://search.cpan.org/~lukec/Net-Stripe-0.06/lib/Net/Stripe.pm
- http://search.cpan.org/~miyagawa/XML-Atom-0.41/lib/XML/Atom/Client.pm
- http://search.cpan.org/~franckc/Net-Backtype-0.03/lib/Net/Backtweet.pm
- http://search.cpan.org/~symkat/WebService-VaultPress-Partner-0.05/lib/WebService/VaultPress/Partner.pm
- http://search.cpan.org/~dpmeyer/WWW-Instapaper-Client-0.901/lib/WWW/Instapaper/Client.pm
API support modules
More on the CPAN API survey
- Individual specific APIs and
- API support modules.
- Find (what I believe to be) a complete list of all web API modules on CPAN, with authors and place in the nomenclature. List any support modules they use.
- Find any support modules that seem likely that aren't in use by existing APIs on CPAN.
- Provide an initial statistical analysis of some sort.
- Compare code and techniques between all these modules.
- Derive a descriptive language for the client side of an API and a mapping between this language and the modules in existence. Or something. Mostly I just want to do the comparison.