Monday, October 31, 2011
Less talking, more doing
Sunday, October 30, 2011
IPEDS
Saturday, October 29, 2011
Unbounce: component and target
Fast test for startup ideas
Shakespeare, the programming language
Thursday, October 27, 2011
GUI vs. CLI
The not-so-secret capitalist cabal that owns us all
Wednesday, October 26, 2011
Language: Elephant
Math: sympy
- Handwriting recognition on a tablet PC to be translated into OpenMath and thence TeX.
- Selection of portions of a large mathematical formula and specification of specific operations to be carried out (e.g. "solve for this" or "call this theta" or what have you, said operations to be discovered by observation of my private theoretical physicist)
- Maintenance of a log of the trajectory through formula space
- n-fold productivity increases for theoretical physicists
- Public perception of my private theoretical physicist as highly productive physics genius
- Live on p.t.p.'s CERN salary while enjoying Geneva
Tuesday, October 25, 2011
Nice interactive graphic
Analysis of Steve Jobs tribute messages
JavaScript roundup
- So you want to write JavaScript for a living. Interesting list of some of the things one should know about JS.
- Badass JavaScript, a blog.
Tangle: a JS library for reactive documents
Monday, October 24, 2011
Some more open source projects
- Qt has officially been spun off by Nokia. Along the same lines would of course be Tk and Wx, and I suppose native W32 by direct DLL access. All these share a lot of concepts that should be organized in parallel, and ultimately a feature in one should always migrate into the others so we're all working with the same set of concepts. They do eventually anyway, so it's kind of an obvious step to formalize that path.
- MediaWiki is, of course, in PHP, and always has bugs outstanding. Hone the semantic understanding tools on that. Same goes for Drupal and WordPress, of course.
- Which brings us to open-access science. This guy, a chemist at Cambridge, appears to be doing some actual data mining of open-access journals. I need to look a little closer at that. And remember: closed source kills.
- And then there's WikiData.
Sunday, October 23, 2011
Decl striving mightily to hit CPAN
Thursday, October 20, 2011
Decl doesn't actually hit CPAN
Decl hits CPAN
Google AI challenge
Graphics by Kevin Karsch
Tuesday, October 18, 2011
Oh, what a tangled web we weave
Monday, October 17, 2011
Sunday, October 16, 2011
NLP
- NLTK has a book. It might be a reasonable place to start, just working through that. And there are online courses available.
- I actually got a lot of useful information from Wikipedia, starting with UIMA, a Unified Information Management Architecture.
- GATE comes up a lot. It's Java-based.
- Apache OpenNLP is out there. Java.
- Book: Handbook of Natural Language Processing
- Oh, and Amazon recommendations come up with Syntax-Based Collocation Extraction
- Looking for the individual chapters of HNLP seems fruitful: Bing Liu has a whole page on opinion mining and sentiment analysis and even links to a PDF of his chapter of the book (I wonder if the entire book couldn't be reassembled in that manner)
- Liu has his own book on Web data mining.
Hyde
A possible approach
Windows PE format in painstaking detail
Saturday, October 15, 2011
Stanford's NLP class
Data journalism
- Be mercenary: do what works. But do it.
- Shave yaks as needed: take the time to learn details when you need them.
- Develop sources
- Become the resident expert
- Be the data project you want to see on the Web
Friday, October 14, 2011
Target application: web automation
Description of Djuggler Enterprise
Data Juggler automates repetitive Web & data tasks without programming code. Use it to create sophisticated scripts for collecting data from the Web, filling Web forms, transforming text files, XML, CSV and database data. The easy-to-use drag-and-drop interface creates scripts that can be deployed as stand-alone Windows executables. Typical application examples:
- Extract competitor's price list from Web pages regularly.
- Extract people data from a Web pages.
- Download Web images op a regular basis.
- Get search results from multiple search engines.
- Automated Web testing and load testing.
- Export data to Web based applications using fill Web forms.
- Automate web based workflow processes like timesheets.
- Search & replace actions to clean data.
- Transform data from one format to another.
- Convert data from legacy applications to industry standards.
- Automate database migration with Business Intelligence.
- Comparing data and create reports.
- Send emails with personalized attachments.
- Server monitoring and reporting.
- Synchronize folders, databases, etc.
- Automate file management & data backup.
Automate IT operations by deploying stand-alone Djuggler scripts. The powerful script designer has many actions and functions like loops, 'if then else' conditions, get text between from html, get html table, get pictures, strip HTML, web macro's, read and save Excel, support for popular databases and many more. Demo's are included in the setup. Visit www.djuggler.com for the script repository and script service. A Djuggler Personal edition is available as freeware.
Keywords: Web data collection, Application Integration, Data Aggregation, Data Transformation, Report Generation, Batch Processing, Business Intelligence, System Monitoring, Form Filling, Web Scripting, Data Extraction, Web Testing.
Postmark spam filter has an API - Despammed should, too
- SpamAssassin
- Procmail
- Green and redlighting of known-good, known-bad actors on a per-account basis
- CRM114
- Bayesian training
- Tracking of spamvertised URLs
- Both forwarding and Webmail access
- Arbitrary forwarding (including taking Web API action or Twilio phone action) based on rules, including rules that can be expressed in arbitrary JavaScript
- Spam discussion with specific examples and other community action
- Blogging about spam topics, including botnet identification and such
- Uniform treatment of both email and Web spam
- and yeah, an API...
CSS tricks
Tuesday, October 11, 2011
CRM114
Sunday, October 9, 2011
CmdrTaco: not dead - scaling
Puppet vs. Chef
Saturday, October 8, 2011
Concurrent Constraint Programming in Oz for Natural Language Processing
XSB Prolog
HNN: what data structure does the brain use?
TermL: another specification for expressing symbolic trees
OMeta: pattern-matching language
One-liner music
Linear regression and linear algebra
- Linear regression in financial analysis [investopedia] - this is magic to a lot of people.
- Linear algebra is nearly universally based on BLAS: the Fortran-written Basic Linear Algebra Subprograms.
- Here's a textbook on elementary linear algebra.
- ATLAS is a library for linear algebra built on top of BLAS.
- And in general this all leads into numerical linear algebra.
Monday, October 3, 2011
E-discovery
Saturday, October 1, 2011
OpenMath
- XML, binary, and Declarative versions of representation
- LaTeX output
- Octave output and manipulation and parsing back in
- Some kind of overarching systems description a la "semantic Excel"
- Some kind of graphical presentation as active areas a la Equation Editor (but better)
Visual Modeling and Programming with Graph Transformations
Math
Gamification
Notificon
Spambot combat
- Timestamp: don't allow a long period between reading and posting. (I had mixed success with this way back when.)
- Hash: check the IP, timestamp, post # - prevents playback attacks.
- Randomized field names.
- Honeypot fields: invisible (not hidden) fields that, if filled in, are a spam indicator.