Monday, October 31, 2011
Sunday, October 30, 2011
Saturday, October 29, 2011
Thursday, October 27, 2011
Wednesday, October 26, 2011
- Handwriting recognition on a tablet PC to be translated into OpenMath and thence TeX.
- Selection of portions of a large mathematical formula and specification of specific operations to be carried out (e.g. "solve for this" or "call this theta" or what have you, said operations to be discovered by observation of my private theoretical physicist)
- Maintenance of a log of the trajectory through formula space
- n-fold productivity increases for theoretical physicists
- Public perception of my private theoretical physicist as highly productive physics genius
- Live on p.t.p.'s CERN salary while enjoying Geneva
Tuesday, October 25, 2011
Monday, October 24, 2011
- Qt has officially been spun off by Nokia. Along the same lines would of course be Tk and Wx, and I suppose native W32 by direct DLL access. All these share a lot of concepts that should be organized in parallel, and ultimately a feature in one should always migrate into the others so we're all working with the same set of concepts. They do eventually anyway, so it's kind of an obvious step to formalize that path.
- MediaWiki is, of course, in PHP, and always has bugs outstanding. Hone the semantic understanding tools on that. Same goes for Drupal and WordPress, of course.
- Which brings us to open-access science. This guy, a chemist at Cambridge, appears to be doing some actual data mining of open-access journals. I need to look a little closer at that. And remember: closed source kills.
- And then there's WikiData.
Sunday, October 23, 2011
Thursday, October 20, 2011
Tuesday, October 18, 2011
Monday, October 17, 2011
Sunday, October 16, 2011
- NLTK has a book. It might be a reasonable place to start, just working through that. And there are online courses available.
- I actually got a lot of useful information from Wikipedia, starting with UIMA, a Unified Information Management Architecture.
- GATE comes up a lot. It's Java-based.
- Apache OpenNLP is out there. Java.
- Book: Handbook of Natural Language Processing
- Oh, and Amazon recommendations come up with Syntax-Based Collocation Extraction
- Looking for the individual chapters of HNLP seems fruitful: Bing Liu has a whole page on opinion mining and sentiment analysis and even links to a PDF of his chapter of the book (I wonder if the entire book couldn't be reassembled in that manner)
- Liu has his own book on Web data mining.
Saturday, October 15, 2011
- Be mercenary: do what works. But do it.
- Shave yaks as needed: take the time to learn details when you need them.
- Develop sources
- Become the resident expert
- Be the data project you want to see on the Web
Friday, October 14, 2011
Description of Djuggler Enterprise
Data Juggler automates repetitive Web & data tasks without programming code. Use it to create sophisticated scripts for collecting data from the Web, filling Web forms, transforming text files, XML, CSV and database data. The easy-to-use drag-and-drop interface creates scripts that can be deployed as stand-alone Windows executables. Typical application examples:
- Extract competitor's price list from Web pages regularly.
- Extract people data from a Web pages.
- Download Web images op a regular basis.
- Get search results from multiple search engines.
- Automated Web testing and load testing.
- Export data to Web based applications using fill Web forms.
- Automate web based workflow processes like timesheets.
- Search & replace actions to clean data.
- Transform data from one format to another.
- Convert data from legacy applications to industry standards.
- Automate database migration with Business Intelligence.
- Comparing data and create reports.
- Send emails with personalized attachments.
- Server monitoring and reporting.
- Synchronize folders, databases, etc.
- Automate file management & data backup.
Automate IT operations by deploying stand-alone Djuggler scripts. The powerful script designer has many actions and functions like loops, 'if then else' conditions, get text between from html, get html table, get pictures, strip HTML, web macro's, read and save Excel, support for popular databases and many more. Demo's are included in the setup. Visit www.djuggler.com for the script repository and script service. A Djuggler Personal edition is available as freeware.
Keywords: Web data collection, Application Integration, Data Aggregation, Data Transformation, Report Generation, Batch Processing, Business Intelligence, System Monitoring, Form Filling, Web Scripting, Data Extraction, Web Testing.
- Green and redlighting of known-good, known-bad actors on a per-account basis
- Bayesian training
- Tracking of spamvertised URLs
- Both forwarding and Webmail access
- Spam discussion with specific examples and other community action
- Blogging about spam topics, including botnet identification and such
- Uniform treatment of both email and Web spam
- and yeah, an API...
Tuesday, October 11, 2011
Sunday, October 9, 2011
Saturday, October 8, 2011
- Linear regression in financial analysis [investopedia] - this is magic to a lot of people.
- Linear algebra is nearly universally based on BLAS: the Fortran-written Basic Linear Algebra Subprograms.
- Here's a textbook on elementary linear algebra.
- ATLAS is a library for linear algebra built on top of BLAS.
- And in general this all leads into numerical linear algebra.
Monday, October 3, 2011
Saturday, October 1, 2011
- XML, binary, and Declarative versions of representation
- LaTeX output
- Octave output and manipulation and parsing back in
- Some kind of overarching systems description a la "semantic Excel"
- Some kind of graphical presentation as active areas a la Equation Editor (but better)
- Economic analyses of popular games
- Simulations of popular games
- Genetic algorithm to devise new ones. Hee.
- Timestamp: don't allow a long period between reading and posting. (I had mixed success with this way back when.)
- Hash: check the IP, timestamp, post # - prevents playback attacks.
- Randomized field names.
- Honeypot fields: invisible (not hidden) fields that, if filled in, are a spam indicator.