Well, it turns out that that sentence was incorporated into a whole series of spammy landing pages inserted all over the web, pointing back to e-loan.expert.com via Javascript redirect. This has been a couple of weeks ago, so many of these are getting rolled back up and fixed, but ... it would be absolutely fascinating to make it a statistical project.
I know, I know, I'm a sucker for Web spidering, too. Sigh.
This is how it would work:
- Seed it with one or more of the target sentences.
- Google a sentence.
- Try to find its actual origin.
- Store all the other URLs and text; break the text up into sentences.
- Spread from there.
- Index all the Javascript, in case techniques varied.
- Try to categorize the type of host (different breaking techniques were probably in use).
- Track down everybody making this possible, and fix each and every one of them. Oy.
Wouldn't that be cool?
Seed sentences:
- Toonbots forum, being able to avoid spammers and trolls and whatnot.
- The world is awash in fast money, he said, and it is changing the structure of capital markets.
- When you are pre-approved by spruce mortgage, you will have access to hundreds of loan programs.
- Many sellers would rather have a monthly check than a lump sum settlement when they sell.
Example page: http://cas.ncat.edu/Departments/dance/js/dojo/pbs/ohioautotax.html - hasn't been cleaned up yet! Note that it's inside a Javascript directory for something. This is the kind of thing it would be cool to track down.
Update 11/19/11: That particular link has been cleaned up, but the network as a whole is still there and still forwarding to the same ... actually, just a very similar site. Of course, there may have been multiple sites all along. So this project is still waiting to be done.
No comments:
Post a Comment