Die, Robot
If you're going to play with bots, best to know defense and offense
2:00 PM -- I was recently asked to find a robot, one that looks and acts like a human. It doesn't rip the site quickly, it doesn't have robot headers, it doesn't respect or even look at robots.txt, yet it was costing this company hundreds of thousands of dollars in lost revenue due to content theft. My job was to seek and destroy it.
Fortunately in this case, it was trivial to find the robot due to non-rotating IP addresses, and the fact that the software they were using was off the shelf and I could test against it. But sometimes it's just not that easy.
I spent the last several years building strategies against robots -- traps that would help detect them. But recently I happened across a tool that actually worried me. It's called iOpus iMacros for Firefox. Essentially the tool is no more complex than a VCR for your browser. It records and plays back whatever you asked it to. It not only looks and acts like your browser, it is your browser. It downloads images and full-motion video, and renders JavaScript, all in the same way your browser does.
This could easily be used to re-run a set automated scanner to crawl a site, if you are clever enough. Take for instance, the GreaseMonkey plugin. Using Greasemonkey, you can put any JavaScript on any page you want in Firefox. The point is to reduce the annoyances of the Web, streamline sites, or otherwise add features and functionality without requiring site developers to do anything.
This also allows quick and dirty hooks to be built into your browser that your macros can use. By writing your spider in JavaScript and having your iOpus plugin move your mouse cursor around, click back and forth, and perform all sorts of other actions that a user would, the attacker can simulate real traffic, while logging what it sees using TCPdump, Wireshark, or some other packet sniffing software. Auto-rotating IP addresses is trivial if you know what you're doing or use something like Tor onion routing.
There are only a few defenses to a spider like that:
Monitor changes to the document object model
Modify the page and look for erroneous clicks using something like Click Density
Request user input based on visual changes to the page (like CAPTCHAs or other turing tests).
All of these things can slow down the user and can make programming the page much more complex. The age of the robots is here, and they are looking more and more like us every day.
— RSnake is a red-blooded lumberjack whose rants can also be found at Ha.ckers and F*the.net. Special to Dark Reading
About the Author
You May Also Like