Web 2.0 Summit: Google, Wikipedia's Tips On Thwarting Spam Abuse

Panelists suggest using spam filtering services, "no follow" links, CAPTCHAs, patching religiously, and possibly even charging money to deter spammer registrations.

Thomas Claburn, Editor at Large, Enterprise Mobility

November 5, 2008

4 Min Read
Dark Reading logo in a gray background | Dark Reading

In Hollywood, you know you're a success when you get stalked. In Silicon Valley, you know your Web company is a success when your site gets spammed.

At the Web 2.0 Summit panel "Defending Web 2.0 From Virtual Blight" on Wednesday morning, representatives from Google, Reddit, Pramana, and Wikipedia discussed strategies and tactics for dealing with the spam and abuse that follows online success.

"If you provide a service where people get links, you will be hit," said Matt Cutts, head of Google's anti-spam team.

Cutts identified three major forms of Web spam: spam links, which spammers use to promote their sites; parasitic landing pages, where legitimate sites that allow users to upload content get used to host malicious code or spam links; and hacking, whereby hackers search for older versions of content management applications and hack them to host malware or spam.

Cutts recommended the standard arsenal of defenses: spam filtering services like Akismet, using "no follow" links to reduce the incentive to spam, deploying CAPTCHAs, patching religiously, adding some sort of karma system, and possibly even charging money to deter spammer registrations.

He also encouraged people to think outside the box and come up with unexpected barriers for spammers. "If you can frustrate a spammer, that's worth just as much as charging them money," he said.

Steve Huffman, founder of Reddit, discussed one such tactic that his site employs. "We've gotten really far by misdirecting spammers and cheaters," he said.

Reddit, he said, is "like Digg, but better." Reddit, like Digg, allows users to submit links to news stories and other sites and to vote on the submission. Getting lots of votes makes posts more visible to the community.

But on Reddit, there's a difference: "Some votes just don't count," Huffman explained.

Reddit, once its systems detect suspicious behavior, may ban accounts without any notice to the account holder. Submissions from banned accounts appear to be posting normally, but those posts are not visible to the rest of the community.

"We try to do a lot of tricks to make spammers think they're winning," he said. Jonathan Hochman, an administrator with Wikipedia, discussed some of the strategies used to police Wikipedia, which is under constant siege from all manner of spammers and vandals.

"It really takes a lot of work to keep all this clean," he said. Between 20% and 30% of edits are vandalism or vandalism repair, he added.

Wikipedia, Hochman explained, is known for being an online encyclopedia that anyone can edit. What's less well known is that the site relies on computer-driven editing. "There actually are bots that are allowed to edit," he said.

ClueBot, for example, gets credit for almost 800,000 contributions to Wikipedia. ClueBot does a lot of article reversion, rolling vandalized articles back to a point back before the defacement.

Wikipedia's bots help spot copyright violations and they even call for help on IRC if needed. They're trained to avoid fights with people, so in cases where someone repeatedly re-vandalizes a page, they will ask for human intervention.

Sanjay Sehgal, founder & CEO of Praman, made the case for his company's HumanPresent technology, which the startup has deployed to prevent abuse in an unnamed, recently launched, massively multiplayer game. CAPTCHAs, he said, are too easy to defeat.

Pramana's technology attempts to differentiate between bot and human activity -- too many bots can ruin a game -- but it's about more than stopping spam, he said. It helps improve the user experience and provides traffic pattern data, which can lead to better ad revenue and lower bandwidth costs. When deployed for the unnamed online game, he said, the game publisher was able to determine that between 12% and 15% of its traffic came from bots.

Panel moderator Jonah Stein, an SEO consultant, observed that advertisers can play a role in reducing online blight too. For example, he pointed out that online ad network Right Media allows its advertisers to opt-out of deceptive advertising. He didn't provide any information about how many advertisers choose to do so.

About the Author

Thomas Claburn

Editor at Large, Enterprise Mobility

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful master's degree in film production. He wrote the original treatment for 3DO's Killing Time, a short story that appeared in On Spec, and the screenplay for an independent film called The Hanged Man, which he would later direct. He's the author of a science fiction novel, Reflecting Fires, and a sadly neglected blog, Lot 49. His iPhone game, Blocfall, is available through the iTunes App Store. His wife is a talented jazz singer; he does not sing, which is for the best.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights