Lately I got really bothered by Spam in my various forums, blogs and emails.
It regularly costs me hours of anti-Spam protection setup, filtering false positives, reconfiguring my Captchas, and cleaning my databases.

So I decided to better understand this phenomenon, why it still exists, how does it work, how much it represents, and how to get rid of it efficiently.

Why does Spam exist?
How does it work?
How does it translate into figures?
How to fight against spam?

Why does Spam exist?

Basically, Spam aims at making people buy things. Simple: this is just a selling strategy.

There is however a big difference between Spam and other selling strategies: Spam achieves its goal by blindly reaching billions of people repeatedly, using unsolicited media.

A selling strategy con only be viable if its running costs are negligible compared to the sales it gains – in other terms: the Return-On-Investment has to be positive. Therefore, reaching billions of people using phone calls or regular mail would cost too much (both in time and money).

That’s why Spam uses different techniques for that: direct email, links posted in forums and blogs, search engines, social networks, Google Ads. Those means are really low costs and can reach billions of people very easily.

How does it work?

First, Spammers usually work for a website selling products (drugs, boots, bags, books, miracle pills…). Then the goal is to direct people to this website.

Several means:

Direct email

Automated bots scrape websites, forums, blogs to harvest email addresses. Another way is to infiltrate people’s address books – either in their social networks accounts or in their own computer – to get more email addresses. This is usually done with computer viruses (trojans…), phishing emails mimicking social networks, bugs exploits on softwares.

Once millions of email addresses have been harvested, Spam email is sent directly to them, including links to the selling website – this is the Spam emails we all get in our Spam folders. This is also done using bot-nets, in order to not be easily traceable. Bot-nets are usually made of computers that have been infected by viruses, and send emails (zombie style) without their owner being aware.

Links posted in forums, blogs and social networks

During web crawling, automated bots (and sometimes human ones) try to post messages containing links to the selling website. Targeted platforms are very often forums, blogs and social networks.

If anonymous posting is disabled, they try to create a user account.

If there is a Captcha protection, they try to pass it. And usually they succeed, at least for the most common Captchas (reCaptcha, phpBB default Captchas…).

Then they try to fool moderators by first waiting for a few days before posting, then posting messages without links to gain their trust, and then posting messages with links to the selling website. As a moderator, when I first got Spam messages without any link (and sometimes just containing random characters), I laughed, thinking “Spam, you’re doing it wrong”, but then I understood their strategy.

Search engines

A great way to direct people to the selling website is make it highly ranked on search engines. This is done by creating a network of false blogs (also called Splogs) that reference each others and sometimes reference the selling website. As search engines value blogs and sites that are referenced by others, Spammers manage to increase their selling website and splogs’ page rank using this technique. This also leads to posting links to those splogs in forums and blogs messages all over the web.

Another use of this technique is through blogs track-backs, also called Spings.

Social networks

False Twitter or MySpace accounts trying to attract the most followers are also part of this strategy. The more those accounts link to other accounts, the better they will be ranked – in search engines and in the social network itself. Then Spammers post links to selling websites or splogs from those accounts, and it is spread automatically to plenty of people’s walls.

Google Ads

This one is even more insidious. The Spammer buys a Google Adword that is quite irrelevant (like a random suite of characters) for its selling website. Then Spam is posted including the Google Adword, with no link needed (using comments on blogs, forums, social networks, direct emails).

People might not understand the purpose of this Spam including no link, but if by any chance a Google Ad is present on the page displaying the Spam message, there will be a very high probability for the Google Ad to display a link to the Spammer’s selling website. For example using online email clients (GMail, Yahoo, Hotmail…), or any website having some ads displayed somewhere along with its members’ comments.

How does it translate into figures?

Most of the Spam messages are rather stupidly written, some even use gibberish, and most Spam filters are able to catch them.
So why is Spam still so active ?

I stumbled upon a study made in 2008 by researchers from Berkeley that found something very interesting:

They set up a false selling website for drugs, with items sold around $100 each.
They used a bot-net containing 75800 infected computers: they had a great firing power to mass-send emails and post messages. This bot-net is rented for $80 to send 1 million e-mails.
They sent 350 million emails containing direct links to their selling website, during 26 days. The cost of this sending is $28000.
They got 28 people that effectively bought from the selling website: representing only 0,000008 % of the Spam sent, and making them earn $2800 in 26 days.

At this point, there is a problem: using the bot-net costs $28000 and they earned $2800. They are clearly in the red: Easy Spamming is not a lucrative activity if you have to rent a bot-net.

So what happens if you are a Spammer that owns a bot-net?

Things are different: from the same study, the researchers rented just 1,5% of the bot-net’s power. This means that they could have sent much more Spam, leading to much more sales at the end if they could use the bot-net completely.

Here the study evaluates the owner’s earning up to $7000 a day, and more than $2 million a year.

Now it is easily understandable that Spam business is not for the little fishes, but is profitable for a few having the infrastructure capable of Spamming in large quantities for nearly no costs.
The fact that Spam usually targets the same industry (pharmacy online drugs and pills) also confirms this explanation.

How to fight against spam?

Several means:

Spam filters: Those are tricky, as depending on the filter you use, it can capture lots of false positive. Email clients usually have embedded Spam filters – that’s why you have a Spam folder there. For forums and blogs, I tested Akismet and I am very happy with it so far.
Captchas: I learned that common Captchas are becoming more and more useless (reCaptcha, default blogs and forums Captchas…). Bots manage to get pass them, and they annoy most of your users. However, having a customized Captcha is quite efficient.
My favorite is using Q/A, where I write the questions and answers. Try not asking simple questions, as Bots manage to find the answers on Google. Questions like “Type the 5 first letters of the capital of England” are better than “How much is 4+5 ?”.

You will also have to change your customized Captcha configuration regularly, especially if your site has enough visibility for human Spammers to have an interest in it – they could adapt their Bots to your Q/A.
No automatic follow on social networks: Spams on Twitter and other social networks will follow you (they usually have hot girls as avatars, and their posts are either empty or the same bullshit repeated over and over). Do not follow them back: following them is giving them power.

For phpBB users, you have great recommendations here.
Google is also full of good practices.

On a funnier note, there is a smart way to profit from Spam. Check this out 😉

Hope this explanation will be useful to many.

Categories

Recent Posts

Understanding Spam … figures, business model, techniques, protection