Speedy google bot

Googlebot is a frequent visitor(*) of drupal.org, eating over 50GB of traffic in over 11 million requests in March. I didn't know that the postprocessing is also quite fast.

Today, one of our helpful users reported some spam which was created by a new user. The user was a member for 2 hours and the spam itself was just over 1 hours old before I deleted it.

After I deleted it, I thought that it would have been entertaining to show it to somebody (the subject of the spam was hardware for your shower) and I decided to look in google. I didn't really expect it to be there, but there it was "found one hour ago". That means that googlebot picked it up within the first few minutes after its creation and it was in the search results shortly afterwards. Considering the huge amount of websites that googlebot looks at I find this pretty impressive.

Spammers read my blog

Maybe they don't but they have realized that their spam profiles on drupal.org are too short-lived to get them much traffic. As a result of this, the number of new spam profiles seems to be down.

As a side note: In part due to the spam profiles and the google traffic that they generated drupal.org served more than 30 Mio pages in March. This is an increase of about 38% compared to February with 22 Mio viewed pages.

The bigger part of this surge can probably be attributed to DrupalCon at the beginning of March.

Here's a table of the number of 403 pages by month:

Month Count
January 66000
February 65000
March 110000
April (until 19th) 1.3 Mio

Why spam works

I've recently been looking into the spam that hits drupal.org and yesterday I've finally found out why they do that and that it actually works. Until I block the accounts at least.

A blocked a account will give any visitor a "403 access denied" message. Drupal logs these incidents. It also logs the referer of these requests, so I am able to see which page the visitor was looking at when he clicked on the link to the blocked account. Most of these pages are search resulte of google and other search engines. And of course the visitor was looking for porn of all different flavours.

Spammer update

Last week I blogged about the spammers on drupal.org and how we remove their accounts. This week I've again looked at the newly created accounts and also added some other domains to the access rules (mainly aliases of mailinator.com).

There is one new player on the mail provider list. Apparently somebody created a domain to use for mail in order to be able to register at sites like drupal.org. And that they did: they created almost 500 accounts on d.o during the last week. They are of course all blocked now.

Spammers on drupal.org

So, after I claaimed we'd have less spammers than others, I wanted to find out how many spammers we've actually had.

mysql> select EXTRACT(YEAR_MONTH FROM from_unixtime(created)) as yearmonth, count(*) as count from users where status = 0 and login != 0 group by yearmonth order by yearmonth desc ;

Year/Month # of spammers
2009 / 04 820
2009 / 03 710
2009 / 02 1101
2009 / 01 371
2008 / 12 171
2008 / 11 145
2008 / 10 136
2008 / 09 268
2008 / 08 486
2008 / 07 639
2008 / 06 145
2008 / 05 132
2008 / 04 149
2008 / 03 206
2008 / 02 167

Spammers by mailprovider

On drupal.org we have much less spammers than other websites. One reason is the fact that we do not allow anonymous users to post anything and that every user needs a valid mail address in order to use his account.

This poses the question: Which email providers to our spammers use?

Luckily, this is rather easy to answer:

mysql> select substring_index(substring_index(init, '@', -1), '.', 1) as provider, count(substring_index(substring_index(init, '@', -1), '.', 1)) as count from users where status = 0 and login != 0 group by provider order by count;

Subscribe with RSS Subscribe to RSS - spam