We are not alone!

Last week we've had some "fun" with Microsoft's msnbot. They were apparently trying out their new beta and it didn't work that well. It ignored drupal.org's robots.txt and kept crawling 20-40 pages per second. You could call that a denial of sevice attack, drupal.org sure had problems.

We then resolved this by banning the whole subnet from our webservers.

Today neclimdul pointed out to me that the people running CPAN testers have had the same fun as we did and resolved this in the same manner.

If Microsoft keeps doing this it will get cut off from indexing many Open Source projects. One could think it was intentional.

The Drupal Association supports OSUOSL

The Oregon State University Open Source Lab (OSUOSL) has been one of the most generous organizations to the Drupal project. In mid-2005 they stepped up and offered to host drupal.org at a time when the website was crashing due to insufficient hardware. OSUOSL generously offered rackspace and bandwidth, all of it donated. Since this donation the drupal.org infrastructure has grown from a single server to more than a dozen, traffic has increased exponentially, and overall growth has exploded. OSUOSL handled all of this in stride and even provided the time of student interns to assist with hosting and infrastructure issues.

More computer power

The Drupal Association has used some of the money that it acquired thanks to the Drupal community and its sponsors to buy more computing power for the infrastructure that all services of drupal.org are hosted on.

Oh no, we have too many links!

Once in a while I log into google to look at their webmaster tools and what they say about drupal.org.

Yesterday, I did that again, after a hiatus of several weeks. I noticed, that google had sent me three mails which didn't make it to my inbox because I hadn't configured email forwarding.

After fixing that I looked at the mails.

One was abotu their services, they want me to use Adwords and offer some free budget. The other two were almost identical and they probably had sent the second one after I didn't react on the first:

Google thinks we have too many links!

They said that the huge amount of links may lead to googlebot not being able to index all of them and we should consider if maybe we could exclude some through robots.txt.

Getroffene Hunde bellen

The title is a German proverb which translates as "hit dogs bark" and means that people will react once they feel sufficiently threatened.

The proverb apparently applies to Phorm a UK company that wants to use Deep Packet Inspection to serve ads. They have been critizied for their plan, and I also wrote them a letter asking them to exempt drupal.org from their plans (besides an auto-reply no answer was received).

The title is a German proverb which translates as "hit dogs bark" and means that people will react once they feel sufficiently threatened.

Speedy google bot

Googlebot is a frequent visitor(*) of drupal.org, eating over 50GB of traffic in over 11 million requests in March. I didn't know that the postprocessing is also quite fast.

Today, one of our helpful users reported some spam which was created by a new user. The user was a member for 2 hours and the spam itself was just over 1 hours old before I deleted it.

After I deleted it, I thought that it would have been entertaining to show it to somebody (the subject of the spam was hardware for your shower) and I decided to look in google. I didn't really expect it to be there, but there it was "found one hour ago". That means that googlebot picked it up within the first few minutes after its creation and it was in the search results shortly afterwards. Considering the huge amount of websites that googlebot looks at I find this pretty impressive.

Spammers read my blog

Maybe they don't but they have realized that their spam profiles on drupal.org are too short-lived to get them much traffic. As a result of this, the number of new spam profiles seems to be down.

As a side note: In part due to the spam profiles and the google traffic that they generated drupal.org served more than 30 Mio pages in March. This is an increase of about 38% compared to February with 22 Mio viewed pages.

The bigger part of this surge can probably be attributed to DrupalCon at the beginning of March.

Here's a table of the number of 403 pages by month:

Month Count
January 66000
February 65000
March 110000
April (until 19th) 1.3 Mio

Goodbye Phorm

After amazon and wikimedia the Drupal association has decided to opt out of the Phorm webtraffic snooping scheme. It is quite scandalous that one has to opt-out instead of having the option ot opt-in, but we did it anyway. We got the same auto-reply that wikimedia got, let's see if we get any more detailed response later.

Why spam works

I've recently been looking into the spam that hits drupal.org and yesterday I've finally found out why they do that and that it actually works. Until I block the accounts at least.

A blocked a account will give any visitor a "403 access denied" message. Drupal logs these incidents. It also logs the referer of these requests, so I am able to see which page the visitor was looking at when he clicked on the link to the blocked account. Most of these pages are search resulte of google and other search engines. And of course the visitor was looking for porn of all different flavours.

Spammer update

Last week I blogged about the spammers on drupal.org and how we remove their accounts. This week I've again looked at the newly created accounts and also added some other domains to the access rules (mainly aliases of mailinator.com).

There is one new player on the mail provider list. Apparently somebody created a domain to use for mail in order to be able to register at sites like drupal.org. And that they did: they created almost 500 accounts on d.o during the last week. They are of course all blocked now.

Pages

Subscribe with RSS Subscribe to RSS - drupal.org