Oh no, we have too many links!

Once in a while I log into google to look at their webmaster tools and what they say about drupal.org.

Yesterday, I did that again, after a hiatus of several weeks. I noticed, that google had sent me three mails which didn't make it to my inbox because I hadn't configured email forwarding.

After fixing that I looked at the mails.

One was abotu their services, they want me to use Adwords and offer some free budget. The other two were almost identical and they probably had sent the second one after I didn't react on the first:

Google thinks we have too many links!

They said that the huge amount of links may lead to googlebot not being able to index all of them and we should consider if maybe we could exclude some through robots.txt.

Speedy google bot

Googlebot is a frequent visitor(*) of drupal.org, eating over 50GB of traffic in over 11 million requests in March. I didn't know that the postprocessing is also quite fast.

Today, one of our helpful users reported some spam which was created by a new user. The user was a member for 2 hours and the spam itself was just over 1 hours old before I deleted it.

After I deleted it, I thought that it would have been entertaining to show it to somebody (the subject of the spam was hardware for your shower) and I decided to look in google. I didn't really expect it to be there, but there it was "found one hour ago". That means that googlebot picked it up within the first few minutes after its creation and it was in the search results shortly afterwards. Considering the huge amount of websites that googlebot looks at I find this pretty impressive.

Googlebot likes Drupal 6

It is now several weeks after the upgrade of drupal.org to Drupal 6 and I've taken a look at google's crawling statistics for drupal.org.

This is the most interesting graph for me as infrastructure manasger, it shows the average time that googlebot needs to download a html page from drupal.org. We apparently had a bit of a rough ride in January, but recently this has smoothed out. About 600ms per page seems quite a good value to me.

Subscribe with RSS Subscribe to RSS - google