Search engine listing delays have come to be called the Google Sandbox effect are actually true in practice at each of four top tier search engines in one form or another. MSN, it seems has the shortest indexing delay at 30 days. This article is the second in a series following the spiders through a brand new web site beginning on May 11, 2005 when the site was first made live on that day under a newly purchased domain name.
Previously we looked at the first 35 days and detailed the crawling behavior of Googlebot, Teoma, MSNbot and Slurp as they traversed the pages of this new site. We discovered the each robot spider displays distinctly different behavior in crawling frequency and similarly differing indexing patterns.
For reference, there are about 15 to 20 new pages added to the site daily, which are each linked from the home page for a day. Site structure is non-traditional with no categories and a linking structure tied to author pages listing their articles as well as a "related articles" index varied by linking to relevant pages containing similar content.
So let's review where we are with each spider crawling and look at pages crawled and compare pages indexed by engine.
The AskJeeves spider, Teoma has crawled most of the pages on the site, yet indexes no pages 60 days later at this writing. This is clearly a site aging delay that's modeled on Google's Sandbox behavior. Although the Teoma spider from Ask.com has crawled more pages on this site than any other engine over a 60 day period and appears to be tired of crawling as they've not returned since July 13 - their first break in 60 days.
In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed even a single page in 60 days since they made that initial crawl. But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at a few pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.
MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.
MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.
Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.
Lessons learned in the first 60 days on a new site follow:
1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.
2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.
3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.
4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.
The bottom line is that we've discovered all engines seem to delay indexing of new domain names for at least thirty days. Google so far has delayed indexing THIS new domain for 60 days since first crawling it. AskJeeves has crawled thousands of pages, while indexing none of them. MSN indexes faster than all engines but requires robots.txt file. Yahoo's Slurp crawls on again off again for 60 days, but indexes only six of total 15,000 or more pages crawled to date.
We seem to have settled that there is a clear indexing delay, but whether this site specifically is "Sandboxed" and whether delays apply universally is less clear. Many webmasters claim that they have been indexed fully within 30 days of first posting a new domain. We'd love to see others track spiders through new sites following launch to document their results publicly so that indexing and crawling behavior are proven.
© Copyright July 18, 2005 Mike Banks Valentine
Mike Banks Valentine is a search engine optimization specialist who operates WebSite101 eCommerce Tutorial and will continue reports of case study chronicling search indexing of Publish101 Article Resource
Click to Contact Mike Valentine
It crosses every webmaster's mind anytime they see an ad... Read More
Reciprocal links are an important step in your overall plan... Read More
Do you depend on free search engine traffic for your... Read More
Keyword SelectionThe most important component of search engine optimization is... Read More
What do the words "Search Engine" make you think of?... Read More
Search Engine Optimization (SEO) is something you should be aware... Read More
At the beginning of the web era, users would go... Read More
The old ways are not always the best ways.The traditional... Read More
Some web sites receive hundreds or thousands of unique visitors... Read More
One of my favorite subjects to teach on at our... Read More
In this third article, we continue to dig into the... Read More
Google's PageRank has been around for years, and in the... Read More
Every webmaster knows that the free search engines are a... Read More
If you've been involved in SEO (search engine optimization) for... Read More
The higher the PR of your site the higher will... Read More
It used to be that designing an attractive website to... Read More
Part 1. Wordtracker for keywords.A problem for all new webmasters... Read More
Search engine optimization refers to the technique of making your... Read More
Everyone packs their website with keywords in order to feed... Read More
Rankings, Rankings, Rankings!How do you get your website ranked well... Read More
This is another one of the controversial questions in many... Read More
Have you heard of the SETI Project? SETI stands for... Read More
With more and more experts and search engine enthusiastsclaiming the... Read More
If you're anything like me, you have a favourite search... Read More
Let's face facts - Search engines are starting to rule... Read More
If you are the owner of a new website, trying... Read More
The last 1.5 years have shown major changes in search... Read More
Search engine traffic is the best traffic You can get... Read More
Selecting the right keyword phrases is the key to a... Read More
Not many web master take the time to use a... Read More
Their is simple way of making your website rank top... Read More
There are today search engine and internet marketing services, in... Read More
There is a lag time between the indexing or updating... Read More
Onpage search engine optimization are things that you can change... Read More
Often, sites view seo and PPC marketing as exclusive marketing... Read More
In part I - Google Page Rank Is Dead -... Read More
One of the important factors in ranking high in search... Read More
Keyword analysis is a major part of search engine marketing.... Read More
As a matter of fact, I recommend NOT wasting money... Read More
Some very early users of the Internet - not the... Read More
In a fluke, I was able to notice something about... Read More
No matter who you are or how much you pay... Read More
When used properly in combination with other basic search engine... Read More
IntroductionThe Google Sandbox is a term applied to the phenomenon... Read More
It's unfortunate that many website owners are so hung-up on... Read More
For a long time now, marketing gurus all over the... Read More
If you have a website then you already know the... Read More
What follows is a condensed version of a conversation that... Read More
Welcome to the second part of our series of articles... Read More
For your business web site, good search engine rankings and... Read More
Search Engine Optimization (SEO) is something you should be aware... Read More
Why shouldn't I use a submission service that submits my... Read More
If you have ever been into a McDonalds you will... Read More
Contrary to what most people think, it is not necessary... Read More
Unfortunately, not many Search Engine Optimization companies know what this... Read More
A Brief Introduction To Link PopularityAs webmasters research various ways... Read More
This is part 1 of a 7 part series that... Read More
Keyword Research will reveal answers to 3 critical questions:1. Is... Read More
Search engines are frugal things. (Froogle, too, haha! Sorry, lame... Read More
For anyone looking to enhance their Google Page Rank (PR)... Read More
Search Engine Optimization is a widely misunderstood industry. Many webmasters,... Read More
Listen. Some make submitting pages to search engines sound like... Read More
Nothing could be simpler than the title you give to... Read More
With search engine algorithms changing seemingly daily, the quest to... Read More
While search engine advertising has been a great advertising medium... Read More
I have spent some time discussing the 5 different options... Read More
Search Engine Optimization (SEO) |