What Are Search Engine Robots?
Search engine robots, sometimes called "spiders", "bots" or "crawlers", are programs used by search engines to explore the web, seeking out new pages and checking/updating the Content of known pages. The information that a search engine uses to rank your pages and every other page it holds in its database was found by these Spiders prior to being indexed and added to the database.
There are four reasons a robot might visit your pages
- You submitted the URL to the search engine
- The engine knows about your pages and is checking to see if the content has changed
- The robot has followed an internal Link to a new page you have recently uploaded
- The robot has followed an external link from another site that links to you
You might think that with such a major role to play in indexing the web, Robots would be powerful and sophisticated animals. Well you would be wrong! Robots are relatively simple programs with limited functionality not unlike early browsers.
Robots don't understand or have difficulty understanding;
- Frames
- Flash Movies
- Flash Intros
- Invalid Code
- JavaScript
- Image Maps
- Dynamically Generated URL's
- JavaScript Navigation.
When a robot arrives at your website the first thing it does is to check your robots.txt file if you have one. This file is used to inform robots about pages or directories that you don't want indexed, these may be directories containing legacy pages or printer friendly pages. A robot gathers as much information as it can about a page before following any links through to other pages.
Not all robots are friendly, some malicious spiders are designed to scrape e-mail addresses that will later be used to send unsolicited Spam e-mail.
If you have access to your server logs or a log statistics program you will be able to see which pages on your site have been visited by the robots. Using your logs or statistics program you will see which robots visited, when they visited, which pages they visited and how often they visit. Some robots are easily spotted from their user agent names, like "Googlebot" Google's spider.
If you identify activity from these spiders in your server logs or log statistics program, your pages are probably about to be listed on that particular search engine. However, be patient, some search engines can take 3 - 6 months to update their databases.
The following info is intended to assist you in identifying the search engine spiders and robots that visit your site based on information you can obtain by viewing your site's visitor log reports.
For information on blocking any of these robots using the Robots.txt exclusion standard, see
http://www.robotstxt.org/wc/exclusion.html
| Company | Alta Vista |
| User Agent | Scooter-3.0.3 (Many variations. Most contain the word Scooter) |
| Robot.txt Identifier | User-agent: Scooter |
| Details | |
| Company | Ask Jeeves |
| User Agent | Mozilla/2.0 (compatible; Ask Jeeves) |
| Robot.txt Identifier | User-agent:directhit User-agent: teomaagent1 |
| Details | |
| Company | Fast Search and Transfer ASA |
| User Agent | FAST-WebCrawler/3.4/Nirvana) AKA - Mozilla/4.0 (compatible; FastCrawler3, support-fastcrawler3@fast.no) |
| Robot.txt Identifier | User-agent: fast |
| Details | http://fast.no/support/crawler.asp Powers Alltheweb.com, Lycos and many smaller search engines |
| Company | |
| User Agent | Googlebot/2.1 (+http://www.googlebot.com/bot.html) AKA: Wget/1.5.3 AKA: Googlebot-image (+http://www.googlebot.com/bot.html) |
| Robot.txt Identifier | User-agent: googlebot |
| Details | http://www.googlebot.com/bot.html |
| Company | Inktomi |
| User Agent | Slurp (Slurp.so/1.0 (slurp@inktomi.com; http://www.inktomi.com/slurp.html) |
| Robot.txt Identifier | User-agent: slurp |
| Details | http://www.inktomi.com/slurp.html Powers AOL, MSN and many others |
| Company | Lycos |
| User Agent | Lycos_Spider_(modspider) AKA: T-Rex |
| Robot.txt Identifier | User-agent: lycos |
| Details | Lycos is powered by the search engine at Fast, we don't know why they continue to operate their own spider |
| Company | Microsoft / MSN |
| User Agent | MSNbot / MSRbot |
| Robot.txt Identifier | N/A |
| Details | MSNbot is the development Bot for their new search engine MSRbot is supposed to be a Research Bot |
Opening Hours: 9:30am to 5pm, Mon to Fri, except public holidays.
Phone us on 0871 900 8407












