Results 1 to 3 of 3

Thread: Google Robots.txt Specifications

  1. #1
    Mike-XS's Avatar
    Mike-XS is offline XeroAgent
    Join Date
    Sep 2009
    Location
    OZ
    Posts
    209
    Thanks
    30
    Thanked 109 Times in 71 Posts

    Google Robots.txt Specifications

    Hi, spotted this in my travels recently, might be of interest to some of you guys.

    Controlling Crawling and Indexing
    Controlling Crawling and Indexing - Google Code

    Robots.txt Specifications

    This document details how Google handles the robots.txt file that allows you to control how Google's website crawlers crawl and index publicly accessible websites.
    Robots.txt Specifications - Controlling Crawling and Indexing - Google Code

    -

    Little discussion:
    Google's Current Specifications for Robots Directives Sitemaps, Meta Data, and robots.txt

  2. Thanked by:

    TopDogger (29 November, 2010)

  3. #2
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    2,946
    Thanks
    341
    Thanked 883 Times in 671 Posts
    Just keep in mind that Google uses extensions with the robots.txt file that are not valid with many other spiders, such as the Allow directive and the use of an asterisk wildcard (*) with a directive's arguments. These are not part of the official standard for the robots.txt file.

    You should set up a Google section in robots.txt if you want to use the Google extensions. If you set up a Google section, Google will not recognize other spider directives on the page, so you need to repeat every area in your site that you do not want Google to index.

    Examples. The first example (from the Google page) is actually incorrect because not all spiders recognize the Allow directive. Yahoo recognizes it, but it looks like Bing does not. This is a good example of Google viewing the Internet through a mirror.

    Code:
    User-agent: *
    Allow: /
    Code:
    Disallow: /*.php
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  4. #3
    iowadawg's Avatar
    iowadawg is online now Free Cell Champion
    Join Date
    May 2010
    Location
    Not in Texas
    Posts
    2,015
    Blog Entries
    4
    Thanks
    165
    Thanked 353 Times in 302 Posts
    Wondered why all of a sudden, I am getting a lot of visits to robot.txt on some of my sites.
    Have no robot.txt though.

    Should I really have one?

Similar Threads

  1. Help with Robots.txt
    By 5starpix in forum Building
    Replies: 4
    Last Post: 11 February, 2010, 02:27 AM
  2. robots.txt help
    By Sami4u in forum Building
    Replies: 9
    Last Post: 27 September, 2009, 07:43 AM
  3. Block Robots and Web Downloaders with robots.txt
    By Will.Spencer in forum Managing
    Replies: 12
    Last Post: 6 June, 2009, 15:40 PM
  4. Computer Specifications?
    By Khan in forum Tech-Talk
    Replies: 13
    Last Post: 2 June, 2009, 15:25 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •