Results 1 to 10 of 10

Thread: robots.txt help

  1. #1
    Sami4u's Avatar
    Sami4u is offline Butterflies Forever
    Join Date
    Sep 2009
    Location
    USA
    Posts
    1,421
    Thanks
    695
    Thanked 292 Times in 220 Posts

    robots.txt help

    Hi,

    Here is the robots.txt that I am using on my sites.

    Is their something that you would add or change and why?

    Code:
     
    Sitemap: http://britneyspearsfans.info/sitemap.xml
     
    User-agent: ADSAComponent
    Disallow: /
    User-agent: Alexibot
    Disallow: /
    User-agent: Aqua_Products
    Disallow: /
    User-agent: BackDoorBot
    Disallow: /
    User-agent: BecomeBot
    Disallow: /
    User-agent: BlowFish
    Disallow: /
    User-agent: Bookmark search tool
    Disallow: /
    User-agent: BotALot
    Disallow: /
    User-agent: BruinBot
    Disallow: /
    User-agent: BuiltBotTough
    Disallow: /
    User-agent: Bullseye
    Disallow: /
    User-agent: BunnySlippers
    Disallow: /
    User-agent: CheeseBot
    Disallow: /
    User-agent: CazoodleBot
    Disallow: /
    User-agent: CherryPicker
    Disallow: /
    User-agent: CherryPickerElite
    Disallow: /
    User-agent: CherryPickerSE
    Disallow: /
    User-agent: Copernic
    Disallow: /
    User-agent: CopyRightCheck
    Disallow: /
    User-agent: Crescent
    Disallow: /
    User-agent: DittoSpyder
    Disallow: /
    User-agent: DomainsDB.net
    Disallow: /
    User-agent: EmailCollector
    Disallow: /
    User-agent: EmailSiphon
    Disallow: /
    User-agent: EmailWolf
    Disallow: /
    User-agent: Enterprise_Search
    Disallow: /
    User-agent: EroCrawler
    Disallow: /
    User-agent: ExtractorPro
    Disallow: /
    User-agent: FairAd Client
    Disallow: /
    User-agent: Flaming AttackBot
    Disallow: /
    User-agent: Foobot
    Disallow: /
    User-agent: FreeFind
    Disallow: /
    User-agent: Gaisbot
    Disallow: /
    User-agent: GetRight
    Disallow: /
    User-agent: Harvest
    Disallow: /
    User-agent: Hatena Antenna
    Disallow: /
    User-agent: ia_archiver
    Disallow: /
    User-agent: InfoNaviRobot
    Disallow: /
    User-agent: Iron33
    Disallow: /
    User-agent: JennyBot
    Disallow: /
    User-agent: Jetbot
    Disallow: /
    User-agent: Kenjin Spider
    Disallow: /
    User-agent: Keyword Density
    Disallow: /
    User-agent: LNSpiderguy
    Disallow: /
    User-agent: LexiBot
    Disallow: /
    User-agent: LinkScan
    Disallow: /
    User-agent: LinkWalker
    Disallow: /
    User-agent: LinkextractorPro
    Disallow: /
    User-agent: MIIxpc
    Disallow: /
    User-agent: Mata Hari
    Disallow: /
    User-agent: Microsoft URL Control
    Disallow: /
    User-agent: Mister PiX
    Disallow: /
    User-agent: NICErsPRO
    Disallow: /
    User-agent: NPBot
    Disallow: /
    User-agent: NetAnts
    Disallow: /
    User-agent: NetMechanic
    Disallow: /
    User-agent: Nutch
    Disallow: /
    User-agent: Offline Explorer
    Disallow: /
    User-agent: OmniExplorer_Bot
    Disallow: /
    User-agent: Openbot
    Disallow: /
    User-agent: Openfind
    Disallow: /
    User-agent: Oracle Ultra Search
    Disallow: /
    User-agent: PerMan
    Disallow: /
    User-agent: ProPowerBot
    Disallow: /
    User-agent: ProWebWalker
    Disallow: /
    User-agent: Python-urllib
    Disallow: /
    User-agent: QueryN Metasearch
    Disallow: /
    User-agent: RMA
    Disallow: /
    User-agent: Radiation Retriever
    Disallow: /
    User-agent: RepoMonkey
    Disallow: /
    User-agent: SBIder
    Disallow: /
    User-agent: SiteSnagger
    Disallow: /
    User-agent: SpankBot
    Disallow: /
    User-agent: Stanford
    Disallow: /
    User-agent: Stanford Comp Sci
    Disallow: /
    User-agent: SurveyBot
    Disallow: /
    User-agent: Swooglebot
    Disallow: /
    User-agent: Szukacz
    Disallow: /
    User-agent: Teleport
    Disallow: /
    User-agent: TeleportPro
    Disallow: /
    User-agent: Telesoft
    Disallow: /
    User-agent: The Intraformant
    Disallow: /
    User-agent: TheNomad
    Disallow: /
    User-agent: True_Robot
    Disallow: /
    User-agent: TurnitinBot
    Disallow: /
    User-agent: URL Control
    Disallow: /
    User-agent: URL_Spider_Pro
    Disallow: /
    User-agent: URLy Warning
    Disallow: /
    User-agent: VCI
    Disallow: /
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    User-agent: WWW-Collector-E
    Disallow: /
    User-agent: Web Image Collector
    Disallow: /
    User-agent: WebAuto
    Disallow: /
    User-agent: WebBandit
    Disallow: /
    User-agent: WebCopier
    Disallow: /
    User-agent: WebEnhancer
    Disallow: /
    User-agent: WebSauger
    Disallow: /
    User-agent: WebStripper
    Disallow: /
    User-agent: WebVac
    Disallow: /
    User-agent: WebZip
    Disallow: /
    User-agent: Website Quester
    Disallow: /
    User-agent: Webster Pro
    Disallow: /
    User-agent: Wget
    Disallow: /
    User-agent: Xenu's
    Disallow: /
    User-agent: Zeus
    Disallow: /
    User-agent: Zeus Link Scout
    Disallow: /
    User-agent: asterias
    Disallow: /
    User-agent: b2w/0.1
    Disallow: /
    User-agent: cosmos
    Disallow: /
    User-agent: dumbot
    Disallow: /
    User-agent: es
    Disallow: /
    User-agent: grub
    Disallow: /
    User-agent: grub-client
    Disallow: /
    User-agent: hloader
    Disallow: /
    User-agent: httplib
    Disallow: /
    User-agent: humanlinks
    Disallow: /
    User-agent: larbin
    Disallow: /
    User-agent: libWeb/clsHTTP
    Disallow: /
    User-agent: lwp-trivial
    Disallow: /
    User-agent: moget
    Disallow: /
    User-agent: naver
    Disallow: /
    User-agent: polybot
    Disallow: /
    User-agent: psbot
    Disallow: /
    User-agent: ru-robot
    Disallow: /
    User-agent: searchpreview
    Disallow: /
    User-agent: sootle
    Disallow: /
    User-agent: spanner
    Disallow: /
    User-agent: suzuran
    Disallow: /
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    User-agent: turingos
    Disallow: /
    User-agent: whowhere
    Disallow: /
    User-agent: Googlebot-Image
    Disallow: /
    User-agent: Yahoo-MMCrawler
    Disallow: /
    User-agent: msnbot
    Disallow: /*.doc$
    Disallow: /*.PDF$
    Disallow: /*.jpeg$
    Disallow: /*.jpg$
    Disallow: /*.png$
    Disallow: /*.gif$
    Disallow: /*.exe$
    Disallow: /*.mp3$
    Disallow: /*.mid$
    Disallow: /*.wav$
    Disallow: /*.swf$
    User-agent: *
    Disallow: /admin/
    Disallow: /axs/
    Disallow: /carp/
    Disallow: /cgi-bin/
    Disallow: /generator/
    Disallow: /guardian/
    Disallow: /images/
    Disallow: /inc/
    Disallow: /news/
    Disallow: /rss/
    Disallow: /spamtrap/
    Disallow: /config.php
    Disallow: /carpsetup.php
    Disallow: /*.doc$
    Disallow: /*.PDF$
    Disallow: /*.jpeg$
    Disallow: /*.jpg$
    Disallow: /*.png$
    Disallow: /*.gif$
    Disallow: /*.exe$
    Disallow: /*.mp3$
    Disallow: /*.mid$
    Disallow: /*.wav$
    Disallow: /*.swf$
    #Crawl-delay: 30
    Thanks in advance

    Butterflies Forever

    Sami
    Current Celebrity Gossip Movies & More TV Site
    Find out how I'm able to get up to 420 backlinks for month, by spending 30 seconds per day...all for FREE! - Click Here

  2. #2
    Snak3's Avatar
    Snak3 is offline Moderator
    Join Date
    Jul 2009
    Location
    Undisclosed Location
    Posts
    629
    Thanks
    155
    Thanked 190 Times in 121 Posts
    I doubt malicious bots will even bother to check the Robots.txt, let alone following its instructions

  3. #3
    Sami4u's Avatar
    Sami4u is offline Butterflies Forever
    Join Date
    Sep 2009
    Location
    USA
    Posts
    1,421
    Thanks
    695
    Thanked 292 Times in 220 Posts
    Quote Originally Posted by Snak3 View Post
    I doubt malicious bots will even bother to check the Robots.txt, let alone following its instructions
    Hi,

    So what you are saying is this is worthless

    I was hoping not.

    Butterflies Forever

    Sami
    Current Celebrity Gossip Movies & More TV Site
    Find out how I'm able to get up to 420 backlinks for month, by spending 30 seconds per day...all for FREE! - Click Here

  4. #4
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,033
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    Quote Originally Posted by Snak3 View Post
    I doubt malicious bots will even bother to check the Robots.txt, let alone following its instructions
    But sometimes they do, and robots.txt is nearly-free protection.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  5. #5
    Kovich's Avatar
    Kovich is offline Community Guardian
    Join Date
    Jan 2009
    Location
    Philadelphia, Pennsylvania
    Posts
    1,797
    Blog Entries
    30
    Thanks
    453
    Thanked 420 Times in 279 Posts
    Exactly! It never hurts to make the robots.txt file disallow malicious bots - as Will said, it's nearly-free protection.

    It will definitely not stop all malicious bots, but it could stop a few, and that's always better than none at all.

  6. #6
    Sami4u's Avatar
    Sami4u is offline Butterflies Forever
    Join Date
    Sep 2009
    Location
    USA
    Posts
    1,421
    Thanks
    695
    Thanked 292 Times in 220 Posts
    Hi,

    Thanks All but I still have the same question.

    Is their something that you would add or change and why?

    Is it missing any that could be out there? If so What?

    Butterflies Forever

    Sami
    Current Celebrity Gossip Movies & More TV Site
    Find out how I'm able to get up to 420 backlinks for month, by spending 30 seconds per day...all for FREE! - Click Here

  7. #7
    Aziz's Avatar
    Aziz is offline no investment, no glory
    Join Date
    May 2009
    Location
    IL
    Posts
    736
    Thanks
    588
    Thanked 243 Times in 168 Posts
    disallow all agents * and allow the bots you'd like to index your pages

    example:

    Code:
    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Allow: /
    etc

  8. Thanked by:

    Sami4u (27 September, 2009)

  9. #8
    Sami4u's Avatar
    Sami4u is offline Butterflies Forever
    Join Date
    Sep 2009
    Location
    USA
    Posts
    1,421
    Thanks
    695
    Thanked 292 Times in 220 Posts
    Hi,

    Quote Originally Posted by Aziz View Post
    disallow all agents * and allow the bots you'd like to index your pages

    example:

    Code:
    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Allow: /
    etc
    So I have that wrong almost at the bottom or right?

    Thanks

    Butterflies Forever

    Sami
    Current Celebrity Gossip Movies & More TV Site
    Find out how I'm able to get up to 420 backlinks for month, by spending 30 seconds per day...all for FREE! - Click Here

  10. #9
    Kovich's Avatar
    Kovich is offline Community Guardian
    Join Date
    Jan 2009
    Location
    Philadelphia, Pennsylvania
    Posts
    1,797
    Blog Entries
    30
    Thanks
    453
    Thanked 420 Times in 279 Posts
    Quote Originally Posted by Aziz View Post
    disallow all agents * and allow the bots you'd like to index your pages

    example:

    Code:
    User-agent: *
    Disallow: /
    User-agent: Googlebot
    Allow: /
    etc
    That could be a bad idea because if you forget to include a legitimate bot, you're denying yourself traffic and exposure.

  11. #10
    Aziz's Avatar
    Aziz is offline no investment, no glory
    Join Date
    May 2009
    Location
    IL
    Posts
    736
    Thanks
    588
    Thanked 243 Times in 168 Posts
    you might be right, but it's an option

Similar Threads

  1. The EPFL mini-robots
    By kiki in forum General Chat
    Replies: 1
    Last Post: 2 June, 2010, 02:24 AM
  2. Help with Robots.txt
    By 5starpix in forum Building
    Replies: 4
    Last Post: 11 February, 2010, 02:27 AM
  3. Robots.txt?
    By dmi in forum Managing
    Replies: 15
    Last Post: 9 September, 2009, 17:15 PM
  4. Block Robots and Web Downloaders with robots.txt
    By Will.Spencer in forum Managing
    Replies: 12
    Last Post: 6 June, 2009, 15:40 PM
  5. What is robots.txt file?
    By ltimranjaved in forum Managing
    Replies: 1
    Last Post: 26 May, 2009, 12:27 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •