Results 1 to 7 of 7

Thread: how to block all spider ?

  1. #1
    SonnyCooL's Avatar
    SonnyCooL is offline HeeHa
    Join Date
    Jan 2010
    Location
    Melb/Malaysia
    Posts
    920
    Thanks
    250
    Thanked 92 Times in 78 Posts

    how to block all spider ?

    Guy how to block all spider with htaccess ???
    yes everyone include all major search engine, cause i setup a few live test site with huge content but prefer not showing it to any bot

    i try wp stop all SE setting, but google still index the site, with title and no description ...

    thanks

  2. #2
    anantshri is offline on leave from Net Builders : will post rarely
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Thanks
    80
    Thanked 47 Times in 40 Posts
    I would suggest adding entries in your htaccess files


    i wrote a article for simmilar task not exctly all bots.

    however with bit of tweaking this should work.

    htaccess based spamBot and Leacher Blocking Code | Anant Shrivastava : Techno Enthusiast

  3. #3
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    3,107
    Thanks
    350
    Thanked 919 Times in 703 Posts
    Bad spiders will ignore the robots.txt file. Only legitimate spiders use the robots.txt.

    For bad spiders, the best solution is to forcefully block them using the .htaccess file.

    You can use the robots.txt file to block Google, Yahoo, Bing/MSN and others. Just use the following in the file.

    Code:
    User-agent: * 
    Disallow: /
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  4. #4
    iowadawg's Avatar
    iowadawg is offline Free Cell Champion
    Join Date
    May 2010
    Location
    Not in Texas
    Posts
    2,148
    Blog Entries
    4
    Thanks
    171
    Thanked 365 Times in 314 Posts
    Popup blocker?

    Hahahahhahaha....

    Thanks, I needed a laugh!

  5. #5
    Leftfield's Avatar
    Leftfield is offline Free The Robots
    Join Date
    May 2010
    Location
    Budva, Montenegro
    Posts
    31
    Thanks
    7
    Thanked 12 Times in 11 Posts
    @SonnyCooL

    You need to use robots.txt like TopDogger wrote. Then you should place meta tag:
    <meta name="robots" content="noindex, nofollow">

    This "nofollow" is not necessary in your case. You can change it with "follow". For bad spiders, use your .htaccess to block them

  6. Thanked by:

    SonnyCooL (3 February, 2011)

  7. #6
    Mike-XS's Avatar
    Mike-XS is offline XeroAgent
    Join Date
    Sep 2009
    Location
    OZ
    Posts
    209
    Thanks
    30
    Thanked 109 Times in 71 Posts
    Hey Sonny .. If your IP address is static, then you could set up a rule in a .htaccess file to allow only your IP. Only problem is that if your IP is dynamic then you would have to update the rule each time it changes.

    This will automatically block everyone and you could also redirect them to a coming soon / maintennance page etc until the site is ready.

    Exclude your IP only and give eveyone else a 403 forbidden.

    Code:
    RewriteCond %{REMOTE_ADDR} !^111\.222\.33\.4
    RewriteRule ^(.*)$ - [F,L]
    Exclude your IP and 302 redirect everyone else to a temp holding page such as a index.html.

    Code:
    RewriteCond %{REMOTE_ADDR} !^111\.222\.33\.4
    RewriteCond %{REQUEST_URI} !/index\.html$ [NC]
    RewriteRule ^(.*)$ /index\.html [R=302,L]
    There's way too many bots to block them all in a htaccess. You would have better results using some kind of anti-bot script.

  8. Thanked by:

    SonnyCooL (3 February, 2011)

  9. #7
    canadaimmigration is offline Newbie Net Builder
    Join Date
    Dec 2011
    Location
    Canada
    Posts
    10
    Thanks
    0
    Thanked 0 Times in 0 Posts
    After a long time, encountered a very good question. There are two methods to block spiders through .htaccess.

    1. Use SetEnv directive with combination with FilesMatch.

    SetEnvIfNoCase user-agent "^Custo" bad_bot=1
    SetEnvIfNoCase user-agent "^Bot\ mailto:craftbot@yahoo.com" bad_bot=1
    <FilesMatch "(.*)">
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </FilesMatch>
    Just define all the bad bots with SetEnvIfNoCase. The


    2. ModRewrite Method

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
    RewriteRule ^.* - [F,L]

Similar Threads

  1. Search Engine Crawler and Spider
    By joy1986joy in forum Promoting
    Replies: 4
    Last Post: 20 July, 2010, 14:20 PM
  2. Replies: 8
    Last Post: 28 March, 2010, 10:08 AM
  3. Who are SE Bots and spider.??
    By sagar.best in forum Promoting
    Replies: 2
    Last Post: 12 June, 2009, 21:06 PM
  4. Spider Cat... Spider Cat....
    By m42 in forum General Chat
    Replies: 4
    Last Post: 22 May, 2009, 18:42 PM
  5. Blogger Spider
    By superfast502 in forum Managing
    Replies: 1
    Last Post: 29 December, 2008, 15:26 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •