Results 41 to 44 of 44

Thread: Block Web Content Scrapers and Downloaders

Threaded View

  1. Block Web Content Scrapers and Downloaders

    I have found Bot Trap to be extremely effective in blocking the sneakiest of the web scraper bots.

    Bot Trap works by placing a hidden link on your homepage. That link can only be seen in the source code to the page. Ergo, the only things that should see and follow that link are web robots. But, that link is Disallowed in robots.txt, so polite robots will never try to follow that link.

    Robots that do follow that link get automatically added with Deny statements into your sites .htaccess file.

    Sometimes legitimate web robots get out of sync, so to make the script able to run unattended, I recommend that you whitelist those in your .htaccess file.

    Here's my current whitelist:
    Allow from
    Allow from      # Google
    Allow from 65.55                # MSN
    Allow from 207.46               # MSN
    Allow from 66.249               # Google
    Allow from 67.195               # Yahoo!
    Allow from       # Google
    Allow from 72.30                # Yahoo!
    Allow from 74.6                 # Yahoo!
    Allow from        # Google
    Allow from       # Baidu
    Allow from 202.160              # Yahoo!
    This blocks people stealing your content to place on MFA sites and it blocks people downloading your entire websites for offline reading. I constantly see people trying to download my fifty-thousand page web sites. It's a complete waste of bandwidth.

    Bot Trap is friendly though. Users will see a message telling them that they are blocked and they only have to enter the word "access" into a form to be automatically unblocked.

    Every blocking or unblocking action generates an email to the site admin.

    It's really a beautiful script.
    Last edited by Will.Spencer; 25 December, 2010 at 02:05 AM.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

Similar Threads

  1. Obfuscate Proxy Content to make harder to Block
    By tibbie in forum Web Proxies
    Replies: 4
    Last Post: 2 May, 2011, 08:34 AM
  2. New Content writer on the block!
    By AjiContent in forum Introduction Forum
    Replies: 0
    Last Post: 28 February, 2011, 09:19 AM
  3. Google on Content Scrapers
    By Kovich in forum Managing
    Replies: 18
    Last Post: 14 May, 2010, 06:58 AM
  4. Block Robots and Web Downloaders with robots.txt
    By Will.Spencer in forum Managing
    Replies: 12
    Last Post: 6 June, 2009, 15:40 PM
  5. How to Profit from Content Scrapers?
    By Shenron in forum Promoting
    Replies: 4
    Last Post: 12 March, 2009, 18:58 PM

Tags for this Thread


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts