Page 4 of 5 FirstFirst ... 2345 LastLast
Results 31 to 40 of 44

Thread: Block Web Content Scrapers and Downloaders

  1. I received a notification from bot-trap that Microsoft is using a new crawling address for Bingbot:
    A bad robot hit /bot-trap/index.php 2010-12-24 (Fri) 18:03:52
    address is 207.46.12.240, hostname is msnbot-207-46-12-240.search.msn.com, agent is Mozilla/5.0 (compatible; bingbot/2.0; +Bing Webmaster Center)

    207.46.12.240 is in the network range 207.46.0.0 - 207.46.255.255, which is 207.46.0.0/16 in CIDR notation.

    If you use bot-trap, you'll want to add a line like this to your .htaccess file:
    Allow from 207.46
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  2. #32
    Does it matter whether I use the CIDR notation or 206.46?

    Does one work any better than the other?

    I already added the CIDR notation.

    I have noticed that sometimes bad spiders slip past Deny statements that use CIDR and end up getting caught by bot-trap. If the CIDR Deny statement was working, they should not have the opportunity to get caught by bot-trap.

    bot-trap snagged 775 bad spiders in my sites in the last two months.
    Last edited by TopDogger; 25 December, 2010 at 21:02 PM.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  3. I wasn't sure that CIDR worked in .htaccess, but I found How to use a CIDR netmask to block an IP address range in .htaccess so I guess it does.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  4. #34
    CIDR does appear to work about 99% of the time. I used it to block the multiple ranges of Yandex bots that were ignoring the robots.txt file and killing my bandwidth. However, every once in a while a Yandex spider appears to slip through.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  5. Microsoft is making more changes:
    Code:
    A bad robot hit /bot-trap/index.php 2010-12-26 (Sun) 07:23:51 
    address is 157.55.116.57, hostname is msnbot-157-55-116-57.search.msn.com, agent is msnbot/2.0b (+http://search.msn.com/msnbot.htm).
    WHOIS gives us three CIDR ranges from that IP address:
    Allow from 157.54.0.0/15
    Allow from 157.56.0.0/14
    Allow from 157.60.0.0/16
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  6. #36
    how to block browser scrapper ? cause they using dynamic IP, basically nothing much i can do other then human monitor right ??

    they creating 10 mb inbound traffic and high latency almost cause my server stop ....

  7. Quote Originally Posted by SonnyCooL View Post
    how to block browser scrapper ? cause they using dynamic IP, basically nothing much i can do other then human monitor right ??
    bot-trap finds the scrapers at each new IP. Give it a try. Seriously.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  8. #38
    Quote Originally Posted by Will.Spencer View Post
    bot-trap finds the scrapers at each new IP. Give it a try. Seriously.
    but those dynamic IP will reuse again by another human sooner or later

    ok saw the captcha for human thing will try it out later with my min knowledge

  9. Quote Originally Posted by SonnyCooL View Post
    but those dynamic IP will reuse again by another human sooner or later
    If you're worried, you can clean the banned IP addresses out of .htaccess once a month or so.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  10. #40
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    this looks like an interesting piece of software will try on my website.

    in the mean time i generally block spam bots and leachers using useragent stings through ht access.

    htaccess based spamBot and Leacher Blocking Code | Anant Shrivastava : Techno Enthusiast

Page 4 of 5 FirstFirst ... 2345 LastLast

Similar Threads

  1. Obfuscate Proxy Content to make harder to Block
    By tibbie in forum Web Proxies
    Replies: 4
    Last Post: 2 May, 2011, 08:34 AM
  2. New Content writer on the block!
    By AjiContent in forum Introduction Forum
    Replies: 0
    Last Post: 28 February, 2011, 09:19 AM
  3. Google on Content Scrapers
    By Kovich in forum Managing
    Replies: 18
    Last Post: 14 May, 2010, 06:58 AM
  4. Block Robots and Web Downloaders with robots.txt
    By Will.Spencer in forum Managing
    Replies: 12
    Last Post: 6 June, 2009, 15:40 PM
  5. How to Profit from Content Scrapers?
    By Shenron in forum Promoting
    Replies: 4
    Last Post: 12 March, 2009, 18:58 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •