Page 4 of 5 FirstFirst ... 2345 LastLast
Results 31 to 40 of 44

Thread: Block Web Content Scrapers and Downloaders

  1. #31
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,034
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    I received a notification from bot-trap that Microsoft is using a new crawling address for Bingbot:
    A bad robot hit /bot-trap/index.php 2010-12-24 (Fri) 18:03:52
    address is 207.46.12.240, hostname is msnbot-207-46-12-240.search.msn.com, agent is Mozilla/5.0 (compatible; bingbot/2.0; +Bing Webmaster Center)

    207.46.12.240 is in the network range 207.46.0.0 - 207.46.255.255, which is 207.46.0.0/16 in CIDR notation.

    If you use bot-trap, you'll want to add a line like this to your .htaccess file:
    Allow from 207.46
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  2. #32
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    3,139
    Thanks
    350
    Thanked 924 Times in 707 Posts
    Does it matter whether I use the CIDR notation or 206.46?

    Does one work any better than the other?

    I already added the CIDR notation.

    I have noticed that sometimes bad spiders slip past Deny statements that use CIDR and end up getting caught by bot-trap. If the CIDR Deny statement was working, they should not have the opportunity to get caught by bot-trap.

    bot-trap snagged 775 bad spiders in my sites in the last two months.
    Last edited by TopDogger; 25 December, 2010 at 21:02 PM.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  3. #33
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,034
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    I wasn't sure that CIDR worked in .htaccess, but I found How to use a CIDR netmask to block an IP address range in .htaccess so I guess it does.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  4. #34
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    3,139
    Thanks
    350
    Thanked 924 Times in 707 Posts
    CIDR does appear to work about 99% of the time. I used it to block the multiple ranges of Yandex bots that were ignoring the robots.txt file and killing my bandwidth. However, every once in a while a Yandex spider appears to slip through.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  5. #35
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,034
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    Microsoft is making more changes:
    Code:
    A bad robot hit /bot-trap/index.php 2010-12-26 (Sun) 07:23:51 
    address is 157.55.116.57, hostname is msnbot-157-55-116-57.search.msn.com, agent is msnbot/2.0b (+http://search.msn.com/msnbot.htm).
    WHOIS gives us three CIDR ranges from that IP address:
    Allow from 157.54.0.0/15
    Allow from 157.56.0.0/14
    Allow from 157.60.0.0/16
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  6. #36
    SonnyCooL's Avatar
    SonnyCooL is offline HeeHa
    Join Date
    Jan 2010
    Location
    Melb/Malaysia
    Posts
    920
    Thanks
    250
    Thanked 92 Times in 78 Posts
    how to block browser scrapper ? cause they using dynamic IP, basically nothing much i can do other then human monitor right ??

    they creating 10 mb inbound traffic and high latency almost cause my server stop ....

  7. #37
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,034
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    Quote Originally Posted by SonnyCooL View Post
    how to block browser scrapper ? cause they using dynamic IP, basically nothing much i can do other then human monitor right ??
    bot-trap finds the scrapers at each new IP. Give it a try. Seriously.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  8. #38
    SonnyCooL's Avatar
    SonnyCooL is offline HeeHa
    Join Date
    Jan 2010
    Location
    Melb/Malaysia
    Posts
    920
    Thanks
    250
    Thanked 92 Times in 78 Posts
    Quote Originally Posted by Will.Spencer View Post
    bot-trap finds the scrapers at each new IP. Give it a try. Seriously.
    but those dynamic IP will reuse again by another human sooner or later

    ok saw the captcha for human thing will try it out later with my min knowledge

  9. #39
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,034
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    Quote Originally Posted by SonnyCooL View Post
    but those dynamic IP will reuse again by another human sooner or later
    If you're worried, you can clean the banned IP addresses out of .htaccess once a month or so.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  10. #40
    anantshri is offline on leave from Net Builders : will post rarely
    Join Date
    Apr 2010
    Location
    india
    Posts
    338
    Thanks
    80
    Thanked 47 Times in 40 Posts
    this looks like an interesting piece of software will try on my website.

    in the mean time i generally block spam bots and leachers using useragent stings through ht access.

    htaccess based spamBot and Leacher Blocking Code | Anant Shrivastava : Techno Enthusiast

Page 4 of 5 FirstFirst ... 2345 LastLast

Similar Threads

  1. Obfuscate Proxy Content to make harder to Block
    By tibbie in forum Web Proxies
    Replies: 4
    Last Post: 2 May, 2011, 09:34 AM
  2. New Content writer on the block!
    By AjiContent in forum Introduction Forum
    Replies: 0
    Last Post: 28 February, 2011, 09:19 AM
  3. Google on Content Scrapers
    By Kovich in forum Managing
    Replies: 18
    Last Post: 14 May, 2010, 07:58 AM
  4. Block Robots and Web Downloaders with robots.txt
    By Will.Spencer in forum Managing
    Replies: 12
    Last Post: 6 June, 2009, 16:40 PM
  5. How to Profit from Content Scrapers?
    By Shenron in forum Promoting
    Replies: 4
    Last Post: 12 March, 2009, 19:58 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •