Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: .htaccess Code for Proxy Sites to Block Bad Bots

  1. #1
    vectro's Avatar
    vectro is offline I Like Stuff
    Join Date
    Dec 2008
    Location
    U.S.A.
    Posts
    631
    Thanks
    192
    Thanked 104 Times in 82 Posts

    .htaccess Code for Proxy Sites to Block Bad Bots

    Scrapers, spammers, proxy abusers, proxy blockers and other web junk can get in the way of operating a clean proxy site. I've had some .htaccess code I've been using to block some annoying and malicious bots. UncleP also had a list of bots and was kind enough to combine the two and check for duplicate entries. He then organized the list quite nicely. This is somewhat of a continuation of Who Is Blocking You?

    Here is what we came up with. To use this, place the code in your .htaccess file in the public_html directory of your hosting account.

    Code:
    Options -Indexes
     
    RewriteEngine on
    <IfModule mod_rewrite.c>
     
    # block fishzone.ru and daynews.com.cn
    RewriteCond %{HTTP_REFERER} daynews\.com\.cn [NC,OR]
    RewriteCond %{HTTP_REFERER} fishzone\.ru [NC,OR]
    #Block comment spammers, bad bots and some proxies
    RewriteCond %{REMOTE_HOST} 12.226.240.248 [OR]
    RewriteCond %{REMOTE_HOST} 24.111.102.26 [OR]
    RewriteCond %{REMOTE_HOST} 24.117.121.113 [OR]
    RewriteCond %{REMOTE_HOST} 65.30.216.140 [OR]
    RewriteCond %{REMOTE_HOST} 67.87.64.23 [OR]
    RewriteCond %{REMOTE_HOST} 68.12.149.198 [OR]
    RewriteCond %{REMOTE_HOST} 69.139.167.203 [OR]
    RewriteCond %{REMOTE_HOST} 74.95.182.57 [OR]
    RewriteCond %{REMOTE_HOST} 91.121.3.29 [OR]
    RewriteCond %{REMOTE_HOST} 203.94.229.227 [OR]
    RewriteCond %{REMOTE_HOST} 208.96.122.142 [OR]
    RewriteCond %{REMOTE_HOST} 210.0.141.247 [OR]
    RewriteCond %{REMOTE_HOST} 210.197.97.67 [OR]
    RewriteCond %{REMOTE_HOST} ^211.138.198.* [OR]
    RewriteCond %{REMOTE_HOST} 212.179.127.188 [OR]
    RewriteCond %{REMOTE_HOST} 216.246.60.183 [OR]
    RewriteCond %{REMOTE_HOST} 220.156.189.233 [OR]
    RewriteCond %{REMOTE_HOST} 222.36.12.42 [OR]
    # Abuse Agent Blocking
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bolt\ 0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot\@yahoo\.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} CazoodleBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Default\ Browser\ 0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ecxi [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} GT::WWW [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HTTP::Lite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} IDBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} id-search [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} id-search\.org [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InternetSeer\.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ISC\ Systems\ iRc\ Search\ 2\.1 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} libwww [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} linkwalker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Maxthon$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} MFC_Tear_Sample [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^microsoft\.url [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla\.*Indy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla\.*NEWT [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Nutch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} panscient.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PECL::HTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PeoplePal [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Ping [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PHPCrawl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PleaseCrawl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Rippers\ 0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SeaMonkey$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^sitecheck\.internetseer\.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Snoopy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Steeler [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Toata\ dragostea\ mea\ pentru\ diavola [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} URI::Fetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} urllib [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} User-Agent [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} webalta [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} WebCollage [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Wells\ Search\ II [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} WEP\ Search [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWW-Mechanize [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} zermelo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus\.*Webster [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC]
    RewriteRule ^.* - [F,L]
    # Abuse bot blocking rule end
    </IfModule>
     
    <Files 403.shtml>
    order allow,deny
    allow from all
    </Files>
     
    # Start Custom Blocks
    # Bluecoat
    deny from 8.21.4.254
    deny from 65.46.48.192/30
    deny from 65.160.238.176/28
    deny from 85.92.222.0/24
    deny from 206.51.36.0/22
    deny from 216.52.23.0/24
    # cyveillance (already blocked 38.*)
    deny from 38.100.19.8/29
    deny from 38.100.21.0/24
    deny from 38.100.41.64/26
    deny from 38.105.71.0/25
    deny from 38.105.83.0/27
    deny from 38.112.21.140/30
    deny from 38.118.42.32/29
    deny from 65.213.208.128/27
    deny from 65.222.176.96/27
    deny from 65.222.185.72/29
    # Cyberpatrol
    deny from 38.103.17.160/27
    # Internet Identity - Anti-Phishing
    deny from 66.113.96.0/20
    deny from 70.35.113.192/27
    # Ironport
    deny from 204.15.80.0/22
    # Lightspeed Systems Security
    deny from 66.17.15.128/26
    deny from 69.84.207.32/27
    deny from 69.84.207.128/25
    # Layered Technologies
    deny from 72.36.128.0/17
    deny from 72.232.0.0/16
    deny from 72.233.0.0/17
    deny from 216.32.0.0/14
    # M86
    deny from 67.192.231.224/29
    deny from 208.90.236.0/22
    # McAfee-Secure-Computing
    deny from 69.48.241.64/26
    deny from 80.66.0.0/19
    deny from 192.55.214.0/24
    deny from 207.67.117.0/24
    # Phish-Inspector.com
    deny from 209.147.127.208/28
    # Prescient Software, Inc. Phishmongers
    deny from 198.186.190.0/23
    deny from 198.186.192.0/23
    deny from 198.186.194.0/24
    # PSI network
    deny from 38.0.0.0/8
    # urlfilterdb
    deny from 207.210.99.32/29
    # websense-in.car1.sandiego1.level3.net
    deny from 4.53.120.22
    # Websense 
    deny from 66.194.6.0/24
    deny from 67.117.201.128/28
    deny from 69.67.32.0/20
    deny from 131.191.87.0/24
    deny from 204.15.64.0/21
    deny from 208.80.192.0/21
    deny from 212.62.26.64/27
    deny from 213.168.226.0/24
    deny from 213.168.241.0/30
    deny from 213.168.242.0/30
    deny from 213.236.150.16/28
    # Yandex 
    deny from 77.88.0.0/18
    deny from 77.88.22.0/23
    deny from 77.88.24.0/21
    deny from 77.88.24.0/22
    deny from 77.88.28.0/22
    deny from 77.88.36.0/23
    deny from 77.88.42.0/23
    deny from 77.88.44.0/24
    deny from 77.88.50.0/23
    deny from 87.250.224.0/19
    deny from 87.250.230.0/23
    deny from 87.250.252.0/22
    deny from 93.158.128.0/18
    deny from 93.158.137.0/24
    deny from 93.158.144.0/21
    deny from 93.158.144.0/23
    deny from 93.158.146.0/23
    deny from 93.158.148.0/22
    deny from 95.108.128.0/17
    deny from 95.108.128.0/24
    deny from 95.108.152.0/22
    deny from 95.108.216.0/23
    deny from 95.108.240.0/21
    deny from 95.108.248.0/23
    deny from 178.154.128.0/17
    deny from 178.154.160.0/22
    deny from 178.154.164.0/23
    deny from 199.36.240.0/22
    deny from 213.180.192.0/19
    deny from 213.180.204.0/24
    deny from 213.180.206.0/23
    deny from 213.180.209.0/24
    deny from 213.180.218.0/23
    deny from 213.180.220.0/23
    # End Custom Blocks
    Last edited by vectro; 26 January, 2011 at 02:07 AM.

  2. Thanked by:

    Aziz (11 April, 2011), GrilledChicked (18 January, 2011), Kaustubh (18 January, 2011), TopDogger (19 January, 2011)

  3. #2
    codename_B's Avatar
    codename_B is offline PHP Coder
    Join Date
    Jan 2011
    Posts
    106
    Thanks
    9
    Thanked 14 Times in 11 Posts
    That's lovely
    http://find-new-proxies.com Auto Approval Proxy List
    http://uf6.info Free Text File Hosting

  4. #3
    Mike-XS's Avatar
    Mike-XS is offline XeroAgent
    Join Date
    Sep 2009
    Location
    OZ
    Posts
    209
    Thanks
    30
    Thanked 109 Times in 71 Posts
    # Websense
    deny from 38.0.0.0/8
    That's nowhere near websense guys. Maybe just call it PSI.

    For anyone just copying and pasting this the 38.* will block residential IP's too.

  5. Thanked by:

    vectro (20 January, 2011)

  6. #4
    UncleP's Avatar
    UncleP is offline The perfect face for radio
    Join Date
    Nov 2009
    Location
    Blighty
    Posts
    218
    Thanks
    20
    Thanked 91 Times in 61 Posts
    Quote Originally Posted by Mike-XS View Post
    # Websense
    deny from 38.0.0.0/8
    That's nowhere near websense guys. Maybe just call it PSI.

    For anyone just copying and pasting this the 38.* will block residential IP's too.
    Are there any residences using that range? Maybe it should have it's own spot. From personal experience I've only ever had rubbish from IP's starting with 38.anything, persistant bots and other proxies chaining, it covers some datacenters too, and possibly some US gov agencies (not proven). I'm happy to block it but it's not compulsary. It might be like playing whack-a-mole with a spade intead of a spoon but there's a saying: "better safe than sorry".

    It seems there's a few of us using similar and I'd think of it as an ongoing project, contributions welcome, lets make the best list we can
    If I can't be a good example, I'll just have to be a terrible warning...

  7. Thanked by:

    vectro (20 January, 2011)

  8. #5
    Mike-XS's Avatar
    Mike-XS is offline XeroAgent
    Join Date
    Sep 2009
    Location
    OZ
    Posts
    209
    Thanks
    30
    Thanked 109 Times in 71 Posts
    Yeah I've had complaints from people whose ISP is blocked in there, but I'd still prefer to block it ( -> 38.*) too for all the other garbage thats goes through PSI. It's still not websense though.

  9. #6
    Kaustubh is offline Unknown Net Builder
    Join Date
    Jan 2011
    Posts
    38
    Thanks
    6
    Thanked 2 Times in 2 Posts
    Vectro (Sorry dont know ur name, please let me know if u want to),

    Can you add this part of default .htaccess for all current proxy hosting account.

    Would be great help

    Thanks



    Quote Originally Posted by vectro View Post
    Scrapers, spammers, proxy abusers, proxy blockers and other web junk can get in the way of operating a clean proxy site. I've had some .htaccess code I've been using to block some annoying and malicious bots. UncleP also had a list of bots and was kind enough to combine the two and check for duplicate entries. He then organized the list quite nicely. This is somewhat of a continuation of Who Is Blocking You?

    Here is what we came up with. To use this, place the code in your .htaccess file in the public_html directory of your hosting account.

    Code:
    RewriteEngine On
    
    #Block comment spammers, bad bots and some proxies
    RewriteCond %{HTTP_REFERER} daynews\.com\.cn [NC,OR]
    RewriteCond %{HTTP_REFERER} fishzone\.ru [NC,OR]
    RewriteCond %{REMOTE_HOST} 12.226.240.248 [OR]
    RewriteCond %{REMOTE_HOST} 24.111.102.26 [OR]
    RewriteCond %{REMOTE_HOST} 24.117.121.113 [OR]
    RewriteCond %{REMOTE_HOST} 65.30.216.140 [OR]
    RewriteCond %{REMOTE_HOST} 67.87.64.23 [OR]
    RewriteCond %{REMOTE_HOST} 68.12.149.198 [OR]
    RewriteCond %{REMOTE_HOST} 69.139.167.203 [OR]
    RewriteCond %{REMOTE_HOST} 74.95.182.57 [OR]
    RewriteCond %{REMOTE_HOST} 91.121.3.29 [OR]
    RewriteCond %{REMOTE_HOST} 203.94.229.227 [OR]
    RewriteCond %{REMOTE_HOST} 208.96.122.142 [OR]
    RewriteCond %{REMOTE_HOST} 210.0.141.247 [OR]
    RewriteCond %{REMOTE_HOST} 210.197.97.67 [OR]
    RewriteCond %{REMOTE_HOST} ^211.138.198.* [OR]
    RewriteCond %{REMOTE_HOST} 212.179.127.188 [OR]
    RewriteCond %{REMOTE_HOST} 216.246.60.183 [OR]
    RewriteCond %{REMOTE_HOST} 220.156.189.233 [OR]
    RewriteCond %{REMOTE_HOST} 222.36.12.42 [OR]
    # Abuse Agent Blocking
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bolt\ 0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo\.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} CazoodleBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Default\ Browser\ 0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ecxi [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} GT::WWW [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HTTP::Lite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} IDBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} id-search [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} id-search\.org [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InternetSeer\.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ISC\ Systems\ iRc\ Search\ 2\.1 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} libwww [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} linkwalker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Maxthon$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} MFC_Tear_Sample [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^microsoft\.url [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla\.*Indy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla\.*NEWT [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Nutch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} panscient.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PECL::HTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PeoplePal [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Ping [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PHPCrawl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} PleaseCrawl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Rippers\ 0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SeaMonkey$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^sitecheck\.internetseer\.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Snoopy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Steeler [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Toata\ dragostea\ mea\ pentru\ diavola [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} URI::Fetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} urllib [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} User-Agent [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} webalta [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} WebCollage [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Wells\ Search\ II [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} WEP\ Search [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWW-Mechanize [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} zermelo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus\.*Webster [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^$
    RewriteRule ^.* - [F,L]
    # Abuse bot blocking rule end
    
    
    order allow,deny
    allow from all
    
    
    # Start Custom Blocks
    # Bluecoat
    deny from 8.21.4.254
    deny from 65.46.48.192/30
    deny from 65.160.238.176/28
    deny from 85.92.222.0/24
    deny from 206.51.36.0/22
    deny from 216.52.23.0/24
    # cyveillance (already blocked 38.*)
    deny from 38.100.19.8/29
    deny from 38.100.21.0/24
    deny from 38.100.41.64/26
    deny from 38.105.71.0/25
    deny from 38.105.83.0/27
    deny from 38.112.21.140/30
    deny from 38.118.42.32/29
    deny from 65.213.208.128/27
    deny from 65.222.176.96/27
    deny from 65.222.185.72/29
    # Cyberpatrol
    deny from 38.103.17.160/27
    # Internet Identity - Anti-Phishing
    deny from 66.113.96.0/20
    deny from 70.35.113.192/27
    # Ironport
    deny from 204.15.80.0/22
    # Lightspeed Systems Security
    deny from 66.17.15.128/26
    deny from 69.84.207.32/27
    deny from 69.84.207.128/25
    # Layered Technologies
    deny from 72.36.128.0/17
    deny from 72.232.0.0/16
    deny from 72.233.0.0/17
    deny from 216.32.0.0/14
    # M86
    deny from 67.192.231.224/29
    deny from 208.90.236.0/22
    # McAfee-Secure-Computing
    deny from 69.48.241.64/26
    deny from 80.66.0.0/19
    deny from 192.55.214.0/24
    deny from 207.67.117.0/24
    # Phish-Inspector.com
    deny from 209.147.127.208/28
    # Prescient Software, Inc. Phishmongers
    deny from 198.186.190.0/23
    deny from 198.186.192.0/23
    deny from 198.186.194.0/24
    # urlfilterdb
    deny from 207.210.99.32/29
    # websense-in.car1.sandiego1.level3.net
    deny from 4.53.120.22
    # Websense
    deny from 38.0.0.0/8
    deny from 66.194.6.0/24
    deny from 67.117.201.128/28
    deny from 69.67.32.0/20
    deny from 131.191.87.0/24
    deny from 204.15.64.0/21
    deny from 208.80.192.0/21
    deny from 212.62.26.64/27
    deny from 213.168.226.0/24
    deny from 213.168.241.0/30
    deny from 213.168.242.0/30
    deny from 213.236.150.16/28
    # Yandex
    deny from 77.88.0.0/18
    deny from 77.88.22.0/23
    deny from 77.88.24.0/21
    deny from 77.88.24.0/22
    deny from 77.88.28.0/22
    deny from 77.88.36.0/23
    deny from 77.88.42.0/23
    deny from 77.88.44.0/24
    deny from 77.88.50.0/23
    deny from 87.250.224.0/19
    deny from 87.250.230.0/23
    deny from 87.250.252.0/22
    deny from 93.158.128.0/18
    deny from 93.158.137.0/24
    deny from 93.158.144.0/21
    deny from 93.158.144.0/23
    deny from 93.158.146.0/23
    deny from 93.158.148.0/22
    deny from 95.108.128.0/17
    deny from 95.108.128.0/24
    deny from 95.108.152.0/22
    deny from 95.108.216.0/23
    deny from 95.108.240.0/21
    deny from 95.108.248.0/23
    deny from 178.154.128.0/17
    deny from 178.154.160.0/22
    deny from 178.154.164.0/23
    deny from 199.36.240.0/22
    deny from 213.180.192.0/19
    deny from 213.180.204.0/24
    deny from 213.180.206.0/23
    deny from 213.180.209.0/24
    deny from 213.180.218.0/23
    deny from 213.180.220.0/23
    # End Custom Blocks

  10. #7
    IProx is offline Unknown Net Builder
    Join Date
    Jun 2009
    Posts
    20
    Thanks
    1
    Thanked 5 Times in 5 Posts
    A few points for consideration:

    - The filtering companies that you seek to block are large and have tremendous resources at their disposal. Blocking part of them is equivalent to not blocking them at all. If they can connect to your site from any part of any of their networks, they will get what they want and your efforts will have been for nothing.

    - Many scrapers cloak their UserAgent and will not transmit a UserAgent that identifies them as a bot. The bots that try to avoid detection are the ones worth blocking.

    - The PSI range (38.0.0.0/8) is sub-delegated to tens of thousands of Cogent customers which include regional ISPs. Blocking them all is expedient but extremely crude and will certainly block countless real users.
    Block Proxies, Bots, Filtering Companies, Scrapers, and more with BlockScript.

  11. Thanked by:

    vectro (20 January, 2011)

  12. #8
    codename_B's Avatar
    codename_B is offline PHP Coder
    Join Date
    Jan 2011
    Posts
    106
    Thanks
    9
    Thanked 14 Times in 11 Posts
    Also I discovered going this isn't good if you want to submit to a proxy list that requires a backlink and has an auto checker, because they can't check it!
    http://find-new-proxies.com Auto Approval Proxy List
    http://uf6.info Free Text File Hosting

  13. Thanked by:

    vectro (20 January, 2011)

  14. #9
    vectro's Avatar
    vectro is offline I Like Stuff
    Join Date
    Dec 2008
    Location
    U.S.A.
    Posts
    631
    Thanks
    192
    Thanked 104 Times in 82 Posts
    I have updated the list and removed the 38.x.x.x range, at least for now. I can't find any accurate info on Google about PSI IPs. It seems any page I came across discussed the same issue as us, like this one.

    Quote Originally Posted by Kaustubh View Post
    Can you add this part of default .htaccess for all current proxy hosting account.
    I try not to force too much on accounts except for some really crucial things. If you (or anyone else) wants this on your account, just reply here, PM me or open a support ticket.

    Quote Originally Posted by IProx View Post
    - The filtering companies that you seek to block are large and have tremendous resources at their disposal. Blocking part of them is equivalent to not blocking them at all. If they can connect to your site from any part of any of their networks, they will get what they want and your efforts will have been for nothing.
    I basically agree with you but still think a list like this would end up keeping some things out. I still see your point that it's harder to block the larger filtering operations.

    Quote Originally Posted by IProx View Post
    - Many scrapers cloak their UserAgent and will not transmit a UserAgent that identifies them as a bot. The bots that try to avoid detection are the ones worth blocking.
    I've seen UserAgents who's names were just random strings of characters. They show up in the logs all the time. This list is one of those "better than nothing" things. I don't expect it to perform miracles.

    Quote Originally Posted by IProx View Post
    - The PSI range (38.0.0.0/8) is sub-delegated to tens of thousands of Cogent customers which include regional ISPs. Blocking them all is expedient but extremely crude and will certainly block countless real users.
    I've removed that for now.

    Quote Originally Posted by codename_B View Post
    Also I discovered going this isn't good if you want to submit to a proxy list that requires a backlink and has an auto checker, because they can't check it!
    How so? Wouldn't that only happen if the proxy list has the same IP or UserAgent as the scrapers and filters? I doubt that would be an issue.

    Besides, it's possible to explicitly allow things, so the UserAgents of common proxy list scripts can be included that way, too.

  15. #10
    Sami4u's Avatar
    Sami4u is offline Butterflies Forever
    Join Date
    Sep 2009
    Location
    USA
    Posts
    1,421
    Thanks
    695
    Thanked 293 Times in 221 Posts
    Hi,

    I'm just a dummy, but is their a way that could be added to the sever once instead of copying that into every different site?

    Sami
    Current Celebrity Gossip Movies & More TV Site
    Find out how I'm able to get up to 420 backlinks for month, by spending 30 seconds per day...all for FREE! - Click Here

Page 1 of 2 12 LastLast

Similar Threads

  1. Block this site from selling access to your proxy
    By Mike-XS in forum Web Proxies
    Replies: 11
    Last Post: 26 November, 2010, 00:36 AM
  2. Replies: 10
    Last Post: 27 October, 2010, 20:10 PM
  3. Proxy Sites - Adsense - Countries To Block ??
    By Soulzripper in forum AdSense
    Replies: 17
    Last Post: 20 October, 2009, 13:20 PM
  4. Replies: 4
    Last Post: 1 March, 2009, 12:24 PM
  5. How to block this proxy ?
    By Szise in forum Web Proxies
    Replies: 3
    Last Post: 24 February, 2009, 18:15 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •