NetBuilders

You are welcome to look around. You will have to register before you can post a message, create a blog, chat live with our members, or add a site to our directory.



Advertise With Us

Reply Make Money Freelance Writing
Old 2 February, 2009, 18:18 PM   #11 (permalink)
Gozer
 
Will.Spencer's Avatar
 
Location: Singapore
iTrader: (45)
Blog Entries: 1
Thanked 1,622 Times in 890 Posts
Posts: 4,958
$NetBucks: 8,564
Join Date: Dec 2008
Last Online: Today 08:07 AM
Default

Quote:
Originally Posted by TopDogger View Post
Will, does your white list cover all of the Google spiders?
It blocks every one I've seen. Each time an IP gets blocked by bot-trap, I get an email. I look at the email to see if the User-Agent is a bot that should not be blocked. If it is, I check it's IP address in WHOIS to make sure it's not some content thief faking their User-Agent. If the User-Agent and the WHOIS data match, I add the new IP address range to my white list.
  Reply With Quote
Old 18 February, 2009, 11:53 AM   #12 (permalink)
Net Builder
 
TopDogger's Avatar
 
iTrader: (3)
Thanked 186 Times in 128 Posts
Posts: 506
$NetBucks: 726
Join Date: Jan 2009
Last Online: Today 02:28 AM
Default

I'm taking a look at this today.

If I understand this correctly, in addition to copying the other files to the server and modifying the robots.txt file, I need to set 777 permissions on the .htaccess file in the root directory and add the whitelist IPs. Is that correct?

I currently have the following list of spammer IPs blocked in the .htaccess on the test site. Do I just add the whitelist IPs to this block or keep them in a separate group?

Code:
<Limit GET HEAD POST>
order allow,deny
deny from 24.129.33.46
deny from 69.94.108.180
deny from 82.128. 
deny from 208.66.195.
allow from all
</LIMIT>
Where is the blacklist being created? Could this system fill up the .htaccess file with blacklisted IPs over time?

Are you using a single pixel image for the link trap or did you place a larger image on the page somewhere?

BTW, the .htaccess protocol does not allow comments on the same line as a directive. I was making notations just like those in your whitelist for IPs that I was manually banning and my server error log filled up with error messages. You have to place comments on separate lines.
  Reply With Quote
Old 18 February, 2009, 12:38 PM   #13 (permalink)
Gozer
 
Will.Spencer's Avatar
 
Location: Singapore
iTrader: (45)
Blog Entries: 1
Thanked 1,622 Times in 890 Posts
Posts: 4,958
$NetBucks: 8,564
Join Date: Dec 2008
Last Online: Today 08:07 AM
Default

Quote:
Originally Posted by TopDogger View Post
If I understand this correctly, in addition to copying the other files to the server and modifying the robots.txt file, I need to set 777 permissions on the .htaccess file in the root directory and add the whitelist IPs. Is that correct?
My .htaccess files seem to work with 644 (rw-r--r--) permissions, but that's because they are owned by the same user id as the web server runs under.

Quote:
Originally Posted by TopDogger View Post
I currently have the following list of spammer IPs blocked in the .htaccess on the test site. Do I just add the whitelist IPs to this block or keep them in a separate group?
Hmmm... you got me... I think they can be separate.

Quote:
Originally Posted by TopDogger View Post
Where is the blacklist being created? Could this system fill up the .htaccess file with blacklisted IPs over time?
It's appended to the end of .htaccess. The .htaccess can grow large over time. Every month or two I delete the older deny statements.

Quote:
Originally Posted by TopDogger View Post
Are you using a single pixel image for the link trap or did you place a larger image on the page somewhere?
I'm using a single pixel image.

Quote:
Originally Posted by TopDogger View Post
BTW, the .htaccess protocol does not allow comments on the same line as a directive. I was making notations just like those in your whitelist for IPs that I was manually banning and my server error log filled up with error messages. You have to place comments on separate lines.
That's odd -- it seems to be working fine here under Apache 2.2. Are you running Apache 1.3 or Apache 2.2?
  Reply With Quote
Old 18 February, 2009, 14:53 PM   #14 (permalink)
Net Builder
 
TopDogger's Avatar
 
iTrader: (3)
Thanked 186 Times in 128 Posts
Posts: 506
$NetBucks: 726
Join Date: Jan 2009
Last Online: Today 02:28 AM
Default

Thanks for the quick update.

Quote:
Originally Posted by Will.Spencer View Post
My .htaccess files seem to work with 644 (rw-r--r--) permissions, but that's because they are owned by the same user id as the web server runs under.
644 is the standard permissions for the .htaccess. The instructions say to, "Make blacklist.dat and .htaccess writable by the web server user." I interpret that to mean that the file needs to be writable by the script being run by a user, which 644 doesn't cover. Maybe I'm interpreting this wrong. I'm looking at it as being similar to a cache directory, where the permissions typically need to be set to 666 or 777. Are the permissions on the blacklist.dat file set to 644 also?

Quote:
Originally Posted by Will.Spencer View Post
That's odd -- it seems to be working fine here under Apache 2.2. Are you running Apache 1.3 or Apache 2.2?
I'm running Apache 2.2.10. One of the techs at my hosting company pointed that out a few months back while we were troubleshooting a server issue. He noticed that the Apache error log was pretty fat and packed with hundreds of messages pointing to the .htaccess files. He took a look at one of the .htaccess files and said that comments had to be on separate lines. I never saw an error with my sites, so the messages may have been warnings. It was something new to me.
  Reply With Quote
Old 18 February, 2009, 16:37 PM   #15 (permalink)
Gozer
 
Will.Spencer's Avatar
 
Location: Singapore
iTrader: (45)
Blog Entries: 1
Thanked 1,622 Times in 890 Posts
Posts: 4,958
$NetBucks: 8,564
Join Date: Dec 2008
Last Online: Today 08:07 AM
Default

Quote:
Originally Posted by TopDogger View Post
644 is the standard permissions for the .htaccess. The instructions say to, "Make blacklist.dat and .htaccess writable by the web server user." I interpret that to mean that the file needs to be writable by the script being run by a user, which 644 doesn't cover. Maybe I'm interpreting this wrong. I'm looking at it as being similar to a cache directory, where the permissions typically need to be set to 666 or 777. Are the permissions on the blacklist.dat file set to 644 also?
I think it means "the script being run by the web server." On my system, the web server runs as the user www and .htaccess is owned by the user www.

But, 777 works too -- no matter who owns the file.

Quote:
Originally Posted by TopDogger View Post
I'm running Apache 2.2.10. One of the techs at my hosting company pointed that out a few months back while we were troubleshooting a server issue. He noticed that the Apache error log was pretty fat and packed with hundreds of messages pointing to the .htaccess files. He took a look at one of the .htaccess files and said that comments had to be on separate lines. I never saw an error with my sites, so the messages may have been warnings. It was something new to me.
Very odd -- not a single mention of this in my error logs.

But I did some Googling and found other people with similar issues -- particularly when the comments contained forward slashes. So, it seems best to move the comments to separate lines.
  Reply With Quote
Old 18 February, 2009, 17:25 PM   #16 (permalink)
Net Builder
 
TopDogger's Avatar
 
iTrader: (3)
Thanked 186 Times in 128 Posts
Posts: 506
$NetBucks: 726
Join Date: Jan 2009
Last Online: Today 02:28 AM
Default

Quote:
Originally Posted by Will.Spencer View Post
I think it means "the script being run by the web server." On my system, the web server runs as the user www and .htaccess is owned by the user www.
OK. after I see how it logs the first spider, I will try setting the permissions to 644 to see what happens. If it does not work, an error will probably show up in the site's error log. Personally, I prefer not setting anything to 777.


Quote:
Originally Posted by Will.Spencer View Post
Very odd -- not a single mention of this in my error logs.

But I did some Googling and found other people with similar issues -- particularly when the comments contained forward slashes. So, it seems best to move the comments to separate lines.
All I use is a hash for comments in the .htaccess. There might be some kind of obscure server configuration issue that causes it to log errors on some servers, but not on others.
  Reply With Quote
Old 19 February, 2009, 14:17 PM   #17 (permalink)
Gozer
 
Will.Spencer's Avatar
 
Location: Singapore
iTrader: (45)
Blog Entries: 1
Thanked 1,622 Times in 890 Posts
Posts: 4,958
$NetBucks: 8,564
Join Date: Dec 2008
Last Online: Today 08:07 AM
Default

Quote:
Originally Posted by TopDogger View Post
OK. after I see how it logs the first spider, I will try setting the permissions to 644 to see what happens. If it does not work, an error will probably show up in the site's error log. Personally, I prefer not setting anything to 777.
As an old Unix guy, 777 makes me feel weird.

Quote:
Originally Posted by TopDogger View Post
All I use is a hash for comments in the .htaccess. There might be some kind of obscure server configuration issue that causes it to log errors on some servers, but not on others.
The forward slashes were apparently being misinterpreted as being part of a CIDR (Classless Internet Domain Routing) statement. Like 192.168.0.0/24.
  Reply With Quote
Old 19 February, 2009, 17:43 PM   #18 (permalink)
Net Builder
 
TopDogger's Avatar
 
iTrader: (3)
Thanked 186 Times in 128 Posts
Posts: 506
$NetBucks: 726
Join Date: Jan 2009
Last Online: Today 02:28 AM
Default

Quote:
Originally Posted by Will.Spencer View Post
As an old Unix guy, 777 makes me feel weird.
I tested it again this morning. It would not work with 644, so I had to settle for 666.

Quote:
Originally Posted by Will.Spenser View Post
The forward slashes were apparently being misinterpreted as being part of a CIDR (Classless Internet Domain Routing) statement. Like 192.168.0.0/24.
I do not understand how the CIDR number works. What range of IPs does 64.233.160.0/19 cover?

I have a long list of G spider IPs. Most are in different IP ranges.
  Reply With Quote
Old 19 February, 2009, 18:58 PM   #19 (permalink)
Gozer
 
Will.Spencer's Avatar
 
Location: Singapore
iTrader: (45)
Blog Entries: 1
Thanked 1,622 Times in 890 Posts
Posts: 4,958
$NetBucks: 8,564
Join Date: Dec 2008
Last Online: Today 08:07 AM
Default

Quote:
Originally Posted by TopDogger View Post
I do not understand how the CIDR number works. What range of IPs does 64.233.160.0/19 cover?
CIDR notation uses binary math. /19 means "the first 19 binary digits are the network range and the rest is the IP address range." When you make the number after the slash larger, the networks get smaller. When you make the number after the slash smaller, the networks get larger.

10.0.0.0/8 is a traditional Class A network, i.e. 10.0.0.0 to 10.255.255.255.

192.168.0.0/24 is a traditional Class C network, i.e. 192.168.0.0 to 192.168.0.255.

Raising the number after the slash by one digit cuts the size of the network in two. Lowering the number after the slash doubles the size of the network.

I've never liked doing math, so I use a CIDR calculator for these calculations.
  Reply With Quote
Thanked by:
TopDogger (20 February, 2009)
Old 21 February, 2009, 11:42 AM   #20 (permalink)
Net Builder
 
TopDogger's Avatar
 
iTrader: (3)
Thanked 186 Times in 128 Posts
Posts: 506
$NetBucks: 726
Join Date: Jan 2009
Last Online: Today 02:28 AM
Default

I snagged my first spider yesterday.

Code:
address is 80.57.190.67, hostname is g190067.upc-g.chello.nl, agent is Java/1.6.0-oem
I feel like I've gone fishing.

Will, are you running the spider trap link on multiple pages? Right now, I just have mine on the home page.
  Reply With Quote
Reply

Bookmarks

Tags
block, content, downloaders, scrapers, web


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Block Robots and Web Downloaders with robots.txt Will.Spencer Managing 12 6 June, 2009 15:40 PM
Anyone know how to Block Web Proxies Mr.Bill Web Proxies 4 30 April, 2009 20:48 PM
How to Profit from Content Scrapers? Shenron Promoting 4 12 March, 2009 18:58 PM
How to block this proxy ? Szise Web Proxies 3 24 February, 2009 18:15 PM
Block A Country Will.Spencer Managing 11 8 January, 2009 20:58 PM


All times are GMT. The time now is 11:40 AM.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.1
vBAdvertise v1.0.0 Copyright ©2009, PixelFX Studios
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios