Results 1 to 7 of 7

Thread: Deindexing domains with proxies

  1. #1
    Szise is offline Net Builder
    Join Date
    Dec 2008
    Posts
    426
    Thanks
    14
    Thanked 33 Times in 29 Posts

    Deindexing domains with proxies

    I think this is what happened to my site, a proxy created a mirror, copy of my site then the copy of my site was indexed and my site was penalized for duplicate content.

    This is the identical copy now blocked with a script from my site:

    Take a look here: http://www.localdomain.ws/wml6eb23q/...6p75732r6r6574

    This is the proxy indexed with my content:

    Link to Google

    Is the first time ? no, happened twice, first time another site was penalized and i was not able to rank for my domain name. "domain-name.tld" anymore even after few months.

    Also saw this article: Google Proxy Hacking: How A Third Party Can Remove Your Site From Google SERPs

    Letest conclusion from this blog:

    Update again: September 2009 - damned if this thing hasn't cropped up again – now it looks like Google's replacing the duped URL with the copy's URL – and even RANKING the duplicates… (similar to the already-known-and-passed-off-as-a-feature 302 redirect bug).
    What do you think?
    Last edited by Szise; 3 November, 2009 at 11:09 AM.
    ...

  2. #2
    Szise is offline Net Builder
    Join Date
    Dec 2008
    Posts
    426
    Thanks
    14
    Thanked 33 Times in 29 Posts
    Another article found:

    BlackHat SEO (Negative SEO – Duplicate content)

    Duplicate content has been a problem for people for a long time. The basic idea is to create many copies of the content on someone elses website to confuse search engines about which one is the true author of the content and force them into supplemental results. Generally this method is unlikely to work on a large, long standing authoritative website but for reasonably newly launched websites, sites with low authority (low links, low quality pages etc) then it can be an effective method of keeping them out of the serps.....
    http://www.esrun.co.uk/blog/negative-seo-duplicate-content/
    ...

  3. #3
    Szise is offline Net Builder
    Join Date
    Dec 2008
    Posts
    426
    Thanks
    14
    Thanked 33 Times in 29 Posts
    Localdomain.ws is a spam proxy or proxy used for black hat, take a look, has more than 57.000 duplicate, copied pages indexed.
    Is using the proxified pages, copied pages as static pages to publish new content, free traffic and content generator, when you update your site the proxy will update in real time the copied page.

    Profile: Localdomain.ws
    Last edited by Szise; 7 November, 2009 at 13:37 PM.
    ...

  4. #4
    Keldorn's Avatar
    Keldorn is offline Net Builder
    Join Date
    Dec 2008
    Location
    Canada
    Posts
    400
    Thanks
    21
    Thanked 60 Times in 52 Posts
    I would contact the owner and tell them stick a robots.txt in their site. Like this,

    useragent: *
    Disallow: /wml6eb23q/


    That will tell Google and search bots not index the proxy pages. Unless their maliciously doing this on purpose. There Alexa rank is getting quite low, I would suspect that some of their traffic coming from Google from hijack pages. So your probably not the only one who has had their site hijacked.


    Put this in your .htaccess, to block them. When Google crawls the pages again it will crawl a copy of your 403 then probably drop their proxy page.

    Code:
    <Limit GET HEAD POST>
    order allow,deny
    #Fdcservers IP range from localdomain.ws
    deny from 76.73.0.0/17
    allow from all
    </LIMIT>
    Use the Spam report, to report their website. If the owner of localdomain.ws doesn't put a robots.txt. Google will drop their whole site probably once they see they have thousands of pages indexed like that.
    Submit new proxies -

  5. #5
    Szise is offline Net Builder
    Join Date
    Dec 2008
    Posts
    426
    Thanks
    14
    Thanked 33 Times in 29 Posts
    I redirected the proxified page yesterday to the main domain from my site and used the ping services to notify search engines about the change, the proxified page was deindexed, waiting for results or hope everything will get back to normal.

    The owner is using private registration, i wasn't able to find anything about him.
    ...

  6. #6
    Mike-XS's Avatar
    Mike-XS is offline XeroAgent
    Join Date
    Sep 2009
    Location
    OZ
    Posts
    209
    Thanks
    30
    Thanked 109 Times in 71 Posts
    What do you think?
    Szise, I've run into this site recently and what they are doing appears to be intentional and targeted.
    The owner is using private registration, i wasn't able to find anything about him
    Here are some details I have:

    - localdomain.ws - 76.73.78.218
    - http://www.robtex.com/dns/localdomain.ws.html
    - http://www.robtex.com/ip/76.73.78.218.html

    About Local Domain

    hxxp://localdomain.ws/about.html

    Local Domain is part of the Acme Serve network of sites. The owner is Jim Ohlstein, a webmaster and webhost with many years in the field. Our goal is to help you preserve your anonymity and privacy online, as well as to help bypass filters that serve to stifle the free exchange of information. The free and unfetttered dissemination of information on the World Wide Web is important to us. So is our privacy.

    That's why we're here.
    If you have an abuse complaint, please send to abuse [at] acmeserv.com.
    -
    Domain Name: LOCALDOMAIN.WS

    Registrar Name: Directi Internet Solutions Pvt. Ltd. DBA
    Registrar Email: tldadmin@logicboxes.com
    Registrar Telephone: 832-295-1535
    Registrar Whois: whois.publicdomainregistry.com

    Domain Created: 2009-05-21
    Domain Last Updated: 2009-05-21
    Domain Currently Expires: 2014-05-21

    Current Nameservers:
    ns1.localdomain.ws ..... 76.73.76.123
    rwhois V-1.5:003eff:00 rwhois.fdcservers.net (by Network Solutions, Inc. V-1.5.10-pre6)
    network:Auth-Area:76.73.0.0/18
    network:Class-Name:network
    network:OrgName:RDS.inc
    network:OrgID;I:RDSDATA-RDSDATACENTERCOM
    network:Address:48 monmouth rd
    network:City:glen rock
    network:StateProv:N/A
    network:PostalCode:07452
    network:Country:US
    network:NetRange:76.73.0.0-76.73.0.31
    network:CIDR:76.73.0.0/27
    -
    network:NetName:RDSDATA-RDSDATACENTERCOM
    network:OrgAbuseHandle:FDCservers Customer
    network:OrgAbuseName:eddie bonica
    network:OrgAbusePhone:201.445.7336
    network:OrgAbuseEmail:rdsdata@rdsdatacenter.com
    network:OrgNOCHandle:NOC1402-ARIN
    network:OrgNOCName:Network Operations Center
    network:OrgNOCPhone:+1-312-913-9304
    network:OrgNOCEmail:support@fdcservers.net
    network:OrgTechHandle:PKR5-ARIN
    network:OrgTechName:Petr Kral
    network:OrgTechPhone:+1-312-933-1046
    network:OrgTechEmail:petr@fdcservers.net
    %ok
    Code:
    43 49 91   75.149.229.10 comcast01.den.fdcservers.net 
    9 43 49 133   10.10.95.1 
    10 44 43 43   76.73.78.218
    --

    acmeserv.com
    76.73.76.125

    Code:
    8 43 45 42   75.149.229.10 comcast01.den.fdcservers.net 
    9 90 43 43   10.10.95.1  
    10 102 45 62   76.73.76.125
    --

    Jim also owns a bunch of other sites like jlkhosting and proxy sites like:

    proxyplace.net
    76.73.76.126
    NetRange: 76.73.0.0 - 76.73.127.255
    CIDR: 76.73.0.0/17
    Code:
    8 43 43 43   75.149.229.10 comcast01.den.fdcservers.net 
    9 44 43 43   10.10.95.1 
    10 43 43 44   76.73.76.126
    --


    Registration Service Provided By: JLK HOSTING
    Contact: +1.7574813006
    Website: http://www.jlkhosting.com


    Domain Name: PROXYPLACE.NET

    Registrant:
    JLK Hosting
    James M Ohlstein (jlkhosting@gmail.com)
    1340 N Great Neck Rd
    #1272-364
    Virginia Beach
    Virginia,23454
    US
    Tel. +757.4813006
    --



    jlkhosting




    addresses 69.4.237.31


    Domain Name: JLKHOSTING.COM
    Registrar: DIRECTI INTERNET SOLUTIONS PVT. LTD. D/B/A PUBLICDOMAINREGISTRY.COM
    Whois Server: whois.PublicDomainRegistry.com
    Name Server: NS1.JLKHOSTING.COM
    Name Server: NS2.JLKHOSTING.COM
    Status: ok
    Updated Date: 23-mar-2009
    Creation Date: 14-mar-2006
    Expiration Date: 14-mar-2012

    >>> Last update of whois database: Sun, 06 Dec 2009 16:00:14 UTC <<<

    Queried whois.publicdomainregistry.com with "jlkhosting.com"...

    Registration Service Provided By: JLK HOSTING
    Contact: +1.7574813006
    Website: http://www.jlkhosting.com

    Domain Name: JLKHOSTING.COM

    Registrant:
    JLK Hosting
    James M Ohlstein ( jlkhosting @ gmail.com)
    1340 N Great Neck Rd
    #1272-364
    Virginia Beach
    Virginia,23454
    US
    Tel. +757.4813006

    Creation Date: 14-Mar-2006
    Expiration Date: 14-Mar-2012
    --

    It appears the whois details and the guy/s behind it may be faked. Information like the rdsdata addresses at FDC don't even exist either.

    My wife investigated and posted about it here in more detail: Google proxy hijacking - BISS Forums

    -

    I started seeing a number of fake google bot alerts, with one that directly links back to the localdomain.ws server.

    76.73.76.122
    [12-03-2009-00:34:50] bad-behavior 403

    User-Agent claimed to be Googlebot- claim appears to be false. (f1182195) xerosurf.com
    Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    76.73.76.122
    -

    - 76.73.76.122 - A Record = mail.localdomain.ws
    - http://www.robtex.com/ip/76.73.76.122.html

    --

    Another one going through hidemyass, also on FDC servers.


    67.159.44.138

    [12-04-2009-03:48:41] bad-behavior 403

    User-Agent claimed to be Googlebot- claim appears to be false. (f1182195) xerosurf.com
    Agent: Mediapartners-Google
    67.159.44.138
    w2.hidemyass.com
    67.159.44.138
    NetRange: 67.159.0.0 - 67.159.63.255
    CIDR: 67.159.0.0/18
    NetName: FDCSERVERS
    --

    Few others not directly linked to any proxy yet:

    --
    84.222.12.253
    12-03-2009-12:26:34] bad-behavior 403

    User-Agent claimed to be Googlebot- claim appears to be false. (f1182195) xerosurf.com
    Agent: Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
    84.222.12.253 host-84-222-12-253.cust-adsl.tiscali.it
    --
    137.204.242.252
    [12-11-2009-11:48:21] bad-behavior 403

    User-Agent claimed to be Googlebot- claim appears to be false. (f1182195) xerosurf.com
    Agent: Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
    137.204.242.252
    --
    [12-19-2009-22:31:43] bad-behavior 403

    User-Agent claimed to be Googlebot- claim appears to be false. (f1182195) xerosurf.com
    Agent: Mediapartners-Google
    98.130.2.91 web912.opentransfer.com
    --
    [12-20-2009-14:31:34] bad-behavior 403

    User-Agent claimed to be Googlebot- claim appears to be false. (f1182195) xerosurf.com
    Agent: Mediapartners-Google
    98.130.2.91 web912.opentransfer.com
    --


    OrgName: Ecommerce Corporation
    OrgID: ECOMM-5
    Address: 247 Mitch Lane
    City: Hopkinsville
    StateProv: KY
    PostalCode: 42240
    Country: US

    NetRange: 98.130.0.0 - 98.131.255.255
    CIDR: 98.130.0.0/15
    NetName: ECOMMERCE-HOSTING
    NameServer: NS1.OPENTRANSFER.COM
    NameServer: NS2.OPENTRANSFER.COM
    RegDate: 2007-10-15
    Updated: 2008-06-19
    --

    I think you'll find many more attempts by going through your server logs and pulling out all the entries with google useragents, verify each IP Address and block every IP address that does not belong to google in your htaccess or firewall.

    You could then do a batch whois with something like smartwhois for example, save each IP address to a text file and process that in one hit. Or set up a script to detect and block access from these things like fake googlebots and all other proxies automatically.


    It seems a bit difficult for people to wrap their heads around... hijacked within Google.

    The symptoms: A search for "My Company Name" would bring up the hijacker's site and not mine, all rankings for all keywords gone, massive traffic and sales loss.
    Quote incrediBILL:

    a) Scrapers spoof as Google to rip-off the naive people depending on shoddy .htaccess files blocking bad user agents

    b) Google is fed lists of cloaked links by proxy sites and they crawl thru the proxy sites and hijack your listings via the proxy. In some cases they give the proxy site ownership of your page.

    c) Scrapers even scrape via Google proxy services such as translate.google.com and the Web Accelerator so just limiting Googlebot to a Google IP range is insufficient to stop abuse.

    The Never Ending SERPs Hijacking Problem: Is there a definite solution?
    http://hamletbatista.com/2007/07/03/the-never-ending-serps-hijacking-problem-is-there-a-definite-solution/

    The problem is basically the same. Two URLs pointing to the same content. Google's duplicate content filters kick in and drop one of the URLs. They normally drop the page with the lower PageRank. That is Google's core problem. They need to find a better way to identify the original author of the page.

    When someone blatantly copies your content and hosts it on their site, you can take the offending page down by sending a DMCA complaint to Google, et al.

    The problem with 302 redirects and cgi proxies is that there is no content being copied. They are simply tricking the search engine into believing there are multiple URLs hosting the same content.
    Go to http://www.google.com/alerts
    And type in the search term mysite.com and even some specific unique sentences from your site.
    You can set the email frequency.Makes it real easy to find scrapers and when Google hits a proxy and the page gets cached, you will immediately know and you will be able to immediately block the ip and fill out a spam report.
    Its easier to take care of scrapers and proxy's when the first page is indexed versus having to take care of hundreds of them.
    --

    Proxy Server URLs Can Hijack Your Google Ranking - how to defend?

    Take preventative action now by doing the following...
    1. Add this to all of your headers:
    <base href="http://www.yoursite.com/" />

    and if you see an attempted hijack...
    2. Block the site via .htaccess:
    RewriteCond %{HTTP_REFERER} yourproblemproxy\.com

    3. Block the IP address of the proxy
    order allow,deny
    deny from 11.22.33.44
    allow from all
    4. Do your research and file a spam report with Google.
    http://www.google.com/contact/spamreport.html
    --

    For anyone else who's interested to see if your site is also picked up by this particular proxy, search google using this operator:

    like
    netbuilders site:localdomain.ws
    or
    glype site:localdomain.ws

  7. #7
    Keldorn's Avatar
    Keldorn is offline Net Builder
    Join Date
    Dec 2008
    Location
    Canada
    Posts
    400
    Thanks
    21
    Thanked 60 Times in 52 Posts
    I seen that site advertising on Proxy.org.
    Its likely they dont know about the robots.txt thing, a lot people start with proxies as their first site are ignorant of the protocols.

    I have seen worse from big sites, for example not sending HTTP/1.0 404 Not Found headers for not found content, basically creating the same thing, infinite pages that respond 200 Found for a search engine that are actually 404. Search Engines like Google have to come up with heuristics hacks to detect those pages becuase idiotic crap like that.
    Submit new proxies -

Similar Threads

  1. Proxies
    By zeolshah in forum Proxy List Announcements
    Replies: 1
    Last Post: 6 November, 2011, 12:16 PM
  2. Free Proxies and Selling Proxies
    By R4z0r in forum Proxy List Announcements
    Replies: 0
    Last Post: 31 January, 2010, 01:38 AM
  3. Proxies
    By lethalboom in forum Web Proxies
    Replies: 0
    Last Post: 27 May, 2009, 16:53 PM
  4. Replies: 0
    Last Post: 17 January, 2009, 01:40 AM
  5. What are Proxies?
    By homebizseo in forum Web Proxies
    Replies: 3
    Last Post: 9 January, 2009, 09:29 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •