Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: SimilarPages

Hybrid View

  1. SimilarPages

    A web robot claiming to be from SimilarPages has been crawling my sites, while completely ignoring robots.txt.

    It's difficult to tell for certain that this robot really is from SimilarPages.com, because the crawling is coming from Amazon's AWS cloud computing system.

    The robot visits look like this:
    2009-02-25 (Wed) 02:20:40 address is 67.202.52.58, hostname is ec2-67-202-52-58.compute-1.amazonaws.com, agent is SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler; http://www.similarpages.com; info at similarpages dot com)
    If this bot really is from SimilarPages.com, I want to tell them that they aren't going to beat Google by building a search engine from an open source project (Nutch) and hosted at Amazon Web Services.

    If this is just another blackhat crawler hosted at Amazon, I want to tell Amazon that their customers are hurting their brand value.

    I think that I will block Amazon Web Services's IP ranges on our external firewall.
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  2. #2
    That one looks very suspicious. Even is it is from SimilarPages.com, what would be the advantage with letting it crawl your sites? Does anyone actually use SimilarPages.com?

    I would block it.

  3. #3
    I woul block it as well. Even if it is from similarpages.com, it wouldn't be big loss.

  4. #4
    They're trying to build a new Copyscape website maybe Maybe they will only do it paid and will only get 100 people buying it monthly, but if it will cost 10$ they will have an extra 1000$

    Just a quick stupid theory of mine.. but their idea is probably based on Copyscape (As their name is similarpages)

    Well block them, that's all I can say


    Greetz
    |Nico Lawsons

  5. #5
    There website is just a logo and here is some whois data:
    Similarpages.com - Similar Pages

    I would say either someone is trying to build something new or just harvesting for their own purpose.

  6. #6
    It it cannot be a serious site when it is hosted at Yahoo and sneaking in through Amazon's AWS Cloud Computing.

    I think they are trying to create similar pages, AKA spamola.

  7. #7
    That bot has been roaming around my blog as well. I like him.

  8. #8
    Quote Originally Posted by stickycarrots View Post
    That bot has been roaming around my blog as well. I like him.
    Is he your pet?

  9. #9
    Quote Originally Posted by Shawn View Post
    Is he your pet?
    I have named him Spot.

  10. Quote Originally Posted by stickycarrots View Post
    I have named him Spot.
    Out, damn'd spot!
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •