Page 2 of 2 FirstFirst 12
Results 11 to 19 of 19

Thread: Google on Content Scrapers

  1. #11
    If they really want to scrap your contents, they will and you can put all the counter measures as you wish, it will only diminish the frequency but not all of them.

    I don't pay attention to the spammers linking back to me, but to all spammers using my contents with no link back to my site, sooner or later I will catch these guys then I will report them to search engines, their hosting, IP provider, or my lawyer and I am good at it.

    The new fashion to spam is still copying the meta descriptions from different sites and make pages like these for unethical SEO, I hope search engines like Google, will include the meta descriptions in their algo detection, if not already done.




    Those who can make you believe absurdities can make you commit atrocities.

    Voltaire


  2. #12
    Well, my experience has been most scrapers are lazy and the related post links get included. If someone wants to manually steal my stuff, they're going to do it if its online. Just the way the Internet works...just have to make it painful when you catch folks doing it. Believe I read about it from Johnny Chow before starting to use the plugin, and he was right on the money for what I've seen...but we all see different stuff and that's what's nice about NetBuilders, we learn from each other on what works and what doesn't in different contexts.

    Just like the difference in kiddy hackers vs professionals...you stop 99% of the kiddies by staying up to date on security patches, best practices, etc...the professional will get you at some point if they want to...

    Cheers,
    James

  3. #13

  4. #14
    Thanks for the article Loko.

    Scrapers linking back to you is Google business appreciation, but when Matt says "scraped content will very rarely rank higher than the original, legitimate content."
    I am disappointed because in this case scrapers can benefit from your contents even when you rank higher.

    The traffic generated by your contents is already there and if they use your contents like they were the copyright holder changing the company name or things like that, ranking just under can split your traffic, so yes it is important to take down these people as it may affect one way or another your site.

    That's why scrapers need to be taken down asap.
    Those who can make you believe absurdities can make you commit atrocities.

    Voltaire


  5. #15
    The irony is that Google is maybe the biggest content scraper on the web. Almost all their content is scraped. Full text in the cache, and images in image search where they frame our sites.

    The robot drills down everywhere unless told not to via robots.txt I think it even traverses web form submit URLs if it can. And possibly goes as far as havesting WHOIS content.

    Not that I'm complaining.

  6. #16
    Quote Originally Posted by Natural Elements View Post
    Scrapers linking back to you is Google business appreciation, but when Matt says "scraped content will very rarely rank higher than the original, legitimate content." I am disappointed because in this case scrapers can benefit from your contents even when you rank higher.
    The real problem is that Google and other search engines do not always know who posted the original content. When content thieves can instantly steal content via RSS (i.e. autoblogs), a spider may find the thief's copy before they find the original.

    There is a reason as to why Matt stresses the addition of links back to a page. Links to a page are what they use to determine which copy is the original. The page with the most links is the winner. I saw that a few years ago when I talked to a guy who had a very popular top ranking article about camping. One day his page dropped out of sight in Google. When he checked it out he found that a Boy Scout site stole his camping article and posted it without any link-back. Their site was much more popular with more links and higher PR. His page was then flagged as duplicate content, even though it had been on the web for a couple of years prior to it being stolen.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  7. #17
    Quote Originally Posted by TopDogger View Post
    His page was then flagged as duplicate content, even though it had been on the web for a couple of years prior to it being stolen.
    Well, if someone steals a page, article, or part of my contents, the most effective solution is to use a DMCA to remove the copyright infrigment.

    Most of the time I try to contact the webmaster/site owner directly to be fair, but will escalate from there if that doesn't work.
    Those who can make you believe absurdities can make you commit atrocities.

    Voltaire


  8. #18
    yeah time wasters... for sure!

  9. The irony is that Google is maybe the biggest content scraper on the web. Almost all their content is scraped. Full text in the cache, and images in image search where they frame our sites
    And then there's Google fastflip:
    http://fastflip.googlelabs.com/

    http://nekkidninjas.com/index.php/2009/10/28/google-fast-flipping-you-out-of-revenue
    Not satisfied controlling almost every advertisement on almost every website in the world, Google has decided that it's going to cut the middleman out of its advertising revenue, the middleman being the content providers for the sites their advertising is on.

    How are they going to do this?

    It appears to be through a new service under development in the Google Labs, called fast flip. Fast flip takes an image of a particular site, at this stage it appears to be only news, and then posts the image with their own advertising into the fast flip service. With a newsfeed page with lots of links, we need to click through one link to get to the full article, but fast flip makes us click through two pages to get to the full article, one of which has half the article and Google's advertising.

    So not only is it not faster, but Google is depriving the content producer of all revenue on their articles much, if not most of the time. I am fairly confident that this will do a damn good job of reducing the amount of useful content that is produced, because authors who aren't getting paid tend to stop writing anything that requires a serious investment of time and energy.
    Google is also the only search engine affected by the 302 redirect hijacks and often still punishes the wrong websites.

    A very effective way to block scrapers, fake googlebots and other junk traffic that has no positive impact on your site is to use something like MM Autoban with Bad Behavior and the HTTP:BL from Project HoneyPot.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Block Web Content Scrapers and Downloaders
    By Will.Spencer in forum Managing
    Replies: 43
    Last Post: 26 March, 2012, 18:52 PM
  2. Big Content Farm Still Thriving After Google Algorithm Change
    By Franc Tireur in forum Search Engine News
    Replies: 10
    Last Post: 20 April, 2011, 02:03 AM
  3. Replies: 0
    Last Post: 19 February, 2010, 21:16 PM
  4. Duplicate Content and Multiple sites Issues - From Google Staff
    By thebookmarker in forum Content and Writing
    Replies: 6
    Last Post: 19 October, 2009, 08:09 AM
  5. How to Profit from Content Scrapers?
    By Shenron in forum Promoting
    Replies: 4
    Last Post: 12 March, 2009, 18:58 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •