Results 1 to 5 of 5

Thread: Strange GoogleBot URL in WordPress Sites

  1. #1
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    3,103
    Thanks
    349
    Thanked 918 Times in 702 Posts

    Strange GoogleBot URL in WordPress Sites

    I run several Wordpress blogs and use Lester Chan's wp-useronline plugin to monitor users. I noticed yesterday that the GoogleBot spider indexing the site appeared to be stuck on a single strange URL.

    http:// myblog.com /&usg=ALkJrhg-i5vaNbgDvusXlAIluxQJsJ7R8w%2F%2Fpage%2F2%2F%2Fpage %2F2%2F%2Fpage%2F3%2F%2Fpage%2F2%2F%2Fpage%2F2%2F% 2Fpage%2F2%2F%2Fpage%2F2%2F%2Fpage%2F3%2F%2Fpage%2 F2%2F%2Fpage%2F3%2F%2/

    The decoded version of the querystring is simply:

    /&usg=ALkJrhg-i5vaNbgDvusXlAIluxQJsJ7R8w//page/2//page/2//page/3//page/2//page/2//page/2//page/2//page/3//page/2//page/3/%2/

    GoogleBot is still stuck on this URL this morning.

    I did a search for WordPress &usg. I found thousands of web pages that reflect variations of this odd URL somewhere on the page. For example, check out this History Channel forum thread:


    Has anyone else seen this? Any ideas as to what GoogleBot might be doing? It looks like it might somehow be related to Google's image bot, which might explain why alt attributes are displaying as text on the page. It looks like a WordPress bug (I am using 2.8.2) may be causing the loop, because the Previous and Next links should not include a reflection of any querystring in the page uRL.

    It looks like a WordPress bug, because if you add any test querystring, such as &usg=666 or &aaa=222 to the end of the home page URL and refresh the page, the querystring shows up in the Previous link at the bottom of the page.

    UPDATE

    I could not find any info about this problem on the web. However, the MSN bot also got stuck in the same loop this morning. I removed the Previous and Next links from the hme Page and within 10 minutes both spiders were indexing the site properly again. Because MSN somehow picked up the same link with the $usg parameters. I therefore suspect that the link exists somewhere on the web and this may have been a malicious attack.

    The problem sitill exists for the Previous and Next links on WordPress category pages. All you need to do is add a querystring to the end of the page URL and hit the return button. The problem is that the querystring is inserted INSIDE the URL, which breaks the URL.

    For example, if you have a category page named /category/web-site-development/ and you add a querystring to the end of the URL, such as /category/web-site-development/&aaa=666 and click the return button, the Next link on the reulting page is /category/web-site-development/&aaa=666%2F/page/2/, which never takes you to page 2.

    Has anyone else seen this issue?
    Last edited by TopDogger; 28 July, 2009 at 22:28 PM.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  2. #2
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,033
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    Ah... my old nemesis &USG!

    Where do &usg URLs come from?

    • Do a Google search for "deflation" (while logged in to your Google account).
    • Mouseover the URL of the first result. It will look like this:
      Code:
      http://en.wikipedia.org/wiki/Deflation
    • Right-click and copy the URL. Now, paste the URL into a text editor. It will look like this:
      Code:
      http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FDeflation&ei=g1ZuSuH9GcKAkQXaoOipCw&rct=j&q=deflation&usg=AFQjCNFCt828ApgJLHMCSlyQIRPDSUKq3Q&sig2=zRKhYBxexIwW1ntyeAmKCA

    Do you see our little friend &USG in the Google URL?

    We know who creates &usg parameters, even if we do not yet know why.

    Here's a quick .htaccess fix to strip &USG parameters:
    Code:
    RewriteBase /
    RewriteRule (.*)&usg=(.*)$ $1 [R=301]
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  3. #3
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    3,103
    Thanks
    349
    Thanked 918 Times in 702 Posts
    Interesting. Thanks for the code. I was planning to use mod_rewrite to strip it out, but I wanted to first find out what it is and where it is coming from. I am a bit surprised to find a lack of information about this on the web.

    The strange thing is that it is not just a Google issue. The MSN bot was also hitting the site with that URL. Do you suspect that doing a search while logged into a Google account is altering URLs in search results? That still should not affect the MSN bot, because those URLs should not exist in their index unless they find them on a page somewhere.

    Both Google and MSN bots kept returning yesterday using the URLs with the usg parameter, but no longer appeared to be getting stuck in the loop on the home page after I removed the Previous link. GoogleBot is back again this morning with that URL. The URL is generating a status code 200.

    The secondary part of this is the apparent bug within WordPress that allows a querystring attached to a URL to be injected within a URL on the category pages, which would also cause the problem because the Previous link still lands on the first page of the category.

    Will, have you run into this before? Are you actively using the mod_rewrite code on your sites?

    I tried the mod_rewrite rule, but it does not appear to work. I tried adding a caret at the start, but still nada. It looks like it should work. It is the first rule in the .htaccess file. I wonder is some of the WordPress rewrites are messing with it.

    Code:
    RewriteRule ^(.*)&usg=(.*)$ $1 [R=301]
    Last edited by TopDogger; 28 July, 2009 at 22:33 PM.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


  4. #4
    Will.Spencer's Avatar
    Will.Spencer is offline Retired
    Join Date
    Dec 2008
    Posts
    5,033
    Blog Entries
    1
    Thanks
    1,010
    Thanked 2,329 Times in 1,259 Posts
    I'm using that code on some of my sites.

    For a long time, the &usg versions of many of my URLs were ranking in Google instead of the correct URLs.

    And, of course, once they started ranking people started linking to them...
    Submit Your Webmaster Related Sites to the NB Directory
    I swear, by my life and my love of it, that I will never live for the sake of another man, nor ask another man to live for mine.

  5. #5
    TopDogger's Avatar
    TopDogger is offline Über Hund
    Join Date
    Jan 2009
    Location
    Hellfire, AZ
    Posts
    3,103
    Thanks
    349
    Thanked 918 Times in 702 Posts
    Quote Originally Posted by Will.Spencer View Post
    For a long time, the &usg versions of many of my URLs were ranking in Google instead of the correct URLs.

    And, of course, once they started ranking people started linking to them...
    That would explain how MSN got the links.

    Google keeps coming back with those links. I am seeing variations in the hash code, so they might be picking up multiple links with differnt versions of the usg parameter.

    I figured out the issue with the rewrite code. I had to add the L at the end so it dies not pass through to the WordPress rewrite after it gets snagged.

    Code:
    RewriteRule ^(.*)&usg=(.*)$ $1 [R=301,L]
    I'm still having a problem with it because it works when I use a test parameter, such as usg=666, but does not work with the long parameter values from Google.

    I did some further testing and found that the rewrite does not work whenever there is a % in the value for the usg parameter. Do you have any idea as to why that would be happening?

    It works with &usg=666

    But does not work with &usg=666%

    Furthermore, it is now generating a 301 status code. That tells me it is snagging the URL, but is not rewriting it properly. This is getting pretty strange.

    Will, the good news is that when I Google for "google usg parameter", this post comes up as #1. That is a good sign that the forum is getting attention.

    I found the following information about this Google phenomenon. It looks like it is related to Google Analytics. Apparently it tells Google Analytics where the site ranked when someone clicked on the link.

    Google Search To Change Referral Strings: SEOs Discuss

    Google Analytics Blog: An upcoming change to Google.com search referrals; Google Analytics unaffected

    Google Adds Ranking Data to Referrer String?


    I wonder if Google has any idea about the way this is screwing up the Previous and Next links in WordPress when someone follows a link tainted with their tracking code, or if they care. Do No Evil. Bwahahahaha!

    It looks like I should focus on fixing the bug in the WordPress core code.
    Last edited by TopDogger; 29 July, 2009 at 14:18 PM.
    "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin


Similar Threads

  1. [WTS] Wordpress membership sites
    By bandarz in forum Marketplace
    Replies: 0
    Last Post: 23 February, 2011, 06:30 AM
  2. Strange links from porn sites
    By TopDogger in forum Wordpress
    Replies: 8
    Last Post: 16 January, 2010, 13:30 PM
  3. GoogleBot Indexing Drafts
    By TopDogger in forum Wordpress
    Replies: 2
    Last Post: 12 September, 2009, 17:16 PM
  4. Replies: 2
    Last Post: 7 August, 2009, 18:00 PM
  5. Googlebot in Google Analytics?
    By tmongy in forum Stats
    Replies: 2
    Last Post: 28 March, 2009, 01:06 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •