Strange Behavior - Dupe Content Via Query String URLs?
-
Hey y'all, could use community help with some strange behavior I'm seeing with a particular ranking.
A week ago a high volume keyword ranking above the fold dropped off the map. I immediately thought must be an algorithmic penguin penalty (no manual action message) or panda / dupe content issue. I think it's dupe content at this point because I found my former ranking page in the omitted results section for the keyword we used to rank for.
The strange thing is that without making any changes, Google would momentarily show our domain ranking high page one again, but with a strange query string URL. At first just domain.com/page/? whereas the old ranking was held by domain.com/page/ but now I see several long query string URLs floating around that the engines don't seem to know what to do with. Canonical tags are in place to canonicalize any query string URL back to the top and I have now designated query string URLs as unimportant in Search Console parameter filtering but these URLs persist.
I ended up deduplicating content to a page on another domain we own (think that was the original problem) and there seemed to be a positive effect but now we are top of page 2 with a much longer query string URL as the ranking page. It seems Google wants to rank everything but the former ranking URL even though it's the most authoritative by far, has canonical signals in place, and is now no longer duplicate content. Content checker tool showed 60% similarity to the other piece, which is a ratio I've never known to cause dupe content.
We found the source of the query string URLs to be from an external site that has a link to us but it's a buggy site so filtering on the page adds the string to our URL, so Google can find them and thinks they're significant.
Long question short, has anyone had trouble like this? Getting weird parameter / query URLs to get out of the index in favor of the non-parameter folder? Is it possible the main folder page got hit with Penguin and is "banned?" Still, I don't know why Google would go out of it's way to rank query string copy pages in its place if that were the case. Any help greatly appreciated.
An example of the URL looks like this:
domain.com/page/?CustomerSubscriptionTrack1PageSize=1&CustomerSubscriptionTrack1Order=Sorter_ID&CustomerSubscriptionTrack1Dir=ASC&CustomerSubscriptionTrack1Page=3&WorkOrder_TBLOrder=Sorter_AssetID&WorkOrder_TBLDir=ASC&ID=106 -
Hey James, sorry to hear you're getting blasted by negative links and appreciate your responses here.
I actually sorted this one out (fingers crossed it stays that way) by having the dev team implement a redirect rule that 301 redirects any query string back to the folder we want ranking. Similar signal to what the canonical tag would send but in my opinion a stronger signal since there is no longer a way to reach those weird query string URLs with a 200 response.
Once that was implemented the appropriate page was right back to its old high ranking position and the query strings are hardly to be seen in the index and are no longer preferred to the old ranking page - so looks like all is right with the world again.
We also disavowed the domain that was the source of many of the query string URLs. I don't think it was a case of negative SEO - just bad coding on their side. I'm not sure what exactly did the trick but I suspect strongly that the 301 redirects is what solidified the index due tot the strong correlation of that change with ranking recovery.
Maybe you can employ a similar solution whereby you can disavow domains where these links originate or set up server side handling to manage URLs of a specific pattern - for example, any URL containing "pornsite.com" if not any query string altogether (in our case we don't have any use for query strings in our URLs so just bagged them all).
Thanks again,
Matt -
Thanks for the response, James. The odd thing is that canonical tags are implemented correctly as far as I can tell. In the of each variation you can find the following code:
rel="canonical" href="https://www.domain.com/page/" />
(still using my example so as to keep the site anonymous)
And this code had been in place well before the issue arose. So yes, we are sending that signal to Google to apply canonical back to the top in every case, without query string.
Not sure what you're confused by in Search Console - the platform provides a tool to deal with parameter URLs just like the ones I'm seeing. I used it to mark all parameter URLs as not changing content, which should designate to engines to exclude them from the index.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content question
Hey Mozzers! I received a duplicate content notice from my Cycle7 Communications campaign today. I understand the concept of duplicate content, but none of the suggested fixes quite seems to fit. I have four pages with HubSpot forms embedded in them. (Only two of these pages have showed up so far in my campaign.) Each page contains a title (Content Marketing Consultation, Copywriting Consultation, etc), plus an embedded HubSpot form. The forms are all outwardly identical, but I use a separate form for each service that I offer. I’m not sure how to respond to this crawl issue: Using a 301 redirect doesn’t seem right, because each page/form combo is independent and serves a separate purpose. Using a rel=canonical link doesn’t seem right for the same reason that a 301 redirect doesn’t seem right. Using the Google Search Console URL Parameters tool is clearly contraindicated by Google’s documentation (I don’t have enough pages on my site). Is a meta robots noindex the best way to deal with duplicate content in this case? Thanks in advance for your help. AK
Technical SEO | | AndyKubrin0 -
Folders in url structure?
Hello, Revamping an out-of-date website and am wondering if I need to include the folders (categories) in the url structure? The proposed structure has 8 main folders. I've been reading that Google is ok if the folder is not included in the url, but is it really? The hesitation I have is that the urls are getting long and the main folder only has only a sub folder beneath it. So, /folder-name/facility-name/treatment-overview. This looks too long, doesn't it? Thanks!
Technical SEO | | lfrazer1230 -
Query string category pagination
I've been reading some posts on the merits and pitfalls of using rel=prev, rel=next and canonical, but I just wanted to double check the right solution. example.com/birth-announcements example.com/birth-announcements?p=2 example.com/birth-announcements?p=3 With a small selection of products on each variation. So at the moment there is a canonical on all of them to the base example.com/birth-announcements. The problem is we are having difficulty getting the products within p=* indexed. I don't think from all I read that rel=prev/rel=next is the way to go. Would the solution (or best way to go) be to create a "view-all" filter and set that to be the canonical URL, so all product URLs are in clear focus for Google. The volume of products won't (shouldn't) have too much of an impact on page load. Or am I wrong and rel=prev/rel=next is a feasible solution?
Technical SEO | | MickEdwards0 -
Devaluing certain content to push better content forward
Hi all, I'm new to Moz, but hoping to learn a lot from it in hopes of growing my business. I have a pretty specific question and hope to get some feedback on how to proceed with some changes to my website. First off, I'm a landscape and travel photographer. My website is at http://www.mickeyshannon.com - you can see that the navigation quickly spreads out to different photo galleries based on location. So if a user was looking for photos from California, they would find galleries for Lake Tahoe, Big Sur, the Redwoods and San Francisco. At this point, there are probably 600-800 photos on my website. At last half of these are either older or just not quite up to par with the quality I'm starting to feel like I should produce. I've been contemplating dumbing down the galleries, and not having it break down so far. So instead of four sub-galleries of California, there would just be one California gallery. In some cases, where there are lots of good images in a location, I would probably keep the sub-galleries, but only if there were dozens of images to work with. In the description of each photo, the exact location is already mentioned, so I'm not sure there's a huge need for these sub-galleries except where there's still tons of good photos to work with. I've been contemplating building a sort of search archive. Where the best of my photos would live in the main galleries, and if a user didn't find what they were looking for, they could go and search the archives for older photos. That way they're still around for licensing purposes, etc. while the best of the best are pushed to the front for those buying fine art prints, etc. These pages for these search archives would probably need to be de-valued somehow, so that the main galleries would be more important SEO-wise. So for the California galleries, four sub-galleries of perhaps 10 images each would become one main California gallery with perhaps 15 images. The other 25 images would be thrown in the search archive and could be searched by keyword. The question I have - does this sound like a good plan, or will I really be killing my site when it comes to SEO by making such a large change? My end goal would be to push my better content to the front, while scaling back a lot of the excess. Hopefully I explained this question well. If not, I can try to elaborate further! Thanks, Mickey
Technical SEO | | msphotography0 -
Htaccess query
I'm currently working on a live version of a clients website which has duplication issues. With .htaccess, I need to rewrite URL's of the following format: vacancy.php?id=802 to vacancy/?id=802 I tried adding the following line but it returned a 500, and don't want to keep taking the site out. RewriteRule ^vacancy/?id=([0-9]+)$ vacancy.php?id=$1 [R=301, L]
Technical SEO | | AndrewAkesson0 -
Duplicate pages, overly dynamic URL’s and long URL’s in Magento
Hi there, I’ve just completed the first crawl of my Magento site and SEOMOZ has picked up 1,000’s of duplicate pages, overly dynamic URL’s and long URL’s due to the sort function which appends URL’s with variables when sorting products (e.g. www.example.com?dir=asc&order=duration). I’m not particularly concerned that this will affect our rankings as Google has stated that they are familiar with the structure of popular CMS’s and Magento is pretty popular. However it completely dominates my crawl diagnostics so I can’t see if there are any real underlying issues. Does anyone know a way of preventing this? Cheers,
Technical SEO | | WendyWuTours
Al.1 -
HTML url extension
I've read some information about the extension of an url. But i couldn't find a clear answer. What is better for SEO, an extension with html or without? /make-money-online/how-to-make-a-million-dollars-in-1-year/ or /make-money-online/how-to-make-a-million-dollars-in-1-year.html/ Is there a difference between a normal website or a blog?
Technical SEO | | PlusPort0 -
Directory URL structure last / in the url
Ok, So my site's urls works like this www.site.com/widgets/ If you go to www.site.com/widgets (without the last / ) you get a 404. My site did no used to require the last / to load the page but it has over the last year and my rankings have dropped on those pages... But Yahoo and BING still indexes all my pages without the last / and it some how still loads the page if you go to it from yahoo or bing, but it looks like this in the address bar once you arrive from bing or yahoo. http://www.site.com/404.asp?404;http://site.com:80/widgets/ How do I fix this? Should'nt all the engines see those pages the same way with the last / included? What is the best structure for SEO?
Technical SEO | | DavidS-2820610