Does Google see this as duplicate content?
-
I'm working on a site that has too many pages in Google's index as shown in a simple count via a site search (example):
site:http://www.mozquestionexample.com
I ended up getting a full list of these pages and it shows pages that have been supposedly excluded from the index via GWT url parameters and/or canonicalization
For instance, the list of indexed pages shows:
1. http://www.mozquestionexample.com/cool-stuff
2. http://www.mozquestionexample.com/cool-stuff?page=2
3. http://www.mozquestionexample.com?page=3
4. http://www.mozquestionexample.com?mq_source=q-and-a
5. http://www.mozquestionexample.com?type=productss&sort=1date
Example #1 above is the one true page for search and the one that all the canonicals reference.
Examples #2 and #3 shouldn't be in the index because the canonical points to url #1.
Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1.
Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place.
Should I worry about these multiple urls for the same page and if so, what should I do about it?
Thanks... Darcy
-
Darcy,
Blocking URLs in the robots.txt file will not remove them from the index if Google has already found them, nor will it prevent them from being added if Google finds links to them, such as internal navigation links or external backlinks. If this is your issue, you'll probably see something like this in the SERPs for those pages:
"We cannot display the content because our crawlers are being blocked by this site's robots.txt file" or something like that.
Here's a good discussion about it on WMW.
If you have parameters set up in GWT and are using a rel canonical tag that points Google to the non-parameter version of the URL you probably don't need to block Googlebot. I would only block them if I thought crawlbudget was an issue, as in seeing Google to continue to crawl these pages within your log files, or when you potentially have millions of these types of pages.
-
Hi Ray,
Thanks for the response. To answer your question, the URL parameters have been set for months, if not years.
I wouldn't know how to set a noindex on a url with a different source code, because it really isn't a whole new url, just different tracking. I'd be setting a noindex for the example 1 page and that would not be good.
So, should I just not worry about it then?
Thanks... Darcy
-
Hi 94501,
Example #1 above is the one true page for search and the one that all the canonicals reference.
If the pages are properly canonicalized then Example #1 will receive nearly all of the authority stemming from pages with this URL as the canonical tag.
I.e. Example #2 and #3 will pass authority to Example #1
Examples #2 and #3 shouldn't be in the index because the canonical points to url #1.
Setting a canonical tag doesn't guarantee that a page will not be indexed. To do that, you'd need to add a 'noindex' tag to the page.
Google chooses whether or not to index these pages and in many situations you want them indexed. For example: User searches for 'product X' and product x resides on the 3rd page of your category. Since Google has this page indexed (although the canonical points to the main page) it makes sense to show the page that contains the product the user was searching for.
Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1.
To make sure it is not indexed, you would need to add a 'noidex' tag and/or make sure the parameters are set in GWMT to ignore these pages.
But again, if the canonical is set properly then the authority passes to the main page and having this page indexed may not have negative impact.
Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place.
How long ago was the parameter setting applied in GWMT? Sometimes it takes a couple weeks to deindex pages that were already indexed by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Country Code Top Level Domains & Duplicate Content
Hi looking to launch in a new market, currently we have a .com.au domain which is geo-targeted to Australia. We want to launch in New Zealand which is ends with .co.nz If i duplicate the Australian based site completely on the new .co.nz domain name, would i face duplicate content issues from a SEO standpoint?
Intermediate & Advanced SEO | | jayoliverwright
Even though it's on a completely separate country code. Or is it still advised tosetup hreflang tag across both of the domains? Cheers.0 -
Google Not Indexing App Content
Hello Mozzers I recently noticed that there has been an increase in crawl errors reported in Google Search console & Google has stopped indexing our app content. Could this be due to the fact that there is a mismatch between the host path name mentioned within the android deeplink (within the alternate tag) and the actual URL of the page. For instance on the following desktop page http://www.example.com.au/page-1 the android deeplink points to http://www.example.com.au/android-app://com.example/http/www.example.com.au/4652374 Please note that the content on both pages (desktop & android) is same.Is this is a correct setup or am I doing something wrong here? Any help would be much appreciated. Thank you so much in advance.
Intermediate & Advanced SEO | | InMarketingWeTrust0 -
How would you handle this duplicate content - noindex or canonical?
Hello Just trying look at how best to deal with this duplicated content. On our Canada holidays page we have a number of holidays listed (PAGE A)
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/north-america/canada/suggested-holidays.aspx We also have a more specific Arctic Canada holidays page with different listings (PAGE B)
http://www.naturalworldsafaris.com/destinations/arctic-and-antarctica/arctic-canada/suggested-holidays.aspx Of the two, the Arctic Canada page (PAGE B) receives a far higher number of visitors from organic search. From a user perspective, people expect to see all holidays in Canada (PAGE A), including the Arctic based ones. We can tag these to appear on both, however it will mean that the PAGE B content will be duplicated on PAGE A. Would it be the best idea to set up a canonical link tag to stop this duplicate content causing an issue. Alternatively would it be best to no index PAGE A? Interested to see others thoughts. I've used this (Jan 2011 so quite old) article for reference in case anyone else enters this topic in search of information on a similar thing: Duplicate Content: Block, Redirect or Canonical - SEO Tips0 -
Moving some content to a new domain - best practices to avoid duplicate content?
Hi We are setting up a new domain to focus on a specific product and want to use some of the content from the original domain on the new site and remove it from the original. The content is appropriate for the new domain and will be irrelevant for the original domain and we want to avoid creating completely new content. There will be a link between the two domains. What is the best practice for this to avoid duplicate content and a potential Panda penalty?
Intermediate & Advanced SEO | | Citybase0 -
Duplicate Content on Product Pages
I'm getting a lot of duplicate content errors on my ecommerce site www.outdoormegastore.co.uk mainly centered around product pages. The products are completely different in terms of the title, meta data, product descriptions and images (with alt tags)but SEOmoz is still identifying them as duplicates and we've noticed a significant drop in google ranking lately. Admittedly the product descriptions are a little bit thin but I don't understand why the pages would be viewed as duplicates and therefore can be ranked lower? The content is definitely unique too. As an example these three pages have been identified as being duplicates of each other. http://www.outdoormegastore.co.uk/regatta-landtrek-25l-rucksack.html http://www.outdoormegastore.co.uk/canyon-bryce-adult-cycling-helmet-9045.html http://www.outdoormegastore.co.uk/outwell-minnesota-6-carpet-for-green-07-08-tent.html
Intermediate & Advanced SEO | | gavinhoman0 -
What is a good content for google?
When we start to study SEO and how google see our webpage, one important point is to have good content. But, for beginners like me, we get lost on this. Is not so black and white: what for you is a good content? the text amount matters? there is any trick that all good content websites need to have?
Intermediate & Advanced SEO | | Naghirniac0 -
Why duplicate content for same page?
Hi, My SEOMOZ crawl diagnostic warn me about duplicate content. However, to me the content is not duplicated. For instance it would give me something like: (URLs/Internal Links/External Links/Page Authority/Linking Root Domains) http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110516 /1/1/31/2 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110711 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110811 0/0/1/0 http://www.nuxeo.com/en/about/contact?utm_source=enews&utm_medium=email&utm_campaign=enews20110911 0/0/1/0 Why is this seen as duplicate content when it is only URL with campaign tracking codes to the same content? Do I need to clean this?Thanks for answer
Intermediate & Advanced SEO | | nuxeo0 -
Do sites with a small number of content pages get penalized by Google?
If my site has just five content pages, instead of 25 or 50, then will it get penalized by Google for a given moderately competitive keyword?
Intermediate & Advanced SEO | | RightDirection0