Does Google see this as duplicate content?
-
I'm working on a site that has too many pages in Google's index as shown in a simple count via a site search (example):
site:http://www.mozquestionexample.com
I ended up getting a full list of these pages and it shows pages that have been supposedly excluded from the index via GWT url parameters and/or canonicalization
For instance, the list of indexed pages shows:
1. http://www.mozquestionexample.com/cool-stuff
2. http://www.mozquestionexample.com/cool-stuff?page=2
3. http://www.mozquestionexample.com?page=3
4. http://www.mozquestionexample.com?mq_source=q-and-a
5. http://www.mozquestionexample.com?type=productss&sort=1date
Example #1 above is the one true page for search and the one that all the canonicals reference.
Examples #2 and #3 shouldn't be in the index because the canonical points to url #1.
Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1.
Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place.
Should I worry about these multiple urls for the same page and if so, what should I do about it?
Thanks... Darcy
-
Darcy,
Blocking URLs in the robots.txt file will not remove them from the index if Google has already found them, nor will it prevent them from being added if Google finds links to them, such as internal navigation links or external backlinks. If this is your issue, you'll probably see something like this in the SERPs for those pages:
"We cannot display the content because our crawlers are being blocked by this site's robots.txt file" or something like that.
Here's a good discussion about it on WMW.
If you have parameters set up in GWT and are using a rel canonical tag that points Google to the non-parameter version of the URL you probably don't need to block Googlebot. I would only block them if I thought crawlbudget was an issue, as in seeing Google to continue to crawl these pages within your log files, or when you potentially have millions of these types of pages.
-
Hi Ray,
Thanks for the response. To answer your question, the URL parameters have been set for months, if not years.
I wouldn't know how to set a noindex on a url with a different source code, because it really isn't a whole new url, just different tracking. I'd be setting a noindex for the example 1 page and that would not be good.
So, should I just not worry about it then?
Thanks... Darcy
-
Hi 94501,
Example #1 above is the one true page for search and the one that all the canonicals reference.
If the pages are properly canonicalized then Example #1 will receive nearly all of the authority stemming from pages with this URL as the canonical tag.
I.e. Example #2 and #3 will pass authority to Example #1
Examples #2 and #3 shouldn't be in the index because the canonical points to url #1.
Setting a canonical tag doesn't guarantee that a page will not be indexed. To do that, you'd need to add a 'noindex' tag to the page.
Google chooses whether or not to index these pages and in many situations you want them indexed. For example: User searches for 'product X' and product x resides on the 3rd page of your category. Since Google has this page indexed (although the canonical points to the main page) it makes sense to show the page that contains the product the user was searching for.
Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1.
To make sure it is not indexed, you would need to add a 'noidex' tag and/or make sure the parameters are set in GWMT to ignore these pages.
But again, if the canonical is set properly then the authority passes to the main page and having this page indexed may not have negative impact.
Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place.
How long ago was the parameter setting applied in GWMT? Sometimes it takes a couple weeks to deindex pages that were already indexed by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is my content being fully read by Google?
Hi mozzers, I wanted to ask you a quick question regarding Google's crawlability of webpages. We just launched a series of content pieces but I believe there's an issue.
Intermediate & Advanced SEO | | TyEl
Based on what I am seeing when I inspect the URL it looks like Google is only able to see a few titles and internal links. For instance, when I inspect one of the URLs on GSC this is the screenshot I am seeing: image.pngWhen I perform the "cache:" I barely see any content**:** image.pngVS one of our blog post image.png Would you agree with me there's a problem here? Is this related to the heavy use of JS? If so somehow I wasn't able to detect this on any of the crawling tools? Thanks!0 -
Why is Google no longer Indexing and Ranking my state pages with Dynamic Content?
Hi, We have some state specific pages that display dynamic content based on the state that is selected here. For example this page displays new york based content. But for some reason google is no longer ranking these pages. Instead it's defaulting to the page where you select the state here. But last year the individual state dynamic pages were ranking. The only change we made was move these pages from http to https. But now google isn't seeing these individual dynamically generated state based pages. When I do a site: url search it doesn't find any of these state pages. Any thoughts on why this is happening and how to fix it. Thanks in advance for any insight. Eddy By the way when I check these pages in google search console fetch as google, google is able to see these pages fine and they're not being blocked by any robot.txt.
Intermediate & Advanced SEO | | eddys_kap0 -
How do we avoid duplicate/thin content on +150,000 product pages?
Hey guys! We got a rather large product range (books) on our eCommerce site (+150,000 titles). We get book descriptions as meta data from our publishers, which we display on the product pages. This obviously is not unique, as many other sites display the same piece of description of the book. It is important for us to rank on those book titles, so my question to You is: How would you go about it? I mean, it seems like a rather unrealistic task to paraphrase +150,000 (and growing) book descriptions. As I see it, there are these options: 1. Don't display the descriptions on the product pages (however then those pages will get even thinner!)
Intermediate & Advanced SEO | | Jacob_Holm
2. Display the (duplicate) descriptions, but put no-index on those product pages in order not to punish the rest of the site (not really an option, though).
3. Hire student workers to produce unique product descriptions for all 150,000 products (seems like a huge and expensive task) But how would You solve such a challenge?
Thanks a lot! Cheers, Tommy.0 -
Duplicate content based on filters
Hi Community, There have probably been a few answers to this and I have more or less made up my mind about it but would like to pose the question or as that you post a link to the correct article for this please. I have a travel site with multiple accommodations (for example), obviously there are many filter to try find exactly what you want, youcan sort by region, city, rating, price, type of accommodation (hotel, guest house, etc.). This all leads to one invevitable conclusion, many of the results would be the same. My question is how would you handle this? Via a rel canonical to the main categories (such as region or town) thus making it the successor, or no follow all the sub-category pages, thereby not allowing any search to reach deeper in. Thanks for the time and effort.
Intermediate & Advanced SEO | | ProsperoDigital0 -
Duplicate Page Content Issues Reported in Moz Crawl Report
Hi all, We have a lot of 'Duplicate Page Content' issues being reported on the Moz Crawl Report and I am trying to 'get to the bottom' of why they are deemed as errors... This page; http://www.bolsovercruiseclub.com/about-us/job-opportunities/ has (admittedly) very little content and is duplicated with; http://www.bolsovercruiseclub.com/cruise-deals/cruise-line-deals/explorer-of-the-seas-2015/ This page is basically an image and has just a couple of lines of static content. Also duplicated with; http://www.bolsovercruiseclub.com/cruise-lines/costa-cruises/costa-voyager/ This page relates to a single cruise ship and again has minimal content... Also duplicated with; http://www.bolsovercruiseclub.com/faq/packing/ This is an FAQ page again with only a few lines of content... Also duplicated with; http://www.bolsovercruiseclub.com/cruise-deals/cruise-line-deals/exclusive-canada-&-alaska-cruisetour/ Another page that just features an image and NO content... Also duplicated with; http://www.bolsovercruiseclub.com/cruise-deals/cruise-line-deals/free-upgrades-on-cunard-2014-&-2015/?page_number=6 A cruise deals page that has a little bit of static content and a lot of dynamic content (which I suspect isn't crawled) So my question is, is the duplicate content issued caused by the fact that each page has 'thin' or no content? If that is the case then I assume the simple fix is to increase add \ increase the content? I realise that I may have answered my own question but my brain is 'pickled' at the moment and so I guess I am just seeking assurances! 🙂 Thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
If a website Uses <select>to dropdown some choices, will Google see every option as Content Or Hyperlink?</select>
If a website Uses <select> to dropdown some choices, will Google see every option as Content Or Hyperlink?</select>
Intermediate & Advanced SEO | | Zanox0 -
Duplicate content from development website
Hi all - I've been trawling for duplicate content and then I stumbled across a development URL, set up by a previous web developer, which nearly mirrors current site (few content and structure changes since then, but otherwise it's all virtually the same). The developer didn't take it down when the site was launched. I'm guessing the best thing to do is tell him to take down the development URL (which is specific to the pizza joint btw, immediately. Is there anything else I should ask him to do? Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Does duplicate content on a sub-domain affect the rankings of root domain?
We recently moved a community website that we own to our main domain. It now lives on our website as a sub-domain. This new sub-domain has a lot of duplicate page titles. We are going to clean it up but it's huge project. (We had tried to clean it even before migrating the community website) I am wondering if this duplicate content on the new sub-domain could be hurting rankings of our root domain? How does Google treat it? From SEO best practices, I know duplicate content within site is always bad. How severe is it given the fact that it is present on a different sub-domain?
Intermediate & Advanced SEO | | Amjath0