Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Quick Fix to "Duplicate page without canonical tag"?
-
When we pull up Google Search Console, in the Index Coverage section, under the category of Excluded, there is a sub-category called ‘Duplicate page without canonical tag’. The majority of the 665 pages in that section are from a test environment.
If we were to include in the robots.txt file, a wildcard to cover every URL that started with the particular root URL ("www.domain.com/host/"), could we eliminate the majority of these errors?
That solution is not one of the 5 or 6 recommended solutions that the Google Search Console Help section text suggests. It seems like a simple effective solution. Are we missing something?
-
No index & test Indexing Before You Launch
The domains are intended for development use and cannot be used for production. A custom or CMS-standard will only work
robots.txt on
Live environments with a custom domain. Adding sub-domains (i.e.,dev.example.com , ``test.example.com
) for DEV or TEST will remove the header only,X-Robots-Tag: noindex
but still, serve the domain.robots.txt
To support pre-launch SEO testing, we allow the following bots access to platform domains:
- Site Auditor by Raven
- SEMrush
- RogerBot by Moz
- Dotbot by Moz
If you’re testing links or SEO with other tools, you may request the addition of the tool to our
robots.txt
Pantheon's documentation on robots.txt: http://pantheon.io/docs/articles/sites/code/bots-and-indexing/User-agent: * Disallow: / User-agent: RavenCrawler User-agent: rogerbot User-agent: dotbot User-agent: SemrushBot User-agent: SemrushBot-SA Allow: /
-
The simplest solution would be to mark every page in your test environment "noindex". This is normally standard operating procedure anyway because most people don't want customers stumbling across the wrong URL in search by mistake and seeing a buggy page that isn't supposed to be "live" for customers.
Updating your robots.txt file would tell Google not to crawl the page, but if they've already crawled it and added it to their index it just means that they will retain the last crawled version of the page and will not crawl it in the future. You have to direct Google to "noindex" the pages. It will take some time as Google refreshes the crawl of each page, but eventually you'll see those errors drop off as Google removes those pages from their index. If I were consulting a client I would tell them to make the change and check back in two or three months.
Hope this helps!
-
The new version of search console will show all the pages available on your site. even the no-index pages, why? I don't know, the truth is even when you set up those pages as no-follow and no-index it will keeping show you the same error. That does not mean that there is something wrong with your site. I would not worry in your case.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate, submitted URL not selected as canonical
Hi all, A number of our pages have dropped out of search rankings. It seems they are being marked as "Duplicate, submitted URL not selected as canonical" However, the page Google is choosing as the canonical is totally different - different headings, titles, metadata, content on the page. We are completely mystified as to why this is happening. If anyone can shed any light, it would be hugely appreciated! Example URL is this one:
Technical SEO | | Eric_S
https://www.vouchedfor.co.uk/IFA-financial-advisor-mortgage/london Which Google seems to think is a duplicate of this: https://www.vouchedfor.co.uk/solicitor/london0 -
Rel=Canonical For Landing Pages
We have PPC landing pages that are also ranking in organic search. We've decided to create new landing pages that have been improved to rank better in natural search. The PPC team however wants to use their original landing pages so we are unable to 301 these pages to the new pages being created. We need to block the old PPC pages from search. Any idea if we can use rel=canonical? The difference between old PPC page and new landing page is much more content to support keyword targeting and provide value to users. Google says it's OK to use rel=canonical if pages are similar but not sure if this applies to us. The old PPC pages have 1 paragraph of content followed by featured products for sale. The new pages have 4-5 paragraphs of content and many more products for sale. The other option would be to add meta noindex to the old PPC landing pages. Curious as to what you guys think. Thanks.
Technical SEO | | SoulSurfer80 -
Does Google read dynamic canonical tags?
Does Google recognize rel=canonical tag if loaded dynamically via javascript? Here's what we're using to load: <script> //Inject canonical link into page head if (window.location.href.indexOf("/subdirname1") != -1) { canonicalLink = window.location.href.replace("/kapiolani", ""); } if (window.location.href.indexOf("/subdirname2") != -1) { canonicalLink = window.location.href.replace("/straub", ""); } if (window.location.href.indexOf("/subdirname3") != -1) { canonicalLink = window.location.href.replace("/pali-momi", ""); } if (window.location.href.indexOf("/subdirname4") != -1) { canonicalLink = window.location.href.replace("/wilcox", ""); } if (canonicalLink != window.location.href) { var link = document.createElement('link'); link.rel = 'canonical'; link.href = canonicalLink; document.head.appendChild(link); } script>
Technical SEO | | SoulSurfer80 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
The Mysterious Case of Pagination, Canonical Tags
Hey guys, My head explodes when I think of this problem. So I will leave it to you guys to find a solution... My root domain (xxx.com) runs on WordPress platform. I use Yoast SEO plugin. The next page of root domain -- page/2/ -- has been canonicalized to the same page -- page/2/ points to page/2/ for example. The page/2/ and remaining pages also have this rel tags: I have also added "noindex,follow" to page/2/ and further -- Yoast does this automatically. Note: Yoast plugin also adds canonical to page/2/...page/3/ automatically. Same is the case with category pages and tag pages. Oh, and the author pages too -- they all have self-canonicalization, rel prev & rel next tags, and have been "noindex, followed." Problem: Am I doing this the way it should be done? I asked a Google Webmaster employee on rel next and prev tags, and this is what she said: "We do not recommend noindexing later pages, nor rel="canonical"izing everything to the first page." (My bad, last year I was canonicalizing pages to first page). One of the popular blog, a competitor, uses none of these tags. Yet they rank higher. Others following this format have been hit with every kind of Google algorithm I could think of. I want to leave it to Google to decide what's better, but then again, Yoast SEO plugin rules my blog -- okay, let's say I am a bad coder. Any help, suggestions, and thoughts are highly appreciated. 🙂 Update 1: Paginated pages -- including category pages and tag pages -- have unique snippets; no full-length posts. Thought I'd make that clear.
Technical SEO | | sidstar0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
Why "title missing or empty" when title tag exists?
Greetings! On Dec 1, 2011 in a SEOMoz campaign, two crawl metrics shot up from zero (Nov 17, Nov 24). "Title missing or empty" was 9,676. "Duplicate page content" was 9,678. Whoa! Content at site has not changed. I checked a sample of web pages and each seems to have a proper TITLE tag. Page content differs as well -- albeit we list electronic part numbers of hard-to-find parts, which look similar. I found a similar post http://www.seomoz.org/q/why-crawl-error-title-missing-or-empty-when-there-is-already-title-and-meta-desciption-in-place . In answer, Sha ran Screaming Frog crawler. I ran Frog crawler on a few hundred pages. Titles were found and hash codes were unique. Hmmm. Site with errors is http://electronics1.usbid.com Small sample of pages with errors: electronics1.usbid.com/catalog_10.html
Technical SEO | | groovykarma
electronics1.usbid.com/catalog_100.html
electronics1.usbid.com/catalog_1000.html I've tried to reproduce errors yet I cannot. What am I missing please? Thanks kindly, Loren0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050