Quick Fix to "Duplicate page without canonical tag"?

CREW-MARKETING

When we pull up Google Search Console, in the Index Coverage section, under the category of Excluded, there is a sub-category called ‘Duplicate page without canonical tag’. The majority of the 665 pages in that section are from a test environment.

If we were to include in the robots.txt file, a wildcard to cover every URL that started with the particular root URL ("www.domain.com/host/"), could we eliminate the majority of these errors?

That solution is not one of the 5 or 6 recommended solutions that the Google Search Console Help section text suggests. It seems like a simple effective solution. Are we missing something?

BlueprintMarketing

No index & test Indexing Before You Launch

The domains are intended for development use and cannot be used for production. A custom or CMS-standard will only work robots.txt on Live environments with a custom domain. Adding sub-domains (i.e., dev.example.com , ``test.example.com) for DEV or TEST will remove the header only, X-Robots-Tag: noindex but still, serve the domain. robots.txt

To support pre-launch SEO testing, we allow the following bots access to platform domains:

Site Auditor by Raven
SEMrush
RogerBot by Moz
Dotbot by Moz

If you’re testing links or SEO with other tools, you may request the addition of the tool to our robots.txt

Pantheon's documentation on robots.txt: http://pantheon.io/docs/articles/sites/code/bots-and-indexing/User-agent: * Disallow: / User-agent: RavenCrawler User-agent: rogerbot User-agent: dotbot User-agent: SemrushBot User-agent: SemrushBot-SA Allow: /

brettmandoes

The simplest solution would be to mark every page in your test environment "noindex". This is normally standard operating procedure anyway because most people don't want customers stumbling across the wrong URL in search by mistake and seeing a buggy page that isn't supposed to be "live" for customers.

Updating your robots.txt file would tell Google not to crawl the page, but if they've already crawled it and added it to their index it just means that they will retain the last crawled version of the page and will not crawl it in the future. You have to direct Google to "noindex" the pages. It will take some time as Google refreshes the crawl of each page, but eventually you'll see those errors drop off as Google removes those pages from their index. If I were consulting a client I would tell them to make the change and check back in two or three months.

Hope this helps!

Roman-Delcarmen

The new version of search console will show all the pages available on your site. even the no-index pages, why? I don't know, the truth is even when you set up those pages as no-follow and no-index it will keeping show you the same error. That does not mean that there is something wrong with your site. I would not worry in your case.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.