Scanning For Duplicate Canonical Tags
-
I'm looking for a solution for identifying pages on a site that have either empty/undefined canonical tags, or duplicate canonical tags (meaning the tag occurs twice within the same page).
I've used Screaming Frog to view sitewide canonical values, but the tool cannot identify when pages use the tag twice, nor can it differentiate between pages that have an empty canonical tag and pages that have no canonical tag at all.
Any help finding a tool of some sort that can assist me in doing this would be much appreciated, as I'm working with tens of thousands of pages and can't do this manually.
-
Paul,
Thanks for your reply! I have used the paid version of Screaming Frog with regex to exclude pages with certain parameters, but I have not tried the custom queries.
Could you give me an example of a custom query that would find empty canonical tags? That would be extremely helpful.
-
I think Screaming Frog is still the solution you want, John, but it's not configured to do what you need "out of the box". You're going to need to write a custom query for Screaming Frog to run while it's indexing your site.
This capability is only available in the paid version of the tool, but you'll need the paid version anyway to be able to crawl 10,000 page sites as the free tool cuts out at 500 pages.
You'll find the Custom settings link under the Configuration tab in the top navigation bar of the tool. Essentially what you're doing is writing custom filters.
You'll need to write a regex (regular expression) that is capable of finding pages with no canonical tag at all, and another which is capable of finding empty canonical tags. If your regex-fu is really strong, you may be able to write a single expression to capture both these states.
Had you already tried the custom queries with Screaming Frog?
Paul
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical and Alternate Advice
At the moment for most of our sites, we have both a desktop and mobile version of our sites. They both show the same content and use the same URL structure as each other. The server determines whether if you're visiting from either device and displays the relevant version of the site. We are in a predicament of how to properly use the canonical and alternate rel tags. Currently we have a canonical on mobile and alternate on desktop, both of which have the same URL because both mobile and desktop use the same as explained in the first paragraph. Would the way of us doing it at the moment be correct?
Intermediate & Advanced SEO | | JH_OffLimits3 -
Question on Indexing, Hreflang tag, Canonical
Dear All, Have a question. We've a client (pharma), who has a prescription medicine approved only in the US, and has only one global site at .com which is accessed by all their target audience all over the world.
Intermediate & Advanced SEO | | jrohwer
For the rest of the US, we can create a replica of the home page (which actually features that drug), minus the existence of the medicine, and set IP filter so that non-US traffic see the duplicate of the home page. Question is, how best to tackle this semi-duplicate page. Possibly no-index won't do because that will block the site from the non-US geography. Hreflang won't work here possibly, because we are not dealing different languages, we are dealing same language (En) but different Geographies. Canonical might be the best way to go? Wanted to have an insight from the experts. Thanks,
Suparno (for Jeff)1 -
Some Tools Not Recognizing Meta Tags
I am analyzing a site which has several thousands of pages, checking the headers, meta tags, and other on page factors. I noticed that the spider tool on SEO Book (http://tools.seobook.com/general/spider-test) does not seem to recognize the meta tags for various pages. However, using other tools including Moz, it seems the meta tags are being recognized. I wouldn't be as concerned with why a tool is not picking up the tags. But, the site suffered a large traffic loss and we're still trying to figure out what remaining issues need to be addressed. Also, many of those pages once ranked in Google and now cannot be found unless you do a site:// search. Is it possible that there is something blocking where various tools or crawlers can easily read them, but other tools cannot. This would seem very strange to me, but the above is what I've witnessed recently. Your suggestions and feedback are appreciated, especially as this site continues to battle Panda.
Intermediate & Advanced SEO | | ABK7170 -
Duplicate page content query
Hi forum, For some reason I have recently received a large increase in my Duplicate Page Content issues. Currently it says I have over 7,000 duplicate page content errors! For example it says: Sample URLs with this Duplicate Page Content http://dikelli.com.au/accessories/gowns/news.html http://dikelli.com.au/accessories/news.html
Intermediate & Advanced SEO | | sterls
http://dikelli.com.au/gallery/dikelli/gowns/gowns/sale_gowns.html However there are no physical links to any of these page on my site and even when I look at my FTP files (I am using Dreamweaver) these directories and files do not exist. Can anyone please tell me why the SEOMOZ crawl is coming up with these errors and how to solve them?0 -
Canonical / 301 Redundancy
Suppose I have two dynamic URLs that lead to the identical page: www.example.com/product.php?x=1&y=1 and www.example.com/product.php?y=1 The x=1 parameter had some historical meaning, but is now unused. All references to the x=1 parameter have been removed from internal links and sitemaps. I have implemented two solutions: First, the header of www.example.com/product.php?x=1&y=1 includes: Second, the .htaccess file includes the following: Redirect permanent /product.php?x=1&y=1 http://www.example.com/product.php?y=1 Questions: 1. I assume that since canonical is still relatively new, it's best to play it safe and implement both solutions. Is this correct? 2. When I point my browser to www.example.com/product.php?x=1&y=1, it does NOT redirect to www.example.com/product.php?y=1. The address bar continues to show the non-canonical URL. Is this because the canonical tag somehow takes precedence over the 301 redirect? 3. How long will Google Webmaster Tools continue to show these as duplicates, even though I've implemeted BOTH canonical and 301? It's been a few weeks and I thought it would have rolled off by now. Thanks!
Intermediate & Advanced SEO | | ahirai0 -
How To Create Dynamic WordPress Tags
Does anyone know how to make WordPress "tag" pages automatically generate a description based on the posts included in the tag? I have a lot of tags, and most of them rank well for long tail keywords. However I have noticed that although they have a dynamically generated "title meta tag" they do not generate a "description meta tag". I know WordPress lets you customize the description for each tag, but I have way to many for that. I need the description meta to be auto generated from the posts that are being tagged, rather than not including one at all. Does anyone know how to do this?
Intermediate & Advanced SEO | | MyNet0 -
Do I need a canonical tag on the 404 error page?
Per definition, a 404 is displayed for different url (any not existing url ...). As I try to clean my website following SEOmoz pro advices, SEOmoz notify me of duplicate content on urls leading to a 404 🙂 This is I guess not that important, but just curious: should we add a cononical tag to the template returning the 404, with a canonical url such as www.mysite.com/404 ?
Intermediate & Advanced SEO | | nuxeo0 -
Hash as a Replacement for Absolute URL in Canonical Tags?
Any idea why companies like Skechers would be doing this: http://screencast.com/t/ooEkATGN7EX ? I suppose it makes sense, but I've never seen it done before. If this works, why on earth would we be using absolute URLs still?
Intermediate & Advanced SEO | | stevewiideman0