Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Magento Dublicate Content (Noindex and Rel"canonical")
- 
					
					
					
					
Hi All,
Just looking for some advice regarding my website on magento.
We by mistake didnt enable canonical tags and noindex tags so had a big problem with dublicate content from filter pages but also have URLs to Cats as Yes so this didnt help with not having canonical tags enabled.
We now have everything enabled for a few weeks now but dont see much drop in indexed pages in google. (currently 27k and we have only 5k products)
My question basically is how do we speed up noindexation of dublicate content and also would you change URL to cats as No so google just now sees the url to products? (my concerns with this is would leaving it to Yes help because it will hopefully read the canonical tags on products now)
Thank you in advance
Michael
 - 
					
					
					
					
Hi Carson
Thank you for replying and the indepth answers.
I did read somewhere that dublicate content on your own website isnt too bad but im glad you have helped me clear things up.
So would you change cat urls to no or leave them to yes for now till google can see all the canoical tags on products?
Thanks
Mike
 - 
					
					
					
					
I think there's an underlying assumption here that duplicate content will harm your site, and that's not necessarily true. There's no "duplicate content penalty" - it's more than a filter. Google is better than most at recognizing this, especially with common CMS like Magento and WP. Google attempts to look at the links going to both pages and understand their authority together.
Duplicate content is more of an issue if you're pulling content that others are using as well, e.g. on product descriptions provided by manufacturers and other types of content. Google won't "penalize" you, but they will sometimes filter your site out in favor of the most authoritative site with that content. It's also an issue (mostly for Panda) if you're creating keyword pages that contain duplicate of even very-similar content just to rank for a bunch of very similar keywords.
So my first bit of advice is, "don't obsess over intra-site duplicate content."
That said, it's best to reduce and avoid duplicate content 1) for less-sophisticated search engine, 2) for the sake of your own analytics data integrity and simplicity, 3) just in case Google doesn't get it (very rare).
Set the categories up however you think is best for the user (generally just the product name without categories), double-check the canonical URLs, and wait for Google to catch up on the canonical and noindex. It can take many months depending on your site's authority, but it's unlikely to move the needle either way. Keep in mind that Google may keep pages in the index even if they are honoring the canonical tag - they'll just show the canonical version but keep both indexed. That's working as intended - don't worry about that

 
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
- 
		
		
Moz Tools
Chat with the community about the Moz tools.
 
- 
		
		
SEO Tactics
Discuss the SEO process with fellow marketers
 
- 
		
		
Community
Discuss industry events, jobs, and news!
 
- 
		
		
Digital Marketing
Chat about tactics outside of SEO
 
- 
		
		
Research & Trends
Dive into research and trends in the search industry.
 
- 
		
		
Support
Connect on product support and feature requests.
 
Related Questions
- 
		
		
		
		
		
		
Duplicate content, although page has "noindex"
Hello, I had an issue with some pages being listed as duplicate content in my weekly Moz report. I've since discussed it with my web dev team and we decided to stop the pages from being crawled. The web dev team added this coding to the pages <meta name='robots' content='max-image-preview:large, noindex dofollow' />, but the Moz report is still reporting the pages as duplicate content. Note from the developer "So as far as I can see we've added robots to prevent the issue but maybe there is some subtle change that's needed here. You could check in Google Search Console to see how its seeing this content or you could ask Moz why they are still reporting this and see if we've missed something?" Any help much appreciated!
Technical SEO | | rj_dale0 - 
		
		
		
		
		
		
Quick Fix to "Duplicate page without canonical tag"?
When we pull up Google Search Console, in the Index Coverage section, under the category of Excluded, there is a sub-category called ‘Duplicate page without canonical tag’. The majority of the 665 pages in that section are from a test environment. If we were to include in the robots.txt file, a wildcard to cover every URL that started with the particular root URL ("www.domain.com/host/"), could we eliminate the majority of these errors? That solution is not one of the 5 or 6 recommended solutions that the Google Search Console Help section text suggests. It seems like a simple effective solution. Are we missing something?
Technical SEO | | CREW-MARKETING1 - 
		
		
		
		
		
		
Why do some URLs for a specific client have "/index.shtml"?
Reviewing our client's URLs for a 301 redirect strategy, we have noticed that many URLs have "/index.shtml." The part we don'd understand is these URLs aren't the homepage and they have multiple folders followed by "/index.shtml" Does anyone happen to know why this may be occurring? Is there any SEO value in keeping the "/index.shtml" in the URL?
Technical SEO | | FranFerrara0 - 
		
		
		
		
		
		
New "Static" Site with 302s
Hey all, Came across a bit of an interesting challenge recently, one that I was hoping some of you might have had experience with! We're currently in the process of a website rebuild, for which I'm really excited. The new site is using Markdown to create an entirely static site. Load-times are fantastic, and the code is clean. Life is good, apart from the 302s. One of the weird quirks I've realized is that with oldschool, non-server-generated page content is that every page of the site is an Index.html file in a directory. The resulting in a www.website.com/page-title will 302 to www.website.com/page-title/. My solution off the bat has been to just be super diligent and try to stay on top of the link profile and send lots of helpful emails to the staff reminding them about how to build links, but I know that even the best laid plans often fail. Has anyone had a similar challenge with a static site and found a way to overcome it?
Technical SEO | | danny.wood1 - 
		
		
		
		
		
		
Google's "cache:" operator is returning a 404 error.
I'm doing the "cache:" operator on one of my sites and Google is returning a 404 error. I've swapped out the domain with another and it works fine. Has anyone seen this before? I'm wondering if G is crawling the site now? Thx!
Technical SEO | | AZWebWorks0 - 
		
		
		
		
		
		
NoIndex/NoFollow pages showing up when doing a Google search using "Site:" parameter
We recently launched a beta version of our new website in a subdomain of our existing site. The existing site is www.fonts.com with the beta living at new.fonts.com. We do not want Google to crawl the new site until it's out of beta so we have added the following on all pages: However, one of our team members noticed that google is displaying results from new.fonts.com when doing an "site:new.fonts.com" search (see attached screenshot). Is it possible that Google is indexing the content despite the noindex, nofollow tags? We have double checked the syntax and it seems correct except the trailing "/". I know Google still crawls noindexed pages, however, the fact that they're showing up in search results using the site search syntax is unsettling. Any thoughts would be appreciated! DyWRP.png
Technical SEO | | ChrisRoberts-MTI0 - 
		
		
		
		
		
		
How to structure rel=canonical for a e commerce site
Hello, So I have searched the Q & A , Google, the zen cart forum and at this point I am looking for some one to give a concrete answer on what I should do. There is a lot of different opinions on " rel=canonical" and how to apply it , since there are many other variable in place. I have a zen cart site. I am using the latest 1.3.9 version. The default setting ( seem to me) uses the rel=canonical to point back to the specific link product or category respectively. Most of the time I have two scenarios. 1. Main category ---> Sub category----> Product 2. Main Category----> Product I'll give an example http://www.perfectindesign.com/awards ---main category http://www.perfectindesign.com/awards/acrylic-awards sub category http://www.perfectindesign.com/awards/acrylic-awards/slanted-award product (this example has three sub categories with maybe 12 products in one 4 in the second and 5 in the third) From looking at the source code for each url it the rel=canonical just points back to its own url. I want to avoid competing against my self, for the example above keyword "acrylic awards" so should the use of the re=canonical be changes site wide to have products point back to sub categories when they exist and have products point back to main categories when no sub categories exist? I am very new to seo, specifically eCommerce seo. If you have experience and have done this to a site you manage for a client or your own please advise how to proceed. Also if I'm missing some thing that will give me a better understanding of the bigger seo picture that would be great. Thanks, Yevgeny
Technical SEO | | Yevgeny0 - 
		
		
		
		
		
		
301 for "index.php" in Web.config?
Hi there, I'm trying to create a 301 redirect for the file "index.php" but I keep getting a "fail to redirect" message in Firefox whenever I insert it into the Web.config file. <location path="index.php"></location> Is there anyway around this? Thanks for any help According to Open Site Explorer, there are about 500 links to my index file but it only has a 302 status so will not be passing link juice.
Technical SEO | | tdsnet0