We are still seeing duplicate content on SEOmoz even though we have marked those pages as "noindex, follow." Any ideas why?
-
We have many pages on our website that have been set to "no index, follow." However, SEOmoz is indexing them as duplicate content. Why is that?
-
Hi Gary,
Great answer from Daniel.
One thing that you can do is to create a list of noindexed pages in excel, then add all pages identified by SEOmoz as duplicates and run a simple comparison in excel. This will identify any pages that do not match. You will easily see whether the new pages in the report can be ignored.
There is already a feature request in the works with the SEOmoz engineering team which will enable us to "turn off" pages that can be ignored (like those that are already noindexed). In the meantime, keeping track of the pages you can ignore is probably the best option.
You can keep track of progress by following updates on the Feature Request here.
Hope that helps,
Sha
-
Go to Google and search site:yourdomain.com and see if the pages in question come up. If so, Google has indexed them. If not, Google has not indexed them. Like SEOMoz, Google can crawl any page. Doesn't mean they will index the page. If you have noindexed a page, it should not be indexed by Google and should not be problematic for you.
-
So, it indexes issues that Google does see and doesn't see. How do we differentiate between the two?
Additionally, what would be some suggestions as to what we should do?
-
SEOMoz is not a search engine index, it uses a crawler. If those pages are not blocked by the robots.txt file, then SEOMoz will crawl them. They ignore the noindex tag because they don't index anything. Search engines will honor the noindex tag and not index a page if you specify with the robots meta tag. However, to remove pages from the crawl, disallow them in the robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content due to numerous sub category level pages
We have a healthcare website which lists doctors based on their medical speciality. We have a paginated series to list hundreds of doctors. Algorithm: A search for Dentist in Newark locality of New York gives a result filled with dentists from Newark followed by list of dentists in locations near by Newark. So all localities under a city have the same set of doctors distributed jumbled an distributed across multiple pages based on nearness to locality. When we don't have any dentists in Newark we populate results for near by localities and create a page. The issue - So when the number of dentists in New York is <11 all Localities X Dentists will have jumbled up results all pointing to the same 10 doctors. The issue is even severe when we see that we have only 1-3 dentists in the city. Every locality page will be exactly the same as a city level page. We have about 2.5 Million pages with the above scenario. **City level page - **https://www.example.com/new-york/dentist - 5 dentists **Locality Level Page - **https://www.example.com/new-york/dentist/clifton, https://www.example.com/new-york/dentist/newark - Page contains the same 5 dentists as in New York city level page in jumbled up or same order. What do you think we must do in such a case? We had discussions on putting a noindex on locality level pages or to apply canonical pointing from locality level to city level. But we are still not 100% sure.
Technical SEO | | ozil0 -
"INDEX,FOLLOW" then later in the code "NOINDEX,NOFOLLOW" which does google follow?
background info: we have an established closed E-commerce system which the company has been using for years. I have only just started and reviewing the system, I don't have direct access to the code, but can request changes, but it could take months before the changes are in effect (or done at all), and we won't can't change to a new E-commerce system for the short to mid term. While reviewing the site (with help of seomoz crawl diagnostics) I noticed that some of the existing "landing pages" have in the code: <meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">INDEX,FOLLOW</a>" /> then a few lines later <meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">NOINDEX,NOFOLLOW</a>" /> Which the crawl diagnostics flagged up, but in the webmaster tools says
Technical SEO | | PaddyDisplays
"We didn't detect any issues with non-indexable content on your site." so the question is which instructions does google follow? the first or 2nd? note: clearly this is need fixed, but I have a big list of changes for the system so I need to know how important this is tthanks0 -
How do I add "noindex" or "nofollow" to a link in Wordpress
It's been a while since I've SEOed a Wordpress site. How do I add "nofollow" or "noindex" to specific links? I highlight the anchor text in the text editor, I click the "link" button. I could have sworn that there used to be an option in the dialogue box that pops up.
Technical SEO | | CsmBill0 -
How to protect against duplicate content?
I just discovered that my company's 'dev website' (which mirrors our actual website, but which is where we add content before we put new content to our actual website) is being indexed by Google. My first thought is that I should add a rel=canonical tag to the actual website, so that Google knows that this duplicate content from the dev site is to be ignored. Is that the right move? Are there other things I should do? Thanks!
Technical SEO | | williammarlow0 -
Sitemaps and "noindex" pages
Experimenting a little bit to recover from Panda and added "noindex" tag for quite a few pages. Obviously now we need Google to re-crawl them ASAP and de-index. Should we leave these pages in sitemaps (with updated "lastmod") for that? Or just patiently wait? 🙂 What's the common/best way?
Technical SEO | | LocalLocal0 -
Page crawling is only seeing a portion of the pages. Any Advice?
last couple of page crawls have returned 14 out of 35 pages. Is there any suggestions I can take.
Technical SEO | | cubetech0 -
Duplicate Page Content
Hi within my campaigns i get an error "crawl errors found" that says duplicate page content found, it finds the same content on the home pages below. Are these seen as two different pages? And how can i correct these errors as they are just one page? http://poolstar.net/ http://poolstar.net/Home_Page.php
Technical SEO | | RouteAccounts0 -
What's the difference between a category page and a content page
Hello, Little confused on this matter. From a website architectural and content stand point, what is the difference between a category page and a content page? So lets say I was going to build a website around tea. My home page would be about tea. My category pages would be: White Tea, Black Tea, Oolong Team and British Tea correct? ( I Would write content for each of these topics on their respective category pages correct?) Then suppose I wrote articles on organic white tea, white tea recipes, how to brew white team etc...( Are these content pages?) Do I think link FROM my category page ( White Tea) to my ( Content pages ie; Organic White Tea, white tea receipes etc) or do I link from my content page to my category page? I hope this makes sense. Thanks, Bill
Technical SEO | | wparlaman0