Abnormal crawl issues appearing in my Moz results
-
I have been asked to look at a site for a friend and was more than surprised to see 16,9k crawl issues appear in the dashboard... of this 6,238 are duplicate page content and 5878 are duplicated page titles.
What on earth is going on? I have spoken to the web developer as it appears there is a dev site somewhere and this is his response
[Can I stress that Google determines which site was in the index first and then removes other sites it sees as having duplicate content. Our dev sites appearing in the search index would not affect your ranking due to duplicate content as Google would see your site as the first site with the content]
As I cannot make contact with him, I am scratching my head, surely a dev site should be no-indexed, it sounds as though he is saying that its ok because Google will take the main site as the first site with the content...
Very confused! Help need MOZ community.
Manythanks,
Sarah
-
Thanks again Dirk. I like your direct and knowledgeable responses. I have sent a Linkedin connection!!
Many thanks,
Sarah
-
Hi Sarah,
Googlebot will follow these links as well and discover these "useless" pages (the are off course not useless from human perspective but they don't add value for bots - and they will be considered as duplicates). Duplicates are no reason for "punishment" - so you could just let them be. Personally I would put a nofollow on these links or add a "noindex" tag to the login page. Normally you shouldn't use nofollow on internal links - but login pages are an exemption on this (check also https://searchenginewatch.com/sew/news/2298312/matt-cutts-you-dont-have-to-nofollow-internal-links : "Of course, there are always exceptions to the rule, and things like login pages can be the exception. He said it doesn’t hurt to put the nofollow link for a link pointing to a login page, or things like terms and conditions or other “useless” pages. However, it doesn’t hurt at all for those pages to be crawled by Google."
For the practical part - if you add an additional question to a question which has been marked as answered - only the ones who have already answered will see the additional question. To be on the safe side - it's better open a new question if you want other people to have a look at it.
Hope this helps,
Dirk
-
hello Dirk, thank you for that great answer, we have since been doing a bit more digging of our own and before we go back to the web developer we want to check what should be happening with the links the we are finding duplicated as we are seeing that the issues relating to Duplicate Pages are coming from links from the login page which shows information about where the user was redirected from.
For example, if the visitor is not logged on and wishes to wish-list an item, they will be redirected to the login page, with the item code and intended action in the url; which can then continue on to the desired page once logged on.
The MOZ crawler is seeing these pages as having Duplicated Content whilst they are all the same apart from a piece of information in the URL. Should we be blocking these duplications? Are they a risk to us? What should we be doing?
I have also added this as a new question - I am quite new to this community thing so wasn't sure which was the best way to ask the question.
Many thanks again,
Sarah
-
Moz is only indexing pages it's crawler is able to find. This implies that on your production site you have links to your development site.
Don't really agree with what your dev is saying - he should correct these links first; put a noindex on these pages. Alternative - put a password on the dev site so it's only accessible with a password. If a lot of users are putting links to your dev site it could become more important than your main site. Google will try to choose the most appropriate site - but you have no guarantee that it will choose the right version. In any case - that's not the type of risk you should be willing to take.
Once this is done - you can request a removal of these pages via the search console.
If all pages are removed from the index you can adapt the robots.txt to prevent access to the Google & other bots. Do this only after all pages are removed - if not Google will never find the noindex directive.
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ive been using moz for just a minute now , i used it to check my website and find quite a number of errors , unfortunately i use a wordpress website and even with the tips , is till dont know how to fix the issues.
ive seen quite a number of errors on my website hipmack.co a wordpress website and i dont know how to begin clearing the index errors or any others for that matter , can you help me please? ghg-1.jpg
Moz Pro | | Dogara0 -
Moz and HubSpot SSL - crawl error?
I'm getting an error message when Moz tries to crawl my site, however when I check in Google Search Console, they return no errors. Our site is hosted on HubSpot. Is Moz still having trouble crawling HubSpot sites that have enabled their SSL? I read an article that this should have been corrected in early 2017, but I'm getting an error.
Moz Pro | | jennygriffin0 -
Rel="canonical" tag is implemented in my product pages, but still getting canoncal error for products in Moz. What is the problem? me or MOZ?
I have included the rel="canonical" tag in all my product pages, but still getting canonical error in MOZ reports for more than 6 month ! I would like to know if my code is wrong or the MOZ report system is not working properly. Here is an example of my canonical code in line 84 rel="canonical" href="http://www.doornmore.com/slab-single-door-80-fiberglass-courtlandt-1-panel-arch-lite-glass.html" /> Thanks Shayann
Moz Pro | | Shayann0 -
Functionality of SEOmoz crawl page reports
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration! Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file? For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
Moz Pro | | jimmyzig
<meta name="robots" content="noindex"> </meta name="robots" content="noindex"> This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004 and also at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex"> So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too. Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear. Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described. Anyone? Jim0 -
SEOmoz crawler not crawling my site
We set up a new campaign in SEOmoz on Friday. It is my understanding that the preliminary crawl can cover up to 250 and this has been our experience in the past. However, the preliminary crawl only went through 2 pages. This is a larger eCommerce site with many pages. Any ideas why more pages weren't crawled? We set up the campaign to track at the root domain level.
Moz Pro | | IMM0 -
OK Crawl Test Link Question Again!
I've downloaded a crawl test and column G Link Count reads 62 and yep there are a total of 62 links on the page in question. Column AM Internal Links reads 303 and yep there are somewhere in the order of 303 pages pointing at this one. Root Domains is surprisingly low at 6, so maybe there are only 6 domains linking to this page. BUT... External Links read 51. There are not 51 links pointing away from this domain on this page, no way hozay, so can anybody tell me what is meant by 'External Links? A humble thank you in anticipation of an education. Jem
Moz Pro | | JemRobinson0 -
Crawl Diagnostics - Canonical Question
On one of my sites I have 61 notices for Rel Canonical. Is it bad to have these or is this just something that's informative?
Moz Pro | | kadesmith0 -
Why Is SEOMOZ No Longer crawling All Of My Site
Hi all, I joined Seomoz over a month ago and Roger has been crawling all of the pages on the site approx 20 pages. Through out the last few weeks I have been working on the errors and notices identified by Roger. However, this week Roger has only re-crawled 1 page and is not picking up all the other pages. Has any one come across this problem. can you recommend any thing to resolve it? Many thanks in advance....
Moz Pro | | Dan280