Functionality of SEOmoz crawl page reports
-
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration!
Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file?
For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
<meta name="robots" content="noindex"> </meta name="robots" content="noindex">This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004
and also at
http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex">
So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too.
Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear.
Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described.
Anyone?
Jim
-
Hi Jimmy,
Thanks for writing in with a great question.
In regard to the "noindex" meta tag, our crawler will obey that tag as soon as we find it in the code, but we will also crawl any other source code up until we hit the tag in the code so pages with the "noindex" tag will still show up in the crawl. We just don't crawl any information past that tag. One of the notices we include is "Blocked by meta robots" and for the truthbook.com campaign, we show over 2000 pages under that notice.
For example, on the page http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010, there are six lines of code, including the title, that we would crawl before hitting the "noindex" directive. Google's crawler is much more sophisticated than ours, so they are better at handling the meta robots "noindex" tag.
As for http://truthbook.com/contemplative_prayer/, we do respect the "*" wildcard directive in the robots.txt file and we are not that page. I checked your full CSV report and there is no record of us crawling any pages with /contemplative_prayer/ in the URL (http://screencast.com/t/hMFuQnc9v1S) so we are correctly respecting the disallow directives in the robots.txt file.
Also, if you would ever like to reach out to the Help Team directly in the future, you can email us from the Help Hub here: http://www.seomoz.org/help, but we are happy to answer questions in the Q&A forum, as well.
I hope this helps. Please let me know if you have any other questions.
Chiaryn
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 : Errors in crawl report - all pages are listed with index.html on a WordPress site
Hi Mozers, I have recently submitted a website using moz, which has pulled up a second version of every page on the WordPress site as a 404 error with index.html at the end of the URL. e.g Live page URL - http://www.autostemtechnology.com/applications/civil-blasting/ Report page URL - http://www.autostemtechnology.com/applications/civil-blasting/index.html The permalink structure is set as /%postname%/ For some reason the report has listed every page with index.html at the end of the page URL. I have tried a number of redirects in the .htaccess file but doesn't seem to work. Any suggestions will be strongly appreciated. Thanks
Moz Pro | | AmanziDigital0 -
Campaign. Only 1 page is crawled
I have a campaign setup a couple weeks ago and noticed that only 1 page has been crawled. Is there something I need to do to get all pages crawled?
Moz Pro | | priceseo0 -
Campaigns - crawled
The new Pages Crawled: 2. I have many 404 and other errors, I wanted to start working on it tomorrow but the new crawl only crawled to pages and doesn't show any errors. Whats the problem and what can I do? Yoseph
Moz Pro | | Joseph-Green-SEO0 -
Crawl Report Warnings
How much notice should be paid to the warnings on the SEO Moz crawl reports? We manage a fairly large property site and a lot of the errors on the crawl reports relate to automated responses. As a matter of priority which of the list below will have negative affects with the search engines? Temporary RedirectToo Many On-Page LinksOverly-Dynamic URLTitle Element Too Long (> 70 Characters)Title Missing or EmptyDuplicate Page ContentDuplicate Page TitleMissing Meta Description Tag
Moz Pro | | SoundinTheory0 -
Reports for page titles
Is there a report I can run on SEOmoz that shows me the page titles for all pages on my website, along with the link to each page?
Moz Pro | | TalarMade0 -
Is There a Way to Create Branded Campaign Reports in seomoz.org Campaigns?
We have a few exciting clients we about to start working with and I'm looking for a full campaign management solution that allows for branded reporting. Because we are an agency, is there an option I'm missing that allows you to swap your logo with the seomoz.org logo?
Moz Pro | | stevewiideman0 -
"Issue: Duplicate Page Content " in Crawl Diagnostics - but these pages are noindex
Hello guys, our site is nearly perfect - according to SEOmoz campaign overview. But, it shows me 5200 Errors, more then 2500 Pages with Duplicate Content plus more then 2500 Duplicated Page Titles. All these pages are sites to edit profiles. So I set them "noindex, follow" with meta robots. It works pretty good, these pages aren't indexed in the search engines. But why the SEOmoz tools list them as errors? Is there a good reason for it? Or is this just a little bug with the toolset? The URLs which are listet as duplicated are http://www.rimondo.com/horse-edit/?id=1007 (edit the IDs to see more...) http://www.rimondo.com/movie-edit/?id=10653 (edit the IDs to see more...) The crawling picture is still running, so maybe the errors will be gone away in some time...? Kind regards
Moz Pro | | mdoegel0