Functionality of SEOmoz crawl page reports
-
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration!
Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file?
For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
<meta name="robots" content="noindex"> </meta name="robots" content="noindex">This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004
and also at
http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex">
So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too.
Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear.
Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described.
Anyone?
Jim
-
Hi Jimmy,
Thanks for writing in with a great question.
In regard to the "noindex" meta tag, our crawler will obey that tag as soon as we find it in the code, but we will also crawl any other source code up until we hit the tag in the code so pages with the "noindex" tag will still show up in the crawl. We just don't crawl any information past that tag. One of the notices we include is "Blocked by meta robots" and for the truthbook.com campaign, we show over 2000 pages under that notice.
For example, on the page http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010, there are six lines of code, including the title, that we would crawl before hitting the "noindex" directive. Google's crawler is much more sophisticated than ours, so they are better at handling the meta robots "noindex" tag.
As for http://truthbook.com/contemplative_prayer/, we do respect the "*" wildcard directive in the robots.txt file and we are not that page. I checked your full CSV report and there is no record of us crawling any pages with /contemplative_prayer/ in the URL (http://screencast.com/t/hMFuQnc9v1S) so we are correctly respecting the disallow directives in the robots.txt file.
Also, if you would ever like to reach out to the Help Team directly in the future, you can email us from the Help Hub here: http://www.seomoz.org/help, but we are happy to answer questions in the Q&A forum, as well.
I hope this helps. Please let me know if you have any other questions.
Chiaryn
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
"On-Page Report Card"- why is still showing " F grade" after introducing the keyword in page and title.
Hello, "On-Page Report Card"- why is still showing " F grade" after introducing the keyword in page and title. After changing the title and putting the keyword inside the title, in this section, "Exact Keyword Usage in Page Title", it shows the first title, without updating my changes. I have updated several times. In some cases worked, in this case doesn't. For example "online project management software" grades F, and "project management software" grades A, even if I've put the "online" word in title an so on. Now I have the same issue with "stock management software" which grades F. "stock management" grades A, even if i've put exactly "stock management software" thanks.
Moz Pro | | directspark0 -
Setting a campaign in SEOmoz
Hi All, when i set a new campaign i am asked to decide if to use: domin-name.com Or www.domain-name.com Can someone please explain the different in terms of the campaign If i use: domain-name.com - will the campaign run on www.domain-name.com too? Thank you SEOwise
Moz Pro | | iivgi0 -
No Crawl data in dashboard
For the second straight week, I have had no crawl data in my dashboard. It seems like the crawler erased all my results in the pro dashboard. Is there a way to manually recrawl my site, since I will have to wait another week to see if it comes back to earth? Thanks
Moz Pro | | bedwards0 -
Missing Page Titles On The Comptetive Link Comparison Page
Hello, When I do a Link Analysis using the SEOmoz tools I have noticed that most of the pages listed on the Top Pages tab show [No Data] for page title. Any idea why that could be? The page source of those pages have one and only one <title>tag.</p> <p>Thanks!</p></title>
Moz Pro | | andersvin0 -
Recent SEOMoz Crawl = Strange Results
Did anyone else get some really strange results in their weekly crawls this week with the campaign tool? Either my ranks sky rocked across three different sites or the tools is busted. Something to the tune of having 4 pages ranking in the top 30 to now having 15-16 pages ranking in the top 30. I'd love to find out it is just all the hard work paying off but i am worried it is the later. Regards - Kyle
Moz Pro | | kchandler0 -
Wild fluctuation in number of pages crawled
I am seeing huge fluctuations in the number of pages discovered the crawl each week. Some weeks the crawl discovers > 10,000 pages and other weeks I am seeing 4-500. So, this week for example I was hoping to see some changes reflected for warnings from last weeks report (which discovered > 10,000 pages). However, the entire crawl this week was 448 pages. The number of pages discovered each week seems to go back and forth between these two extremes. The more accurate count would be nearer the 10,000 mark than the 400 range. Thanks. Mark
Moz Pro | | MarkWill0 -
Why do pages with canonical urls show in my report as a "Duplicate Page Title"?
eg: Page One
Moz Pro | | DPSSeomonkey
<title>Page one</title>
No canonical url Page Two
<title>Page one</title> Page two is counted as being a page with a duplicate page title.
Shouldn't it be excluded?0 -
How do i get to know th pages crawled by SEOMOZ?
My SEOMOZ campaign says that "n" number of pages were crawled. How do i get access to the list of the pages crawled by SEOMOZ?
Moz Pro | | IM_Learner0