Functionality of SEOmoz crawl page reports
-
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration!
Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file?
For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
<meta name="robots" content="noindex"> </meta name="robots" content="noindex">This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004
and also at
http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex">
So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too.
Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear.
Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described.
Anyone?
Jim
-
Hi Jimmy,
Thanks for writing in with a great question.
In regard to the "noindex" meta tag, our crawler will obey that tag as soon as we find it in the code, but we will also crawl any other source code up until we hit the tag in the code so pages with the "noindex" tag will still show up in the crawl. We just don't crawl any information past that tag. One of the notices we include is "Blocked by meta robots" and for the truthbook.com campaign, we show over 2000 pages under that notice.
For example, on the page http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010, there are six lines of code, including the title, that we would crawl before hitting the "noindex" directive. Google's crawler is much more sophisticated than ours, so they are better at handling the meta robots "noindex" tag.
As for http://truthbook.com/contemplative_prayer/, we do respect the "*" wildcard directive in the robots.txt file and we are not that page. I checked your full CSV report and there is no record of us crawling any pages with /contemplative_prayer/ in the URL (http://screencast.com/t/hMFuQnc9v1S) so we are correctly respecting the disallow directives in the robots.txt file.
Also, if you would ever like to reach out to the Help Team directly in the future, you can email us from the Help Hub here: http://www.seomoz.org/help, but we are happy to answer questions in the Q&A forum, as well.
I hope this helps. Please let me know if you have any other questions.
Chiaryn
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Reporting a brand page as KW stuffed for the the brand KW, not a problem i presume ?
Hi if I'm getting a keyword stuffing warning on Moz for a brand page for the brand name kw that shouldn't really matter should it? Moz just oversimplifying it since seen the brand name says 27 times on a page think it's stuffed, but if its a 2000 word page all about the brand, those occurrences are totally natural & hence Google will think so too generally speaking? Thanks Dan
Moz Pro | | Dan-Lawrence0 -
Is www.domain.com/page the same url as www.domain.com/page/ for Google? (extra slash at end of url)
Dear all, in open site explorer there is a difference the url's 'www.domain.com/page' and 'www.domain.com/page/' (extra slash at end). There can be different values in pageauthority etc. in the open site explorer tool, but is this also the case for Google? Thanks for replying, Regards, Ben
Moz Pro | | HMK-NL0 -
1 page crawled - again
Just had to let you know that it happend again. So right now we are at 2 out of the last 4 crawls. Uptime here is 99,8% for the last 30 days, with a small downtime due to an update process at the 18/5 from around 2:30 to 4:30 GMT In relation to: http://moz.com/community/q/1-page-crawled-and-other-errors
Moz Pro | | alsvik0 -
Duplicate content in SEOMOZ report
Hi guys, The SEOMOZ report shows there is duplicate content on my Magento ecommerce: footdistrict.com Example: http://footdistrict.com/nike-air-royalty-386169602.html?___store=footdistrict_en Duplicate content shown on the report: http://footdistrict.com/marcas/puma.html?___store=footdistrict_en
Moz Pro | | footd
http://footdistrict.com/new-balance-m400rk.html?___store=footdistrict_en
http://footdistrict.com/new-balance-gm500mbn.html?___store=footdistrict_en
http://footdistrict.com/new-balance-m400nnb.html?___store=footdistrict_en My guess is that this is due to the fixed footer that we have set where modal windows pop up with delivery info and so on. As such, all the content within it is repeated through all the pages What do you recommend me to remove this duplicate content? I have read about duplicate content issues but they don't usually deal with div tag duplicate issues, modal windows and so on. Thanks Regards0 -
How do I force a crawl?
In the campaign overview it reads that 0 pages were crawled. Also got an email saying that a comprehensive audit will be done in 7 days. But the 'crawl in progress' wheel disappeared. I think it stopped, and I need to submit that report to substantiate buying the tool! How do I force a crawl?
Moz Pro | | ilhaam0 -
Is it possible to override the 10k pages crawl limit on PRO?
Hi There, Just signed up for PRO and I love it! We have a particularly large website (tons of content) and the 10,000 page limit is holding us back from getting really exhaustive analysis. Is there any way to up the limit for a single crawl? Thanks!
Moz Pro | | Richline_Digital0 -
Crawl Diagnostics Report Lacks Information
When I look at the crawl diagnostics, SEOMoz tells me there are 404 errors. This is understandable, because some pages were removed. What this report doesn't tell me is how those pages were discovered. This is a very important piece of information, because it would tell me there are links pointing to those pages, either internal or external. I believe the internal links have been removed. If the report told me how if found the link, I would be able to take immediate action. Without that information, I have to go so a lot of investigation. And when you have a million pages, that isn't easy. Some possibilities: The crawler remembered the page from the previous crawl. There was a link from an index page - i.e. it is in the database still There was an individual link from another story - so now there are broken links Ditto, but it in on a static index page The link was from an external source - I need to make a redirect Am I missing something, or is this a feature the SEO Moz crawler doesn't have yet? What can I do (other than check all my pages) to discover this?
Moz Pro | | loopyal0 -
Discrepancies in PA and LRDs reported in different SEOmoz tools
I've noticed a difference in the reported PA and LRD numbers for URLs depending on whether you use Open Site Explorer, or look at the same metrics from within the rankings history (in your campaign set up). I've checked this for a few URLs and what I'm seeing is the reported scores for PA and LRDs is different 9 times out of ten. The PA is sometiomes higher on one report, lower on another, or vice versa. Same for LRDs. I thought it might be because one report was lagging behind and using old data, but that would only make sense if I was seeing an increase in reported LRDs, but it just as often shows a decrease ! Is this just a bug in the campaign>rankings history report or is there a reason for the discrepancies?
Moz Pro | | Websensejim0