Functionality of SEOmoz crawl page reports
-
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration!
Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file?
For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
<meta name="robots" content="noindex"> </meta name="robots" content="noindex">This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004
and also at
http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex">
So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too.
Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear.
Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described.
Anyone?
Jim
-
Hi Jimmy,
Thanks for writing in with a great question.
In regard to the "noindex" meta tag, our crawler will obey that tag as soon as we find it in the code, but we will also crawl any other source code up until we hit the tag in the code so pages with the "noindex" tag will still show up in the crawl. We just don't crawl any information past that tag. One of the notices we include is "Blocked by meta robots" and for the truthbook.com campaign, we show over 2000 pages under that notice.
For example, on the page http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010, there are six lines of code, including the title, that we would crawl before hitting the "noindex" directive. Google's crawler is much more sophisticated than ours, so they are better at handling the meta robots "noindex" tag.
As for http://truthbook.com/contemplative_prayer/, we do respect the "*" wildcard directive in the robots.txt file and we are not that page. I checked your full CSV report and there is no record of us crawling any pages with /contemplative_prayer/ in the URL (http://screencast.com/t/hMFuQnc9v1S) so we are correctly respecting the disallow directives in the robots.txt file.
Also, if you would ever like to reach out to the Help Team directly in the future, you can email us from the Help Hub here: http://www.seomoz.org/help, but we are happy to answer questions in the Q&A forum, as well.
I hope this helps. Please let me know if you have any other questions.
Chiaryn
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz crawl only shows 2 pages, but we have more than 1000 pages.
Hi Guys Is there anyway we can test Moz crawler ? it showing only 2 pages crawls. We are running website on HTTPS ? Is HTTPS is issues for Moz ?
Moz Pro | | dotlineseo0 -
Duplicate page report
We ran a CSV spreadsheet of our crawl diagnostics related to duplicate URLS' after waiting 5 days with no response to how Rogerbot can be made to filter. My IT lead tells me he thinks the label on the spreadsheet is showing “duplicate URLs”, and that is – literally – what the spreadsheet is showing. It thinks that a database ID number is the only valid part of a URL. To replicate: Just filter the spreadsheet for any number that you see on the page. For example, filtering for 1793 gives us the following result: | URL http://truthbook.com/faq/dsp_viewFAQ.cfm?faqID=1793 http://truthbook.com/index.cfm?linkID=1793 http://truthbook.com/index.cfm?linkID=1793&pf=true http://www.truthbook.com/blogs/dsp_viewBlogEntry.cfm?blogentryID=1793 http://www.truthbook.com/index.cfm?linkID=1793 | There are a couple of problems with the above: 1. It gives the www result, as well as the non-www result. 2. It is seeing the print version as a duplicate (&pf=true) but these are blocked from Google via the noindex header tag. 3. It thinks that different sections of the website with the same ID number the same thing (faq / blogs / pages) In short: this particular report tell us nothing at all. I am trying to get a perspective from someone at SEOMoz to determine if he is reading the result correctly or there is something he is missing? Please help. Jim
Moz Pro | | jimmyzig0 -
SEOMoz Question
Hi, I have taken over SEO on a real estate site with an internal blog. Unfortunately there are loads of duplicated pages and titles in the blog. It was suggested that all should be rel=canonical so not to show up. In my last crawl here though they still do. So question is if SEOMoz crawls and sees them is Google also seeing them? Also would it be best to move the blog off site so this does not cause anymore damage and just link to it from the main site? Thanks for your comments
Moz Pro | | AkilarOffice0 -
SEOMoz Software
I want to start off with stating that i am truly an advocate of SEOMoz and the great stuff they have done for the inbound community that we all know and love. I've been an active member since July 2010 and a paying pro member since December 2010. The software has always been monumental in helping my clients achieve their goals. However, in the past few months i have received nothing short of buggy unreliable software. The keyword difficulty tool never returns difficulty results. The Adwords data has been gone since i can remember. The rank tracker tool is successfull close to 1 out of 5 times. OSE is updated terribly slow compared to competitors. Plus, I have had to write emails to get my campaigns to be manually refreshed to see new ranking data. I have simply missed deadlines because my data is always delayed or missing from the software. Am i an anomaly here? does anyone have these problems? I have been researching some new tools as a replacement but i have yet to find anything as robust as the old SEOMoz. I'd love some feedback. Cheers - Kyle
Moz Pro | | kchandler0 -
Broken links in the pdf of the On Page Report
Hi, I run an individual On Page report for a particular URL, then I export as pdf. The URL appears in the pdf and looks fine but when you click on it it goes to a 'page not found'. I know the URL is correct. When I hover over the URL in the pdf i notice that the word 'Good' is at the end of my URL but I did not put this in there. if I give the report to a client it doesn't look so good. http://www.narellanpools.com.au/local-contact/narellan-pools-alburywodongaGood Is this a bug? Cheers Virginia
Moz Pro | | VirginiaC0 -
Drop in number of Pages crawled by Moz crawler
What would cause a sudden drop in the number of pages crawled/accessed by the Moz crawler? The site has about 600 pages of content. We have multiple campaigns set up in our Pro account to track different keyword campaigns- but all for the same domain. Some show 600+ pages accessed, while others only access 7 pages for the same domain. What could be causing these issues?
Moz Pro | | AllaO0 -
SEOmoz PRO Custom Reports Issues
This post is for all PRO members, On January 2, 2012, we found a serious bug in our custom reporting system that made it possible for some of our customers to see other customers’ custom reports. This affected less than 1% of our customers in total (.6 % to be exact). Fewer than 100 reports could potentially be viewed by another customer. We are taking this matter very seriously. 98 accounts were affected and we will be contacting those people individually via their account email addresses. If you do not receive an email from us, don’t worry - you were not affected. Soon after we discovered the problem, we completely turned off access to these reports; even though people may have been provided access previously, the access was limited to a short time period. They no longer have access to the reports whatsoever. The issue has now been resolved and all reports moving forward are functioning properly. For more information on what caused the issue, how it's been resolved and any other updates on the issue, follow the thread here: http://seomoz.zendesk.com/entries/20836338-custom-reports-issues-resolved. We'll add any updates there. Also, we'd like to have all comments/questions from members in one place, so please use the Zendesk link for comments. We really appreciate your patience while we work through this. If you have any questions or concerns please feel free to contact us at help@seomoz.org - we're always happy to help. Sincerely, Jen Lopez
Moz Pro | | jennita
Community Manager, SEOmoz0 -
Domain vs Page
I see a lot of different metrics pointing to domain or page. What is the difference between these two definitions?
Moz Pro | | Gfrink0