Functionality of SEOmoz crawl page reports
-
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration!
Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file?
For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
<meta name="robots" content="noindex"> </meta name="robots" content="noindex">This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004
and also at
http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex">
So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too.
Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear.
Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described.
Anyone?
Jim
-
Hi Jimmy,
Thanks for writing in with a great question.
In regard to the "noindex" meta tag, our crawler will obey that tag as soon as we find it in the code, but we will also crawl any other source code up until we hit the tag in the code so pages with the "noindex" tag will still show up in the crawl. We just don't crawl any information past that tag. One of the notices we include is "Blocked by meta robots" and for the truthbook.com campaign, we show over 2000 pages under that notice.
For example, on the page http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010, there are six lines of code, including the title, that we would crawl before hitting the "noindex" directive. Google's crawler is much more sophisticated than ours, so they are better at handling the meta robots "noindex" tag.
As for http://truthbook.com/contemplative_prayer/, we do respect the "*" wildcard directive in the robots.txt file and we are not that page. I checked your full CSV report and there is no record of us crawling any pages with /contemplative_prayer/ in the URL (http://screencast.com/t/hMFuQnc9v1S) so we are correctly respecting the disallow directives in the robots.txt file.
Also, if you would ever like to reach out to the Help Team directly in the future, you can email us from the Help Hub here: http://www.seomoz.org/help, but we are happy to answer questions in the Q&A forum, as well.
I hope this helps. Please let me know if you have any other questions.
Chiaryn
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages with Temporary Redirects on pages that don't exist!
Hi There Another obvious question to some I hope. I ran my first report using the Moz crawler and I have a bunch of pages with temporary redirects as a medium level issue showing up. Trouble is the pages don't exist so they are being redirected to my custom 404 page. So for example I have a URL in the report being called up from lord only knows where!: www.domain.com/pdf/home.aspx This doesn't exist, I have only 1 home.aspx page and it's in the root directory! but it is giving a temp redirect to my 404 page as I would expect but that then leads to a MOZ error as outlined. So basically you could randomize any url up and it would give this error so I am trying to work out how I deal with it before Google starts to notice or before a competitor starts to throw all kinds at my site generating these errors. Any steering on this would be much appreciated!
Moz Pro | | Raptor-crew0 -
Getting my top keywords separated out in SEOmoz reports
I am using the standard functionality to produce weekly Moz reports. There does not seem to be a setting to show rankings of my most important keywords. It would be nice to have those high-level keywords on the first page. For example, I have 200 keywords in an account. I want the report to show on a page my 10 most important keywords. Is there a way to set up a Label to my keywords in order to product a report page just for those keywords?
Moz Pro | | clicktoshop0 -
What SeoMoz tool am I thinking of?
A few months ago I found a tool on Moz that did keyword link research. It is not keyword analysis either. It took a word and turned it into links such as dir:cabinet. I can't find it or remember or it. Does anyone else know? (I know this description sucks, sorry about that)
Moz Pro | | EcommerceSite0 -
How come the linking root domains doesn't download to the cvs when I try to create a "Top Pages" report?
How come the linking root domains tab doesn't download to the cvs when I try to create a "Top Pages" report?
Moz Pro | | mrmworldwidesearch0 -
SEOMoz Software
I want to start off with stating that i am truly an advocate of SEOMoz and the great stuff they have done for the inbound community that we all know and love. I've been an active member since July 2010 and a paying pro member since December 2010. The software has always been monumental in helping my clients achieve their goals. However, in the past few months i have received nothing short of buggy unreliable software. The keyword difficulty tool never returns difficulty results. The Adwords data has been gone since i can remember. The rank tracker tool is successfull close to 1 out of 5 times. OSE is updated terribly slow compared to competitors. Plus, I have had to write emails to get my campaigns to be manually refreshed to see new ranking data. I have simply missed deadlines because my data is always delayed or missing from the software. Am i an anomaly here? does anyone have these problems? I have been researching some new tools as a replacement but i have yet to find anything as robust as the old SEOMoz. I'd love some feedback. Cheers - Kyle
Moz Pro | | kchandler0 -
Can Google see all the pages that an seomoz crawl picks up?
Hi there My client's site is showing around 90 pages indexed in Google. The seomoz crawl is returning 1934 pages. Many of the pages in the crawl are duplicates, but there are also pages which are behind the user login. Is it theoretically correct to say that if a seomoz crawl finds all the pages, then Google has the potential to as well, even if they choose not to index? Or would Google not see the pages behind the login? And how come seomoz can see the pages? Many thanks in anticipation! Wendy
Moz Pro | | Chammy0 -
Difference in data between http://pro.seomoz.org/tools/keyword-difficulty and http://lsapi.seomoz.com/linkscape/url-metrics/
Hi, Has any once else experienced any difference in data between http://lsapi.seomoz.com/linkscape/url-metrics/ and http://pro.seomoz.org/tools/keyword-difficulty Please look at the attached image. For "http://www.webmd.com/diet/guide/choosing-weight-loss-program" and "http://www.freedieting.com/" page authority and domain authority match exactly. But for "http://www.fitnessmagazine.com/weight-loss/plans/" data does not match. The data from "http://lsapi.seomoz.com/linkscape/url-metrics/" was retrieved brely 60 seconds latter after data from "http://pro.seomoz.org/tools/keyword-difficulty". We used our custom app for retrieve data from "http://lsapi.seomoz.com/linkscape/url-metrics/". The columns were matched against the specs given in "http://apiwiki.seomoz.org/w/page/13991153/URL-Metrics-API". We are retrieving following columns 1)ut(Title) 2)ueid(External Links) 3)uid(Links) 4)umrp(mozRank) 5)upa(Page Authority) 6)pda(Domain Authority) Any help will be greatly appreciated. zvFif.jpg
Moz Pro | | claytons0 -
Too Many On-Page Links: Crawl Diag vs On-Page
I've got a site I'm optimizing that has thousands of 'too many links on-page' warnings from the SeoMoz crawl diagnostic. I've been in there and realized that there are indeed, the rent is too damned high, and it's due to a header/left/footer category menu that's repeating itself. So I changed these links to NoFollow, cutting my total links by about 50 per page. I was too impatient to wait for a new crawl, so I used the On Page Reports to see if anything would come up on the Internal Link Count/External Link Count factors, and nothing did. However, the crawl (eventually) came back with the same warning. I looked at the link Count in the crawl details, and realized that it's basically counting every single '<a href'="" on="" the="" page.="" because="" of="" this,="" i="" guess="" my="" questions="" are="" twofold:<="" p=""></a> <a href'="" on="" the="" page.="" because="" of="" this,="" i="" guess="" my="" questions="" are="" twofold:<="" p="">1. Is no-follow a valid strategy to reduce link count for a page? (Obviously not for SeoMoz crawler, but for Google)</a> <a href'="" on="" the="" page.="" because="" of="" this,="" i="" guess="" my="" questions="" are="" twofold:<="" p="">2. What metric does the On-Page Report use to determine if there are too many Internal/External links? Apologies if this has been asked, the search didn't seem to come up with anything specific to this.</a>
Moz Pro | | icecarats0