Issue in number of pages crawled
-
i wanted to figure out how our friend Roger Bot works.
On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.
Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.
I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.
Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.
Thanks!
-
Hey Chirag
That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.
If a single page resolves on multiple versions of the URL like...
/pagename
/pagename/
/pagename.html
Then one single page could get reported as three pieces of content.
So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.
Hope that helps.
Marcus -
Hi Marcus,
Thanks for the reply.
Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.
the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.
Cheers
-
Hey Chirag
As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate pages coming from links from the login page - what should we do about them?
This is a follow on to an earlier question which was well answered by Dirk Ceuppens regarding abnormal crawl issues. We are seeing that the issues relating to Duplicate Pages are coming from links from the login page which shows information about where the user was redirected from. For example, if the visitor is not logged on and wishes to wish-list an item, they will be redirected to the login page, with the item code and intended action in the url; which can then continue on to the desired page once logged on. The MOZ crawler is seeing these pages as having Duplicated Content whilst they are all the same apart from a piece of information in the URL. Should we be blocking these duplications? Are they a risk to us? What should we be doing? Many thanks, Sarah
Moz Pro | | Mutatio_Digital0 -
Crawl diagnostics up to date after Magento ecommerce site crawl?
Howdy Mozzers, I have a Magento ecommerce website and I was wondering if the data (errors/warnings) from the Crawl diagnostics are up to date. My Magento website has 2.439 errors, mainly 1.325 duplicate page content and 1.111 duplicate page title issues. I already implemented the Yoast meta data plugin that should fix these issues, however I still see there errors appearing in the crawl diagnostics, but when going to the mentioned URL in the crawl diagnostics for e.g.: http://domain.com/babyroom/productname.html?dir=desc&targetaudience=64&order=name and checking the source code and searching for 'canonical' I do see: http://domain.com/babyroom/productname.html" />. Even I checked the google serp for url: http://domain.com/babyroom/productname.html?dir=desc&targetaudience=64&order=name and I couldn't find the url indexed in Google. So it basically means the Yoast meta plugin actually worked. So what I was wondering is why I still see the error counted in the crawl diagnostics? My goal is to remove all the errors and bring it all to zero in the crawl diagnostics. And now I am still struggling with the "overly-dynamic URL" (1.025) and "too many on-page links" (9.000+) I want to measure whether I can bring the warnings down after implementing an AJAX-based layered navigation. But if it's not updating it here crawl diagnostics I have no idea how to measure the success of eliminating the warnings. Thanks for reading and hopefully you all can give me some feedback.
Moz Pro | | videomarketingboys0 -
Duplicate Page Title - although there are differences
Hello, I get duplicate page titles errors on pages in which there are little differences. For example: C++ Online Test for Seniors C# Online Test for Seniors I assume that from some reason the ++ and the # are removed when SEOMoz crawler checks for duplicate page titles. As you may know C# and C++ means two different programming languages. Should I do something about it or is it a bug in the crawler?
Moz Pro | | ulukach0 -
Functionality of SEOmoz crawl page reports
I am trying to find a way to ask SEOmoz staff to answer this question because I think it is a functionality question so I checked SEOmoz pro resources. I also have had no responses in the Forum too it either. So here it is again. Thanks much for your consideration! Is it possible to configure the SEOMoz Rogerbot error-finding bot (that make the crawl diagnostic reports) to obey the instructions in the individual page headers and http://client.com/robots.txt file? For example, there is a page at http://truthbook.com/quotes/index.cfm month=5&day=14&year=2007 that has – in the header -
Moz Pro | | jimmyzig
<meta name="robots" content="noindex"> </meta name="robots" content="noindex"> This page is themed Quote of the Day page and is duplicated twice intentionally at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2004 and also at http://truthbook.com/quotes/index.cfm?month=5&day=14&year=2010 but they all have <meta name="robots" content="noindex"> in them. So Google should not see them as duplicates right. Google does not in Webmaster Tools.</meta name="robots" content="noindex"> So it should not be counted 3 times? But it seems to be? How do we gen a report of the actual pages shown in the report as dups so we can check? We do not believe Google sees it as a duplicate page but Roger appears too. Similarly, one can use http://truthbook.com/contemplative_prayer/ , here also the http://truthbook.com/robots.txt tells Google to stay clear. Yet we are showing thousands of dup. page content errors when Google Webmaster tools as shown only a few hundred configured as described. Anyone? Jim0 -
Crawl Diagnostics: Next crawl date is in the past
Hi - I have quite a few crawl diagnostic errors and warnings. I have attempted to fix many of them but noticed this note at the bottom of the crawl diagnostics chart: "Last Crawl Completed: Mar. 22nd, 2013 Next Crawl Starts: Mar. 29th, 2013" It looks like SEOMoz thinks the next crawl date is Mar 29th, 2013, which is two weeks ago. Is there any way to "force" the crawl and get it back on regular schedule? This may have happened when my account was disabled because my credit card expired...Thoughts?
Moz Pro | | 6thirty0 -
Issue with Data Updates on Campaign
I'm reviewing a campaign I am running for a client, the Keywords data has not updated yet even though it's supposed to update every monday. The competitive analysis shows a date of August 14, which is odd becuase I'm on a trial account?
Moz Pro | | amerihope0 -
Keywords Best Practices for On-Page Optimization
Hi guys, we've successfully optimized our home page such that it receives a Grade A for 3 completely different, high traffic keywords. Looking forward to seeing the results! The keywords in question were identified by using the monthly searches reported from the Google Keyword Tool. For one of the keywords, the Google Keyword Tool differentiates between what I thought would be seen as being the same. For example, let's say Google reports these three keywords as high traffic keywords: tea cup
Moz Pro | | yacpro13
tea cups
the tea cup Using the On-Page Report Card, we get a Grade A for 'tea cup', but we get an F for the other 2 terms! I thought Google searches didn't really care about the plural form or adding the word 'the' in front. How should we interpret the result from the On-Page Report Card for the plural form of the keyword and with the word 'the' added in front? Would you track all 3 instances of the keyword independtly in your campaign, or would you just track 'tea cup'? Thanks!0 -
Question about when new crawls start
Hi everyone, I'm currently using the trial of seomoz and I absolutely love what I'm seeing. However, I have 2 different websites (one has over 10,000 pages and one has about 40 pages). I've noticed that the smaller website is crawled every few days. However, the larger site hasn't been crawled in a few days. Although both campaigns state that the sites won't be crawled until next Monday, is there any way to get the crawl to start sooner on the large site? The reason that I've asked is that I've implemented some changes that will likely decrease the amount of pages that are crawled simply based upon the recommendations on this site. So, I'm excited to see the potential changes. Thanks, Brian
Moz Pro | | beeneeb0