Issue in number of pages crawled
-
i wanted to figure out how our friend Roger Bot works.
On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.
Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.
I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.
Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.
Thanks!
-
Hey Chirag
That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.
If a single page resolves on multiple versions of the URL like...
/pagename
/pagename/
/pagename.html
Then one single page could get reported as three pieces of content.
So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.
Hope that helps.
Marcus -
Hi Marcus,
Thanks for the reply.
Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.
the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.
Cheers
-
Hey Chirag
As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why my site not crawl?
hi all me allow rogerbot in robots.txt but rogerbot can't crawl my site my site: toska-co.ir
Moz Pro | | jahanidawodi0 -
Moz crawl duplicate pages issues
Hi According to the moz crawl on my website I have in the region of 800 pages which are considered internal duplicates. I'm a little puzzled by this, even more so as some of the pages it lists as being duplicate of another are not. For example, the moz crawler considers page B to be a duplicate of page A in the urls below: Not sure on the live link policy so ive put a space in the urls to 'unlive' them. Page A http:// nuchic.co.uk/index.php/jeans/straight-jeans.html?manufacturer=3751 Page B http:// nuchic.co.uk/index.php/catalog/category/view/s/accessories/id/92/?cat=97&manufacturer=3603 One is a filter page for Curvety Jeans and the other a filter page for Charles Clinkard Accessories. The page titles are different, the page content is different so Ive no idea why these would be considered duplicate. Thin maybe, but not duplicate. Like wise, pages B and C are considered a duplicate of page A in the following Page A http:// nuchic.co.uk/index.php/bags.html?dir=desc&manufacturer=4050&order=price Page B http:// nuchic.co.uk/index.php/catalog/category/view/s/purses/id/98/?manufacturer=4001 Page C http:// nuchic.co.uk/index.php/coats/waistcoats.html?manufacturer=4053 Again, these are product filter pages which the crawler would have found using the site filtering system, but, again, I cannot find what makes pages B and C a duplicate of A. Page A is a filtered result for Great Plains Bags (filtered from the general bags collection). Page B is the filtered results for Chic Look Purses from the Purses section and Page C is the filtered results for Apricot Waistcoats from the Waistcoat section. I'm keen to fix the duplicate content errors on the site before it goes properly live at the end of this month - that's why anyone kind enough to check the links will see a few design issues with the site - however in order to fix the problem I first need to work out what it is and I can't in this case. Can anyone else see how these pages could be considered a duplicate of each other please? Checking ive not gone mad!! Thanks, Carl
Moz Pro | | daedriccarl0 -
Aren't domain.com/page and domain.com/page/ the same thing?
Hi All, A recent Moz scan has turned up quite a few duplicate content notifications, all of which have the same issue. For instance: domain.com/page and domain.com/page/ are listed as duplicates, but I was under the impression that these pages would, in fact, be the same page. Is this even something to bother fixing or a fluke scan? If I should fix it does anyone know of an .htaccess modification that might be used? Thanks!
Moz Pro | | G2W0 -
Crawl Disgnosis only crawling 250 pages not 10,000
My crawl diagnosis has suddenly dropped from 10,000 pages to just 250. I've been tracking and working on an ecommerce website with 102,000 pages (www.heatingreplacementparts.co.uk) and the history for this was showing some great improvements. Suddenly the CD report today is showing only 250 pages! What has happened? Not only is this frustrating to work with as I was chipping away at the errors and warnings, but also my graphs for reporting to my client are now all screwed up. I have a pro plan and nothing has (or should have!) changed.
Moz Pro | | eseyo0 -
Truncate page URLs
We have some pages (for example a contact us form) for which the URL is modified by the CMS depending on the referring page (this helps to put the form submission in context for the sales reps who get the contact submission). The SEOmoz crawler considers each URL a new page -- and so numbers like in diagnostics are all inflated as the same page is listed multiple times (e.g. for too many links) Is there a setting to change what the crawler considers to be the same page? Here are two URLs for the same page that the reports treat as separate pages: http://www.spirent.com/About-Us/Contact_us.aspx?referurl=0F528F4D703D8BB3523738D6373AA8AD http://www.spirent.com/About-Us/Contact_us.aspx?referurl=10ACDA6055244E369395223437FDCF30 The page is actually: http://www.spirent.com/About-Us/Contact_us.aspx Thanks Ken
Moz Pro | | spirent.marcom0 -
On Page Analysis and Grading
I am new here and happy to be! My site is an ecommerce site with hundreds of products. I have set up campaigns to track specific products. For the on page analysis where SEOMOZ gives you a grade I have 2 urls showing. But 1 of the urls is getting an A, and 1 is getting a F. But they are the same url and obviously go to the same page. Any help would be appreciated!
Moz Pro | | Confections0 -
HTTP 404 for 404-page?
Hi Mozzers! SEOmoz just finished crawling one of my websites and this crawl found 3 errors. One of these errors was the (custom) 404-page, because of the http-status 404. What's you suggestion about this? Should a 404-page have a status 404? Thanks in advance for your suggestions!
Moz Pro | | Partouter0 -
Crawl Rate for Lower Page Authority Websites
Hi,At thumbtack.com we get tons of links from low (or no) page authority websites, and I'm wondering what the crawl rate of those links looks like. I know Google pulls in the web at an astonishing rate, but I'd imagine they aren't re-crawling lower PA very frequently.Are they discovering these links a week after they're posted? A month? More? I spent a while looking around for histograms of actual crawl rates and found surprisingly little. I'd love to see average crawl rate by Domain or Page Authority if that exists anywhere.
Moz Pro | | Thumbtack
Thanks!-MichaelP.S. Here are some random examples of the types of pages with inbound links I'm talking about. Normally we wouldn't spend too much time thinking about these, but there's just so many of them we can't ignore it!- http://www.majestic-cleaners.webs.com/- http://domchieraphotography.blogspot.com/- http://charlottepiano.musicteachershelper.com/- http://pin-upgirlphotography.vpweb.com/default.html- http://jfaithful.weebly.com/0