Issue in number of pages crawled
-
i wanted to figure out how our friend Roger Bot works.
On the first crawl of one of my large sites, the number of pages crawled stopped at 10000 (due to the restriction on the pro account). However after a few weeks, the number of pages crawled went down to about 5500. This number seemed to be a more accurate count of the pages on our site.
Today, it seems that Roger Bot has completed another crawl and the number is up to 10000 again.
I know there has been no downtime on our site, and the items that we fixed on our site did not reduce or increase the number of pages we had.
Just making sure there are no known issues with Roger Bot before I look deeper into our site to see if there is an issue.
Thanks!
-
Hey Chirag
That is the point, if the crawler is seeing multiple versions of the same page, you will get a false page count.
If a single page resolves on multiple versions of the URL like...
/pagename
/pagename/
/pagename.html
Then one single page could get reported as three pieces of content.
So, if you have 100 pages, but all pages resolve on say two page names then it would show 200 pages BUT the duplicate content report should allow you to see if this is the case.
Hope that helps.
Marcus -
Hi Marcus,
Thanks for the reply.
Yes the duplicate content report is quite large, but I am not certain why the number of pages crawled fluctuated by over 4000.
the Duplicate content number went down by over 2000 last week, and then went straight back up again. So I am not sure if the crawler missed something, or if there was some other issue going on.
Cheers
-
Hey Chirag
As a first suggestion, I would take a look at the duplicate content report and you may see some pages with multiple page names / urls giving a falsely inflated page count.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site explorer Issue
Hello, I'm looking to see in the Site Explorer the links coming from directories such as BOW, yahoo etc. I'm listed there from almost 1 year and these links are not listed, the same with my competitors. I'm missing something? Thank you Claudio
Moz Pro | | SharewarePros0 -
How to make a Crawl Report readable?
Hi! I am trying to find out how I make my CSV report neat, so I can interprete the report. I know have a CSV report with just numbers and text all in one column. I tried the button text to columns but that doesn't work because when I do that at column A it overwrites column B in which I have the same problem! Thanks
Moz Pro | | HetCommunicatielokaal0 -
Too many on-page links
I received a warning in my most recent report for too many on-page links for the following page: http://www.fateyes.com/blog/. I can't figure out why this would be. I am counting between 60-70 including all pull downs, "read more's", archive, category and a few additional misc. links. Any ideas or suggestions on this? Or what I might do to rectify? Perhaps it's just an SEOmoz report blip... We currently don't have the post list rolling to additional pages so it's kind of passively set up to be endless, but it's in the works.
Moz Pro | | gfiedel0 -
Crawl Errors from URL Parameter
Hello, I am having this issue within SEOmoz's Crawl Diagnosis report. There are a lot of crawl errors happening with pages associated with /login. I will see site.com/login?r=http://.... and have several duplicate content issues associated with those urls. Seeing this, I checked WMT to see if the Google crawler was showing this error as well. It wasn't. So what I ended doing was going to the robots.txt and disallowing rogerbot. It looks like this: User-agent: rogerbot Disallow:/login However, SEOmoz has crawled again and it still picking up on those URLs. Any ideas on how to fix? Thanks!
Moz Pro | | WrightIMC0 -
Issue with Opesiteexplorer report
Hi, When I pulled backlinks report for my website from opensite explorer. I see the backlinks listing for my homepage is duplicated with index page. Ex. For a external website my homepage www.website.com as well as www.website.com/index.php shows as backlinks. Note: I had given rel=canonical tag to my homepage and index.php is 301 redirection to the homepage. I find this error only to my homepage and I don't find this issue to my inner pages. I would like to know whether this is an issue with opensiteexplorer or a problem with my homepage url. Your answers are appreciated.
Moz Pro | | massimobrogi0 -
How do I scan down to 10000 pages?
Hi very new here I have set up 5 campaigns, all of fairly large sites. It appears seomoz has scanned 4 of them down to 250 and 1 down to 10000. the one a really want to see down to 10000, my own site is the one I started scanning first well over a week ago. How do I get seomoz to scan further? Thanks
Moz Pro | | First-VehicleLeasing0 -
Crawl Test produced only 1 page
Hi, I recently submitted a crawl for www.cirrato.com using SEOMoz Crawl Test Tool. I have a lot of pages, but the crawl result shows only 1 page, which is the front page and nothing else... Does anyone know what this could mean or what the problem is?
Moz Pro | | yusufcirrato0 -
On the Crawl Diagnostics Summary, its reporting over 100 "Title Missing or Empty" issues, but they all check out fine?
Wondering if there Is a bug with the crawler or known timeout issues? Site speed is fast, but we do run a couple of large cron jobs out of hours, which may be the cause of any timeouts, but shouldn't the crawler report that, rather saying no title tags on 100 pages, when there are? SEOmoz newbie, so still finding my feet 🙂
Moz Pro | | sjr4x40