Increase in pages crawled per day
-
What does it mean when GWT abruptly jump from 15k to 30k pages crawled per day?
I am used to see spikes, like 10k average and a couple of time per month 50k pages crawled.
But in this case 10 days ago moved from 15k to 30k per day and it's staying there. I know it's a good sign, the crawler is crawling more pages per day, so it's picking up changes more often, but I have no idea of why is doing it, what good signals usually drive google crawler to choose to increase the number of pages crawled per day?
Anyone knows?
-
Nice find Ryan.
-
Agreed. Especially since Google's own Gary Illyes respond to the following with:
How long is the delay between making it mobile friendly and it being reflected in the search results?
Illyes says “As soon as we discover it is mobile friendly, on a URL by URL basis, it will be updated.
Sounds like when you went responsive they double checked each URL to confirm. From: http://www.thesempost.com/googles-gary-illyes-qa-upcoming-mobile-ranking-signal-change/. Cheers!
-
I usually analyze backlinks with both gwt and ahrefs, and ahrefs also doesn't show any abnormally high DA backlink either.
Agree the responsive change is the most probable candidate, I have a couple of other websites I want to turn responsive before April 21st, that's an opportunity to test and see if that is the reason.
-
Ah, the responsive change could be a big part of it. You're probably getting crawls from the mobile crawler. GWT wouldn't be the best source for the recency on backlinks. I'd actually look for spikes via referrers in Analytics. GWT isn't always that responsive when reporting links. Still, it looks like the responsive redesign is a likely candidate for this, especially with Google's looming April 21st deadline.
-
Tw things I forgot to mention are:
- something like 2 weeks ago we turned the website responsive, could it be google mobile crawler is increasing the number of crawled pages, I have to analyze the logs to see if the requests are coming from google mobile crawler
- the total number of indexed pages didn't change, which make me wonder if a rise in the number of crawled pages per day is all that relevant
-
Hi Ryan,
- GWT (Search Traffic->Search Queries) shows a drop of 6% in impressions for brand based searches (google trends shows a similar pattern).
- GWT is not showing any recent backlink with an abnormally high DA.
- we actually had a couple of unusually high traffic from Facebook thanks to a couple of particularly successful post, but we are talking about a couple of spikes of just 5k visits and they both started after the rise of pages crawled per day.
If you have any other idea it's more than welcome, I wish I could understand the source of that change to be able to replicate it on other websites.
-
I am not sure I understand what you mean, that website has a total of 35k pages submitted through sitemap to GWT, of which only 8k are indexed. The total number of pages indexed have always been slowly increasing through time, it moved from 6k to 8k in the last couple of months, slowly with no spikes.
That's not the total number of pages served by the site, since dynamics search results page amount to around 150k total pages, we do not submit all of them in the sitemap on purpose, and GWT shows 70k pages as the total number of indexed pages.
I analyzed Google crawler activity through server logs in the past, it does pick a set of (apparently) random pages every night and does crawl them. I actually never analyzed what percentage of those pages are in the sitemap or not.
Internal link structure was built on purpose to try to favor ranking of pages we considered more important.
The point is we didn't change anything in the website structure recently. User generated content have been lowering duplicate pages count, slowly, through time, without any recent spike. We have a PR campaign which is increasing backlinks with an average rate of around 3 links per week, and we didn't have any high DA backlinks appearing in the last few weeks.
So I am wondering what made google crawler start crawling much more pages per day.
-
yes, I updated to parameters just before you posted
-
When you say URL variables do you mean query string variables like ?key=value
That is really good advice. You can check in your GWT. If you let google crawl and it runs in to a loop it will not index that section of your site. It would be costly for them.
-
I would also check you have not got a spike of URL parameters becoming available. I recently had a similar issue and although I had these set up in GWT the crawler was actively wasting its time on them. Once I added to robots the crawl level went back to 'normal'.
-
There could be several factors... maybe your brand based search is prompting Google to capture more of your site. Maybe you got a link from a very high authority site that prompts higher crawl volumes. Queries that prompt freshness related to your site could also spur on Google. It is a lot of guesswork, but can be whittled down some by a close look at Analytics and perhaps tomorrows OSE update (Fresh Web Explorer might provide some clue's in the meantime.) At least you're moving in the right direction. Cheers!
-
There are two variables in play and you are picking up on one.
If there are 1,000 pages on your website then Google may index all 1,000 if they are aware of all the pages. As you indicated, it is also Google's decision how many of your pages to index.
The second factor which is most likely the case in your situation is that Google only has two ways to index your pages. One is to submit a sitemap in GWT to all of your known pages. So Google would then have a choice to index all 1,000 as it would then be aware of their existence. However, it sounds like your website is relying on links. If you have 1,000 pages and a home page with one link leading to an about us page then Google is only aware of two pages on your entire website. Your website has to have a internal link structure that Google can crawl.
Imagine your website like a tree root structure. For Google to get to every page and index it then it has to have clear, defined, and easy access. Websites with a home page that links to a page A that then links to page B that then links to page C that then links to page D that then links to 500 pages can easily lose 500 pages if there is an obstruction between any of the pages that lead to page D. Because google can't crawl to page D to see all the pages on it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with old conversion pages
Hey folks! I have a ton of old conversion pages from past trade shows, old webinars, etc that are either getting no traffic or very little. Wondering if I should just 404 them out? Here's an example: http://marketing.avidxchange.com/rent-manager-user-conference-demo-request-2015 For the pages getting traffic (from PPC, referral links, organic) my presumption is to keep those. The only problem is we have multiple instances of the same asset (prior marketers would just clone them for different campaigns), so in those cases should I 301 them to one version? Looking for advice on best practices here for future instances. Such as future trade shows, after we use the conversion pages at an event, should I just delete/404 them? Cleaning up old pages should I just delete/404? They don't have any value really and they're annoying to have hanging around. Thanks!
Technical SEO | | Bill_King0 -
Big page of clients - links to individual client pages with light content - not sure if canonical or no-follow - HELP
Not sure what best practice here is: http://www.5wpr.com/clients/ Is this is a situation where I'm best off adding canonical tags back to the main clients page, or to the practice area each client falls under? No-following all these links and adding canonical? No-follow/No-index all client pages? need some advice here...
Technical SEO | | simplycary0 -
Duplicate page titles
Hi, I have a Joomla 2.5 site and I use categoryblogs. So I have a page with "reviews". All the reviews are shown on this page and there are about 15 pages of it. In my SEOMoz crawl result I get 71 errors ! about "duplicate titles". How can I diminish this? I don't know how to show all the reviews in a proper way other than what I have accomplished with categoryblog. Patrick
Technical SEO | | paddydaddy0 -
Can Google show the hReview-Aggregate microformat in the SERPs on a product page if the reviews themselves are on a separate page?
Hi, We recently changed our eCommerce site structure a bit and separated our product reviews onto a a different page. There were a couple of reasons we did this : We used pagination on the product page which meant we got duplicate content warnings. We didn't want to show all the reviews on the product page because this was bad for UX (and diluted our keywords). We thought having a single page was better than paginated content, or at least safer for indexing. We found that Googlebot quite often got stuck in loops and we didn't want to bury the reviews way down in the site structure. We wanted to reduce our bounce rate a little, so having a different reviews page could help with this. In the process of doing this we tidied up our microformats a bit too. The product page used to have to three main microformats; hProduct hReview-Aggregate hReview The product page now only has hProduct and hReview-Aggregate (which is now nested inside the hProduct). This means the reviews page has hReview-Aggregate and hReviews for each review itself. We've taken care to make sure that we're specifying that it's a product review and the URL of that product. However, we've noticed over the past few weeks that Google has stopped feeding the reviews into the SERPs for product pages, and is instead only feeding them in for the reviews pages. Is there any way to separate the reviews out and get Google to use the Microformats for both pages? Would using microdata be a better way to implement this? Thanks,
Technical SEO | | OptiBacUK
James0 -
Wrong Page Ranking
Higher-level page with more power getting pushed out by weaker page in the SERPs for an important keyword. I don't care about losing the weaker page. Should I: 404 the weaker page and wait for Google to (hopefully) replace it with the stronger page? 301 the weaker page to the stronger page? NOTE: Due to poor communication between content team and myself, the weak and strong pages have similar title tags (i.e, "lawsuits" and "litigation")
Technical SEO | | LCNetwork0 -
According to 1 of my PRO campaigns - I have 250+ pages with Duplicate Content - Could my empty 'tag' pages be to blame?
Like I said, my one of my moz reports is showing 250+ pages with duplicate content. should I just delete the tag pages? Is that worth my time? how do I alert SEOmoz that the changes have been made, so that they show up in my next report?
Technical SEO | | TylerAbernethy0 -
Duplicate Page Content and Title for product pages. Is there a way to fix it?
We we're doing pretty good with our SEO, until we added product listing pages. The errors are mostly Duplicate Page Content/Title. e.g. Title: Masterpet | New Zealand Products MasterPet Product page1 MasterPet Product page2 Because the list of products are displayed on several pages, the crawler detects that these two URLs have the same title. From 0 Errors two weeks ago, to 14k+ errors. Is this something we could fix or bother fixing? Will our SERP ranking suffer because of this? Hoping someone could shed some light on this issue. Thanks.
Technical SEO | | Peter.Huxley590 -
Is this 404 page indexed?
I have a URL that when searched for shows up in the Google index as the first result but does not have any title or description attached to it. When you click on the link it goes to a 404 page. Is it simply that Google is removing it from the index and is in some sort of transitional phase or could there be another reason.
Technical SEO | | bfinternet0