Howcome Google is indexing one day 2500 pages and the other day only 150 then 2000 again ect?
-
This is about an big affiliate website of an customer of us, running with datafeeds...
Bad things about datafeeds:
- Duplicate Content (product descriptions)
- Verrryyyy Much (thin) product pages (sometimes better to noindex, i know, but this customer doesn't want to do that)
-
Hi Dana,
Thanks for your detailed explanation. Appreciate it Off course I understand that site speed is a factor for crawling (+ ranking) and that the Google bots only want to spend a certain period of time on a website. It's more like, when servers are performing almost equal every day so page loads are igual to, what could it be?
I agree with your two points of considering, but I'm the type of guy that always wants to know why something is happening
@Nakul: Thanks for your responds!
The pages that are in and out of the index are mostly product pages. So the thing about "frequently updates" can be something. The website is pretty young so authority is not yet build as it should be for a big site. This can also be a factor cause the more authority the more time Google will spend indexing a website rightAnyway, great thanks for both of your answers!
Gr. Wesley
-
I agree with everything Nakul has said. Just to piggyback on that with additional information, try to think about it this way. Remember when someone gave you $1.00 when you were little and said "Don't spend it all in one place?" Well, someone at Google must have grown up with the same grandparents I did.
Okay, now, the analogy-free explanation
Google has a "crawl budget" every day. Every day that budget is allocated to millions of different sites. Now, by "sites" I mean "pages." Some pages change really frequently (i.e. the Yahoo New homepage). Some pages change hardly ever (i.e. an archived blog post). Also, some pages have very high PR and others, not so much. Also, some pages load extremely fast (consuming less of Google's bandwidth when the page is crawled) which leaves more Google resources available to Google to crawl more pages. Google likes it, and so should we all because people with fast sites are making it possible for everyone to get crawled more often (in essence, making them very considerate, well-behaved members of the Internet community).
So, based on all these, Google is going to apportion a part of its crawl budget to your site on any given day. Some days, it may have more room in its budget for you than others. Part of this might be effected by how fast pages, on any given day, load from your site. A ton of parameters can come into play here, including whether or not the pages on that day are heavier, or whether or not your servers are performing really fast on one day versus another.
I'd say the two things to be really concerned with after considering all of these things are:
- Is Google indexing all of the pages you want indexed?
- Is Google's cache date of your important pages recent enough? (i.e. 3 weeks or less)
If the answer is "no" to either one of those, then it's time to do some investigation to find out if there are technical issues or penalties that have been put in place that are hurting Google's ability or desire (not the right word to use about a bot, but I'm using it anyway) to crawl your pages.
Does that help?
-
Domain Authority / Pagerank is what Google looks to see how deep and how frequently Google will crawl a particular website. They also typically look into how frequently the content is being updated.
Think about it from Google's perspective. Why should they index that website, 2500 pages every day. What's changing ? Does the site have enough domain authority to warrant that kind of indexing ?
In my opinion, this is not a concern. Just submit XML Sitemaps and see what percentage of your submitted pages are indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google index
Hello, I removed my site from google index From GWT Temporarily remove URLs that you own from search results, Status Removed. site not ranking well in google from last 2 month, Now i have question that what will happen if i reinclude site url after 1 or 2 weeks. Is there any chance to rank well when google re index the site?
Intermediate & Advanced SEO | | Getmp3songspk0 -
How does google treat dynamically generated content on a page?
I'm trying to find information on how google treats dynamically generated content within a webpage? (not dynamic urls) For example I have a list of our top 10 products with short product descriptions and links on our homepage to flow some of the pagerank to those individual product pages. My developer wants to make these top products dynamic to where they switch around daily. Won't this negatively affect my seo and ability to rank for those keywords if they keep switching around or would this help since the content would be updated so frequently?
Intermediate & Advanced SEO | | ntsupply0 -
Can I tell Google to Ignore Parts of a Page?
Hi all, I was wondering if there was some sort of html trick that I could use to selectively tell a search engine to ignore texts on certain parts of a page. Thanks!
Intermediate & Advanced SEO | | Charles_Murdock
Charles0 -
Why is this site not indexed by Google?
Hi all and thanks for your help in advance. I've been asked to take a look at a site, http://www.yourdairygold.ie as it currently does not appear for its brand name, Your Dairygold on Google Ireland even though it's been live for a few months now. I've checked all the usual issues such as robots.txt (doesn't have one) and the robots meta tag (doesn't have them). The even stranger thing is that the site does rank on Yahoo! and Bing. Google Webmaster Tools shows that Googlebot is crawling around 150 pages a day but the total number of pages indexed is zero. It does appear if you carry out a site: search on Google however. The site is very poorly optimised in terms of title tags, unnecessary redirects etc which I'm working on now but I wondered if you guys had any further insights. Thanks again for your help.
Intermediate & Advanced SEO | | iProspect-Ireland0 -
Google authorship page versus posrt
I have a wordpress.com blog and have recently set up google authorship (at least I think I have). If I add a new post what happens to the old post in terms of authorship? Is the solution opening a new page for each article? If so does the contribution link in google+ pick up all pages if you only have home link? many thanks
Intermediate & Advanced SEO | | harddaysgrind1 -
Language Subdirectory homepage not indexed by Google
Hi mozzers, Our Spanish homepage doesn't seem to be indexed or cached in Google, despite being online for over a month or two. All Spanish subpages are indexed and have started to rank but not the homepage. I have submitted sitemap xml to GWTools and have checked there's no noindex on the page - it seems to be in order. And when I run site: command in Google it shows all pages except homepage. What could be the problem? Here's the page: http://www.bosphorusyacht.com/es/
Intermediate & Advanced SEO | | emerald0 -
What Sources to use to compile an as comprehensive list of pages indexed in Google?
As part of a Panda recovery initiative we are trying to get an as comprehensive list of currently URLs indexed by Google as possible. Using the site:domain.com operator Google displays that approximately 21k pages are indexed. Scraping the results however ends after the listing of 240 links. Are there any other sources we could be using to make the list more comprehensive? To be clear, we are not looking for external crawlers like the SEOmoz crawl tool but sources that would be confidently allow us to determine a list of URLs currently hold in the Google index. Thank you /Thomas
Intermediate & Advanced SEO | | sp800 -
Adding Orphaned Pages to the Google Index
Hey folks, How do you think Google will treat adding 300K orphaned pages to a 4.5 million page site. The URLs would resolve but there would be no on site navigation to those pages, Google would only know about them through sitemap.xmls. These pages are super low competition. The plot thickens, what we are really after is to get 150k real pages back on the site, these pages do have crawlable paths on the site but in order to do that (for technical reasons) we need to push these other 300k orphaned pages live (it's an all or nothing deal) a) Do you think Google will have a problem with this or just decide to not index some or most these pages since they are orphaned. b) If these pages will just fall out of the index or not get included, and have no chance of ever accumulating PR anyway since they are not linked to, would it make sense to just noindex them? c) Should we not submit sitemap.xml files at all, and take our 150k and just ignore these 300k and hope Google ignores them as well since they are orhpaned? d) If Google is OK with this maybe we should submit the sitemap.xmls and keep an eye on the pages, maybe they will rank and bring us a bit of traffic, but we don't want to do that if it could be an issue with Google. Thanks for your opinions and if you have any hard evidence either way especially thanks for that info. 😉
Intermediate & Advanced SEO | | irvingw0