How can I tell Google, that a page has not changed?
-
Hello,
we have a website with many thousands of pages. Some of them change frequently, some never. Our problem is, that googlebot is generating way too much traffic. Half of our page views are generated by googlebot.
We would like to tell googlebot, to stop crawling pages that never change. This one for instance:
http://www.prinz.de/party/partybilder/bilder-party-pics,412598,9545978-1,VnPartypics.html
As you can see, there is almost no content on the page and the picture will never change.So I am wondering, if it makes sense to tell google that there is no need to come back.
The following header fields might be relevant. Currently our webserver answers with the following headers:
Cache-Control:
no-cache, must-revalidate, post-check=0, pre-check=0, public
Pragma:no-cache
Expires:Thu, 19 Nov 1981 08:52:00 GMT
Does Google honor these fields? Should we remove no-cache, must-revalidate, pragma: no-cache and set expires e.g. to 30 days in the future?
I also read, that a webpage that has not changed, should answer with 304 instead of 200. Does it make sense to implement that? Unfortunatly that would be quite hard for us.
Maybe Google would also spend more time then on pages that actually changed, instead of wasting it on unchanged pages.
Do you have any other suggestions, how we can reduce the traffic of google bot on unrelevant pages?
Thanks for your help
Cord
-
Unfortunately, I don't think there are many reliable options, in the sense that Google will always honor them. I don't think they gauge crawl frequency by the "expires" field - or, at least, it carries very little weight. As John and Rob mentioned, you can set the "changefreq" in the XML sitemap, but again, that's just a hint to Google. They seem to frequently ignore it.
If it's really critical, a 304 probably is a stronger signal, but I suspect even that's hit or miss. I've never seen a site implement it on a large scale (100s or 1000s of pages), so I can't speak to that.
Two broader questions/comments:
(1) If you currently list all of these pages in your XML sitemap, consider taking them out. The XML sitemap doesn't have to contain every page on your site, and in many cases, I think it shouldn't. If you list these pages, you're basically telling Google to re-crawl them (regardless of the changefreq setting).
(2) You may have overly complex crawl paths. In other words, it may not be the quantity of pages that's at issue, but how Google accesses those pages. They could be getting stuck in a loop, etc. It's going to take some research on a large site, but it'd be worth running a desktop crawler like Xenu or Screaming Frog. This could represent a site architecture problem (from an SEO standpoint).
(3) Should all of these pages even be indexed at all, especially as time passes? More and more (especially post-Panda), more indexed pages is often worse. If Googlebot is really hitting you that hard, it might be time to canonicalize some older content or 301-redirect it to newer, more relevant content. If it's not active at all, you could even NOINDEX or 404 it.
-
Thanks for the answers so far. The tips are not really solving my problems yet, though: I don't want to set down general crawling speed in the webmaster tools, because pages that frequently change should also be crawled frequently. We do have XML Sitemaps, although we did not include these picture pages, as in our example. There are ten- maybe houndreds- of thousands of these pages. If everyone agrees on this, we can include these pages in our XML Sitemaps of course. Using "meta refresh" to indicate, that the page never changed, seems a bit odd to me. But I'll look into it.
But what about the http headers, I asked about? Does anyone have any ideas on that?
-
Your best bet is to build an Excel report using a crawl tool (like Xenu, Frog, Moz, etc), and export that data. Then look to map out the pages you want to log and mark as 'not changing'.
Make sure to built (or have a functioning XML sitemap file) for the site, and as John said, state which URL's NEVER change. Over time, this will tell googlebot that it isn't neccessary yo crawl those page URL's as they never change.
You could also place a META REFRESH tag on those individual pages, and set that to never as well.
Hope some of this helps! Cheers
-
If you have Google Webmaster Tools set up, go to Site configuration > Settings, and you can set a custom crawl rate for you site. That will change it site-wide, so if you have other pages that change frequently, that might not be so great for you.
Another thing you could try is generate a sitemap, and set a change frequency of never (or yearly) for all of the pages you don't expect to change. That also might slow down Google's crawl rate of those pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will Google still ignore the second instance of anchor text on a page if it has an H2 tag on it?
We have a page set up that has anchor text with header tags. There is an instance where the same anchor text is on the page twice linking to the same page, and I know that Google will ignore the second instance. But in the second instance it also had an H2 tag (which I removed and put it on the first instance of anchor text even though it's smaller). Is this good practice?
Technical SEO | | AliMac260 -
How can I tell Google not to index a portion of a webpage?
I'm working with an ecommerce site that has many product descriptions for various brands that are important to have but are all straight duplicates. I'm looking for some type of tag tht can be implemented to prevent Google from seeing these as duplicates while still allowing the page to rank in the index. I thought I had found it with Googleoff, googleon tag but it appears that this is only used with the google appliance hardware.
Technical SEO | | bradwayland0 -
Want to change URL for a page
Hey there Mozzers. I want to change the url of a certain page on my website. Example: www.example.com/poker-face I want to change this www.example.com/poker-faces Should I create a new page and make the old one 301? Does 301 pass all the link juice in the new page or do i have to make a rel=canonical also ?
Technical SEO | | Angelos_Savvaidis0 -
Changed URL of all web pages to a new updated one - Keywords still pick the old URL
A month ago we updated our website and with that we created new URLs for each page. Under "On-Page", the keywords we put to check ranking on are still giving information on the old urls of our websites. Slowly, some new URLs are popping up. I'm wondering if there's a way I can manually make the keywords feedback information from the new urls.
Technical SEO | | Champions0 -
Google Change of Address with Questionable Backlink Profile
We have a .com domain where we are 301-ing the .co.uk site into it before shutting it down - the client no longer has an office in the UK and wants to focus on the .com. The .com is a nice domain with good trust indicators. I've just redesigned the site, added a wad of healthy structured markup, had the duplicate content mostly rewritten - still finishing off this job but I think we got most of it with Copyscape. The site has not so many backlinks, but we're working on this too and the ones it does have are natural, varied and from trustworthy sites. We also have a little feature on the redesign coming up in .Net magazine early next year, so that will help. The .co.uk on the other hand has a fair few backlinks - 1489 showing in Open Site Explorer - and I spent a good amount of time matching the .co.uk pages to similar content on the .com so that the redirects would hopefully pass some pagerank. However, approximately a year later, we are struggling to grow organic traffic to the .com site. It feels like we are driving with the handbrake on. I went and did some research into the backlink profile of the .co.uk, and it is mostly made up of article submissions, a few on 'quality' (not in my opinion) article sites such as ezine, and the majority on godawful and broken spammy article sites and old blogs bought for seo purposes. So my question is, in light of the fact that the SEO company that 'built' these shoddy links will not reply to my questions as to whether they received a penalty notification or noticed a Penguin penalty, and the fact that they have also deleted the Google Analytics profiles for the site, how should I proceed? **To my mind I have 3 options. ** 1. Ignore the bad majority in the .co.uk backlink profile, keep up the change of address and 301's, and hope that we can just drown out the shoddy links by building new quality ones - to the .com. Hopefully the crufty links will fade into insignificance over time.. I'm not too keen on this course of action. 2. Use the disavow tool for every suspect link pointing to the .co.uk site (no way I will be able to get the links removed manually) - and the advice I've seen also suggests submitting a reinclusion request afterwards- but this seems pointless considering we are just 301-ing to the new (.com) site. 3. Disassociate ourselves completely from the .co.uk site - forget about the few quality links to it and cut our losses. Remove the change of address request in GWT and possibly remove the site altogether and return 410 headers for it just to force the issue. Clean slate in the post. What say you mozzers? Please help, working myself blue in the face to fix the organic traffic issues for this client and not getting very far as yet.
Technical SEO | | LukeHardiman0 -
I have 15,000 pages. How do I have the Google bot crawl all the pages?
I have 15,000 pages. How do I have the Google bot crawl all the pages? My site is 7 years old. But there are only about 3,500 pages being crawled.
Technical SEO | | Ishimoto0 -
Google counting numbers of products on category pages - what about pagination ?
Hi there, Whilst checking out the SERPS, as you do, I noticed that where our category page appears, google now seems to be counting the number of products (what it calls items) on the product page and displaying this in the 1st part of the description (see image attached). My problem is we employ pagination, so that our category page will have 15 items on it, then there are paginated results for the rest, with either ?page=2 or page-2/ etc. appended to the URL. Although this is only a minor issue, I was just wondering if there was a way to change the number of products displayed on that page to be the entire number of products in that category, is there a microformat markup or something that can over-ride what google has detected ? Furthermore is this system of pagination effective ? I have considered using javascript pagination, such that all products would be loaded on to the one page but hidden until 'paginated', but I was worried about having hidden elements on the page, and also the impact of load times. Although I think this may solve the problem and display the true number of products in a section! Any help much appreciated, Stuart b4urme.jpg
Technical SEO | | stukerr0 -
Existing Pages in Google Index and Changing URLs
Hi!! I am launching a newly recoded site this week and had a another noobie question. The URL structure has changed slightly and I have installed a 301 redirect to take care of that. I am wondering how Google will handle my "old" pages? Will they just fall out of the index? Or does the 301 redirect tell Google to rewrite the URLs in the index? I am just concerned I may see an "old" page and a "new" page with the same content in the index. Just want to make sure I have covered all my bases. Thanks!! Lynn
Technical SEO | | hiphound0