How can I tell Google, that a page has not changed?
-
Hello,
we have a website with many thousands of pages. Some of them change frequently, some never. Our problem is, that googlebot is generating way too much traffic. Half of our page views are generated by googlebot.
We would like to tell googlebot, to stop crawling pages that never change. This one for instance:
http://www.prinz.de/party/partybilder/bilder-party-pics,412598,9545978-1,VnPartypics.html
As you can see, there is almost no content on the page and the picture will never change.So I am wondering, if it makes sense to tell google that there is no need to come back.
The following header fields might be relevant. Currently our webserver answers with the following headers:
Cache-Control:
no-cache, must-revalidate, post-check=0, pre-check=0, public
Pragma:no-cache
Expires:Thu, 19 Nov 1981 08:52:00 GMT
Does Google honor these fields? Should we remove no-cache, must-revalidate, pragma: no-cache and set expires e.g. to 30 days in the future?
I also read, that a webpage that has not changed, should answer with 304 instead of 200. Does it make sense to implement that? Unfortunatly that would be quite hard for us.
Maybe Google would also spend more time then on pages that actually changed, instead of wasting it on unchanged pages.
Do you have any other suggestions, how we can reduce the traffic of google bot on unrelevant pages?
Thanks for your help
Cord
-
Unfortunately, I don't think there are many reliable options, in the sense that Google will always honor them. I don't think they gauge crawl frequency by the "expires" field - or, at least, it carries very little weight. As John and Rob mentioned, you can set the "changefreq" in the XML sitemap, but again, that's just a hint to Google. They seem to frequently ignore it.
If it's really critical, a 304 probably is a stronger signal, but I suspect even that's hit or miss. I've never seen a site implement it on a large scale (100s or 1000s of pages), so I can't speak to that.
Two broader questions/comments:
(1) If you currently list all of these pages in your XML sitemap, consider taking them out. The XML sitemap doesn't have to contain every page on your site, and in many cases, I think it shouldn't. If you list these pages, you're basically telling Google to re-crawl them (regardless of the changefreq setting).
(2) You may have overly complex crawl paths. In other words, it may not be the quantity of pages that's at issue, but how Google accesses those pages. They could be getting stuck in a loop, etc. It's going to take some research on a large site, but it'd be worth running a desktop crawler like Xenu or Screaming Frog. This could represent a site architecture problem (from an SEO standpoint).
(3) Should all of these pages even be indexed at all, especially as time passes? More and more (especially post-Panda), more indexed pages is often worse. If Googlebot is really hitting you that hard, it might be time to canonicalize some older content or 301-redirect it to newer, more relevant content. If it's not active at all, you could even NOINDEX or 404 it.
-
Thanks for the answers so far. The tips are not really solving my problems yet, though: I don't want to set down general crawling speed in the webmaster tools, because pages that frequently change should also be crawled frequently. We do have XML Sitemaps, although we did not include these picture pages, as in our example. There are ten- maybe houndreds- of thousands of these pages. If everyone agrees on this, we can include these pages in our XML Sitemaps of course. Using "meta refresh" to indicate, that the page never changed, seems a bit odd to me. But I'll look into it.
But what about the http headers, I asked about? Does anyone have any ideas on that?
-
Your best bet is to build an Excel report using a crawl tool (like Xenu, Frog, Moz, etc), and export that data. Then look to map out the pages you want to log and mark as 'not changing'.
Make sure to built (or have a functioning XML sitemap file) for the site, and as John said, state which URL's NEVER change. Over time, this will tell googlebot that it isn't neccessary yo crawl those page URL's as they never change.
You could also place a META REFRESH tag on those individual pages, and set that to never as well.
Hope some of this helps! Cheers
-
If you have Google Webmaster Tools set up, go to Site configuration > Settings, and you can set a custom crawl rate for you site. That will change it site-wide, so if you have other pages that change frequently, that might not be so great for you.
Another thing you could try is generate a sitemap, and set a change frequency of never (or yearly) for all of the pages you don't expect to change. That also might slow down Google's crawl rate of those pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Does anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
Technical SEO | | Chophel
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
Can anyone tell me - in layman's terms - any SEO implications of a Netscaler redirect?
We are in the midst of exploring the best options for developing a "microsite" experience for a client and how we manage the site - subdomain vs. subdirectory... Netscaler redirect vs DNS change. We understand that a subdirectory is best for SEO purposes; however, we anticipate technical limitations when integrating the different hosting platforms and capabilities into the existing site. The proposed solutions that were provided are a netscaler redirect and/or dns changes. Any experience with these solutions?
Technical SEO | | jgrammer0 -
Site address change: new site isn't showing up in Google, old site is gone.
We just transitioned mccacompanies.com to confluentstrategies.com. The problem is that when I search for the old name, the old website doesn't come up anymore to redirect people to the new site. On the local card, Google has even taken off the website altogether. (I'm currently still trying to gain access to manage the business listing) When I search for confluent strategies, the website doesn't come up at all. But if I use the site: operator, it is in the index. Basically, my client has effectively disappeared off the face of the Google. (In doing other name changes, this has never happened to me before) What can I do?
Technical SEO | | MichaelGregory0 -
Google not index main keyword on homepage in 2 countries same language, rest of pages no problem
Hello, Two the same websites, two countries, same language http://www.lavistarelatiegeschenken.nl / http://www.lavistarelatiegeschenken.be The main keyword "relatiegeschenken" in top 10 of netherlands (steady position for 2 years) and in ** belgium** not in top 15****0 the main keyword "relatiegeschenken| but other keywords good positions, thats so strange I didn't understand and now every thing turned around suddenly: Now the main keyword "relatiegeschenken suddenly " not anymore in top 10 in the netherslandsits gone and other kewyords still good positions , now **main keyword suddenly in top 10 of belgium 2 years was not **other pages still ok. It are exactly the same websites and the same language. So double content But my programmer told me in google webmaster tools settings are right, so no problem with double content ? I really dont understand first main keyword in netherland in top 10 and in belgium not, now changed, now in belgium top 10 and not findable in the netherland on the main keyword. Maybe problem in code ? Maybe problems in code because websites are identical and active in two different countries wit same language ? No message about a penalty message in WMT, no spam links week i delete two strong but according to Linkdetox a bad links. I can not find a solution but its really important keyword that my customer want back in top 10 in netherland, like it was. All other positions and visitors are the same. Befor i have had this with belgium site, also main keyword google not index homepage. But suddenly no google show in belgium in top 10 Its turned around Kind regards, Marcel
Technical SEO | | Bossie720 -
Google still listing pages from old domain after 2 change requests
Good Morning I put forward the following question in December 2014 https://moz.com/community/q/google-still-listing-old-domain as pages from our old domain www.fhr-net.co.uk were still indexed in Google. We have submitted two change request in WMT, the most recent was over 6 months ago yet the old pages are still being indexed and we can't see why that would be Any advice would be appreciated
Technical SEO | | Ham19790 -
Can Title Tag be seen in the page source, but not seen by search engines?
This is a follow up question derived from a previous question I posted - http://moz.com/community/q/does-title-tag-location-in-a-page-s-source-code-matter There have been several reputable crawl tools used on our (including Moz) site that state we are missing title tags on may pages. One such page is http://www.paintball-online.com/Paintball-Guns-And-Markers-0Y.aspx I can see the title tag on line 238 of the page source. I find it unlikely that there is an issue with the crawl tools. Any thoughts would be greatly appreciated. Nick
Technical SEO | | Istoresinc0 -
Can you 301 redirect a page to an already existing/old page ?
If you delete a page (say a sub department/category page on an ecommerce store) should you 301 redirect its url to the nearest equivalent page still on the site or just delete and forget about it ? Generally should you try and 301 redirect any old pages your deleting if you can find suitable page with similar content to redirect to. Wont G consider it weird if you say a page has moved permenantly to such and such an address if that page/address existed before ? I presume its fine since say in the scenario of consolidating departments on your store you want to redirect the department page your going to delete to the existing pages/department you are consolidating old departments products into ?
Technical SEO | | Dan-Lawrence0 -
What is the best way to deal with pages whose content changes?
My site features businesses that offers activities for kids. Each business has its own page on my site. Business pages contains a listing of different activities that organization is putting on (such as events, summer camps, drop-in activities). Some businesses only offer seasonal activities (for example, during Christmas break and summer camps). The rest of the year, the business has no activities -- the page is empty. This is creating 2 problems. It's poor user experience (which I can fix no problem) but it also is thin content and sometimes treated as duplicate content. What's the best way to deal with pages whose content can be quite extensive at certain points of the year and shallow or empty at other parts? Should I include a meta ROBOTS tag to not index when there is no content, and change the tag to index when there is content? Should I just ignore this problem? Should I remove the page completely and do a redirect? Would love to know people's thoughts.
Technical SEO | | ChatterBlock0