Does using robots.txt to block pages decrease search traffic?
-
I know you can use robots.txt to tell search engines not to spend their resources crawling certain pages.
So, if you have a section of your website that is good content, but is never updated, and you want the search engines to index new content faster, would it work to block the good, un-changed content with robots.txt? Would this content loose any search traffic if it were blocked by robots.txt? Does anyone have any available case studies?
-
If you block the pages from being crawled, you are also telling the search engines to not index the pages (they don't want to include something they haven't looked at). So yes, the traffic numbers from organic search will change if you block the pages in robots.txt.
-
Agreed, that is a better solution, but, I am still wondering if you block something with robots.txt, will that lead to a decrease in traffic? What if we have some duplicate content that is highly trafficked, if we block it with robots.txt, will the traffic numbers change?
-
You certainly don't want to block this content!
One thing I'd consider is the if-modified-since header, or other headers. Here are two articles that explain more about the concept of using headers to tell the search engines " this hasn't changed, don't bother crawling it". I haven't personally used this, but have read about it in many places.
http://www.feedthebot.com/ifmodified.html
http://searchengineland.com/how-to-improve-crawl-efficiency-with-cache-control-headers-88824
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
Sitelink Search Box mark-up when multiple query strings are used
Hi all, I'm looking to implement sitelink search box mark-up in Google Tag Manager in JSON-LD format. This would be popped into the Custom HTML tag and would look a little something like: The above option is great if you have one query string for your search term, but what if you had a URL that triggered two query strings - for example: https://www.example.com/search?q=searchterm&category=all Would you need to amend the code something like the below: Any help would be much appreciated! Cheers, Sean
Intermediate & Advanced SEO | | seanginnaw0 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Whole site blocked by robots in webmaster tools
My URL is: www.wheretobuybeauty.com.auThis new site has been re-crawled over last 2 weeks, and in webmaster tools index status the following is displayed:Indexed 50,000 pagesblocked by robots 69,000Search query 'site:wheretobuybeauty.com.au' returns 55,000 pagesHowever, all pages in the site do appear to be blocked and over the 2 weeks, the google search query site traffic declined from significant to zero (proving this is in fact the case ).This is a Linux php site and has the following: 55,000 URLs in sitemap.xml submitted successfully to webmaster toolsrobots.txt file existed but did not have any entries to allow or disallow URLs - today I have removed robots.txt file completely URL re-direction within Linux .htaccess file - there are many rows within this complex set of re-directions. Developer has double checked this file and found that it is valid.I have read everything that google and other sources have on this topic and this does not help. Also checked webmaster crawl errors, crawl stats, malware and there is no problem there related to this issue.Is this a duplicate content issue - this is a price comparison site where approx half the products have duplicate product descriptions - duplicated because they are obtained from the suppliers through an XML data file. The suppliers have the descriptions from the files in their own sites.Help!!
Intermediate & Advanced SEO | | rrogers0 -
Which search engines still use Meta Keywords?
I know Google doesn't use meta keywords in meta tags, but i was wondering if there are other smaller search engines that still do? Id it worth it to add meta keywords for them?
Intermediate & Advanced SEO | | jhinchcliffe0 -
Why does my home page show up in search results instead of my target page for a specific keyword?
I am using Wordpress and am targeting a specific keyword..and am using Yoast SEO if that question comes up.. and I am at 100% as far as what they recommend for on page optimization. The target html page is a "POST" and not a "Page" using Wordpress definitions. Also, I am using this Pinterest style theme here http://pinclone.net/demo/ - which makes the post a sort of "pop-up" - but I started with a different theme and the results below were always the case..so I don't know if that is a factor or not. (I promise .. this is not a clever spammy attempt to promote their theme - in fact parts of it don't even work for me yet so I would not recommend it just yet...) I DO show up on the first page for my keyword.. however.. instead of Google showing the page www.mywebsite.com/this-is-my-targeted-keyword-page.htm Google shows www.mywebsite.com in the results instead. The problem being - if the traffic goes only to my home page.. they will be less likely to stay if they dont find what they want immediately and have to search for it.. Any suggestions would be appreciated!
Intermediate & Advanced SEO | | chunkyvittles0 -
Can Location Information Decrease National Search Volume ?
Has anyone observed the effect on G organic traffic when a site which has little or no location information suddenly registers with the reputable "local" directories? I am especially curious about results observations based upon G's behavior during the past several months. It might be a hosting problem (the host is performing some non-routine mantenance) or possibly even a HUGE change in G's algo but I've observed a huge drop in my traffic after claiming a couple of the local listings earlier this week. Until then, I doubt G had associated my site with my city. A couple of other explanations are possible but the timing leaves me to doubt it's a coincidence. T.I.A.
Intermediate & Advanced SEO | | JustDucky0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1