When You Add a Robots.txt file to a website to block certain URLs, do they disappear from Google's index?
-
I have seen several websites recently that have have far too many webpages indexed by Google, because for each blog post they publish, Google might index the following:
- www.mywebsite.com/blog/title-of-post
- www.mywebsite.com/blog/tag/tag1
- www.mywebsite.com/blog/tag/tag2
- www.mywebsite.com/blog/category/categoryA
- etc
My question is: if you add a robots.txt file that tells Google NOT to index pages in the "tag" and "category" folder, does that mean that the previously indexed pages will eventually disappear from Google's index? Or does it just mean that newly created pages won't get added to the index? Or does it mean nothing at all? thanks for any insight!
-
Hi William
If the pages in question are
- already indexed by Google then if you block them via the robots.txt , they will show up in search result but the meta description will say something along the lines of
A description for this result is not available because of this site's robots.txt – learn more.
2) not indexed by Google for example on a new site , they don't follow it and the pages does not come up in search directly BUT if some external sites link to the pages then they can still come up in the SERP some time down the track.
Your best bet to keep the page out of the public SERP index is the meta robots tag : http://www.robotstxt.org/meta.html
-
William, If the pages in question are linked to from external resources the robots.txt file will not prevent the pages from appearing in the index. Per Moz's Robots.txt and Meta Robots best practices, "the robots.txt tells the engines not to crawl the given URL, but that they may keep the page in the index and display it in in results.
To prevent all robots from indexing a page on your site, place the following meta tag into the section of your page:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best practice to have gated white paper indexed by Google
Our main website white paper page has an image and brief description of the white paper. Once you click the white paper you are redirected to a form to access the gated white paper. Once you complete that form you are redirected to the white paper pdf which is housed on a subdomain/Hubspot. Because of this, I do not believe our website is getting "credit" for the keywords/content on these pages. Any suggestions on how we can allow the search engines to crawl this content while still keeping it gated? As I understand it a sub domain cannot hep or hurt (aside from critical crawler issues) the main domain. Thank you
On-Page Optimization | | NikCall0 -
Disadvantages of Migrating Website to New URL
Hi There, I am currently struggling with the ranking of my website. No matter how many initiatives I try (backlinking, blog commenting, social posting, etc.) I can't seem to make any progression in Google Search. I've done competitive metrics through Open Site Explorer and can't seem to really find the reason why my site is not ranking as well as my competitors. The only one possible glaring element I've thought about is my website URL. This company is in the heating and cooling industry and majority of my competitors have either "heating" or "cooling" or both in their website URL's but mine does not. Does anyone have any thoughts or recommendations on if changing my URL and then redirecting my current URL would be a step in the right direction help me to climb the rankings in Google Search? Thanks!
On-Page Optimization | | MainstreamMktg0 -
Thousands of 404's showing up from Wordpress Blog!?!?
Hey guys, Have recently seen thousands of 404 errors thrown up from my wordpress blog in Google Search Console. These are URL's trying to link (i'm not sure where from) to other parts of my site, but they are not relative to the site root... infact they are a mix of random folders/subfolders and pages on my site. E.g: http://www.MYSITE.co.uk/blog/how-to/driving-to-the-alps/attachment/2013-land-rover-range-rover-evoque-front-snow-1/st-martin-de-belleville/chalet-st-martin-de-belleville/ski-holidays/ski-holidays/summer/st-martin-de-belleville/summer/your-stay-st-martin-de-belleville.html This is a link to a picture on the blog: http://www.MYSITE.co.uk/blog/how-to/driving-to-the-alps/attachment/2013-land-rover-range-rover-evoque-front-snow-1/ And the rest of it is finding it's own way there! Any ideas? This is Wordpress by the way. Cheers, Paul. p.s. I got no help from the Wordpress community so am posting here! p.p.s I forgot to mention that MOZ is reporting these issues too, but running Screaming Frog does NOT show any 404's at all on my site...
On-Page Optimization | | SnowTrippin0 -
Two URL's for the same page
Hi, on our site we have two separate URL's for a page that has the same content. So, for example - 'www.domain.co.uk/stuff' and 'www.domain.co.uk/things/stuff' both have the same content on the page. We currently rank high in search for 'www.domain.co.uk/things/stuff' for our targeted keyword, but there are numerous links on the site to www.domain.co.uk/stuff and also potentially inbound links to this page. Ideally we want just the www.domain.co.uk/things/stuff URL to be present on the site, what would be the best course of action to take? Would a simple Canonical tag from the '/stuff' URL which points to the '/things/stuff' page be wise? If we were to scrap the '/stuff' URL totally and redirect it to the 'things/stuff' URL and change all our on site links, would this be beneficial and not harm our current ranking for '/things/stuff'? We only want 1 URL for this page for numerous reasons (i.e, easier to track in Analytics), but I'm a bit cautious that changing the page that doesn't rank may have an affect on the page that does rank! Thanks.
On-Page Optimization | | Jaybeamer2 -
PDF's - Dupe Content
Hi I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ? Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ? Cheers Dan
On-Page Optimization | | Dan-Lawrence0 -
Long URL's
So I'm super new at SEO and learning a lot. I'm a small business owner and enjoy doing it myself. Are long URL's good or bad? Like this: http://www.farnorthkennel.com/german-shepherd-puppies-the-girls/long-haired-german-shepherd-puppies-lava Is that too long? The german-shepherd-puppies-the-girls is an actual page with actual content. Do those hurt me?
On-Page Optimization | | Joshlaska0 -
Dates in URL's
I have an issue of duplicate content errors and duplicate page titles which is penalising my site. This has arisen because a number of URLs are suffixed by date(s) and have been spidered . In principle I do not want any url with a suffixed date to be spidered. Eg:- www.carbisbayholidays.co.uk/carbis-bay/houses-in-carbis-bay/seaspray.htm/06_07_13/13_07_13 http://www.carbisbayholidays.co.uk/carbis-bay/houses-in-carbis-bay/seaspray.htm/20_07_13/27_07_13 Only this URL should be spidered:- http://www.carbisbayholidays.co.uk/carbis-bay/houses-in-carbis-bay/seaspray.htm I have over 10,000 of these duplicates and firstly wish to remove them on block from Google ( not one by one ) and secondly wish to amend my robots.txt file so the URL's are not spidered. I do not know the format for either. Can anyone help please.
On-Page Optimization | | carbisbayhols0 -
I have home tab in 2 menu's which calls the same hompage article. How do I get over this
I am getting duplicate content for this article. I need 'home' tab on two menus.
On-Page Optimization | | rajendraksh0