Clarification regarding robots.txt protocol
-
Hi,
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice. -
If the pages are already indexed and you want them to be completely removed, you need to allow the crawlers in robots.txt and noindex the individual pages.
So if you just block the site with robots.txt (and I recommend blocking via folders or variables, not individual pages) while the pages are indexed, they will continue to appear in search results but have a meta description of (this page is being blocked by robots.txt). However, it will continue to rank and appear because of the cached data.
If you add the noindex tags to your pages instead, the next time crawlers visit the pages they will see the new tag and remove the page from the search index (meaning it won't show up at all). However, make sure your robots.txt isn't blocking the crawlers from seeing this updated code.
-
There are a few ways to do this.
First, I would use the Google Removal Tool to remove those URLs. More information here: https://support.google.com/webmasters/answer/1663419?hl=en
Then, using the robots.txt file is good, you need to make sure that you're listing the correct URLs or URL path there.
I would make sure that you are using a "410 Gone" in the server header, and not a 404 error. The 410 Gone will get those URLs removed faster.
-
If the target is to get the URLs out of the search engine index than there are the few solutions can work for you:
- The one your mentioned: I think it’s bad to add 1000+ URLs in robots.txt file its make sense for your business.
- Adding meta no-index tag to the pages (if pages physically exist).
Also in order to quickly remove them from the index you can update robots.txt file and then go to GWC and use remove URL feature.
Just a thought!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Utilizing one robots.txt for two sites
I have two sites that are facilitated hosting in similar CMS. Maybe than having two separate robots.txt records (one for every space), my web office has made one which records the sitemaps for the two sites, similar to this:
Technical SEO | | eulabrant0 -
No: 'noindex' detected in 'robots' meta tag
I'm getting an error in Search Console that pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. Unfortunately I can't post images on here but I've linked some url's below. The page below in search console shows the error above... https://mixeddigitaleduconsulting.com/ As does this one. https://mixeddigitaleduconsulting.com/independent-school-marketing-communications/ However, this page does not have the error and is indexed by Google. The meta robots tag looks identical. https://mixeddigitaleduconsulting.com/blog/leadership-team/jill-goodman/ Any and all help is appreciated.
Technical SEO | | Sean_White_Consult0 -
Regarding Schema Tag
Hi, I have found out more errors related to schema tag when using this tag on this page. Please tell me which types of schema need to implement on this URL. https://www.giftalove.com/delhi
Technical SEO | | Packersmove0 -
Keyword Density Clarification, Please
Does keyword density only account for the content-based text on the page or everything that can be crawled on the page? To illustrate, I'll use this forum page and the keyword Moz. Here's my incredibly short blog post: "Moz forum is very helpful, but I still can't figure out Moz analytics." Now, in terms of keyword density, is "Moz" only being counted twice for the times I mentioned it in my post (what I'm calling content-based text) or is "Moz" being counted 40-50 times for all the places it appears on this page. Thanks, Ruben
Technical SEO | | KempRugeLawGroup1 -
Exclude root url in robots.txt ?
Hi, I have the following setup: www.example.com/nl
Technical SEO | | mikehenze
www.example.com/de
www.example.com/uk
etc
www.example.com is 301'ed to www.example.com/nl But now www.example.com is ranking instead of www.example.com/nl
Should is block www.example.com in robots.txt so only the subfolders are being ranked?
Or will i lose my ranking by doing this.0 -
301 Redirect Clarification: Images, Paramter URLs, etc.
I know that going through a site redesign it's essential to make sure that 301s are implemented for any changed URLs, but I wasn't sure if this was the same for the images on the page and the parameter URLs that are created by marketing campaigns - do those URLs also need to be 301 redirected? For example, this URL: www.mysite.com/32-inch-round-aluminum-table/ Could have a parameter at: www.mysite.com/32-inch-round-aluminum-table/?utm_source=twitterfeed&utm_medium=twitter&utm_campaign=Social%3A+My_Site And an image at: www.mysite.com/images/32-inch-round-aluminum-table.jpg Would the first two URLs mentioned need to be redirected to the new URL, and the image redirected to the new image URL? Thanks for the help.
Technical SEO | | eTundra0 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0 -
Restricted by robots.txt and soft bounce issues (related).
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this. Any help? Thanks, Libby
Technical SEO | | GristMarketing0