Clarification regarding robots.txt protocol
-
Hi,
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice. -
If the pages are already indexed and you want them to be completely removed, you need to allow the crawlers in robots.txt and noindex the individual pages.
So if you just block the site with robots.txt (and I recommend blocking via folders or variables, not individual pages) while the pages are indexed, they will continue to appear in search results but have a meta description of (this page is being blocked by robots.txt). However, it will continue to rank and appear because of the cached data.
If you add the noindex tags to your pages instead, the next time crawlers visit the pages they will see the new tag and remove the page from the search index (meaning it won't show up at all). However, make sure your robots.txt isn't blocking the crawlers from seeing this updated code.
-
There are a few ways to do this.
First, I would use the Google Removal Tool to remove those URLs. More information here: https://support.google.com/webmasters/answer/1663419?hl=en
Then, using the robots.txt file is good, you need to make sure that you're listing the correct URLs or URL path there.
I would make sure that you are using a "410 Gone" in the server header, and not a 404 error. The 410 Gone will get those URLs removed faster.
-
If the target is to get the URLs out of the search engine index than there are the few solutions can work for you:
- The one your mentioned: I think it’s bad to add 1000+ URLs in robots.txt file its make sense for your business.
- Adding meta no-index tag to the pages (if pages physically exist).
Also in order to quickly remove them from the index you can update robots.txt file and then go to GWC and use remove URL feature.
Just a thought!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
Duplicate content: using the robots meta tag in conjunction with the canonical tag?
We have a WordPress instance on an Apache subdomain (let's say it's blog.website.com) alongside our main website, which is built in Angular. The tech team is using Akamai to do URL rewrites so that the blog posts appear under the main domain (website.com/more-keywords/here). However, due to the way they configured the WordPress install, they can't do a wildcard redirect under htaccess to force all the subdomain URLs to appear as subdirectories, so as you might have guessed, we're dealing with duplicate content issues. They could in theory do manual 301s for each blog post, but that's laborious and a real hassle given our IT structure (we're a financial services firm, so lots of bureaucracy and regulation). In addition, due to internal limitations (they seem mostly political in nature), a robots.txt file is out of the question. I'm thinking the next best alternative is the combined use of the robots meta tag (no index, follow) alongside the canonical tag to try to point the bot to the subdirectory URLs. I don't think this would be unethical use of either feature, but I'm trying to figure out if the two would conflict in some way? Or maybe there's a better approach with which we're unfamiliar or that we haven't considered?
Technical SEO | | prasadpathapati0 -
My Last Question Regarding URLs - I Promise...
Hello I've recently asked the community which urls would be best for a company with a variety of wood flooring products. This question relates to "keywords" within the url which relates to each and every product. Which would you choose, 1. a or b? 2. a or b? 1. - Product: CIRO a. www.thewoodgalleries.co.uk/engineered-flooring/rustic-oak-ciro - Keyword Match, YES. "Rustic Oak Flooring" b. www.thewoodgalleries.co.uk/engineered-flooring/ciro - Keyword Match, NO. "Rustic Oak Flooring" 2. - Product: VOGUE a. www.thewoodgalleries.co.uk/engineered-flooring/prefinished-oak-vogue - Keyword Match, YES. "Pr_efinished Oak Flooring"_ b. www.thewoodgalleries.co.uk/engineered-flooring/vogue - Keyword Match, NO. "Pr_efinished Oak Flooring"_ Although seemingly a basic part of SEO, I find myself revisiting this question time and time again - what is really better for SEO? Shorter URL's or "slightly" longer to achieve keyword match? _After researching many keywords which we have chosen to use as part of this project, it seems to have any chance of ranking on the first page, the key word (or part of the keyword) must appear within the url. _ I would like to get some "extra" clarification. Thanks for your help!
Technical SEO | | GaryVictory0 -
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
How can I make Google Webmaster Tools see the robots.txt file when I am doing a .htacces redirec?
We are moving a site to a new domain. I have setup an .htaccess file and it is working fine. My problem is that Google Webmaster tools now says it cannot access the robots.txt file on the old site. How can I make it still see the robots.txt file when the .htaccess is doing a full site redirect? .htaccess currently has: Options +FollowSymLinks -MultiViews
Technical SEO | | RalphinAZ
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?michaelswilderhr.com$ [NC]
RewriteRule ^ http://www.s2esolutions.com/ [R=301,L] Google webmaster tools is reporting: Over the last 24 hours, Googlebot encountered 1 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%.0 -
Mobile site: robots.txt best practices
If there are canonical tags pointing to the web version of each mobile page, what should a robots.txt file for a mobile site have?
Technical SEO | | bonnierSEO0 -
Quick robots.txt check
We're working on an SEO update for http://www.gear-zone.co.uk at the moment, and I was wondering if someone could take a quick look at the new robots file (http://gearzone.affinitynewmedia.com/robots.txt) to make sure we haven't missed anything? Thanks
Technical SEO | | neooptic0 -
Need Help With Robots.txt on Magento eCommerce Site
Hello, I am having difficulty getting my robots.txt file to be configured properly. I am getting error emails from Google products stating they can't view our products because they are being blocked, and this past week, in my SEO dashboard, the URL's receiving search traffic dropped by almost 40%. Is there anyone that can offer assistance on a good template robots.txt file I can use for a Magento eCommerce website? The one I am currently using was found at this site here: e-commercewebdesign.co.uk/blog/magento-seo/magento-robots-txt-seo.php - However, I am getting problems from Google now because of it. I searched and found this thread here: http://www.magentocommerce.com/wiki/multi-store_set_up/multiple_website_setup_with_different_document_roots#the_root_folder_robots.txt_file - But I felt like maybe I should get some additional help on properly configuring a robots for a Magento site. Thanks in advance for any help. Please, let me know if you need more info to provide assistance.
Technical SEO | | JerDoggMckoy0