Should I use noindex or robots to remove pages from the Google index?
-
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed.
From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else.
I can add the noindex tag to the review pages but they wont be crawled.
https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html
Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
-
Thanks, Logan!
-
Rhys,
Your web dev team is confused. You cannot de-index by simply disallowing them in your robots.txt file. Google will still index anything they find (that doesn't have a noindex tag) from a link, this is the reason you often see search results that say "A description for this result is not available because of this site's robots.txt" as the description.
Here's a quote from Google regarding the subject: "You should not use robots.txt as a means to hide your web pages from Google Search results." - https://support.google.com/webmasters/answer/6062608?hl=en
-
Hi all,
Sorry to jump in here but I've been told the opposite by our web dev team. We're removing indexed 404s at the moment, and our web dev team said we simply need to add robots.txt to the pages and they'll be de-indexed. If this incorrect? I thought I'd need to add a noindex tag but was argued down...
Cheers,
Rhys
-
Hi there. Good question and one that comes up a lot.
You need to do the following:
- Put the noindex on those pages
- Remove the block in robots.txt
- Monitor these pages falling out of the index
- Once they are all out, then put the block back in place
You both want them to a) drop out and b) then not be crawled, so the above will take care of that for you.
Hope that helps!
John
-
Thanks.
That is what I figured just wanted to double check.
-
Hi Tyler,
Yes, remove the robots.txt disallow for that section and add a noindex tag. Noindex is the only sure-fire way to de-index URLs, but the crawlers need to be allowed to crawl those pages to see the tag.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site moved. Unable to index page : Noindex detected in robots meta tag?!
Hope someone can shed some light on this: We moved our smaller site (into the main site ( different domains) . The smaller site that was moved ( https://www.bluegreenrentals.com)
Intermediate & Advanced SEO | | bgvsiteadmin
Directory where the site was moved (https://www.bluegreenvacations.com/rentals) Each page from the old site was 301 redirected to the appropriate page under .com/rentals. But we are seeing a significant drop in rankings and traffic., as I am unable to request a change of address in Google search console (a separate issue that I can elaborate on). Lots of (301 redirect) new destination pages are not indexed. When Inspected, I got a message : Indexing allowed? No: 'index' detected in 'robots' meta tagAll pages are set as Index/follow and there are no restrictions in robots.txtHere is an example URL :https://www.bluegreenvacations.com/rentals/resorts/colorado/innsbruck-aspen/Can someone take a look and share an opinion on this issue?Thank you!0 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed. The above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!? Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
Intermediate & Advanced SEO | | alphonseha0 -
Canonicle & rel=NOINDEX used on the same page?
I have a real estate company: www.company.com with approximately 400 agents. When an agent gets hired we allow them to pick a URL which we then register and manage. For example: www.AGENT1.com We then take this agent domain and 301 redirect it to a subdomain of our main site. For example
Intermediate & Advanced SEO | | EasyStreet
Agent1.com 301’s to agent1.company.com We have each page on the agent subdomain canonicled back to the corresponding page on www.company.com
For example: agent1.company.com canonicles to www.company.com What happened is that google indexed many URLS on the subdomains, and it seemed like Google ignored the canonical in many cases. Although these URLS were being crawled and indexed by google, I never noticed any of them rank in the results. My theory is that Google crawled the subdomain first, indexed the page, and then later Google crawled the main URL. At that point in time, the two pages actually looked quite different from one another so Google did not recognize/honor the canonical. For example:
Agent1.company.com/category1 gets crawled on day 1
Company.com/category1 gets crawled 5 days later The content (recently listed properties for sale) on these category pages changes every day. If Google crawled the pages (both the subdomain and the main domain) on the same day, the content on the subdomain and the main domain would look identical. If the urls are crawled on different days, the content will not match. We had some major issues (duplicate content and site speed) on our www.company.com site that needed immediate attention. We knew we had an issue with the agent subdomains and decided to block the crawling of the subdomains in the robot.txt file until we got the main site “fixed”. We have seen a small decrease in organic traffic from google to our main site since blocking the crawling of the subdomains. Whereas with Bing our traffic has dropped almost 80%. After a couple months, we have now got our main site mostly “fixed” and I want to figure out how to handle the subdomains in order to regain the lost organic traffic. My theory is that these subdomains have a some link juice that is basically being wasted with the implementation of the robots.txt file on the subdomains. Here is my question
If we put a ROBOTS rel=NOINDEX on all pages of the subdomains and leave the canonical (to the corresponding page of the company site) in place on each of those pages, will link juice flow to the canonical version? Basically I want the link juice from the subdomains to pass to our main site but do not want the pages to be competing for a spot in the search results with our main site. Another thought I had was to place the NOIndex tag only on the category pages (the ones that seem to change every day) and leave it off the product (property detail pages, pages that rarely ever change). Thank you in advance for any insight.0 -
Using Canonical on home page
Our home page has the canonical tag pointing to itself (something from wordpress i understand). Is there any positive or negative affect that anyone is aware of from having pages canonical'ed to themselves?
Intermediate & Advanced SEO | | halloranc0 -
Howcome Google is indexing one day 2500 pages and the other day only 150 then 2000 again ect?
This is about an big affiliate website of an customer of us, running with datafeeds... Bad things about datafeeds: Duplicate Content (product descriptions) Verrryyyy Much (thin) product pages (sometimes better to noindex, i know, but this customer doesn't want to do that)
Intermediate & Advanced SEO | | Zanox0 -
Google Indexing Feedburner Links???
I just noticed that for lots of the articles on my website, there are two results in Google's index. For instance: http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html and http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Now my Feedburner feed is set to "noindex" and it's always been that way. The canonical tag on the webpage is set to: rel='canonical' href='http://www.thewebhostinghero.com/articles/tools-for-creating-wordpress-plugins.html' /> The robots tag is set to: name="robots" content="index,follow,noodp" /> I found out that there are scrapper sites that are linking to my content using the Feedburner link. So should the robots tag be set to "noindex" when the requested URL is different from the canonical URL? If so, is there an easy way to do this in Wordpress?
Intermediate & Advanced SEO | | sbrault740 -
How to remove an entire site from Google?
Hi people, I have a site with around 2.000 urls indexed in google, and 10 subdomains indexed too, which I want to remove entirely, to set up a new web. Which is the best way to do it? Regards!
Intermediate & Advanced SEO | | SeoExpertos0 -
Volusion store product pages will not index
Hello, I have moved over to Volusion and was wondering if you guys know of any SEO practices that are Volusion specific. i have been working on this site now for 2 months and my impressions and rankings have dropped substantially My 301 redirects where in place before I flipped over and my keywords / titles/ tags etc.. are in place. However i am still not making any progress in the engines. I have noticed that my products are not being indexed per Webmaster tools. I have heard that volusion has something set up to where you must purchase their SEO package in order to rank. I am really at my wits end and currently I thinking about taking a loss and reverting back to my old Shoppe Pro site. Any help would be very appreciated
Intermediate & Advanced SEO | | kerry0217
.0