How can I get unimportant pages out of Google?
-
Hi Guys,
I have a (newbie) question, untill recently I didn't had my robot.txt written properly so Google indexed around 1900 pages of my site, but only 380 pages are real pages, the rest are all /tag/ or /comment/ pages from my blog. I now have setup the sitemap and the robot.txt properly but how can I get the other pages out of Google? Is there a trick or will it just take a little time for Google to take out the pages?
Thanks!
Ramon
-
If you want to remove an entire directory, you can exclude that directory in robots.txt, then go to Google Webmaster Tools and request a URL removal. You'll have an option to remove an entire directory there.
-
No, sorry. What I said is, if you mark the folder as disalow in robots.txt, it will not remove the pages are already indexed.
But the meta tag, when the spiders go again on the page and see that the pages are with the noindex tag will remove it.
Since you can not already include the directory on the robots.txt. Before removing the SE pages.
First you put the noindex tag on all pages you want to remove. After they are removed, it takes a week for a month. After you add the folders in robots.txt to your site who do not want to index.
After that, you dont need to worry about the tags.
I say this because when you add in the robots.txt first, the SE does not read the page anymore, so they would not read the meta noindex tag. Therefore you must first remove the pages with noindex tag and then add in robot.txt
Hope this has helped.
João Vargas
-
No, sorry. What I said is, if you mark the folder as disalow in robots.txt, it will not remove the pages are already indexed.
But the meta tag, when the spiders go again on the page and see that the pages are with the noindex tag will remove it.
Since you can not already include the directory on the robots.txt. Before removing the SE pages.
First you put the noindex tag on all pages you want to remove. After they are removed, it takes a week for a month. After you add the folders in robots.txt to your site who do not want to index.
After that, you dont need to worry about the tags.
I say this because when you add in the robots.txt first, the SE does not read the page anymore, so they would not read the meta noindex tag. Therefore you must first remove the pages with noindex tag and then add in robot.txt
Hope this has helped.
João Vargas
-
Thanks Vargas, If I choose for noindex, I should remove it from the robot.txt right?
I understood that if you have a noindex tag on the page and as well a dissallow in the robot.txt the SE will index it, is that true?
-
For you remove the pages you want, need to put a tag:
<meta< span="">name="robots" content="noindex">If you want internal links and external relevance to pass on these pages, you put:
<meta< span="">name="robots" content="noindex, follow">If you do the lock on robot.txt: only need to include the tag in the current urls, new search engines will index no.
In my opinion, I do not like using the google url remover. Because if someday you want to index these folders, will not, at least it has happened to me.
The noindex tag works very well to remove objectionable content, within 1 month or so now will be removed.</meta<></meta<>
-
Yes. It's only a secondary level aid, and not guaranteed, yet it could help speed up the process of devaluing those pages in Google's internal system. If the system sees those, and cross-references to the robots.txt file it could help.
-
Thanks guys for your answers....
Alan, do you mean that I place the tag below at all the pages that I want out of Google? -
I agree with Alan's reply. Try canonical 1st. If you don't see any change, remove the URLs in GWT.
-
There's no bulk page request form so you'd need to submit every URL one at a time, and even then it's not a guaranteed way. You could consider gettting a canonical tag on those specific pages that provides a different URL from your blog, such as an appropriate category page, or the blog home page. That could help speed things up, but canonical tags themselves are only "hints" to Google.
Ultimately it's a time and patience thing.
-
It will take time, but you can help it along by using the url removal tool in Google Webmaster Tools. https://www.google.com/webmasters/tools/removals
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Issues with getting a web page indexed
Hello friends, I am finding it difficult to get the following page indexed on search: http://www.niyati.sg/mobile-app-cost.htm It was uploaded over two weeks back. For indexing and trouble shooting, we have already done the following activities: The page is hyperlinked from the site's inner pages and few external websites and Google+ Submitted to Google (through the Submit URL option) Used the 'Fetch and Render' and 'Submit to index' options on Search Console (WMT) Added the URL on both HTML and XML Sitemaps Checked for any crawl errors or Google penalty (page and site level) on Search Console Checked Meta tags, Robots.txt and .htaccess files for any blocking Any idea what may have gone wrong? Thanks in advance!
Technical SEO | | RameshNair
Ramesh Nair0 -
Google dropping pages from SERPs even though indexed and cached. (Shift over to https suspected.)
Anybody know why pages that have previously been indexed - and that are still present in Google's cache - are now not appearing in Google SERPs? All the usual suspects - noindex, robots, duplication filter, 301s - have been ruled out. We shifted our site over from http to https last week and it appears to have started then, although we have also been playing around with our navigation structure a bit too. Here are a few examples... Example 1: Live URL: https://www.normanrecords.com/records/149002-memory-drawings-there-is-no-perfect-place Cached copy: http://webcache.googleusercontent.com/search?q=cache:https://www.normanrecords.com/records/149002-memory-drawings-there-is-no-perfect-place SERP (1): https://www.google.co.uk/search?q=memory+drawings+there+is+no+perfect+place SERP (2): https://www.google.co.uk/search?q=memory+drawings+there+is+no+perfect+place+site%3Awww.normanrecords.com Example 2: SERP: https://www.google.co.uk/search?q=deaf+center+recount+site%3Awww.normanrecords.com Live URL: https://www.normanrecords.com/records/149001-deaf-center-recount- Cached copy: http://webcache.googleusercontent.com/search?q=cache:https://www.normanrecords.com/records/149001-deaf-center-recount- These are pages that have been linked to from our homepage (Moz PA of 68) prominently for days, are present and correct in our sitemap (https://www.normanrecords.com/catalogue_sitemap.xml), have unique content, have decent on-page optimisation, etc. etc. We moved over to https on 11 Aug. There were some initial wobbles (e.g. 301s from normanrecords.com to www.normanrecords.com got caught up in a nasty loop due to the conflicting 301 from http to https) but these were quickly sorted (i.e. spotted and resolved within minutes). There have been some other changes made to the structure of the site (e.g. a reduction in the navigation options) but nothing I know of that would cause pages to drop like this. For the first example (Memory Drawings) we were ranking on the first page right up until this morning and have been receiving Google traffic for it ever since it was added to the site on 4 Aug. Any help very much appreciated! At the very end of my tether / understanding here... Cheers, Nathon
Technical SEO | | nathonraine0 -
How can I get Google to forget an https version of one page on my site?
Google mysteriously decided to index the broken, https version of one page on my company's site (we have a cert for the site, but this page is not designed to be served over https and the CSS doesn't load). The page already has many incoming links to the http version, and it has a canonical URL with http. I resubmitted it on http with webmaster tools. Is there anything else I could do?
Technical SEO | | BostonWright0 -
After I 301 redirect duplicate pages to my rel=canonical page, do I need to add any tags or code to the non canonical pages?
I have many duplicate pages. Some pages have 2-3 duplicates. Most of which have Uppercase and Lowercase paths (generated by Microsoft IIS). Does this implementation of 301 and rel=canonical suffice? Or is there more I could do to optimize the passing of duplicate page link juice to the canonical. THANK YOU!
Technical SEO | | PFTools0 -
Title tag not changing in Google. Can somebody take a look for me?
I'm using Yoast SEO plugin for the website. The website is http://www.emerypharmaservices.com. It appears on the webpage, the title tag is correct (home page should be Contract Laboratory Research Services for Analytical Chemistry and Microbiology), however, in Google it only says Emeryville Pharmaceutical Services. Could this be due to my settings? Please advise. Thank you
Technical SEO | | leopold49520 -
Can dynamically translated pages hurt a site?
Hi all...looking for some insight pls...i have a site we have worked very hard on to get ranked well and it is doing well in search. The site has about 1000 pages and climbing and has about 50 of those pages in translated pages and are static pages with unique urls. I have had no problems here with duplicate content and that sort of thing and all pages were manually translated so no translation issues. We have been looking at software that can dynamically translate the complete site into a handfull of languages...lets say about 5. My problem here is these pages get produced dynamically and i have concerns that google will take issue with this aswell as the huge sudden influx of new urls....as now we could be looking at and increase of 5000 new urls. (which usually triggers an alarm) My feeling is that it could be risking the stability of the site that we have worked so hard for and maybe just stick with the already translated static pages. I am sure the process could be fine but fear a manual inspection and a slap on the wrist for having dynamically created content?? and also just risk a review trigger period. These days it is hard to know what could get you in "trouble" and my gut says keep it simple and as is and dont shake it up?? Am i being overly concerned? Would love to here from others who have tried similar changes and also those who have not due to similar "fear" thanks
Technical SEO | | nomad-2023230 -
Google Page speed
I get the following advice from Google page speed: Suggestions for this page The following resources have identical contents, but are served from different URLs. Serve these resources from a consistent URL to save 1 request(s) and 77.1KiB. http://www.irishnews.com/ http://www.irishnews.com/index.aspx I'm not sure how to fix this the default page is http://www.irishnews.com/index.aspx, anybody know what need to be done please advise. thanks
Technical SEO | | Liammcmullen0 -
Why is our page not visible in Google-ranking? www.loseweight.com.
using Wordpress as platform. Using the URL gets into the site,- but seems to be non-existent for public... No comments at all, seems to be "invisible"?
Technical SEO | | gewi0