Removing indexed pages
-
Hi all, this is my first post so be kind - I have a one page Wordpress site that has the Yoast plugin installed. Unfortunately, when I first submitted the site's XML sitemap to the Google Search Console, I didn't check the Yoast settings and it submitted some example files from a theme demo I was using. These got indexed, which is a pain, so now I am trying to remove them. Originally I did a bunch of 301's but that didn't remove them from (at least not after about a month) - so now I have set up 410's - These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?
Thanks in advance for any suggestions. -
Thanks for all the responses!
At the moment I am serving the 410's using the .htaaccess file as I removed the actual pages a while ago. The pages don't show in most searches, however, two of them do show up in some instances under the sitelinks which is the main pain. I manually asked for them to be removed using 'remove urls' however that only last a couple of months and they are now back.
So I guess the best way is to recreate the pages and insert a noindex?
Thanks again for everyone time, it's much appreciated.
-
I agree with ViviCa1's methods, so go with that.
One thing I just wanted to bring up though, is that unless people are actually visiting those pages you don't want indexed, or it does some type of brand damage, then you don't really need to make it a priority.
Just because they're indexed doesn't mean they're showing up for any searches - and most likely they aren't - so people will realistically never see them. And if you only have a one-page site, you're not wasting much crawl budget on those.
I just bring this up since sometimes we (I'm guilty of it too) can get bogged down by small distractions in SEO that don't really help much, when we should be creating and producing new things!
"These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?"
There was a good related response from Google employee Susan Moskwa:
“The best way to stop Googlebot from crawling URLs that it has discovered in the past is to make those URLs (such as your old Sitemaps) 404. After seeing that a URL repeatedly 404s, we stop crawling it. And after we stop crawling a Sitemap, it should drop out of your "All Sitemaps" tab.”
A bit older, but shows how Google discovers URLs through the sitemap. Take a look at the rest of that thread as well.
-
I'd suggest adding a noindex robots meta tag to the affected pages (see how to do this here: https://support.google.com/webmasters/answer/93710?hl=en) and until Google recrawls use the remove URLs tool (see how to use this here: https://support.google.com/webmasters/answer/1663419?hl=en).
If you use the noindex robots meta tag, don't disallow the pages through your robots.txt or Google won't even see the tag. Disallowing Google from crawling a page doesn't mean it won't be indexed (or removed from the index), it just means Google won't crawl the page.
-
Couple of ideas spring to mind
- Use the robots.txt file
- Demote the site link in Google search console (see https://support.google.com/webmasters/answer/47334)
Example of robots.txt file...
Disallow: /the-link/you-dont/want-to-show.html
Disallow: /the-link/you-dont/want-to-show2.htmlDon't include the domain just the link to the page, Plenty of tutorials out there worthwhile having a look at http://www.robotstxt.org
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way to get a list of all pages of your website that are indexed in Google?
I am trying to put together a comprehensive list of all pages that are indexed in Google and have differing opinions on how to do this.
Technical SEO | | SpodekandCo0 -
Spam pages being redirected to 404s but sill indexed
Client had a website that was hacked about a year ago. Hackers went in and added a bunch of spam landing pages for various products. This was before the site had installed an SSL certificate. After the hack, the site was purged of the hacked pages and and SLL certificate was implemented. Part of that process involved setting up a rewrite that redirects http pages to the https versions. The trouble is that the spam pages are still being indexed by Google, even months later. If I do a site: search I still see all of those spam pages come up before most of the key "real" landing pages. The thing is, the listing on the SERP are to the http versions, so they're redirecting to the https version before serving a 404. Is there any way I can fix this without removing the rewrite rule?
Technical SEO | | SearchPros1 -
Sudden decrease in indexed AMP pages after 8/1/16 update
After the AMP update on 8/1/16, the number of AMP pages indexed suddenly dropped by about 50% and it's crushing our search traffic- I haven't been able to find any documentation on any changes to look out for and why we are getting a penalty- any advice or something I should look out for?
Technical SEO | | nystromandy0 -
My SEO friend says my website is not being indexed by Google considering the keywords he has placed in the page and URL what does that mean?
My SEO friend says my website is not being indexed by Google considering the keywords he has placed in the page and URL what does that mean? We have added some text in the pages with keywords thats related the page
Technical SEO | | AlexisWithers0 -
Contact Page
I'm currently designing a new website for my wife, who just started her own wedding/engagement photography business. I'm trying to build it as SEO friendly as possible, but she brought up an idea that she likes that I've never tried before. Typically on all the websites I've ever built, I've had a dedicated contact page that has the typical contact form. Because that contact form on a wedding photographers website is almost as important as selling a product on an e-commerce site, she brought up the possibility of putting the contact form in the footer site-wide (minus maybe the homepage) rather than having a dedicated contact page. And in the navigation, where you have links such as "Home", "Portfolio", "About", "Prices", "Contact", etc. the "Contact" navigation item would transfer the user to the bottom of the page they are on rather than a new page. Any thoughts on which way would be better for a case like this, and any positives/negatives for doing it each way? One thought I had is that if it's in the footer rather than it's own page, it would lose it's search-ability as it's technically duplicate content on each page. But then again, that's what a footer is. Thanks, Mickey
Technical SEO | | shannmg10 -
Best way to handle pages with iframes that I don't want indexed? Noindex in the header?
I am doing a bit of SEO work for a friend, and the situation is the following: The site is a place to discuss articles on the web. When clicking on a link that has been posted, it sends the user to a URL on the main site that is URL.com/article/view. This page has a large iframe that contains the article itself, and a small bar at the top containing the article with various links to get back to the original site. I'd like to make sure that the comment pages (URL.com/article) are indexed instead of all of the URL.com/article/view pages, which won't really do much for SEO. However, all of these pages are indexed. What would be the best approach to make sure the iframe pages aren't indexed? My intuition is to just have a "noindex" in the header of those pages, and just make sure that the conversation pages themselves are properly linked throughout the site, so that they get indexed properly. Does this seem right? Thanks for the help...
Technical SEO | | jim_shook0 -
How to verify a page-by-page level 301 redirect was done correctly?
Hello, I told some tech guys to do a page-by-page relevant 301 redirect (as talked about in Matt Cutts video https://www.youtube.com/watch?v=r1lVPrYoBkA) when a company wanted to move to a new domain when their site was getting redesigned. I found out they did a 302 redirect on accident and had to fix that, so now I don't trust they did the page-by-page relevant redirect. I have a feeling they just redirected all of the pages on the old domain to the homepage of the new domain. How could I confirm this suspicion? I run the old domain through screaming frog and it only shows 1 URL - the homepage. Does that mean they took all of the pages on the old domain offline? Thanks!
Technical SEO | | EvolveCreative0 -
Https indexed - though a no index no follow tag has been added
Hi, The https-pages of our booking section are being indexed by Google. We added But the pages are still being indexed. What can I do to exclude these URL's from the Google index? Thank you very much in advance! Kind regards, Dennis Overbeek ACSI Publishing | dennis@acsi.eu
Technical SEO | | SEO_ACSI0