Removing indexed pages

Jettynz

Hi all, this is my first post so be kind - I have a one page Wordpress site that has the Yoast plugin installed. Unfortunately, when I first submitted the site's XML sitemap to the Google Search Console, I didn't check the Yoast settings and it submitted some example files from a theme demo I was using. These got indexed, which is a pain, so now I am trying to remove them. Originally I did a bunch of 301's but that didn't remove them from (at least not after about a month) - so now I have set up 410's - These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?
Thanks in advance for any suggestions.

Jettynz

Thanks for all the responses!

At the moment I am serving the 410's using the .htaaccess file as I removed the actual pages a while ago. The pages don't show in most searches, however, two of them do show up in some instances under the sitelinks which is the main pain. I manually asked for them to be removed using 'remove urls' however that only last a couple of months and they are now back.

So I guess the best way is to recreate the pages and insert a noindex?

Thanks again for everyone time, it's much appreciated.

Joe.Robison

I agree with ViviCa1's methods, so go with that.

One thing I just wanted to bring up though, is that unless people are actually visiting those pages you don't want indexed, or it does some type of brand damage, then you don't really need to make it a priority.

Just because they're indexed doesn't mean they're showing up for any searches - and most likely they aren't - so people will realistically never see them. And if you only have a one-page site, you're not wasting much crawl budget on those.

I just bring this up since sometimes we (I'm guilty of it too) can get bogged down by small distractions in SEO that don't really help much, when we should be creating and producing new things!

"These also seem to not be working and I am wondering if it is because I re-submitted the sitemap with only the index page on it (as it is just a single page site) could that have now stopped Google indexing the original pages to actually see the 410's?"

There was a good related response from Google employee Susan Moskwa:

“The best way to stop Googlebot from crawling URLs that it has discovered in the past is to make those URLs (such as your old Sitemaps) 404. After seeing that a URL repeatedly 404s, we stop crawling it. And after we stop crawling a Sitemap, it should drop out of your "All Sitemaps" tab.”

A bit older, but shows how Google discovers URLs through the sitemap. Take a look at the rest of that thread as well.

ViviCa1

I'd suggest adding a noindex robots meta tag to the affected pages (see how to do this here: https://support.google.com/webmasters/answer/93710?hl=en) and until Google recrawls use the remove URLs tool (see how to use this here: https://support.google.com/webmasters/answer/1663419?hl=en).

If you use the noindex robots meta tag, don't disallow the pages through your robots.txt or Google won't even see the tag. Disallowing Google from crawling a page doesn't mean it won't be indexed (or removed from the index), it just means Google won't crawl the page.

seoman10

Couple of ideas spring to mind

Use the robots.txt file
Demote the site link in Google search console (see https://support.google.com/webmasters/answer/47334)

Example of robots.txt file...

Disallow: /the-link/you-dont/want-to-show.html
Disallow: /the-link/you-dont/want-to-show2.html

Don't include the domain just the link to the page, Plenty of tutorials out there worthwhile having a look at http://www.robotstxt.org

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Removing indexed pages

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

URLs dropping from index (Crawled, currently not indexed)

Use Internal Search pages as Landing Pages?

My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.

According to 1 of my PRO campaigns - I have 250+ pages with Duplicate Content - Could my empty 'tag' pages be to blame?

Top pages give " page not found"

Why this page doesn't get indexed?

Does page speed affect what pages are in the index?

What's the difference between a category page and a content page