How to get a large number of urls out of Google's Index when there are no pages to noindex tag?
-
Hi,
I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.
The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.
I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.
Any ideas? Thanks! Best... Michael
P.S.,
All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.
-
If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:
- https://yoast.com/x-robots-tag-play/
- https://www.searchenginejournal.com/x-robots-tag-simple-alternate-robots-txt-meta-tag/67138/
Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).
This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)
The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long will old pages stay in Google's cache index. We have a new site that is two months old but we are seeing old pages even though we used 301 redirects.
Two months ago we launched a new website (same domain) and implemented 301 re-directs for all of the pages. Two months later we are still seeing old pages in Google's cache index. So how long should I tell the client this should take for them all to be removed in search?
Intermediate & Advanced SEO | | Liamis0 -
Canonical URL's For Two Domains
We have two websites, one we use for Google PPC (website 1) and one (website 2) we use for everything else. The reason is we are in an industry that Google Adwords doesn't like, so we built a whole other website that removes the product descriptions as Google Adwords doesn't approve of many of them (nutrition). Right now we have that Google Adwords approved website (website 1) no-index/no-follow because we didn't want to run into potential duplicate content issues in free search, but the issue is we can't submit it to Google Shopping...as they require it to be indexable. Do you think removing the no-index/no-follow from that website 1 and adding canonical URL's pointing to website 2 would resolve this issue (being able to submit it to Google Shopping) and not cause any problems with duplicate content? I was thinking of adding the canonical tag to all pages of website 1 and point it to website 2. Does that make sense? Do you think that would work?
Intermediate & Advanced SEO | | vetofunk0 -
Why isn't Google caching our pages?
Hi everyone, We have a new content marketing site that allows anyone to publish checklists. Each checklist is being indexed by Google, but Google is not storing a cached version of any of our checklists. Here's an example:
Intermediate & Advanced SEO | | Checkli
https://www.checkli.com/checklists/ggc/a-girls-guide-to-a-weekend-in-south-beach Missing Cache:
https://webcache.googleusercontent.com/search?q=cache:DfFNPP6WBhsJ:https://www.checkli.com/checklists/ggc/a-girls-guide-to-a-weekend-in-south-beach+&cd=1&hl=en&ct=clnk&gl=us Why is this happening? How do we fix it? Is this hurting the SEO of our website.0 -
Sitemap Indexed Pages, Google Glitch or Problem With Site?
Hello, I have a quick question about our Sitemap Web Pages Indexed status in Google Search Console. Because of the drastic drop I can't tell if this is a glitch or a serious issue. When you look at the attached image you can see that under Sitemaps Web Pages Indexed has dropped suddenly on 3/12/17 from 6029 to 540. Our Index status shows 7K+ indexed. Other than product updates/additions and homepage layout updates there have been no significant changes to this website. If it helps we are operating on the Volusion platform. Thanks for your help! -Ryan rou1zMs
Intermediate & Advanced SEO | | rrhansen0 -
Dfferent url of some other site is shown by Google in cace copy of our site's page
Hi, When i check cached copy of url of my site http://goo.gl/BZw2Zz , the url in cache copy shown by Google is of some other third party site. Why is Google showing third party url in our site's cached url. Did any of you guys faced any such issue. Regards,
Intermediate & Advanced SEO | | vivekrathore0 -
Two Pages with the Same Name Different URL's
I was hoping someone could give me some insight into a perplexing issue that I am having with my website. I run an 20K product ecommerce website and I am finding it necessary to have two pages for my content: 1 for content category pages about wigets one for shop pages for wigets 1st page would be .com/shop/wiget/ 2nd page would be .com/content/wiget/ The 1st page would be a catalogue of all the products with filters for the customer to narrow down wigets. So ultimately the URL for the shop page could look like this when the customer filters down... .com/shop/wiget/color/shape/ The second page would be content all about the Wigets. This would be types of wigets colors of wigets, how wigets are used, links to articles about wigets etc. Here are my questions. 1. Is it bad to have two pages about wigets on the site, one for shopping and one for information. The issue here is when I combine my content wiget with my shop wiget page, no one buys anything. But I want to be able to provide Google the best experience for rankings. What is the best approach for Google and the customer? 2. Should I rel canonical all of my .com/shop/wiget/ + .com/wiget/color/ etc. pages to the .com/content/wiget/ page? Or, Should I be canonicalizing all of my .com/shop/wiget/color/etc pages to .com/shop/wiget/ page? 3. Ranking issues. As it is right now, I rank #1 for wiget color. This page on my site would be .com/shop/wiget/color/ . If I rel canonicalize all of my pages to .com/content/wiget/ I am going to loose my rankings because all of my shop/wiget/xxx/xxx/ pages will then point to .com/content/wiget/ page. I am just finding with these massive ecommerce sites that there is WAY to much potential for duplicate content, not enough room to allow Google the ability to rank long tail phrases all the while making it completely complicated to offer people pages that promote buying. As I said before, when I combine my content + shop pages together into one page, my sales hit the floor (like 0 - 15 dollars a day), when i just make a shop page my sales are like (1k+ a day). But I have noticed that ever since Penguin and Panda my rankings have fallen from #1 across the board to #15 and lower for a lot of my phrase with the exception of the one mentioned above. This is why I want to make an information page about wigets and a shop page for people to buy wigets. Please advise if you would. Thanks so much for any insight you can give me!
Intermediate & Advanced SEO | | SKP0 -
Pro's & Con's of registering your customers?
I know that making a user register will drop the the conversion rate. However, there are a lot of sites that still stand by making users register before you can purchase. I was wondering if they know something that I don't that would outweigh the loss of those conversions. What exactly are the Pro's & Con's of making your customers register before being able to purchase an item?
Intermediate & Advanced SEO | | HCGDiet0 -
We are changing ?page= dynamic url's to /page/ static urls. Will this hurt the progress we have made with the pages using dynamic addresses?
Question about changing url from dynamic to static to improve SEO but concern about hurting progress made so far.
Intermediate & Advanced SEO | | h3counsel0