Tens of duplicate homepages indexed and blocked later: How to remove from Google cache?
-
Hi community,
Due to some WP plugin issue, many homepages indexed in Google with anonymous URLs. We blocked them later. Still they are in SERP. I wonder whether these are causing some trouble to our website, especially as our exact homepages indexed. How to remove these pages from Google cache? Is that the right approach?
Thanks
-
Hi Nigel,
Thanks for the suggestion. I'm going to use "Remove URLs" tool from GSC. They have been created due to a bug in the Yoast SEO plugin. Very unfortunate and we paid for no mistake from our end.
Removing from SERP means removing from Google index also? Or Google will still consider them and just stops showing us? My intention is: Anyway we blocked them, but whether they will cause some distraction to our ranking efforts being there in results being cached.
Thanks
-
Thanks!
A agree - I have just done a similar clean up by:
1. Don't let them be created
2. Redirect all previous versions!One site I just worked on had 8 versions of the home page! lol
http
https
/index.php
/index.php/A mess!
We stopped them all being created and 301'd all versions just in case they were indexed anywhere or linked externally.
Cheers
-
It is assuredly true that, just like in any number of fields (medicine) - in SEO, prevention is better than cleanup based methodology. If your website doesn't take its medicine, you get problems like this one
I think your advice here was really good
-
Good solid advice
They can be created in any number of ways but it's normally simple enough to specify the preferred URL on the server then move any variations in htaccess, such as those with www (if the none www is preferred), those with a trailing slash at the end etc.
The self canonical on all will sort out any other duplicates.
As for getting rid of them - the search console way is the quickest. If they don't exist after that then the won't be reindexed unless they are linked from somewhere else. In such cases, they will 301 from htaccess so it shouldn't be a problem.
if you 410 you will lose any benefit from those links going to the pages and it's a bad experience for a visitor. Always 301 do not 410 if it is a version.
410s are fine for old pages you never want to see in the index again but not for a home page version.
Regards
Nigel
-
It's likely that you don't have access to edit the coding on these weird plugin URLs. As such, normal techniques like using a Meta no-index tag in the HTML may be non-viable.
You could use the HTTP header (server level stuff) to help you out. I'd advise adding two strong directives to the afflicted URLs through the HTTP header so that Google gets the message:
-
Use the X-Robots deployment of the no-index directive on the affected URLs, at the HTTP header (not the HTML) level. That linked pages tells you about the normal HTML implementation, but also about the X-Robots implementation which is the one you need (scroll down a bit)
-
Serve status code 410 (gone) on the affected URLs
That should prompt Google to de-index those pages. Once they are de-indexed, you can use robots.txt to block Google from crawling such URLs in the future (which will stop the problem happening again!)
It's important to de-index the URLs before you do any robots.txt stuff. If Google can't crawl the affected URLs, it can't find the info (in the HTTP header) to know that it should de-index those pages
Once Google is blocked from both indexing and crawling these pages, they should begin to stop caching them too
Hope that helps
-
-
+1 for "Make sure that they are not created in the first place" haha
-
Hi again vtmoz!
1. Make sure that they are not created in the first place
2. Make sure that they are not in the sitemap
3. Go to search console and remove any you do not want - it will say temporary removal but they will not come back if they are not in the structure or the sitemap.More:
https://support.google.com/webmasters/answer/1663419?hl=en
Note: Always self canonicalize the home page to stop versions with UTM codes (created by Facebook, Twitter etc) appearing in SERPS
Regards
Nigel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404s in Google Search Console and javascript
The end of April, we made the switch from http to https and I was prepared for a surge in crawl errors while Google sorted out our site. However, I wasn't prepared for the surge in impossibly incorrect URLs and partial URLs that I've seen since then. I have learned that as Googlebot grows up, he'she's now attempting to read more javascript and will occasionally try to parse out and "read" a URL in a string of javascript code where no URL is actually present. So, I've "marked as fixed" hundreds of bits like /TRo39,
Algorithm Updates | | LizMicik
category/cig
etc., etc.... But they are also returning hundreds of otherwise correct URLs with a .html extension when our CMS system generates URLs with a .uts extension like this: https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.html
when it should be:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.uts Worst of all, when I look at them in GSC and check the "linked from" tab it shows they are linked from themselves, so I can't backtrack and find a common source of the error. Is anyone else experiencing this? Got any suggestions on how to stop it from happening in the future? Last month it was 50 URLs, this month 150, so I can't keep creating redirects and hoping it goes away. Thanks for any and all suggestions!
Liz Micik0 -
Does Google's Information Box Seem Shady to you?
So I just had this thought, Google returns information boxes for certain search terms. Recently I noticed one word searches usually return a definition. For example if you type in the word "occur" or "happenstance" or "frustration" you get a definition information box. But what I didn't see is a reference to where they are getting or have gotten this information. Now it could very well be they built their own database of definitions, and if they did great, but here is where it seems a bit grey to me... Did Google hire a team of people to populate the database, or did they just write an algorithm to comb a dictionary website and stick the information in their database. The latter seems more likely. If that is what happened then Google basically stole the information from somebody to claim it as their own, which makes me worry, if you coin a term, lets say "lumpy stumpy" and it goes mainstream which would entail a lot of marketing, and luck. Would Google just add it to its database and forgo giving you credit for its creation? From a user perspective I love these information boxes, but just like Google expects us webmasters to do, they should be giving credit where credit is due... don't you think? I'm not plugged in to the happenings of Google so maybe they bought the rights, or maybe they bought or hold a majority of shares in some definition type company (they have the cash) but it just struck me as odd not seeing a reference to a site. What are your thoughts?
Algorithm Updates | | donford1 -
Missing Keywords in Google SERP
We just got this attached image from one of our partners - has anyone seen Google putting 'missing' keywords in SERPs like this before? They said that it was not a plugin or anything and this is a screenshot of their organic search results. google%20screenshot_zpsgmwaf9e2.png
Algorithm Updates | | ReunionMarketing0 -
Google Reconsideration - To do or not to do?
We haven't been manually penalized by Google yet but we have had our fair share of things needing to be fixed; malware, bad links, lack/if no content, lack-luster UX, and issues with sitemaps & redirects. Should we still submit a reconsideration even though we haven't had a direct penalty? Does hurt us to send it?
Algorithm Updates | | GoAbroadKP0 -
Google places - are you still registering your companies?
Is Google plus taking over from Google Places? I have our company and website registered on Google places and Google Plus. Now we have a new website ( same company different product) Should I register a new listing on Google places? And what about Google Plus?
Algorithm Updates | | Realtor1010 -
How to Recover Ranking from Latest Google Panda and EMD Update?
We have 7 websites with the exact match domains. Our Website has been affected by Google recently updated (Panda and EMD Update). we don't want to do any changes in our existing domains. What should we do? So, What is the exact solution for that. Help Us out! Website Names: http://www.hamptoninndenverairport.com http://www.wingatehotelcolumbia.com http://www.fairfieldinnhotelaurora.com http://www.hamptoninnhotelcrestwood.com http://www.laquintahoteldavenport.com http://www.redroofinnhotelcedarrapids.com http://www.hamptoninnhotelcarolstream.com Thanks in advance.
Algorithm Updates | | CommercePundit1 -
Google SERPS problem - "block all results from this domain - click here".
Anyone know what can be done about this when it happens to one of your own domains? On the Google SERPS page, underneath the Title, next to the Description, Google has added "Block all results from this domain?". I understand that this is a new "feature", aimed at allowing users to filter out results from low quality, pornograhphic or offensive sites. But the site in question is none of the above - any ideas how to tackle? Couldn't find anything yet by searching.
Algorithm Updates | | Understudy0 -
Did google change their algorithm over the past week?
I did some home page optimization with the seo moz on page key word optimization tool and we are now back in the top three in the past week (after dropping to page 3 a month or so ago). It seems that google has gone back to combining google places with organic searches. Has anyone else noticed this type of change? I did read some posts about panda 2.2, which seems to explain some of these findings. I am wondering if things are in flux or they may be more stable this way? Thanks for the insights.
Algorithm Updates | | fertilityhealth0