Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
-
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there.
How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped.
Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
-
Google Search Console actually has a URL removal tool built into it, unfortunately it's not really scaleable (mostly it's one at a time submissions) and in addition to that the effect of using the tool is only temporary (the URLs come back again)
In your case I reckon' that changing the status code of the 'gone' URLs from 404 ("temporarily not found, but will be returning soon") to 410 ("GONE!") might be a good idea. Google might digest that better as it's a harder indexation directive and a very strong crawl directive ("go away, don't come back!")
You could also serve the Meta no-index directive on those URLs. Obviously you're unlikely to have access to the HTML of non-existent pages, but did you know Meta no-index can also be fired through x-robots, through the HTTP header? So it's not impossible
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
(Ctrl+F for "X-Robots-Tag HTTP header")
Another option is this form to let Google know outdated content is gone, has been removed, and isn't coming back:
https://www.google.com/webmasters/tools/removals
... but again, URLs one at a time is going to be mega-slow. It does work pretty well though (at least in my experience)
In any eventuality I think you're looking at, a week or two for Google to start noticing in a way that you can see visually - and then maybe a month or two until it rights itself (caveat: it's different for all sites and URLs, it's variable)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
How to maximize CTR from Google image search?
I'm getting good, solid growth in my Google SERPs and Google search traffic now, but I do notice that 70% of my high ranking search results are images and the CTR on those is only 3-4%. All my images are illustrative and highly relevant to my travel blog, but I guess that hardly matters unless they get CTR so people see them in context. Has anyone seen or done any good research on what makes people click through on Google Image Search results? What are the key factors? How do you optimize for click-through? Is it better to watermark your images or overlay label them to increase likelihood of click-through? Thanks, Tony FYI the travel blog in question is www.asiantraveltips.com and a relevant Google search where I rank highly is "songkran 2016 phuket".
Intermediate & Advanced SEO | | Gavin.Atkinson0 -
URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred. My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically? Thanks in advance Moz community for your feedback.
Intermediate & Advanced SEO | | peteboyd0 -
Why is /home used in this company's home URL?
Just working with a company that has chosen a home URL with /home latched on - very strange indeed - has anybody else comes across this kind of homepage URL "decision" in the past? I can't see why on earth anybody would do this! Perhaps simply a logic-defying decision?
Intermediate & Advanced SEO | | McTaggart0 -
After reading of Google's so called "over-optimization" penalty, is there a penalty for changing title tags too frequently?
In other words, does title tag change frequency hurt SEO ? After changing my title tags, I have noticed a steep decline in impressions, but an increase in CTR and rankings. I'd like to once again change the title tags to try and regain impressions. Is there any penalty for changing title tags too often? From SEO forums online, there seems to be a bit of confusion on this subject...
Intermediate & Advanced SEO | | Felix_LLC0 -
My website (non-adult) is not appearing in Google search results when i have safe search settings on. How can i fix this?
Hi, I have this issue where my website does not appear in Google search results when i have the safe search settings on. If i turn the safe search settings off, my site appears no problem. I'm guessing Google is categorizing my website as adult, which it definitely is not. Has anyone had this issue before? Or does anyone know how to resolve this issue? Any help would be much appreciated. Thanks
Intermediate & Advanced SEO | | CupidTeam0 -
Include Cross Domain Canonical URL's in Sitemap - Yes or No?
I have several sites that have cross domain canonical tags setup on similar pages. I am unsure if these pages that are canonicalized to a different domain should be included in the sitemap. My first thought is no, because I should only include pages in the sitemap that I want indexed. On the other hand, if I include ALL pages on my site in the sitemap, once Google gets to a page that has a cross domain canonical tag, I'm assuming it will just note that and determine if the canonicalized page is the better version. I have yet to see any errors in GWT about this. I have seen errors where I included a 301 redirect in my sitemap file. I suspect its ok, but to me, it seems that Google would rather not find these URL's in a sitemap, have to crawl them time and time again to determine if they are the best page, even though I'm indicating that this page has a similar page that I'd rather have indexed.
Intermediate & Advanced SEO | | WEB-IRS0 -
Disallowed Pages Still Showing Up in Google Index. What do we do?
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly?
Intermediate & Advanced SEO | | udemy0