Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
-
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred.
My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically?
Thanks in advance Moz community for your feedback.
-
If the urls follow any particular pattern then you can use a htaccess redirect and return the header code 410 / 403 / 404 to Google. (I suggest 410) They will soon drop out of the index.
I don't know the exact .htaccess syntax off the top of my head but it will be something like this:
If they all come from the same folder then it would look something like this:
RedirectMatch 410 ^/folder/.*$If they have a common character string after the forward slash (such as xyz) then it would look something like this:
RedirectMatch 410 ^/xyz.*$If they have any common character string footprints at all (such as xyz) then it would look something like this (now I'm guessing):
RedirectMatch 410 ^/()xyz.$This would be a pretty easy fix if all of those spammy urls have any common characters after the forward slash or they all originate from a certain folder.
-
You might get a little quicker removal if you send them with a 410 status code. That will let Google know that the page is gone for good. http://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes
-
No problem at all! These new URLs do not actually exist on the website. Since we cleaned up the malicious code all of these URLs redirect to our 404 page.
-
Sorry to misunderstand the problem. Do those new urls actually exist on your site or just in search?
-
Hi 94501,
Thanks for taking the time to respond. Just to be clear, we are not submitting multiple removals for the same URL and I don't think Google WMT even allows you to do that. Completely new URLs are appearing each day after removing the older ones.
My main concern is having spammy URLs indexed and associated with my website and the negative effects it can have from an SEO perspective.
-
Hi Pete,
It sounds like you've done what you can. I wouldn't submit multiple removals for the same url.
I assume it's out of your site map and you're not still being hacked and have figured out how it happened and taken steps to fix it.
Google will eventually figure it out. I'd try to move on to new stuff.
Best... Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
Password Protected Page(s) Indexed
Hi, I am wondering if my website can get a penalty if some password protected pages are showing up when I search on google: site:www.example.com/sub-group/pass-word-protected-page That shows that my password protected page was indexed either before or after adding the password protection. I've seen people suggest no indexing the page. Is that the best method to take care of this? What if we are planning on pushing the page live later on? All of these pages have no title tag, meta description, image alt text, etc. Should I add them for each page? I am wondering what is the best step, especially if we are planning on pushing the page(s) live. Thanks for any help!
Intermediate & Advanced SEO | | aua0 -
How to NOT appear in Google results in other countries?
I have ecommerce sites the only serve US and Canada. Is there a way to prevent a site from appearing in the Google results in foreign countries? The reason I ask is that we also have a lot of informational pages that folks in other countries are visiting, then leaving right after reading. This is making our overall Bounce Rate very high (64%). When we segment the GA data to look at just our US visitors, then the Bounce Rate drops a lot. (to 48%) Thanks!
Intermediate & Advanced SEO | | GregB1230 -
Our login pages are being indexed by Google - How do you remove them?
Each of our login pages show up under different subdomains of our website. Currently these are accessible by Google which is a huge competitive advantage for our competitors looking for our client list. We've done a few things to try to rectify the problem: - No index/archive to each login page Robot.txt to all subdomains to block search engines gone into webmaster tools and added the subdomain of one of our bigger clients then requested to remove it from Google (This would be great to do for every subdomain but we have a LOT of clients and it would require tons of backend work to make this happen.) Other than the last option, is there something we can do that will remove subdomains from being viewed from search engines? We know the robots.txt are working since the message on search results say: "A description for this result is not available because of this site's robots.txt – learn more." But we'd like the whole link to disappear.. Any suggestions?
Intermediate & Advanced SEO | | desmond.liang1 -
Google News URL Structure
Hi there folks I am looking for some guidance on Google News URLs. We are restructuring the site. A main traffic driver will be the traffic we get from Google News. Most large publishers use: www.site.com/news/12345/this-is-the-title/ Others use www.example.com/news/celebrity/12345/this-is-the-title/ etc. www.example.com/news/celebrity-news/12345/this-is-the-title/ www.example.com/celebrity-news/12345/this-is-the-title/ (Celebrity is a channel on Google News so should we try and follow that format?) www.example.com/news/celebrity-news/this-is-the-title/12345/ www.example.com/news/celebrity-news/this-is-the-title-12345/ (unique ID no at the end and part of the title URL) www.example.com/news/celebrity-news/celebrity-name/this-is-the-title-12345/ Others include the date. So as you can see there are so many combinations and there doesnt seem to be any unity across news sites for this format. Have you any advice on how to structure these URLs? Particularly if we want to been seen as an authority on the following topics: fashion, hair, beauty, and celebrity news - in particular "celebrity name" So should the celebrity news section be www.example.com/news/celebrity-news/celebrity-name/this-is-the-title-12345/ or what? This is for a completely new site build. Thanks Barry
Intermediate & Advanced SEO | | Deepti_C0 -
How to prevent 404's from a job board ?
I have a new client with a job listing board on their site. I am getting a bunch of 404 errors as they delete the filled jobs. Question: Should we leave the the jobs pages up for extra content and entry points to the site and put a notice like this job has been filled, please search our other job listings ? Or should I no index - no follow these pages ? Or any other suggestions - it is an employment agency site. Overall what would be the best practice going forward - we are looking at probably 20 jobs / pages per month.
Intermediate & Advanced SEO | | jlane90 -
Does Google index url with hashtags?
We are setting up some Jquery tabs in a page that will produce the same url with hashtags. For example: index.php#aboutus, index.php#ourguarantee, etc. We don't want that content to be crawled as we'd like to prevent duplicate content. Does Google normally crawl such urls or does it just ignore them? Thanks in advance.
Intermediate & Advanced SEO | | seoppc20120 -
How to check a website's architecture?
Hello everyone, I am an SEO analyst - a good one - but I am weak in technical aspects. I do not know any programming and only a little HTML. I know this is a major weakness for an SEO so my first request to you all is to guide me how to learn HTML and some basic PHP programming. Secondly... about the topic of this particular question - I know that a website should have a flat architecture... but I do not know how to find out if a website's architecture is flat or not, good or bad. Please help me out on this... I would be obliged. Eagerly awaiting your responses, BEst Regards, Talha
Intermediate & Advanced SEO | | MTalhaImtiaz0