URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
-
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred.
My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically?
Thanks in advance Moz community for your feedback.
-
If the urls follow any particular pattern then you can use a htaccess redirect and return the header code 410 / 403 / 404 to Google. (I suggest 410) They will soon drop out of the index.
I don't know the exact .htaccess syntax off the top of my head but it will be something like this:
If they all come from the same folder then it would look something like this:
RedirectMatch 410 ^/folder/.*$If they have a common character string after the forward slash (such as xyz) then it would look something like this:
RedirectMatch 410 ^/xyz.*$If they have any common character string footprints at all (such as xyz) then it would look something like this (now I'm guessing):
RedirectMatch 410 ^/()xyz.$This would be a pretty easy fix if all of those spammy urls have any common characters after the forward slash or they all originate from a certain folder.
-
You might get a little quicker removal if you send them with a 410 status code. That will let Google know that the page is gone for good. http://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes
-
No problem at all! These new URLs do not actually exist on the website. Since we cleaned up the malicious code all of these URLs redirect to our 404 page.
-
Sorry to misunderstand the problem. Do those new urls actually exist on your site or just in search?
-
Hi 94501,
Thanks for taking the time to respond. Just to be clear, we are not submitting multiple removals for the same URL and I don't think Google WMT even allows you to do that. Completely new URLs are appearing each day after removing the older ones.
My main concern is having spammy URLs indexed and associated with my website and the negative effects it can have from an SEO perspective.
-
Hi Pete,
It sounds like you've done what you can. I wouldn't submit multiple removals for the same url.
I assume it's out of your site map and you're not still being hacked and have figured out how it happened and taken steps to fix it.
Google will eventually figure it out. I'd try to move on to new stuff.
Best... Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website dropped out from Google index
Howdy, fellow mozzers. I got approached by my friend - their website is https://www.hauteheadquarters.com She is saying that they dropped from google index over night - and, as you can see if you google their name, website url or even site: , most of the pages are not indexed. Home page is nowhere to be found - that's for sure. I know that they were indexed before. Google webmaster tools don't have any manual actions (at least yet). No sudden changes in content or backlink profile. robots.txt has some weird rule - disallow everything for EtaoSpider. I don't know if google would listen to that - robots checker in GWT says it's all good. Any ideas why that happen? Any ideas what I should check? P.S. Just noticed in GWT there was a huge drop in indexed pages within first week of August. Still no idea why though. P.P.S. Just noticed that there is noindex x-robots-tag in headers... Anyone knows where this can be set?
Intermediate & Advanced SEO | | DmitriiK0 -
Product Pages not indexed by Google
We built a website for a jewelry company some years ago, and they've recently asked for a meeting and one of the points on the agenda will be why their products pages have not been indexed. Example: http://rocks.ie/details/Infinity-Ring/7170/ I've taken a look but I can't see anything obvious that is stopping pages like the above from being indexed. It has a an 'index, follow all' tag along with a canonical tag. Am I missing something obvious here or is there any clear reason why product pages are not being indexed at all by Google? Any advice would be greatly appreciated. Update I was told 'that each of the product pages on the full site have corresponding page on mobile. They are referred to each other via cannonical / alternate tags...could be an angle as to why product pages are not being indexed.'
Intermediate & Advanced SEO | | RobbieD910 -
Why is /home used in this company's home URL?
Just working with a company that has chosen a home URL with /home latched on - very strange indeed - has anybody else comes across this kind of homepage URL "decision" in the past? I can't see why on earth anybody would do this! Perhaps simply a logic-defying decision?
Intermediate & Advanced SEO | | McTaggart0 -
- Truth ? ''link building isn't considered a suitable way of promotion as per recent search engine updates''
I need SEO. A SEO consultant said: ''link building isn't considered a suitable way of promotion as per recent search engine updates'' they mention: ''Therefore we would be undertaking a range of promotional exercises such as blog postings, social book marking, press release, etc that are more effective for ensuring best possible rankings for the website.'' Do you agree? Thank you
Intermediate & Advanced SEO | | BigBlaze2051 -
Google's serp
Hello Guys ! I will appreciate if you will share your thoughts re the situation i have. The homepage for one of my sites is one last page of google's serp, although internal pages are displayed in the top 10. 1. Why ?
Intermediate & Advanced SEO | | Webdeal
2. What should I do to correct the situation with the homepage ? regards0 -
Is there anyway to recover my site's rankings?
My site has been top 3 for 'speed dating' on Google.co.uk since about 2003 and it went to below top 50 for a lot of it's main keywords shortly after 27 Oct 2012. I did a re-submission request and was told there was 'no manual spam action'. My conclusions is I was dropped by Google because of poor quality links I've gained over 10+ years. I have a Domain Authority of 40, a regular blog http://bit.ly/oKyi88, a KLOUT of 42, user reviews and quality content. Since Oct 2012 I've done some technical improvements and managed to get a few questionable links removed. I've continued blogging reguarly and got more active on Twitter. I've seen no improvement and my traffic is 80% down on last year. It would be great to be able to produce content that others want to link to but I've not had much success from that in over 10 years of trying and I've not seen many others in my sector, with small budgets having much success. Is there anything I can do to regain favour with Google?
Intermediate & Advanced SEO | | benners0 -
Can I, in Google's good graces, check for Googlebot to turn on/off tracking parameters in URLs?
Basically, we use a number of parameters in our URLs for event tracking. Google could be crawling an infinite number of these URLs. I'm already using the canonical tag to point at the non-tracking versions of those URLs....that doesn't stop the crawling tho. I want to know if I can do conditional 301s or just detect the user agent as a way to know when to NOT append those parameters. Just trying to follow their guidelines about allowing bots to crawl w/out things like sessionID...but they don't tell you HOW to do this. Thanks!
Intermediate & Advanced SEO | | KenShafer0 -
What Is The Preferred Url Structure For Se’s?
Here is my issue, my domain is abcdomian.com and I’m trying to rank the site for the keyword “example”. All of my content is under “abcdomain.com/folder/example/” and building content off of “abcdomain.com/example” is not an option. So I’m thinking about moving the content to “abcdomain.com/online-example/” and 301ing the old pages . Of the two paths below, which will have a greater impact on my rankings for the term “example”? Current: abcdomain.com/folder/example/
Intermediate & Advanced SEO | | samp582
Proposed: abcdomain.com/online-example/ Thoughts?0