Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
-
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred.
My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically?
Thanks in advance Moz community for your feedback.
-
If the urls follow any particular pattern then you can use a htaccess redirect and return the header code 410 / 403 / 404 to Google. (I suggest 410) They will soon drop out of the index.
I don't know the exact .htaccess syntax off the top of my head but it will be something like this:
If they all come from the same folder then it would look something like this:
RedirectMatch 410 ^/folder/.*$If they have a common character string after the forward slash (such as xyz) then it would look something like this:
RedirectMatch 410 ^/xyz.*$If they have any common character string footprints at all (such as xyz) then it would look something like this (now I'm guessing):
RedirectMatch 410 ^/()xyz.$This would be a pretty easy fix if all of those spammy urls have any common characters after the forward slash or they all originate from a certain folder.
-
You might get a little quicker removal if you send them with a 410 status code. That will let Google know that the page is gone for good. http://searchenginewatch.com/sew/how-to/2340728/matt-cutts-on-how-google-handles-404-410-status-codes
-
No problem at all! These new URLs do not actually exist on the website. Since we cleaned up the malicious code all of these URLs redirect to our 404 page.
-
Sorry to misunderstand the problem. Do those new urls actually exist on your site or just in search?
-
Hi 94501,
Thanks for taking the time to respond. Just to be clear, we are not submitting multiple removals for the same URL and I don't think Google WMT even allows you to do that. Completely new URLs are appearing each day after removing the older ones.
My main concern is having spammy URLs indexed and associated with my website and the negative effects it can have from an SEO perspective.
-
Hi Pete,
It sounds like you've done what you can. I wouldn't submit multiple removals for the same url.
I assume it's out of your site map and you're not still being hacked and have figured out how it happened and taken steps to fix it.
Google will eventually figure it out. I'd try to move on to new stuff.
Best... Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing Of Pages As HTTPS vs HTTP
We recently updated our site to be mobile optimized. As part of the update, we had also planned on adding SSL security to the site. However, we use an iframe on a lot of our site pages from a third party vendor for real estate listings and that iframe was not SSL friendly and the vendor does not have that solution yet. So, those iframes weren't displaying the content. As a result, we had to shift gears and go back to just being http and not the new https that we were hoping for. However, google seems to have indexed a lot of our pages as https and gives a security error to any visitors. The new site was launched about a week ago and there was code in the htaccess file that was pushing to www and https. I have fixed the htaccess file to no longer have https. My questions is will google "reindex" the site once it recognizes the new htaccess commands in the next couple weeks?
Intermediate & Advanced SEO | | vikasnwu1 -
Forwarded vanity domains, suddenly resolving to 404 with appended URL's ending in random 5 characters
We have several vanity domains that forward to various pages on our primary domain.
Intermediate & Advanced SEO | | SS.Digital
e.g. www.vanity.com (301)--> www.mydomain.com/sub-page (200) These forwards have been in place for months or even years and have worked fine. As of yesterday, we have seen the following problem. We have made no changes in the forwarding settings. Now, inconsistently, they sometimes resolve and sometimes they do not. When we load the vanity URL with Chrome Dev Tools (Network Pane) open, it shows the following redirect chains, where xxxxx represents a random 5 character string of lower and upper case letters. (e.g. VGuTD) EXAMPLE:
www.vanity.com (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx/xxxxx (302, Found) -->
www.mydomain.com/sub-page/xxxxx (404, Not Found) This is just one example, the amount of redirects, vary wildly. Sometimes there is only 1 redirect, sometimes there are as many as 5. Sometimes the request will ultimately resolve on the correct mydomain.com/sub-page, but usually it does not (as in the example above). We have cross-checked across every browser, device, private/non-private, cookies cleared, on and off of our network etc... This leads us to believe that it is not at the device or host level. Our Registrar is Godaddy. They have not encountered this issue before, and have no idea what this 5 character string is from. I tend to believe them because per our analytics, we have determined that this problem only started yesterday. Our primary question is, has anybody else encountered this problem either in the last couple days, or at any time in the past? We have come up with a solution that works to alleviate the problem, but to implement it across hundreds of vanity domains will take us an inordinate amount of time. Really hoping to fix the cause of the problem instead of just treating the symptom.0 -
Does google ignore ? in url?
Hi Guys, Have a site which ends ?v=6cc98ba2045f for all its URLs. Example: https://domain.com/products/cashmere/robes/?v=6cc98ba2045f Just wondering does Google ignore what is after the ?. Also any ideas what that is? Cheers.
Intermediate & Advanced SEO | | CarolynSC0 -
Should I Add Location to ALL of My Client's URLs?
Hi Mozzers, My first Moz post! Yay! I'm excited to join the squad 🙂 My client is a full service entertainment company serving the Washington DC Metro area (DC, MD & VA) and offers a host of services for those wishing to throw events/parties. Think DJs for weddings, cool photo booths, ballroom lighting etc. I'm wondering what the right URL structure should be. I've noticed that some of our competitors do put DC area keywords in their URLs, but with the moves of SERPs to focus a lot more on quality over keyword density, I'm wondering if we should focus on location based keywords in traditional areas on page (e.g. title tags, headers, metas, content etc) instead of having keywords in the URLs alongside the traditional areas I just mentioned. So, on every product related page should we do something like: example.com/weddings/planners-washington-dc-md-va
Intermediate & Advanced SEO | | pdrama231
example.com/weddings/djs-washington-dc-md-va
example.com/weddings/ballroom-lighting-washington-dc-md-va OR example.com/weddings/planners
example.com/weddings/djs
example.com/weddings/ballroom-lighting In both cases, we'd put the necessary location based keywords in the proper places on-page. If we follow the location-in-URL tactic, we'd use DC area terms in all subsequent product page URLs as well. Essentially, every page outside of the home page would have a location in it. Thoughts? Thank you!!0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Pages are Indexed but not Cached by Google. Why?
Here's an example: I get a 404 error for this: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all But a search for qjamba restaurant coupons gives a clear result as does this: site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all What is going on? How can this page be indexed but not in the Google cache? I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
Intermediate & Advanced SEO | | friendoffood2 -
Google Not Indexing XML Sitemap Images
Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT. If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'. That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt. As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in. Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change. But over a week later, that seems to have had no impact. The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this. I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ
Intermediate & Advanced SEO | | edlondon0 -
How to prevent 404's from a job board ?
I have a new client with a job listing board on their site. I am getting a bunch of 404 errors as they delete the filled jobs. Question: Should we leave the the jobs pages up for extra content and entry points to the site and put a notice like this job has been filled, please search our other job listings ? Or should I no index - no follow these pages ? Or any other suggestions - it is an employment agency site. Overall what would be the best practice going forward - we are looking at probably 20 jobs / pages per month.
Intermediate & Advanced SEO | | jlane90