Not found errors (404) due to being hacked
-
Hi Moz Guru's
Our website was hacked a few months ago, since then we have taken various measures, last one being redesigning the website all together and removing it from a WordPress platform. So far all is going well, except that the 404 not found errors keeps coming up in Google Webmaster tools. The URLs are spam pages that were created by the virus. And these spam pages have been indexed by Google, and now we are struggling to get rid of them.
Is there any way we can deal with these 404 spam pages links? Is marking all of them as fixed in the webmaster tools - search console- crawl errors helpful in any way? Can this have a negative impact on the SEO ?
Looking forward to your answers.
Many thanks.
-
I have a new client and just discovered on Open Site Explorer hundreds of links to ghost pages. The anchor text is stuff like Criminal Background Checks Las Vegas or Find Missing Persons.
I am not the webmaster. What advice should I give him?
Julie
-
Green Stone,
Thank you for your reply. At the moment we are manually trying to remove the links by using "Remove outdated content" tool whilst also creating a list of spammy links that might backlink to those spam pages we are removing, that were created by the virus.
Thank you.
-
Monica,
Sounds like you guys have taken the necessary steps to clean up the website and prevent it from occurring again. 404 spam links are a pain, that can often take some time to be removed from google's index all-together.
- A way to speed up the process is by changing the 404 status of these pages, and having it return a "410" error instead. This tells google it is a permanent non-existent page, and thus it will fall out of the index more quickly than a regular 404.
- In the meantime, if the number of 404 errors aren't overwhelming, you could try the "remove urls" tool within search console for these pages, which will temporarily remove them from the index all together. (emphasis on temporary)
- Marking them as fixed wouldn't be helpful, as the errors still exist, and would return to your search console not long after. (it certainly wouldn't harm your SEO, it just wouldn't be very helpful in this specific instance).
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
Forwarded vanity domains, suddenly resolving to 404 with appended URL's ending in random 5 characters
We have several vanity domains that forward to various pages on our primary domain.
Intermediate & Advanced SEO | | SS.Digital
e.g. www.vanity.com (301)--> www.mydomain.com/sub-page (200) These forwards have been in place for months or even years and have worked fine. As of yesterday, we have seen the following problem. We have made no changes in the forwarding settings. Now, inconsistently, they sometimes resolve and sometimes they do not. When we load the vanity URL with Chrome Dev Tools (Network Pane) open, it shows the following redirect chains, where xxxxx represents a random 5 character string of lower and upper case letters. (e.g. VGuTD) EXAMPLE:
www.vanity.com (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx/xxxxx (302, Found) -->
www.mydomain.com/sub-page/xxxxx (404, Not Found) This is just one example, the amount of redirects, vary wildly. Sometimes there is only 1 redirect, sometimes there are as many as 5. Sometimes the request will ultimately resolve on the correct mydomain.com/sub-page, but usually it does not (as in the example above). We have cross-checked across every browser, device, private/non-private, cookies cleared, on and off of our network etc... This leads us to believe that it is not at the device or host level. Our Registrar is Godaddy. They have not encountered this issue before, and have no idea what this 5 character string is from. I tend to believe them because per our analytics, we have determined that this problem only started yesterday. Our primary question is, has anybody else encountered this problem either in the last couple days, or at any time in the past? We have come up with a solution that works to alleviate the problem, but to implement it across hundreds of vanity domains will take us an inordinate amount of time. Really hoping to fix the cause of the problem instead of just treating the symptom.0 -
Google robots.txt test - not picking up syntax errors?
I just ran a robots.txt file through "Google robots.txt Tester" as there was some unusual syntax in the file that didn't make any sense to me... e.g. /url/?*
Intermediate & Advanced SEO | | McTaggart
/url/?
/url/* and so on. I would use ? and not ? for example and what is ? for! - etc. Yet "Google robots.txt Tester" did not highlight the issues... I then fed the sitemap through http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php and that tool actually picked up my concerns. Can anybody explain why Google didn't - or perhaps it isn't supposed to pick up such errors? Thanks, Luke0 -
New blog post URLs due to WordPress permalink structure changes. Any SEO repercussions?
A client site had the follwing URLs for all blog posts: www.example.com/health-news/sample-post www.example.com/health-news is the top level page for the blog section. While making some theme changes during Google mobilegeddon, the permalink structure got changed to www.example.com/sample-post ("health-news" got dropped from all blog post URLs). Google has indexed the updated post structure and older URLs are getting redirected (if entered directly in the browser) to the new ones; it appears that WordPress takes care of that automatically as no 301 redirects were entered manually. It seems that there hasn't been any loss of rankings (however not 100% sure as the site ranks for well over 100 terms). Do you suggest changing the structure back to the old one? Two reasons that I see are preserving any link juice from domains linking to old URLs and ensuring no future/current loss of rankings.
Intermediate & Advanced SEO | | VishalRayMalik0 -
Redirecting 404 pages
Hello,We have a wordpress site that has some "hidden" pages with weird URL's. Due to the way the site was built (not by us) if we setup our standard practice for 404 pages some site functionality will be lost.Standard Practice for 404's www.domain.com/fafsaf shows 404 error pageProposed solution:- www.domain.com/safaf >forwards to> www.domain.com/404 - shows 404 error page. Will this stop the search engines indexing the pages? - we have also amended the robots.txt to try and stop some pages being indexed but we still keep finding the odd little "hidden" page.The aim is to try and get all the search engines just to index the few simple pages on the sitemap instead of finding all these extra pages.
Intermediate & Advanced SEO | | JohnW-UK0 -
Do 410 show in the 404 not found section in Google Webmaster Tools?
Question: Do 410 show in the 404 not found section in Google Webmaster Tools? Specific situation: We got rid of an entire subdomain except for a few pages that we 301'd to relevant content on our main domain. The rest return a 404 not found. These show up in our google webmaster tools as crawl errors. I was wondering since 410 is a content gone error and we intentionally want this content gone, if we switch it to 410, does Google still report it as a 404 error? Thanks
Intermediate & Advanced SEO | | MarloSchneider0 -
Soft 404 problem
I have a soft 404 problem in webmaster tools for http://www.musicliveuk.com/about/feed and I'm not sure why. I read on here that if it is a main content page it should be fixed but I don't know how. I've tried to 301 redirect the page to http://www.musicliveuk.com/about/ but the redirect doesn't appear to be working? how do I fix this?
Intermediate & Advanced SEO | | SamCUK0 -
404'd pages still in index
I recently launched a site and shortly after performed a URL rewrite (not the greatest idea, i know). The developer 404'd the old pages instead of a permanent 301 redirect. This caused a mess in the index. I have tried to use Google's removal tool to remove these URL's from the index. These pages were being removed but now I am finding them in the index as just URL's to the 404'd page (i.e. no title tag or meta description). Should I wait this out or now go back and 301 redirect the old URL's (that are 404'd now) to the new URL's? I am sure this is the reason for my lack of ranking as the rest of my site is pretty well optimized and I have some quality links.
Intermediate & Advanced SEO | | mj7750