Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Fake Links indexing in google
-
Hello everyone,
I have an interesting situation occurring here, and hoping maybe someone here has seen something of this nature or be able to offer some sort of advice.
So, we recently installed a wordpress to a subdomain for our business and have been blogging through it. We added the google webmaster tools meta tag and I've noticed an increase in 404 links. I brought this up to or server admin, and he verified that there were a lot of ip's pinging our server looking for these links that don't exist. We've combed through our server files and nothing seems to be compromised. Today, we noticed that when you do site:ourdomain.com into google the subdomain with wordpress shows hundreds of these fake links, that when you visit them, return a 404 page.
Just curious if anyone has seen anything like this, what it may be, how we can stop it, could it negatively impact us in anyway? Should we even worry about it? Here's the link to the google results.
https://www.google.com/search?q=site%3Amshowells.com&oq=site%3A&aqs=chrome.0.69i59j69i57j69i58.1905j0j1&sourceid=chrome&es_sm=91&ie=UTF-8 (odd links show up on pages 2-3+)
-
Thank you everyone for your responses! The link you sent of the cached pages LynnP was also helpful. As soon as my co-worker who administers the server gets in I'm going to mention to him that we check the subfolders for anything fishy. I know for a fact he looked for subfolders that were suspicious but I'm not sure he may have thought to check the existing folders for sneaky things. Most passwords have been changed... but I will double check.
Again, thanks everyone for your help, very useful!
-
My 2 cents: This does look like a wp hack - been having a nightmare with a recent Pharma hack like JV mentions and honestly I still cannot figure out how exactly they got into the site but suspect through an outdated plugin.
A couple of things to keep in mind are to check your htaccess file for weird lines and have a look for non standard wp files in various folders (things like cache.php or ms-writer.php if I recall right). These files were not showing recent change dates however so it was not as simple as just ftping in and seeing which files had been recently changed (still no idea how they pulled that off). It can also be that all these pages are being spun out of a handful of php files (or the database!) so not 100% the case that you would actually see the subfolders (although in some cases you might). Also seen dev versions of wp on the same server that have not been kept so up to date be used to get into the main production version (pretty sure they were indexed through links sent via gmail emails, thanks google!).
You can check the google cache for any of these pages to see what they looked like and when they were last cached for example: http://webcache.googleusercontent.com/search?q=cache:Y0U-2Yyk3y4J:news.mshowells.com/CI/Ugg-Hazelwood-1437.shtml+
Most of them show late August cache dates so that should help narrow the timeframe. Interesting to note that all pages have a bunch of links at the bottom, some to your site some to other (probably infected) sites. All of the links are now 404s so maybe the hack got taken down by the originator (no idea why just a thought since its a bit odd that all of the links on the external sites also seem to be 404ing now). Needless to say, change all wpadmin, ftp etc passwords to be safe!
-
Hmm...never seen this exactly before - but a few years back we discovered for a client that their reality tv series show (Deadliest Catch) member site had been severely infected by Canadian Pharma phony sites....
Seems the hacker had 'broken' in via a MS update that was not done on their hosting platform site - and it took the tv company almost 4 months to disavow, rebuild and then index and begin to rank again as I remember....i.e. this was NOT a WP issue but a hosting server hack...
But with 20+ pages of Uggs and Nude Men rolling Christians (love that one, eh!) infections, you need to get that totally fixed asap so I'd start with querying the hosting vendor logs...
How comes to mind...if you can not determine where the hack came from - you could kill the subdomain after saving all your articles - recreate it say as "info.mshowells.com" or "advice.mshowells.com" or "counsel.mshowells.com" and reload in the same artices....have had to do that too for another client....
-
Yeah, only 2 of us, server admin guy. We're talking right now and the site is on a brand new VPS that has never been compromised, no strange folder structure, brand new install of Wordpress.. you can see lots of server errors in the error log on the server but the files NEVER existed, and neither of us removed the files. I, personally, do not even have access to the VPS. Only he does, and he is well aware what he's doing and most definitely would have noticed an odd set of folders and would have remembered deleting them. Almost as soon as we made the wordpress install live is when the 404 crawl errors showed up in google, and on the server. We both have seen many instances of wordpress sites being compromised and know what to look for and how to clean it up. This is why this is baffling. Because we're not exactly sure how or in what way they would benefit from this. My server admin thinks these hackers are somehow tricking google somehow... we just both have never seen this and not sure what to expect... very bizarre!
-
That's pretty strange. There isn't another web person there who might have cleaned things up without telling you? Or maybe your server company?
I don't see how these URLs could be indexed if they never existed, so at some point, someone created those pages and they were around long enough to get indexed. Are there any weird spikes in crawl rates or search queries since the launch of the subdomain?
I've seen this kind of hack before. The hacker just drops some folders full of HTML files into the roots. That's why all those links have a two characters sub directory. That was the folder the HTML files were in before someone likely just saw those folders in the root and deleted them. Maybe they didn't realize what they were doing and thought they were just doing the house cleaning?
Doing a "site:mshowells.com/ci/" or "site:mshowells.com/sp/" can show you what I'm talking about.
-
Well, the interesting thing is the links are only showing up on the subdomain news.mshowells.com - which has only existed on the server for maybe 2 - 3 months? Also, when we first noticed them, we checked the server and wordpress and there were no files and nothing was out of order or anything fishy. Everything was and is just fine. We haven't done any cleanup of any sort. And Wordpress & plugins have been kept up to date.
That's why it's weird because at no point were there hacked files or content or anything... so it's a little confusing...
-
Looks like a hack. A hacker somehow got in at some point, dropped a bunch of Ugg Boot affiliate marketing pages and left. Not sure why they are 404ing unless someone already discovered these when they happened and cleaned them up. That could've happened months and months ago.
The 404s shouldn't effect your SEO, but the hack has potential to if it hasn't been cleaned up properly. Do you see a spike in search queries if you look back over the last year or two? That may indicate when the hack occurred and was cleaned up. It's important to know how the hack was cleaned up, so you can ensure that the vulnerabilities have been resolved. If they haven't been, your site is still open to additional attacks, and spam like that can hurt your SEO.
For Wordpress, it's important to keep not only Wordpress itself up to date, but also your plugins (and only use well established plugins, and do a little research on them to make sure people aren't screaming about hacking issues). Hackers search for vulnerabilities in all sorts of places.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I get a photo album indexed by Google?
We have a lot of photos on our website. Unfortunately most of them don't seem to be indexed by Google. We run a party website. One of the things we do, is take pictures at events and put them on the site. An event page with a photo album, can have anywhere between 100 and 750 photo's. For each foto's there is a thumbnail on the page. The thumbnails are lazy loaded by showing a placeholder and loading the picture right before it comes onscreen. There is no pagination of infinite scrolling. Thumbnails don't have an alt text. Each thumbnail links to a picture page. This page only shows the base HTML structure (menu, etc), the image and a close button. The image has a src attribute with full size image, a srcset with several sizes for responsive design and an alt text. There is no real textual content on an image page. (Note that when a user clicks on the thumbnail, the large image is loaded using JavaScript and we mimic the page change. I think it doesn't matter, but am unsure.) I'd like that full size images should be indexed by Google and found with Google image search. Thumbnails should not be indexed (or ignored). Unfortunately most pictures aren't found or their thumbnail is shown. Moz is giving telling me that all the picture pages are duplicate content (19,521 issues), as they are all the same with the exception of the image. The page title isn't the same but similar for all images of an album. Example: On the "A day at the park" event page, we have 136 pictures. A site search on "a day at the park" foto, only reveals two photo's of the albums. 3QolbbI.png QTQVxqY.jpg mwEG90S.jpg
Technical SEO | | jasny0 -
Removed Subdomain Sites Still in Google Index
Hey guys, I've got kind of a strange situation going on and I can't seem to find it addressed anywhere. I have a site that at one point had several development sites set up at subdomains. Those sites have since launched on their own domains, but the subdomain sites are still showing up in the Google index. However, if you look at the cached version of pages on these non-existent subdomains, it lists the NEW url, not the dev one in the little blurb that says "This is Google's cached version of www.correcturl.com." Clearly Google recognizes that the content resides at the new location, so how come the old pages are still in the index? Attempting to visit one of them gives a "Server Not Found" error, so they are definitely gone. This is happening to a couple of sites, one that was launched over a year ago so it doesn't appear to be a "wait and see" solution. Any suggestions would be a huge help. Thanks!!
Technical SEO | | SarahLK0 -
How To Cleanup the Google Index After a Website Has Been HACKED
We have a client whose website was hacked, and some troll created thousands of viagra pages, which were all indexed by Google. See the screenshot for an example. The site has been cleaned up completely, but I wanted to know if anyone can weigh in on how we can cleanup the Google index. Are there extra steps we should take? So far we have gone into webmaster tools and submitted a new site map. ^802D799E5372F02797BE19290D8987F3E248DCA6656F8D9BF6^pimgpsh_fullsize_distr.png
Technical SEO | | yoursearchteam0 -
Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?
We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating. Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site. So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure? We are signed up with WMT if that helps.
Technical SEO | | kirmeliux0 -
Blocked URL parameters can still be crawled and indexed by google?
Hy guys, I have two questions and one might be a dumb question but there it goes. I just want to be sure that I understand: IF I tell webmaster tools to ignore an URL Parameter, will google still index and rank my url? IS it ok if I don't append in the url structure the brand filter?, will I still rank for that brand? Thanks, PS: ok 3 questions :)...
Technical SEO | | catalinmoraru0 -
Unnecessary pages getting indexed in Google for my blog
I have a blog dapazze.com and I am suffering from a problem for a long time. I found out that Google have indexed hundreds of replytocom links and images attachment pages for my blog. I had to remove these pages manually using the URL removal tool. I had used "Disallow: ?replytocom" in my robots.txt, but Google disobeyed it. After that, I removed the parameter from my blog completely using the SEO by Yoast plugin. But now I see that Google has again started indexing these links even after they are not present in my blog (I use #comment). Google have also indexed many of my admin and plugin pages, whereas they are disallowed in my robots.txt file. Have a look at my robots.txt file here: http://dapazze.com/robots.txt Please help me out to solve this problem permanently?
Technical SEO | | rahulchowdhury0 -
Is Google caching date same as crawling/indexing date?
If a site is cached on say 9 oct 2012 doesn't that also mean that Google crawled it on same date ? And indexed it on same date?
Technical SEO | | Personnel_Concept0 -
Can Google read onClick links?
Can Google read and pass link juice in a link like this? <a <span="">href</a><a <span="">="#Link123" onClick="window.open('http://www.mycompany.com/example','Link123')">src="../../img/example.gif"/></a> Thanks!
Technical SEO | | jorgediaz0