Google Indexing Pages with Made Up URL
-
Hi all,
Google is indexing a URL on my site that doesn't exist, and never existed in the past. The URL is completely made up. Anyone know why this is happening and more importantly how to get rid of it.
Thanks
-
Hi Brian
Dan (Moz Associate) here. Bernadette and Excal pretty much nailed it. Just wanted to add that OSE, Search Console and other links tools may not always display every single link that exists out there on the web (especially OSE - OSE is the most 'filtered' index, showing mostly quality/relevant links and filtering out the most spam etc).
Regardless, the best course of action is indeed to be sure your broken pages return a proper 404 status code, and Google will handle the rest
-
Agree with Bernadette that this is most likely a hacker / spammer taking advantage of a configuration issue with your website. If you're using a CMS (Wordpress/Joomla/Drupal etc.) make sure that it has been properly configured (or have your website developer do it).
I had a similar instance with a website I inherited a few years back where there was a configuration issue on the CMS that enabled individuals to set themselves up as users and a blogging extension, which had an out of the box configuration issue enabling anyone to create blog posts. Whilst the blogging tool was set to require admin approval to make the article live and visible on the site, once the article was created, it was still somehow able to be indexed by Google which created one hell of a mess.
Fixing the issue in the CMS/Blogging extension was quite simple but the cleanup took a long while and over a period of months I had to disavow a continuing stream of junk links and spent a lot of time writing to other webmasters advising them of the issue with their site so they could remove. Nearly 3 years down the line I still get a few of these pop up from time to time, as there are obviously other sites that have not plugged the gap and updated their blogging tool and as such contain this massive list of dodgy links from link spammers.
If you are using a CMS I would recommend that you, or your webmaster, check the list of authorised users and, if there are any that you do not recognise or you did not create then block them; and immediately take a look at your CMS security settings to ensure that all new users require Admins to approve/activate them before they can do anything.
Unfortunately with this stuff, once the exploits are discovered it is quickly disseminated across the internet and every link spammer (and his dog) tend to jump on-board, so the quicker you can plug the leak and commence remediation the better. Good luck
-
Brian, that's definitely an issue. If it's not delivering a 404 error when you go to a non-existent page on your site, that's the problem. I could theoretically go to yourdomain.com/aslksjdltkjlkjalskdj.html, make a link to it, and Google would index the page.
Check with your web developer to see how you can make sure that 404 error pages (page not found) delivers a 404 error in the server header.
There are lots of ways that Google will discover new URLs (even someone browsing with Google Chrome might allow Google to discover a new URL and then crawl it). So, you'll want to make sure that you have this fixed on your site.
-
Hi Bernadette,
Thanks for your response. I checked OSE and Search Console and can't find any links pointing to the URL. I did the server header check and it's delivering a 200 OK response.
-
Brian, when this happens, there is typically one reason: somewhere there is a link with that URL in it. What we've seen before is that oftentimes those links are created by hackers or spammers that then try to create content on your site with that URL. For example, when a site is hacked, they will create a page on your site and then link to it.
Without the URL (or the page name without your domain name), it's tough for me to see what might be causing this. But, there has to be a link somewhere to it in order for Google to want to index it.
What I would do is use a server header check tool (such as http://www.rexswain.com/httpview.html) to see if the page has a "200 OK" server response or a 404 error. Google typically doesn't index pages that deliver 404 errors. It could be that the server is set up to deliver a "page not found" on your site but it comes up with a "200 OK" in the server header, so Google indexes the page.
Check your site to see if there is a link to the page. If the link exists, then fix it. Then, look at Majestic.com or Open Site Explorer to see if they show any links from other sites to the page. If those links exist, see if you can get rid of those links.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Change, Old URLs Still In Index
Recently changed URLs on a website to remove dynamic parameters. We 301'd the old dynamic links (canonical version) to the cleaner parameter-free URLs. We then updated the canonical tags to reflect these changes. All pages dropped at least a few ranking positions and now Moz shows both the new page ranking slightly lower in results pages and the old page still in the index. I feel like I'm splitting value between the two page versions until the old one disappears... is there a way to consolidate this quickly?
Technical SEO | | ShawnW0 -
Google Displaying wrong URL but correct page title and description in SERPS
Hi. Our second highest performing page on Google is messed up in the SERPS. This is our login page. It always ranks high. It still does, but the URL is incorrect. Google is referencing an old redirect that was for a one off campaign from January 2014. This page has long been redirected. But now the vanity url for this page is what is displayed in Google. The link goes to our login page but once you log in it redirects you to a page saying the offer has expired instead of your account details. This is a huge issue for us. Can anyone shed some light? I'm having a rel canonical added since this page is used for a lot of vanity deeplinks.
Technical SEO | | PollyKane0 -
How to fix google index filled with redundant parameters
Hi All This follows on from a previous question (http://moz.com/community/q/how-to-fix-google-index-after-fixing-site-infected-with-malware) that on further investigation has become a much broader problem. I think this is an issue that may plague many sites following upgrades from CMS systems. First a little history. A new customer wanted to improve their site ranking and SEO. We discovered the site was running an old version of Joomla and had been hacked. URL's such as http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate redirected users to other sites and the site was ranking for buy adobe or buy microsoft. There was no notification in webmaster tools that the site had been hacked. So an upgrade to a later version of Joomla was required and we implemented SEF URLs at the same time. This fixed the hacking problem, we now had SEF url's, fixed a lot of duplicate content and added new titles and descriptions. Problem is that after a couple of months things aren't really improving. The site is still ranking for adobe and microsoft and a lot of other rubbish and the urls like http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate are still sending visitors but to the home page as are a lot of the old redundant urls with parameters in them. I think it is default behavior for a lot of CMS systems to ignore parameters it doesn't recognise so http://domain.com/index.php?vc=427&Buy_Pinnacle_Studio_14_Ultimate displays the home page and gives a 200 response code. My theory is that Google isn't removing these pages from the index because it's getting a 200 response code from old url's and possibly penalizing the site for duplicate content (which don't showing up in moz because there aren't any links on the site to these url's) The index in webmaster tools is showing over 1000 url's indexed when there are only around 300 actual url's. It also shows thousands of url's for each parameter type most of which aren't used. So my question is how to fix this, I don't think 404's or similar are the answer because there are so many and trying to find each combination of parameter would be impossible. Webmaster tools advises not to make changes to parameters but even so I don't think resetting or editing them individually is going to remove them and only change how google indexes them (if anyone knows different please let me know) Appreciate any assistance and also any comments or discussion on this matter. Regards, Ian
Technical SEO | | iragless0 -
Why did Google stop indexing my site?
Google used to crawl my site every few minutes. Suddenly it stopped and the last week it indexed 3 pages out of thousands. https://www.google.co.il/#q=site:www.yetzira.com&source=lnt&tbs=qdr:w&sa=X&ei=I9aTUfTTCaKN0wX5moCgAw&ved=0CBgQpwUoAw&bav=on.2,or.r_cp.r_qf.&fp=cfac44f10e55f418&biw=1829&bih=938 What could cause this to happen and how can I solve this problem? Thanks!
Technical SEO | | JillB20130 -
If my home page never shows up in SERPS but other pages do, does that mean Google is penalizing me?
So my website I do local SEO for, xyz.com is finally getting better on some keywords (Thanks SEOMOZ) But only pages that are like this xyz.com/better_widgets_ or xyz.com/mousetrap_removals Is Google penalizing me possibly for some duplicate content websites I have out there (working on, I know I know it is bad)...
Technical SEO | | greenhornet770 -
Pages not Indexed after a successful Google Fetch
I am trying to understand why google isn't indexing key content on my site. www.BeyondTransition.com is indexed and new pages show up in a couple of hours. My key content is 6 pages of information for each of 3000 events (driven by mySQL on a wordpress platform). These pages are reached via a search page, but no direct navigation from the home page. When I link to an event page from an indexed page it doesn't show up in search results. When I use fetch on webmaster tools the fetch is successful but is then not indexed - or if it does appear in results it's directed to the internal search page e.g. http://www.beyondtransition.com/site/races/course/race110003/ has been fetched and submitted with links but when I search for BeyondTransition Ironman Cozumel I get these results.... So what have I done wrong and how do I go about fixing it? All thoughts and advice appreciated Thanks Denis
Technical SEO | | beyondtransition0 -
New Domain Page 7 Google but Page 1 Bing & Yahoo
Hi just wondered what other people's experience is with a new domain. Basically have a client with a domain registered end of May this year, so less than 3 months old! The site ranks for his keyword choice (not very competitive), which is in the domain name. For me I'm not at all surprised with Google's low ranking after such a short period but quite surprsied to see it ranking page 1 on Bing and Yahoo. No seo work has been done yet and there are no inbound links. Anyone else have experience of this? Should I be surprised or is that normal in the other two search engines? Thanks in advance Trevor
Technical SEO | | TrevorJones0 -
Getting Google to index new pages
I have a site, called SiteB that has 200 pages of new, unique content. I made a table of contents (TOC) page on SiteB that points to about 50 pages of SiteB content. I would like to get SiteB's TOC page crawled and indexed by Google, as well as all the pages it points to. I submitted the TOC to Pingler 24 hours ago and from the logs I see the Googlebot visited the TOC page but it did not crawl any of the 50 pages that are linked to from the TOC. I do not have a robots.txt file on SiteB. There are no robot meta tags (nofollow, noindex). There are no 'rel=nofollow' attributes on the links. Why would Google crawl the TOC (when I Pinglered it) but not crawl any of the links on that page? One other fact, and I don't know if this matters, but SiteB lives on a subdomain and the URLs contain numbers, like this: http://subdomain.domain.com/category/34404 Yes, I know that the number part is suboptimal from an SEO point of view. I'm working on that, too. But first wanted to figure out why Google isn't crawling the TOC. The site is new and so hasn't been penalized by Google. Thanks for any ideas...
Technical SEO | | scanlin0