Google Indexing Pages with Made Up URL
-
Hi all,
Google is indexing a URL on my site that doesn't exist, and never existed in the past. The URL is completely made up. Anyone know why this is happening and more importantly how to get rid of it.
Thanks
-
Hi Brian
Dan (Moz Associate) here. Bernadette and Excal pretty much nailed it. Just wanted to add that OSE, Search Console and other links tools may not always display every single link that exists out there on the web (especially OSE - OSE is the most 'filtered' index, showing mostly quality/relevant links and filtering out the most spam etc).
Regardless, the best course of action is indeed to be sure your broken pages return a proper 404 status code, and Google will handle the rest
-
Agree with Bernadette that this is most likely a hacker / spammer taking advantage of a configuration issue with your website. If you're using a CMS (Wordpress/Joomla/Drupal etc.) make sure that it has been properly configured (or have your website developer do it).
I had a similar instance with a website I inherited a few years back where there was a configuration issue on the CMS that enabled individuals to set themselves up as users and a blogging extension, which had an out of the box configuration issue enabling anyone to create blog posts. Whilst the blogging tool was set to require admin approval to make the article live and visible on the site, once the article was created, it was still somehow able to be indexed by Google which created one hell of a mess.
Fixing the issue in the CMS/Blogging extension was quite simple but the cleanup took a long while and over a period of months I had to disavow a continuing stream of junk links and spent a lot of time writing to other webmasters advising them of the issue with their site so they could remove. Nearly 3 years down the line I still get a few of these pop up from time to time, as there are obviously other sites that have not plugged the gap and updated their blogging tool and as such contain this massive list of dodgy links from link spammers.
If you are using a CMS I would recommend that you, or your webmaster, check the list of authorised users and, if there are any that you do not recognise or you did not create then block them; and immediately take a look at your CMS security settings to ensure that all new users require Admins to approve/activate them before they can do anything.
Unfortunately with this stuff, once the exploits are discovered it is quickly disseminated across the internet and every link spammer (and his dog) tend to jump on-board, so the quicker you can plug the leak and commence remediation the better. Good luck
-
Brian, that's definitely an issue. If it's not delivering a 404 error when you go to a non-existent page on your site, that's the problem. I could theoretically go to yourdomain.com/aslksjdltkjlkjalskdj.html, make a link to it, and Google would index the page.
Check with your web developer to see how you can make sure that 404 error pages (page not found) delivers a 404 error in the server header.
There are lots of ways that Google will discover new URLs (even someone browsing with Google Chrome might allow Google to discover a new URL and then crawl it). So, you'll want to make sure that you have this fixed on your site.
-
Hi Bernadette,
Thanks for your response. I checked OSE and Search Console and can't find any links pointing to the URL. I did the server header check and it's delivering a 200 OK response.
-
Brian, when this happens, there is typically one reason: somewhere there is a link with that URL in it. What we've seen before is that oftentimes those links are created by hackers or spammers that then try to create content on your site with that URL. For example, when a site is hacked, they will create a page on your site and then link to it.
Without the URL (or the page name without your domain name), it's tough for me to see what might be causing this. But, there has to be a link somewhere to it in order for Google to want to index it.
What I would do is use a server header check tool (such as http://www.rexswain.com/httpview.html) to see if the page has a "200 OK" server response or a 404 error. Google typically doesn't index pages that deliver 404 errors. It could be that the server is set up to deliver a "page not found" on your site but it comes up with a "200 OK" in the server header, so Google indexes the page.
Check your site to see if there is a link to the page. If the link exists, then fix it. Then, look at Majestic.com or Open Site Explorer to see if they show any links from other sites to the page. If those links exist, see if you can get rid of those links.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Over 40+ pages have been removed from the indexed and this page has been selected as the google preferred canonical.
Over 40+ pages have been removed from the indexed and this page has been selected as the google preferred canonical. https://studyplaces.com/about-us/ The pages affected by this include: https://studyplaces.com/50-best-college-party-songs-of-all-time-and-why-we-love-them/ https://studyplaces.com/15-best-minors-for-business-majors/ As you can see the content on these pages is totally unrelated to the content on the about-us page. Any ideas why this is happening and how to resolve.
Technical SEO | | pnoddy0 -
Over 500 thin URLs indexed from dynamically created pages (for lightboxes)
I have a client who has a resources section. This section is primarily devoted to definitions of terms in the industry. These definitions appear in colored boxes that, when you click on them, turn into a lightbox with their own unique URL. Example URL: /resources/?resource=dlna The information for these lightboxes is pulled from a standard page: /resources/dlna. Both are indexed, resulting in over 500 indexed pages that are either a simple lightbox or a full page with very minimal content. My question is this: Should they be de-indexed? Another option I'm knocking around is working with the client to create Skyscraper pages, but this is obviously a massive undertaking given how many they have. Would appreciate your thoughts. Thanks.
Technical SEO | | Alces0 -
A few pages deindexed from Google .. PLEASE HELP!
My client has a fairly new site and we were agressively building content to the website. It is an ecommerce store and we have got a blog as well. We guest blogged in a few places and wrote 3-5 articles a day. Last few days, i noticed 3-4 pages that we were building links to got deindexed. What could be the reason? We weren't using any bots to build links, only a couple of it around 5-10 links to a page. Google WMT is not showing any messages and no manual action is seen. What could be the reason? I've submitted those URL for reindex and so far nothing seems to work. Any idea? Please help.
Technical SEO | | WayneRooney0 -
Why is Google not indexing my site?
I'm a bit confused as to why my site just isn't indexing on Google. Even if I type in my brand name, my social channels rank and there's no evidence of my website. I've followed all of the advice I've read and gone into webmaster tools and got the Wordpress yoast plug-in but nothing seems to be making a difference!One thing I've noticed, in Google Webmaster Tools it says "Couldn’t communicate with the DNS server." in site errors. I've called GoDaddy and they said that everything is fine. A bit frustrating. Trying to work out what my next steps should be but feeling a bit lost to be honest! Any help GREATLY appreciated!
Technical SEO | | j1066s0 -
Using the Google Remove URL Tool to remove https pages
I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week. I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front. For example, I add to the removal tool:- https://www.mydomain.com/blah.html?search_garbage_url_addition On the confirmation page, the URL actually shows as:- http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look? AND PART 2 OF MY QUESTION If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request? www.domain.com/url.html?xsearch_... A description for this result is not available because of this site's robots.txt – learn more.
Technical SEO | | sparrowdog1 -
Google Indexed Only 1 Page
Hi, I'm new and hope this forum can help me. I have recently resubmit my sitemap and Google only Indexed 1 Page. I can still see many of my old indexed pages in the SERP's? I have upgraded my template and graded all my pages to A's on SEOmoz, I have solid backlinks and have been building them over time. I have redirected all my 404 errors in .htaccess and removed /index.php from my url's. I have never done this before but my website runs perfect and all my pages redirect as I hoped. My site: www.FunerallCoverFinder.co.za How do I figure out what the problem is? Thanks in Advance!
Technical SEO | | Klement690 -
Google Cache is not showing in my page
Hello Everyone, I have issue in my Page, My category page (http://www.bannerbuzz.com/custom-vinyl-banners.html) is regular cached in past, but before sometime it can't show the cached result in SERP and not show in cached result , I have also fetch this link in google web master, but can't get the result, it is showing following message. 404. That’s an error. The requested URL /search?q=cache%3A http%3A//www.bannerbuzz.com/custom-vinyl-banners.html was not found on this server. That’s all we know. My category page rank is 2 and its keyword is on first in google.com, so i am little bit worried about this page cache issue, Can someone please tell me why is this happening? Is this a temporary issue? Help me to solve out this cache issue and once again my page will regularly cache in future. Thanks
Technical SEO | | CommercePundit0 -
Speed up the process of removing URLs from Google Index
Hi guys, We have done some work to try to remove pages from Google index. We have done the following: 1. Noindex tag 2. Make pages returning a 404 response. Is there anyway to notify Google about these changes so we can speed up the process of removing these pages from Google index? Also regarding the URL removal tool, Google says that it's used to remove URLs from search results, does it mean the URLs are removed from their index too? Many thanks guys David
Technical SEO | | sssrpm0