Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Google tries to index non existing language URLs. Why?
-
Hi,
I am working for a SAAS client. He uses two different language versions by using two different subdomains.
de.domain.com/company for german and en.domain.com for english. Many thousands URLs has been indexed correctly.But Google Search Console tries to index URLs which were never existing before and are still not existing.
de.domain.com**/en/company
en.domain.com/de/**company... and an thousand more using the /en/ or /de/ in between. We never use this variant and calling these URLs will throw up a 404 Page correctly (but with wrong respond code - we`re fixing that
). But Google tries to index these kind of URLs again and again. And, I couldnt find any source of these URLs. No Website is using this as an out going link, etc.
We do see in our logfiles, that a Screaming Frog Installation and moz.com w opensiteexplorer were trying to access this earlier.My Question: How does Google comes up with that? From where did they get these URLs, that (to our knowledge) never existed?
Any ideas? Thanks

-
Hi Hecksler,
Did you ever resolve this?
Quick idea from me is to double check ALL version of your website within Google Search Console. You can now register the entire domain property using DNS: https://searchengineland.com/how-to-set-up-google-search-console-domain-verification-for-site-wide-reporting-data-313256
I found that Google was trying to crawl a very old HTTP sitemap from about five years ago for one of my sites, and thus I was able to delete it.
There's some mixed comments/feeling within the Search Community about whether or not GoogleBot really "guesses" URLs, so it's probably more than likely they are getting the links from somewhere....https://stackoverflow.com/questions/20855082/googlebot-guesses-urls-how-to-avoid-handle-this-crawling
Look forward to hearing from you,
Nick
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My brand name has 2 words but Google only indexing as 1 word. Is there a fix?
Hi all...I'm at a loss. I've never had this happen. Google only shows pages of my site when I search the brand name as one word. When I Google the site as one word BrandBrand- it only shows my blog page and about us page plus Twitter and Facebook on page 1. The homepage does not show up at all. When I Google the site as two words Brand Brand - My Facebook page is on page 1 but nothing else. The homepage isn't showing up at all. When I search both words on Bing and Yahoo both are indexing it as two words and shows on page 1. Any ideas?
Technical SEO | | TexasBlogger0 -
Why images are not getting indexed and showing in Google webmaster
Hi, I would like to ask why our website images not indexing in Google. I have shared the following screenshot of the search console. https://www.screencast.com/t/yKoCBT6Q8Upw Last week (Friday 14 Sept 2018) it was showing 23.5K out 31K were submitted and indexed by Google. But now, it is showing only 1K 😞 Can you please let me know why might this happen, why images are not getting indexed and showing in Google webmaster.
Technical SEO | | 21centuryweb0 -
How To Cleanup the Google Index After a Website Has Been HACKED
We have a client whose website was hacked, and some troll created thousands of viagra pages, which were all indexed by Google. See the screenshot for an example. The site has been cleaned up completely, but I wanted to know if anyone can weigh in on how we can cleanup the Google index. Are there extra steps we should take? So far we have gone into webmaster tools and submitted a new site map. ^802D799E5372F02797BE19290D8987F3E248DCA6656F8D9BF6^pimgpsh_fullsize_distr.png
Technical SEO | | yoursearchteam0 -
Will blocking the Wayback Machine (archive.org) have any impact on Google crawl and indexing/SEO?
Will blocking the Wayback Machine (archive.org) by adding the code they give have any impact on Google crawl and indexing/SEO? Anyone know? Thanks! ~Brett
Technical SEO | | BBuck0 -
How Does Google's "index" find the location of pages in the "page directory" to return?
This is my understanding of how Google's search works, and I am unsure about one thing in specific: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" knows the location of relevant pages in the "page directory". The keyword entries in the "index" point to the "page directory" somehow. I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website (and would the keywords in the "index" point to these urls)? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I want to discuss this is to know the effects of changing a pages url by understanding how the search process works better.
Technical SEO | | reidsteven750 -
Having www. and non www. links indexed
Hey guys, As the title states, the two versions of the website are indexed in Google. How should I proceed? Please also note that the links on the website are without the www. How should I proceed knowing that the client prefers to have the www. version indexed. Here are the steps that I have in mind right now: I set the preferred domain on GWMT as the one with www. I 301 redirect any non www. URL to the www. version. What are your thoughts? Should I 301 redirect the URL's? or is setting the preference on GWMT enough? Thanks.
Technical SEO | | BruLee0 -
Google News URL Format
Hi, We are currently redesigning our gaming website (www.totallygn.com) and one of our main goals is to get listed by Google News in future. Looking at the Google News URL requirements "The URL for each article must contain a unique number consisting of at least three digits." How does the above affect SEO structure? I was planning on using a format such as www.totallygn.com/xbox-360/360-reviews/fifa-12-review how would this compare to something like? www.totallygn.com/xbox-360/360-reviews/fifa-12-review234 Thanks in advance for your help
Technical SEO | | WalesDragon0 -
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
Hi, I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much. Sincerely SEOmoz Pro Member
Technical SEO | | fugu0