URLs appear in Google Webmaster Tools that I can't find on my own site?!?
-
Hi,
I have a Magento e-commerce site (clothing) and when I had a look through some of the sections in Google Webmaster Tools I found URLs that I can't find on my site.
For example, a product url maybe http://www.example.co.uk/product-url/ which is fine. In that product there maybe three sizes of the product (Small, Medium, Large) and for some reason Googlebot is sometimes finding a url like:
http://www.example.co.uk/product-url/1202/ has been found and when clicked on is a live url (Status code: 200) with is one of the sizes (medium). However I have ran a site crawl in Screaming Frog and other crawl tests and can't seem to find where Googlebot is finding these URLs.
I think I need to:
1. Find how Googlebot is finding these urls?
2. Find out how to keep out of index (e.g. robots.txt, canonical etc....
Any help would be much appreciated and I'm happy to share the URL with members if they think they can have a look and help with this problem. I can share specific URLs which might make the issue seem clearer, let me know?
Thanks,
Darrell
-
No problem, glad it resolved the problem.
There are a number of possibilities, probably through one of the following;
- XML sitemap
- Faceted navigation
- Magento pinged Google when the page was created
-
Cheers John, sorted the issue! Appreciate your expertise.
-
Thanks John, your reply was really helpful and I've now done that for the 4000 simple product and now those URLs are returning 404 pages, which is great. Well, just going to see if I can find a mass import 301 redirect extension for Magento to 301 redirect these urls to the homepage so I can redirect them rather than leave as 404 pages.
How do you think Googlebot found those pages as there is no links to them? Maybe through a link when the simple products were loaded to the cart?
-
What is the visibility set to on the simple products for different sizes? If it's set to "Catalog" it will still be crawlable but not appear in your website's internal search results.
Setting the visibility to "Not Visible Individually" should resolve this issue.
-
I had a similar issue (not Magento), turns out it was in the sitemap that was submitted to WMTs, did you check there?
check the url in the open site explore too, it might tell you if any urls are linking to it
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Infinite Scroll and URL Changing
Hi, So my website is having an issue indexing. Much like other sports sites like ESPN or MLB or a variety of others my site changes the URL as you go down the page. So if you go on a news article and continue scrolling you'll go to another news article. I believe that this is creating errors in Search Console with the article being given an error of being "too long". I don't know how to keep this infinite scroll and url changing which increases my pageviews and eliminate the errors. Can someone help?
Web Design | | mattdinbrooklyn0 -
How do you influence the default site title?
Hi, We have noticed that on brand searches, a site's page title is replaced with the name of the site or the business, we can understand that this is due to the fact that a CTR enticing title is not as important when the customer is looking for a certain brand. What tells Google what company name to display in this instance? We're having trouble with our French site displaying the page title, we are moving the position of the title code earlier in the page, but can't see why a) Telefleurs is not displaying the title chosen and b) why it is displaying EuroFlorist when our French brand is Telefleurs. Any advice on this would be much appreciated! Thanks, Sam JgLwnGV.png
Web Design | | seoeuroflorist0 -
Https Implementation - Weird Redirection After Putting 's' in http://
Hi Mozers, I have come across some websites with their https version going to a totally different website. For example, http://www.samplesite1.com will load fine but when the protocol is changed to https (https://www.samplesite1.com) it will go to a total different domain say, https://www.samplesite2.com How does this happen, in technical sense? In the warning from browser, it says the the security certificate is from the other website but I would like to understand how this happens and how it impacts SEO. I seem to be not able to understand the relationship of this error and SEO impact. Thanks in advance for your response. Malika
Web Design | | Malika10 -
Can't Hyperlink After the WP 4.0 Update?
Anyone else who runs a WP site have problems hyperlinking after the 4.0 update? I read I could deactivate all my plugins, and go through them one-by-one, but before I go to that step, I want to know if there's either an easier way to regain this functionality or if there is a specific plugin that's known to cause the problem. Thanks, Ruben
Web Design | | KempRugeLawGroup0 -
Why aren't Images in G+ product page posts showing up in SERPs for brand searches?
Before 1-2 weeks ago, our G+ posts containing links to our product pages would show up in in SERPs (when searching for our brand name) with a thumbnail of the product image. Now, they do not (see image below for visual). Our tech team confirmed there hasn't been any coding change that might be to blame and I see that this isn't happening to other sites. Any idea what may be the problem here? tcnhLgy
Web Design | | znotes0 -
Is there something fundamentally wrong with our site architecture?
Hi everyone! Could a few of you brilliant people take a look at the architecture of this site http://www.ccisolutions.com, and let me know if you see any obvious problems? I have run the site through XENU, and all of our most important pages, including categories and products, are no deeper than level 3. Everything deeper than that is, in most cases, an image, a pdf or an orphaned page (of which we have thousands). Could having thousands upon thousands of orphaned pages be having a more hurtful effect on our rankings than our site architecture? I have made loud noises and suggested that duplicate content, site speed and dilution of page authority due to all those orphaned pages are some of the primary reasons we don't rank as well as we could. But, I think those suggestions just aren't sexy or dramatic enough, so there is much shaking of heads and discussion that it must be something fundamentally wrong with site architecture. I know re-arranging the furniture is more fun than scrubbing the floors, but I think our problems are more about fundamental cleanup than moving things around What do you think?
Web Design | | danatanseo0 -
Do I need to redirect soft 404s that I got from Google Webmaster Tools?
Hi guys, I got almost 1000+ soft 404s from GWT. All of the soft 404s produce 200 HTTP status code but the URLs are something like the following: http://www.example.com/search/house-for-rent (query used: house for rent) http://www.example.com/search/-----------rent (query used:-------rent) There are no listings that match these queries and there is an advanced search that is visible in these pages. Here are my questions: 1. Do I need to redirect each page to its appropriate landing page? 2. Do I need to add user sitemap or a list of URLs where they can search for other properties? Any suggestions would help. 🙂
Web Design | | esiow20130