Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
JSON-LD product page markup for multiple currencies?
I haven't found a working example of a single product page with one "Offer" in multiple "priceCurrency" and "price" We have product pages with a single product URL which will offer different prices in different currencies based on the user's IP. Some of the language of the page will be translated based on the IP (this will have href lang tag) but the URL will not change. (We're aware TLD is considered best practice, however, this is not an option at this time.) Is the best option to update the markup based on what the corresponding "country"? I'm uncertain how this may be handled by crawlers. Eg, For the product page https://www.example.com/product1 displaying USD "offers": {
Web Design | | sb1030
"@type": "Offer",
"url": "https://www.example.com/product1",
"itemCondition": "https://schema.org/NewCondition",
"availability": "InStock",
"priceCurrency": "USD",
"price": "7.99"} For the product pagehttps://www.example.com/product1 displaying EUR "offers": {
"@type": "Offer",
"url": "https://www.example.com/product1",
"itemCondition": "https://schema.org/NewCondition",
"availability": "InStock",
"priceCurrency": "EUR",
"price": "7.50"} Thanks for any input.0 -
Are slides how's etc the new Splash Pages?
[How did Moz know that my question was about this?!] I've just completed an audit of nearly 50 websites in the tourism industry and 90% had a slideshow, large image or video taking up more than the initial screen on the fairly large screened Chromebook that I'm using. I'm advising them all to ditch this and am often getting resistance from the site owners and their web developers. I know that these can be better optimized for page load speed, which is poor for most of these sites, especially on mobile devices; but from a usability standpoint, are these affective at drawing in users? Do users take the time to view these? Are they annoyed at always having to scroll down to see if there is anything else useful on the homepage? I think they are like the splash pages of the past: poor for usability and SEO. I've advised to at least make sure that the images are sized so the top of the page fits any screen (some of them do resize well for mobile devices, but maybe not laptops/desktops), include text with calls to action and click through to relevant content. I've been noting that they aren't media businesses selling images or videos, so they need to get their offerings to the top of the page so that users can see and engage more quickly. Anyone have any stats or experience on this? Thanks, Ann
Web Design | | anndonnelly0 -
Disallow: /sr/ and Disallow: /si/ - robots.txt
Hello Mozzers - I have come across the two directives above in a robots.txt file of a website - the web dev isn't sure what they meant although he implemented robots.txt - I think just legacy stuff that nobody has analysed for years - I vaguely recall sr means search request but can't remember. If any of you know what these directives do, then please let me know.
Web Design | | McTaggart0 -
Referring subdirectory pages from 3rd hierarchy level pages. Will this hurts?
Hi all, We have product feature pages at 3rd tier like website.com/product/features. We have the help guides for each of these features on a different subdirectory like website.com/help/guides. We are linking these help guides from every page of features. So, will it hurts us anywhere just because we are encouraging 4th tier pages in website, moreover they are from different sub-directory. Thanks
Web Design | | vtmoz0 -
Why is Google displaying meta descriptions for pages that are nowhere contained in said page metas?
Certain search keywords are pulling up incorrect page titles and meta descriptions for our site. I've looked through our code, and the text used by Google in the search results is nowhere found inside our site. I've also looked at previous iterations of our site from over a decade ago and still haven't found it. I then searched specifically for the exact phrased incorrect meta descriptions and found a long list of spammy sites linking to our domain with the exact, incorrect meta description. Is this why Google is displaying the incorrect data, and how do I get Google to use the meta descriptions from my actual site?
Web Design | | Closetstogo0 -
Best practices for ecommerce product categories?
I'm trying to optimise my ecommerce site's category/navigation structure so that it is: Intuitive for human users Keyword optimised, and Minimises duplicate content penalties Here is my dilemma. Let's say my site sells widgets. Some people search for widgets according to size (big widgets, medium widgets) while others search according to colour (green widgets, blue widgets). My keyword research suggests that I should target some keywords that relate to size, others that relate to colour, yet others relating to material, etc. I figured that I'd use one of these taxonomies as a category system, then set the others as filter elements. So my site's main navigation would say "Big Widgets | Medium Widgets | Small Widgets". If you click on any of them, or if you click on the "Widgets" supercategory, you'd reach a filter function allowing you to see only green widgets, or only plastic widgets, etc. So far so good - from a user perspective. The problem with this method is that Google isn't going to index my filter results. So someone Googling "green widgets" or "plastic widgets" is unlikely to find my site, even though I have plenty of green/plastic widgets that they could have filtered for. My next thought was to add some of these filter urls to my main navigation so they will be crawled. My filter mod generates urls for each filter (eg mysite.com/category?filter=k39;w24). So now I have a flashier navigation menu where clicking "Widgets" will pop out a panel allowing you to browse by size or by colour. I don't know whether users will find this helpful or redundant/confusing, but at least Google can see my filter urls. But I've run into two more problems. My filter results aren't really pages, so I can't set things like H1s, meta descriptions and so on. There's very little I can do to keyword optimise them. Further, I now have duplicate content, because the same widget can show up under multiple filter urls. And so I'm stuck here. I've thought about creating custom pages for each target keyword and manually listing products that pertain to each keyword. This will allow me to optimise the pages, but it's a lot of ongoing work (I have to update them whenever I get new stock), and I'm not sure my visitors will appreciate this - I suspect they would rather just browse/filter/search through my site than have to click through pages of manual curated content. I'd appreciate any thoughts or advice on figuring out my category and navigation system!
Web Design | | peekpeeka0 -
Internal links, new pages & Domain Authority
I have two questions regarding Domain Authority: 1. Is it possible that a drop in Domain Authority may have been caused by adding a blog and blog posts? In other words, would adding pages/posts dilute the site's authority? And will it catch back up with itself or will that require inbound links to those new pages? (oops! that was 3 questions in one) 2. Would it be detrimental to have internal links coming from blog posts without authority to my Home page and could that have contributed to a drop in Domain Authority? Thanks!
Web Design | | gfiedel0 -
Sudden dramatic drops in SERPs along with no snippet and no cached page?
We are a very stable, time tested domain (over 15 yrs old) with thousands of stable, time tested inbound links. We are a large catalog/e commerce business and our web team has over a decade's experience with coding, seo etc. We do not engage in link exchanges, buying links etc and adhere strictly to best white hat seo practices. Our SERPs have generally been very stable for years and years. We continually update content, leverage user generated content etc, and stay abreast of important algorithm and policy changes on Google's end. On Wednesday Jan 18th, we noticed dramatic, disturbing changes to our SERPs. Our formerly very stable positions for thousands of core keywords dropped. In addition, there is no snippet in the SERPs and no cached page for these results. Webmaster tools shows our sitemap most recently successfully downloaded by Google on Jan 14th. Over the weekend and monday the 16th, our cloud hosted site experienced some downtime here and there. I suspect that the sudden issues we are seeing are being caused by one of three possibilities: 1. Google came to crawl when the site was unavailable.
Web Design | | jamestown
However, there are no messages in the account or crawl issues otherwise noted to indicate this. 2. There is a malicious link spam or other attack on our site. 3. The last week of December 2011, we went live with Schema.org rich tagging on product level pages. The testing tool validates all but the breadcrumb, which it says is not supported by Schema. Could Google be hating our Schema.org microtagging and penalizing us? I sort of doubt bc category/subcategory pages that have no such tags are among those suffering. Whats odd is that ever since we went live with Schema.org, Google has started preferring very thin content pages like video pages and articles over our product pages. This never happened in the past. the site is: www.jamestowndistributors.com Any help or ideas are greatly, greatly appreciated. Thank You DMG0