Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ecommerce Category Pages
First, let's define the terminology for the various types of ecommerce pages. The terminology differs from organization to organization: Product Description Pages (PDPs): These pages have a single product, pricing, an "add to cart" button, reviews, and a product description. Product Listing Pages (PLPs): These are product category/subcategory pages that have product image links and text links to Product Description Pages (PDPs). Category Pages: These pages have subcategory image and text links to subcategory pages. No product images are displayed Hybrid Category Pages: these pages combine sub-Category Images and text at the top of the page and product listings below. Our CMS currently does not allow us to create hybrids. This conversation revolves primarily around mobile. Our ecommerce team is having discussions around the appropriate use of PLPs vs Category pages. After doing a quick audit of the mobile sites of some top ecommerce players, there is definitely a trend to use Category Pages at the top of the category and sub-category hierarchy and use PLPs at the very bottom. The logic from a usability perspective is to allow visitors to navigate a site without ever using the hamburger navigation. ex: Baby (Category Page) => Car Seats (Category Page) => Convertible Car Seats (PLP) The sites I audited all had hamburger menus. A visitor would navigate from a home page image for "Baby," an image on the "Baby" page to "Car Seats", and an image on the "Car Seats" page to the Convertible Car Seats page. At that point, they would be able to shop for "Convertible Car Seats" on a PLP. This appears to be excellent UX and easy to use navigation. Theoretically, good for SEO as well. In short, category and subcategory pages are being used as navigation to allow visitors to easily navigate to the bottom of the hierarchy and shop on the most narrow page in the hierarchy. Much easier to use than a hamburger menu, but it does entail more clicks. The discussion revolves around allowing users to shop for product at a higher level in the taxonomy. For example, what if a visitor wants to shop all Car Seats? In the above taxonomy, we are precluding users from shopping in this manner. There is no "Car Seats" PLP. Our CMS has the ability to create both a Category Page and a PLP for "Car Seats". We could theoretically place an image on the "Car Seats" category page for "View All Car Seats", and allow users to click to a "Car Seats" PLP. None of the major ecommerce players I've audited are adding a PLP option higher up in the hierarchy. That doesn't mean that it's not good UX. Problems: From an SEO perspective, having a Category Page and a PLP for "Car Seats" would cause cannibalization - they would be competing for the same keywords. I am skeptical that canonicals would work. The pages are not near duplicate content. One page has category images, the other has product images. We could place content blocks on the page to make them more similar. We could noindex the PLP, but that's a waste of internal link juice. Need advice: Will canonicals work in this situation? Should we trash this idea entirely? Does adding a PLP add value or confusion? Is noindex a good idea? Is there an option to target keyword variations with the PLP? Is there another solution?
Web Design | | Satans_Apprentice0 -
2 Menu links to same page. Is this a problem?
One of my clients wants to link to the same page from several places in the navigation menu. Does this create any crawl issues or indexing problems? It's the same page (same url) so there is no duplicate content problems. Since the page is promotional, the client wants the page accessible from different places in the nav bar. Thanks, Dino
Web Design | | Dino640 -
Responsive design to serve different page for IE8 - SEO Implications?
A client is planning on developing a responsive designed website which redirects visitors using IE8 to a static webpage that encourages users to visit in another browser. What are the SEO implications of a server redirect just for IE8 visitors? Possible solutions: would containing a link on the static page to "continue browsing" and give the visitor access to the entire site in IE8 work well? Or should a CSS overlay message appear to IE8 visitors, no redirect, that encourages them to visit in another browser? Or serving a separate stylesheet for IE8 visitors, and not giving a responsive experience be optimal? Any suggestions or thoughts are appreciated. Cheers, Alex
Web Design | | Alex.Weintraub0 -
Too Many Outbound Links on the Home Page - Bad for SEO?
Hello Again Moz community, This is my last Q of the day: I have a LOT of outbound links on the home page of www.web3.ca Some are to clients projects, most are to other pages on the website. Can reducing this to the core pages have a positive impact on SEO? Thanks, Anton
Web Design | | Web3Marketing870 -
Are links from main page to inner pages will affect on ranking?
About 3 weeks ago I converted index.html to index.php. Both are 301 redirect to main url. Also I have about 70 links on main page pointing to internal pages. The Website is about 11 years old,and was on active link building . Is this conversion from html to php and also 70 links pointing to inner pages will affect on ranking?Since all links are passing juice to inner pages.
Web Design | | LosAngelesLimo0 -
Should /dev folder be blocked?
I have been experiencing a ranking drop every two months, so I came upon a new theory this morning... Does Google do a deep crawl of your site say every 60-90 days and would they penalize a site if they crawled into your /dev area which would contain pretty the exact same urls and content as your production environment and therefore penalize you for duplicate content? The only issue I see with this theory is that I have been penalized only for specific keywords on specific pages, not necessarily across the board. Thoughts? What would be the best way to block out your /dev area?
Web Design | | BoulderJoe0 -
Products page design | e-commerce
Hi All, We are redesiging our ecommerce site and product page is bothering us. We want to tidy up. We want to hide some of the description with jquery script. Visitor will be able to view first 30 words and there will be a "read more" text link just after that content. If they want to read more (which we think most will do) they will have to click the text link and rest of the content will slide open in the same page. The whole content is visible from source code. Would this be ok as far as Google and SEO concerned? I hope I explained it well 🙂
Web Design | | Jvalops0 -
Is it more beneficial to have internal links without the full file path?
When linking internally in my site is it more beneficial to have links written like, Fast Blenders OR Fast Blenders
Web Design | | tickettoss0