Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does having too many wordpress portfolio pages with little content hurt a site's SEO?
I have a site that is for a service company, not image based like a photographer or artist. We utilize the Portfolio feature to create a gallery of floor coating finishes (images of all the flooring finish options available) but this solution has created /portfolio/file-name pages for each image. These pages have no other content besides the image. I've run SEMrush audits on this site which shows a high percentage of pages with low text/code ratio and duplicate content (a lot of the finishes have very similar names). This site has been extremely slow to improve any visibility online (more than 9 months) and I'm wondering if this is a factor by possibly having a negative effect on our site. We initially chose the portfolio option because it was the best-looking solution for our users but we can certainly change it to another format if that is better. Thanks!
Web Design | | WillGMG0 -
We use bigcommerce platform and want to access the bigcommerce server to change the way our product images display
Hi We use bigcommerce and want to chage the way we display multiple images for our products. At the moment in bigcommerce you switch between images by clicking the next image, we want the images to change when we hover the curser over the image. Does anyone know how to do this ? Regards Adrienne
Web Design | | CostumeD0 -
Facebook is now only allowing owners of FB pages (not admins) to create keys for a WP blog post syndication. Is there a way around this?
I hired a contractor to configure a WP plugin to syndicate FB, G+, Twitter and standard WP posts. He is using NextScripts: Social Networks Auto-Poster. He came back to me saying that FB is now only allowing direct owners (not admins) of FB pages to create keys. This means I have to give my client's personal FB access to a third party contractor. I'm not comfortable asking my client to do this. Does anybody know of a way around this? Is there a way to create a FB key with just admin access? Thanks
Web Design | | RosemaryB0 -
HELP! IE secure page display issue on new live site
For some reason IE 7, 8, & 9 do not display the following page: https://www.jwsuretybonds.com/protools.htm All they show is the Norton seal. It shows properly in all other browsers without issue (including IE 10+), but the earlier versions flash the page for a split second, then hides everything. Can someone shed some light on this? This is a new live site we just launched minutes ago and these browsers account for 12% of our overall traffic. UGH I hate you microsoft!!! Thanks all 🙂
Web Design | | TheDude0 -
I am looking to improve my on page seo, can you provide any recommendations or suggestions for how?
I am relatively new to the world of SEO and recently built a new site. I have read as many books as I can to help increase my skill set rapidly, and have attempted to implement the best of what I have learned but I know many of you have been in this arena for a while and I would be extremely appreciative of any suggestions you can offer with regard to on page. Thanks in advance. http://luxuryhomehunt.com - home page http://luxuryhomehunt.com/homes-for-sale/orlando.html - city level http://luxuryhomehunt.com/homes-for-sale/orlando/bay-hill.html - community level
Web Design | | Jdubin0 -
Link Pages/Directory
Hello, What is best practise for dealing with alot of links. I was thinking of breaking them download to alphabet pages i.e. all A on one page etc... BUT should I then make the links clickable on this list OR that they load to a sub company page which has a clickable link to there website.
Web Design | | JohnW-UK0 -
Does **tag on a product description help?**
Hi, Does using the tag on a line of text in the products description help with SEO for that keyword phrase? **See here: http://www.designerboutique-online.com/tops/passarella-death-squad/passarella-death-squad-t-shirt-white/0/ I have bolded the Passarella Death Squad T-Shirt line. Would this help in any way? Cheers Will**
Web Design | | YNWA0 -
Why is Google sending traffic to our homepage, not our optimized pages?
Hello Forum, My team and I just completely redid a yoga eCommerce site, including its SEO. The old version of the site didn't feature page-specific optimization and, as a result, Google's search results for our keywords almost always directed visitors to the homepage. For example, a Google search for the term "yoga bolster" sent users to the homepage, not the product category page for yoga bolsters. After redoing the site and optimizing specific pages (i.e. the yoga bolster page is now optimized for the keyword "yoga bolster"), the Google search results are still taking users to the homepage, not the optimized page. (i.e. if you search for yoga bolster, find our search result, and click the search result link, you're taken to the homepage, not the bolster page) It's only been about 36 hours since we've launched the new website and submitted it to Google's webmaster tools. Does anyone know why Google is still sending people to our homepage and not the keyword-optimized pages we created? Is this a timing issue?
Web Design | | pano0