Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does adding new pages, new slugs, new URLS in a site affects rankings and visibility?
hi reader, i have decided to add new pages to my site. if i add new urls, i feel like i have to submit the sitemap again. my question is, does submitting sitemap again with new slugs or urls affects visibility is serps, if yes, how do i minimize the impact?
Web Design | | SIMON-CULL0 -
Index Page Redirect to Home Page? Best Practices...
Hi, I am wondering what the best practice is when a site has an index page and a home page? I have two pages, listed below, and want to know if I should 301 redirect my "index" page to my standard home page. The home page is where I would like all traffic to fall on for our website. Additionally, I used the rel=canonical tag years ago on the index page to indicate that the home page is the main content. Home Page - https://www.1099pro.com/ (PA 45) Home Page Canonical: rel="canonical" href="https://www.1099pro.com/"/> Index Page - https://www.1099pro.com/index.asp (PA - 33) Index Page Canonical: rel="canonical" href="https://www.1099pro.com/"/> It seems to me that there is some extra juice that could be passed to my home page (which is the page that ranks highly for our major keywords) by 301 redirecting the index page. Is there any reason why I should not do that? Really appreciate any help - especially with extra explanations - for the simple minded like me ;)! -Michael
Web Design | | Stew2220 -
Are slides how's etc the new Splash Pages?
I just did SEO audits of approx 50 websites in the tourism sector. Nearly all had poor Google Pagespeed ratings, partly down to that, among other factors. I also feel that slideshows,, large images and videos in headers are poor for usability. I say get the content people need to engage with in front of them asap Are there any stats or studies that can provide insight on this? I've been telling those with these designs to keep an eye on bounce rates and let that guide them
Web Design | | anndonnelly0 -
Content thin for new home page been told to change it? any suggestions?
Hi guys, I'm newbie.... I have been told that my home page is content thin, and if I want to rank really well in the search i need to have more relevant content on my homepage - the site is only new 2months and I can see we are now at 39th place in the search, if i make changes to the home page design and add more content will this effect this current ranking?
Web Design | | edward-may0 -
Writing A Data Extraction To Web Page Program
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page. I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this? As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
Web Design | | KempRugeLawGroup0 -
What is the code to 301 http to www in htaccess file on unix server
i want to 301 my http home page to www on a linux server and all my other redirects are set up similar to this in my htaccess file: redirect 301 /example-page.html http://www.example-page.html how do I 301 redirect: http://example.com to http://www.example.com I've tried all kinds of code recommended for an htaccess file on a linux server and nothing seems to work. Thanks for the help mozzers! Ron
Web Design | | Ron100 -
Splash Pages For App Downlowds
Hi, We currently have a very simple splash page that Android and iPhone users see when they land on our homepage. The screen gives them the option to download our app or move on to the full website. If they choose to go to the site they are redirected to our homepage. Is this going to have any negative impacts on our rankings? I'm not sure how the Google bot treats this type of page. We have also talked about replacing the splash page with a modal window, but I'm concerned that this will increase the load time of the home page on mobile devices. Does anyone have any experience with a similar situation or any advice? Thanks in advance!
Web Design | | Cash4Books0 -
Correct use for Robots.txt
I'm in the process of building a website and am experimenting with some new pages. I don't want search engines to begin crawling the site yet. I would like to add the Robot.txt on my pages that I don't want them to crawl. If I do this, can I remove it later and get them to crawl those pages?
Web Design | | EricVallee340