Reason for robots.txt file blocking products on category pages?
-
Hi
I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google.
Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!!
Thanks
-
Thanks again AL123al!
I would be concerned about my internal linking because of this problem. I've always wanted to keep important pages within 3 clicks of the Homepage. My worry here is that while these products can get clicked by a user within 3 clicks of the Homepage, they're blocked to Googlebot.
So the product URLS are only getting crawled in the sitemap, which would be hugely ineffcient? So I think I have to decide whether opening up these pages will improve my linking structure for Google to crawl the product pages, but is that important than increasing the amount of pages it's able to crawl and wasting crawl budget?
-
Hello,
The canonical product URLS will be getting crawled just fine as they are not blocked in the robots.txt. Without understanding your problem completely, I think the guys before you were trying to stop all the duplicate URLS with parameters being crawled and just leaving Google to crawl the canonicals - which is what you want.
If you remove the parameter from robots.txt then Google will crawl everything including the parameter URLS. This will waste crawl budget. So better that Google is only crawling the canonicals.
Regarding the sitemap, being present on the sitemap will help Googlebot decide what to prioritise crawling but won't stop it finding other URLS if there is good internal linking.
-
Thanks AL123al! The base URL's (www.example.com/product-category/ladies-shoes) do seem to be getting crawled here & there, and some are ranking which is great. But I think the only place they can get crawled is the sitemap, which has has over 28,000 URLs on one page (another thing I need to fix)!
So if Googlebot gets to the parameter URL through category pages (www.example.com/product-category/ladies-shoes?cgid...) and sees it's blocked, I'm guessing it can't see it's important to us (from the website hierarchy) or the canonical tag, so I'm presuming it's seriously damaging or power in getting products ranked
In Screaming Frog, 112,000 get crawled and 68% are blocked by robots. 17,000 are URL's which contain "?cgid", which I don't think is too big for Googlebot to crawl, the websites has a pretty good authority so I think we have a pretty deep crawl.
So I suppose what really want to know is will removing "?cgid" from the robots file really damage the site? I my opinion, I think it'll really help
-
This looks like the products are being appended by a parameter ?cgid - there may be other stuff attached to the end of each URL like this below:
e.g. www.example.com/product-category/ladies-shoes?cgid-product=19&controller=product etc
but canonical URL is www.example.com/product-category/ladies-shoes
These products may have had a canonical to the base URL which means that there won't be any problem with duplicates being indexed. So all well and good.
Except.....Google has to crawl each of these parameter URLs to find the canonical. In a huge website this means that crawl budget is being consumed by unnecessary crawling of these parameterised URLs.
You can tell Google not to crawl the parameter URLs in search console (at least in the old version you can). But you can also stop Google crawling these URLS unnecessarily by blocking them in robots txt if you are sure that the parameters are not changing how the page is looking in search.
So long story short is that is why you may see that the URLS with parameters are being blocked in robots.txt. The canonical version URLS will be getting crawled just fine since they don't have any parameters and hence not being blocked.
Hope that makes sense?
-
Yes, it's in the robot.txt, that's the problem. Someone had to physically put it in there, but I've no idea why they would.
-
Did you check your robot txt file? Or check if any plugin creating this problem.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding picture and new layout on jobs-overview page
Im running a castingsite today, where the jobs-overview page is the highest ranked on google on the important words. There is a big of reasons for that, it's updated daily, the domain is old and wellknown and so. Anyways, the today is this: (Yes it's ugly and old-school :))
Web Design | | KasperGJ
Current design:
http://www.onlinecasting.dk/auditions.asp I've created a new design, which is much nicer and with added pictures. The pictures in the new design, will be somewhat unique to the specific jobs, so the current ones are mostly for testing New design: (Not implemented)
http://www.onlinecasting.dk/auditionsnd.asp Question:
So my question is. Do you think this NEW design could affect my site / page in a bad way in SEO or?
I'm planning basically just to overwrite the old auditions.asp file with the new code. What do you guys think.0 -
Using a query string for linked, static landing pages - is this good practice?
My company has a page with links for each of our dozen office locations as well as a clickable map. These offices are also linked in the footer of every page along with their phone number. When one of these links is clicked, the visitor is directed to a static page with a picture of the office, contact information, a short description, and some other information. The URL for these pages is displayed as something like http:/example.com/offices.htm?office_id=123456, with seemingly random ID numbers at the end depending on the office that remain static. I know first off that this is probably bad SEO practice, as the URL should be something like htttp://example.com/offices/springfield/ My question is, why is there a question mark in the page URL? I understand that it represents a query string, but I'm not sure why it's there to begin with. A search query should not required if they are just static landing pages, correct?. Is there any reason at all why they would be queries? Is this an issue that needs to be addressed or does it have little to no impact on SEO?
Web Design | | BD690 -
Our "home page" is behind a member wall, options?
So www.pch.com(portal) redirects to www.pch.com/unrecognized(landing page) if you are not registered with us and logged in. This means that the search engines are not logged in, so they see only our landing page. It used to be that there was no portal/home, on pch.com, that was just the landing page, but that changed about 6 months ago. We do rank for our brand terms, but my company would like to rank for terms like "sweepstakes." They DO understand why we don't, thankfully. They don't think SEO is magic voodoo. They get it. But they asked for options, as I have said that the portal on www.pch.com really is a good page to optimize for non-brand, core terms like sweepstakes....but only if the search engines can see it. I gave them these options, and they asked me to seek out more. So any thoughts would be good: 1. Best case scenario would be to abandon the landing page, just have the keyword rich portal page be the actual home page with no re-direct. (this won't happen, but I decided it needed to be first on my list). 2. Turn the portal into the home page (remove the redirect), but have the landing page overlay in a light box. This should, if I am not mistaken, be a best of both worlds situation, where the light box landing page would still have all of the value of the actual keyword rich portal page behind it. 3. If the landing page has to remain as it does now with the non-logged in redirect to it, change the URLs so that the landing page is www.pch.com and the portal becomes www.pch.com/members/ or something like that. Any other thoughts? Thanks! Kenn Gold Publishers Clearing House
Web Design | | Kenn_Gold0 -
How do I gain full SEO value from individual property pages?
A client of ours has a vacation rental business with rental locations all over the country. Their old sites were a messy assembly of black hat, broken links and htaccess files that were used over and over on each site. We are redoing everything for them, in one site, with multiple subdirectories for individual locations, like Aspen, Fort Meyers, etc. Anyhow, I'm putting together the SEO plan for the site and I have a problem. The individual rental properties have great SEO value (lots of text, indexable pictures, can create google/bing location pages), and are great for linking in social media (Look at this wonderful property, rental price just reduced!). However, I don't want individual properties, which will have very similar keywords, links, descriptions, etc, competing with each other when indexed. Truth be told, I don't really want search engines linking directly to the individual property pages at all. The intended browsing experience should allow a user to "narrow down" exactly what they're seeking using the site until the perfect rental appears. What I want is for searchers to be directed to the property listing index that most closely matches what they're seeking (Ft. Meyers Rental Condos or Breckenridge Rental Homes), and then allow them to narrow it down from there. This is ideal for the users, because it allows them to see all available properties that match what they want, and ideal for the customer, because it applies dozens of pages of SEO mojo to a single index, rather than dozens of pages. So I can't "noindex" or "nofollow", because I want all that good SEO mojo. I can't REL=CANONICAL, because the property pages aren't similar enough to the index. I can't 301 Redirect because I want the users to be able to see the property pages at some point. I'm stymied.
Web Design | | SpokeHQ0 -
Nav / Sitemap Question. Using a "services" page vs just linking directly to individual service page?
Okay, so our company offers video production, web design, and web marketing services. While we do offer these services individually, our goal is to get our clients to integrate these services together. Our nav is currently like so : home - about - video - web design - web marketing - blog - contact Now I've seen businesses and agencies also use a nav with a "services" button instead of listing out their service offerings (if they have more than 1, like us). The services button usually links to a category page or has a drop down with links to the company's individual services. I'm wondering if there is any benefit to having a main services page like this and linking to the individual pages off of it (video ,web design, marketing, etc). Or if we should just keep it the way we have it now (since we've already got some page authority on the individual service pages). I know this may not be the most important aspect of our site and we may be over-thinking it but any thoughts/ideas would be greatly appreciated, thanks!
Web Design | | RenderPerfect0 -
Joomla ( title page override not working properly ) any techy guys out there
Hey Mozzers I am having some problems with joomla. I have tried many support forums and since everyone is in the same field as me, i thought this would be a great place to ask this question. I am working with joomla 2.5 and After i have turn on my search engine friendly configuration, you can override the ( alias ) of the page by providing page display options for title tag. so i turned on the SEF in global config and turn on the mod-rewrite and made sure my htaccess file was not txt. But i am having some problems with this.
Web Design | | BizDetox
On some pages the page display option for the _browser page title _works and on some it does. On the pages it doesnt it is pulling the information of the Alias. ( which is common with most site )
Why is it doing this You can check out the pages yourself Here is a page with it not working
http://tungstengem.com/mens-wedding-bands and here is a page with it working
http://tungstengem.com/mens-wedding-...-bands-for-men Also for my homepage when i didnt have my Apach rewrite it show the index.php and i was able to ad an alias to it. Now the Alias for the home page is not working0 -
I've set up my own site which is still fairly new but I'm a bit concerned that there is a bloackage SEO wise somewhere because when I try to crawl the site on SEOmoz it only crawls one page.
I'm really baffled and none of my research has shed much light on it. My url is www.emporiumofmanliness.co.uk I'd really appreciate any help! Thanks
Web Design | | JoshED0 -
Does page speed worth for SEO?
I always broken my head to try to follow all pagespeed guidelines. I increase my pagespeed significantly, but i didnt saw any effect in my SEO performance. In my keywords, my concorrents are crap on it (I have score of 90 and they are at 60-70).Does google gives importance to it?
Web Design | | Naghirniac0