Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can I Disallow Faceted Nav URLs - Robots.txt
-
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls.
So disallow: /category.html/? /category2.html/? /category3.html/*?
To prevent the price faceted url from being cached:
/category.html?price=1%2C1000
and
/category.html?price=1%2C1000&product_material=88Thanks!
-
If you can no-index , follow all but the default, then you will send link juice to the pages but it will return the link juice because it is follow, but they will not index because they are no-index.
If you use robots, then it can not read the page to follow the links.
-
Hey Tyler! haven't seen you on SEOmoz in a while. Hope you are good!
Check to see if this would make sense for you. GWT > Site Configuration > URL Perameters. It says "Only use this feature if you feel confident about how parameters work for your site. Telling Googlebot to exclude URLs with certain parameters could result in large numbers of your pages disappearing from our index."
-
If I can, then I disallow hundreds of pages that are duplicate content and should not be crawled.
If I don't then I send link juice to urls that I don't want seen.
This is a good answer though, thanks. Any other thoughts?
-
You can, but then you have links passing link juice to non followed pages. it would be better if you used canonical. even better would be to add no-index, follow meta tag when non canonical page is displayed, but this requres some codeing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Folders in url structure?
Hello, Revamping an out-of-date website and am wondering if I need to include the folders (categories) in the url structure? The proposed structure has 8 main folders. I've been reading that Google is ok if the folder is not included in the url, but is it really? The hesitation I have is that the urls are getting long and the main folder only has only a sub folder beneath it. So, /folder-name/facility-name/treatment-overview. This looks too long, doesn't it? Thanks!
Technical SEO | | lfrazer1230 -
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Multiple robots.txt files on server
Hi! I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step. One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names: robots.txt (original dupplicate)
Technical SEO | | mjukhud
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate) Would really appreciate help and expertise suggestions. Thanks!0 -
Old URLs Appearing in SERPs
Thirteen months ago we removed a large number of non-corporate URLs from our web server. We created 301 redirects and in some cases, we simply removed the content as there was no place to redirect to. Unfortunately, all these pages still appear in Google's SERPs (not Bings) for both the 301'd pages and the pages we removed without redirecting. When you click on the pages in the SERPs that have been redirected - you do get redirected - so we have ruled out any problems with the 301s. We have already resubmitted our XML sitemap and when we run a crawl using Screaming Frog we do not see any of these old pages being linked to at our domain. We have a few different approaches we're considering to get Google to remove these pages from the SERPs and would welcome your input. Remove the 301 redirect entirely so that visits to those pages return a 404 (much easier) or a 410 (would require some setup/configuration via Wordpress). This of course means that anyone visiting those URLs won't be forwarded along, but Google may not drop those redirects from the SERPs otherwise. Request that Google temporarily block those pages (done via GWMT), which lasts for 90 days. Update robots.txt to block access to the redirecting directories. Thank you. Rosemary One year ago I removed a whole lot of junk that was on my web server but it is still appearing in the SERPs.
Technical SEO | | RosemaryB3 -
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
Will an XML sitemap override a robots.txt
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed. I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why. Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
Technical SEO | | KCBackofen0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0 -
Is blocking RSS Feeds with robots.txt necessary?
Is it necessary to block an rss feed with robots.txt? It seems they are automatically not indexed (http://googlewebmastercentral.blogspot.com/2007/12/taking-feeds-out-of-our-web-search.html) And, google says here that it's important not to block RSS feeds (http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html) I'm just checking!
Technical SEO | | nicole.healthline0