Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can I Disallow Faceted Nav URLs - Robots.txt
-
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls.
So disallow: /category.html/? /category2.html/? /category3.html/*?
To prevent the price faceted url from being cached:
/category.html?price=1%2C1000
and
/category.html?price=1%2C1000&product_material=88Thanks!
-
If you can no-index , follow all but the default, then you will send link juice to the pages but it will return the link juice because it is follow, but they will not index because they are no-index.
If you use robots, then it can not read the page to follow the links.
-
Hey Tyler! haven't seen you on SEOmoz in a while. Hope you are good!
Check to see if this would make sense for you. GWT > Site Configuration > URL Perameters. It says "Only use this feature if you feel confident about how parameters work for your site. Telling Googlebot to exclude URLs with certain parameters could result in large numbers of your pages disappearing from our index."
-
If I can, then I disallow hundreds of pages that are duplicate content and should not be crawled.
If I don't then I send link juice to urls that I don't want seen.
This is a good answer though, thanks. Any other thoughts?
-
You can, but then you have links passing link juice to non followed pages. it would be better if you used canonical. even better would be to add no-index, follow meta tag when non canonical page is displayed, but this requres some codeing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt allows wp-admin/admin-ajax.php
Hello, Mozzers!
Technical SEO | | AndyKubrin
I noticed something peculiar in the robots.txt used by one of my clients: Allow: /wp-admin/admin-ajax.php What would be the purpose of allowing a search engine to crawl this file?
Is it OK? Should I do something about it?
Everything else on /wp-admin/ is disallowed.
Thanks in advance for your help.
-AK:2 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Good robots txt for magento
Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages.
Technical SEO | | rijwielcashencarry040
Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
What can I do if my reconsideration request is rejected?
Last week I received an unnatural link warning from Google. Sad times. I followed the guidelines and reviewed all my inbound links for the last 3 months. All 5000 of them! Along with several genuine ones from trusted sites like BBC, Guardian and Telegraph there was a load of spam. About 2800 of them were junk. As we don't employ any SEO agency and don't buy links (we don't even buy adwords!) I know that all of this spam is generated by spam bots and site scrapers copying our content. As the bad links have not been created by us and there are 2800 of them I cannot hope to get them removed. There are no 'contact us' pages on these Russian spam directories and Indian scraper sites. And as for the 'adult book marking website' who have linked to us over 1000 times, well I couldn't even contact that site in company time if I wanted to! As a result i did my manual review all day, made a list of 2800 bad links and disavowed them. I followed this up with a reconsideration request to tell Google what I'd done but a week later this has been rejected "We've reviewed your site and we still see links to your site that violate our quality guidelines." As these links are beyond my control and I've tried to disavow them is there anything more to be done? Cheers Steve
Technical SEO | | SteveBrumpton0 -
No indexing url including query string with Robots txt
Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL0 -
Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian
Technical SEO | | ChristianMKTG0 -
How do you disallow HTTPS?
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/). If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't... Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor. It's really just 1 page that needs to be disallowed.. Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
Technical SEO | | WebsiteConsultants0