Can I Disallow Faceted Nav URLs - Robots.txt
-
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls.
So disallow: /category.html/? /category2.html/? /category3.html/*?
To prevent the price faceted url from being cached:
/category.html?price=1%2C1000
and
/category.html?price=1%2C1000&product_material=88Thanks!
-
If you can no-index , follow all but the default, then you will send link juice to the pages but it will return the link juice because it is follow, but they will not index because they are no-index.
If you use robots, then it can not read the page to follow the links.
-
Hey Tyler! haven't seen you on SEOmoz in a while. Hope you are good!
Check to see if this would make sense for you. GWT > Site Configuration > URL Perameters. It says "Only use this feature if you feel confident about how parameters work for your site. Telling Googlebot to exclude URLs with certain parameters could result in large numbers of your pages disappearing from our index."
-
If I can, then I disallow hundreds of pages that are duplicate content and should not be crawled.
If I don't then I send link juice to urls that I don't want seen.
This is a good answer though, thanks. Any other thoughts?
-
You can, but then you have links passing link juice to non followed pages. it would be better if you used canonical. even better would be to add no-index, follow meta tag when non canonical page is displayed, but this requres some codeing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt on refinements
In dealing with Panda do you think it is a good idea to put all refinements for category pages in the robots.txt file? We already have a lot as noindex, follow but I am wondering if it would be better to address from a crawl perspective as the pages are probably thin duplicate content to Google.
Technical SEO | | Gordian0 -
Adding multi-language sitemaps to robots.txt
I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt?
Technical SEO | | MickEdwards0 -
How can I make Google Webmaster Tools see the robots.txt file when I am doing a .htacces redirec?
We are moving a site to a new domain. I have setup an .htaccess file and it is working fine. My problem is that Google Webmaster tools now says it cannot access the robots.txt file on the old site. How can I make it still see the robots.txt file when the .htaccess is doing a full site redirect? .htaccess currently has: Options +FollowSymLinks -MultiViews
Technical SEO | | RalphinAZ
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?michaelswilderhr.com$ [NC]
RewriteRule ^ http://www.s2esolutions.com/ [R=301,L] Google webmaster tools is reporting: Over the last 24 hours, Googlebot encountered 1 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%.0 -
Using Robots.txt
I want to Block or prevent pages being accessed or indexed by googlebot. Please tell me if googlebot will NOT Access any URL that begins with my domain name, followed by a question mark,followed by any string by using Robots.txt below. Sample URL http://mydomain.com/?example User-agent: Googlebot Disallow: /?
Technical SEO | | semer0 -
Can URL re writes fix the problem of critical content too deep in a sites structure?
Good morning from Wetherby UK 🙂 Ok imagine this scenario. You ask the developers to design a site where "offices to let" is on level two of a sites hierachy and so the URL would look like this: http://www.sandersonweatherall.co.uk/office-to-let. But Yikes when it goes live it ends up like this: http://www.sandersonweatherall.co.uk...s/residential/office-to-let Is a fix to this a URL re - write? Or is the only fix relocating the office to let content further up the site structure? Any insights welcome 🙂
Technical SEO | | Nightwing0 -
Should we block URL param in Webmaster tools after URL migration?
Hi, We have just released a new version of our website that now has a human readable nice URL's. Our old ugly URL's are still accessible and cannot be blocked/redirected. These old URL's use a URL param that has an xpath like expression language to define the location in our catalog. We have about 2 million pages indexed with this old URL param in it while we have approximately 70k nice URL's after the migration. This high number of old URL's is due to facetting that was done using this URL param. I wonder if we should now completely block this URL param from Google Webmaster tools so that these ugly URL's will be removed from the Google index. Or will this harm our position in Google? Thanks, Chris
Technical SEO | | eCommerceSEO0 -
Confused about robots.txt
There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots. User-agent: * Disallow: javascript.js Disallow: /images/ Disallow: /embedconfig Disallow: /playerconfig Disallow: /spotlightmedia Disallow: /EventVideos Disallow: /playEpisode Allow: / Sitemap: http://www.example.tv/sitemapindex.xml Sitemap: http://www.example.tv/sitemapindex-videos.xml Sitemap: http://www.example.tv/news-sitemap.xml Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools! Help someone, anyone! Can't seem to understand this robotic business! Regards,
Technical SEO | | Netpace0 -
Keywords in Vanity URL
If I set up a vanity URL that just 301's to the main site, do the search engines look at the keywords in the vanity URL when determing how to rank the site. For example, if I set up a vanity URL of www.coolnewtechgear.com, and redirect it to www.company.com/products/, would the search engines view the keywords of cool, new, tech, and gear and associate that with the page it's getting redirected to? Or does it ignore the vanity URL and only look at the content of the page itself?
Technical SEO | | ryanwats0