Crawl Budget and Faceted Navigation
-
Hi, we have an ecommerce website with facetted navigation for the various options available.
Google has 3.4 million webpages indexed. Many of which are over 90% duplicates.
Due to the low domain authority (15/100) Google is only crawling around 4,500 webpages per day, which we would like to improve/increase.
We know, in order not to waste crawl budget we should use the robots.txt to disallow parameter URL’s (i.e. ?option=, ?search= etc..). This makes sense as it would resolve many of the duplicate content issues and force Google to only crawl the main category, product pages etc.
However, having looked at the Google Search Console these pages are getting a significant amount of organic traffic on a monthly basis.
Is it worth disallowing these parameter URL’s in robots.txt, and hoping that this solves our crawl budget issues, thus helping to index and rank the most important webpages in less time.
Or is there a better solution?
Many thanks in advance.
Lee.
-
Hello, I have also been in a similar situation. What I did was to disallow the urls with parameters using the robots.txt and place (in only the pages with parameters) the following two html tags:
This will expressly indicate to google not to index these pages. I still have some errors but I guess they will disappear in a few months.
Regards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
HREF LANG: Different navigation/structure per country: is that a problem?
Hi all, One question about the href lang tag. Our webshop sells to 4 different countries (the Netherlands, Germany, Belgium & Spain). The navigation is a little bit different for these countries, depending on how popular certain product categories are in certain countries. So, for example: Netherlands --> Category A and B are in the top navigation
Intermediate & Advanced SEO | | AMAGARD
Germany --> Category B is a subcategory of product A. We want to implement the Hreflang tag, would it be a problem that the navigation/site structure (and therefore the URL structure for certain categories) are a bit different? So: The url for category B in the Netherlands is: https://www.website.com/nl/category-B/
The url for category B in Germany is: https://www.website.com/de/category-A/category-B/ Thanks in advance! Best!0 -
Help with facet URLs in Magento
Hi Guys, Wondering if I can get some technical help here... We have our site britishbraces.co.uk , built in Magento. As per eCommerce sites, we have paginated pages throughout. These have rel=next/prev implemented but not correctly ( as it is not in is it in ) - this fix is in process. Our canonicals are currently incorrect as far as I believe, as even when content is filtered, the canonical takes you back to the first page URL. For example, http://www.britishbraces.co.uk/braces/x-style.html?ajaxcatalog=true&brand=380&max=51.19&min=31.19 Canonical to... http://www.britishbraces.co.uk/braces/x-style.html Which I understand to be incorrect. As I want the coloured filtered pages to be indexed ( due to search volume for colour related queries ), but I don't want the price filtered pages to be indexed - I am unsure how to implement the solution? As I understand, because rel=next/prev implemented ( with no View All page ), the rel=canonical is not necessary as Google understands page 1 is the first page in the series. Therefore, once a user has filtered by colour, there should then be a canonical pointing to the coloured filter URL? ( e.g. /product/black ) But when a user filters by price, there should be noindex on those URLs ? Or can this be blocked in robots.txt prior? My head is a little confused here and I know we have an issue because our amount of indexed pages is increasing day by day but to no solution of the facet urls. Can anybody help - apologies in advance if I have confused the matter. Thanks
Intermediate & Advanced SEO | | HappyJackJr0 -
Jump to Navigation in SERPs?
To make 'jump to' navigation work, does the href or anchor need to contain descriptive text? For example, I know this is best: Install with Wubi But, would the below work just as well? Install with Wubi
Intermediate & Advanced SEO | | nicole.healthline0 -
URL Parameter Being Improperly Crawled & Indexed by Google
Hi All, We just discovered that Google is indexing a subset of our URL’s embedded with our analytics tracking parameter. For the search “dresses” we are appearing in position 11 (page 2, rank 1) with the following URL: www.anthropologie.com/anthro/category/dresses/clothes-dresses.jsp?cm_mmc=Email--Anthro_12--070612_Dress_Anthro-_-shop You’ll note that “cm_mmc=Email” is appended. This is causing our analytics (CoreMetrics) to mis-attribute this traffic and revenue to Email vs. SEO. A few questions: 1) Why is this happening? This is an email from June 2012 and we don’t have an email specific landing page embedded with this parameter. Somehow Google found and indexed this page with these tracking parameters. Has anyone else seen something similar happening?
Intermediate & Advanced SEO | | kevin_reyes
2) What is the recommended method of “politely” telling Google to index the version without the tracking parameters? Some thoughts on this:
a. Implement a self-referencing canonical on the page.
- This is done, but we have some technical issues with the canonical due to our ecommerce platform (ATG). Even though page source code looks correct, Googlebot is seeing the canonical with a JSession ID.
b. Resubmit both URL’s in WMT Fetch feature hoping that Google recognizes the canonical.
- We did this, but given the canonical issue it won’t be effective until we can fix it.
c. URL handling change in WMT
- We made this change, but it didn’t seem to fix the problem
d. 301 or No Index the version with the email tracking parameters
- This seems drastic and I’m concerned that we’d lose ranking on this very strategic keyword Thoughts? Thanks in advance, Kevin0 -
Unable to Crawl my Website
Hi all, I have a website that I am trying to promote, but tried to add it here in SEOMoz and got the following message: We have detected that the root domain evolving-networks.co.uk does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information. Does anyone know why this website cannot be crawled? Please help. Thank you in advance!
Intermediate & Advanced SEO | | LSDigital0 -
How to Disallow Specific Folders and Sub Folders for Crawling?
Today, I have checked indexing for my website in Google. I found very interesting result over there. You can check that result by following result of Google. Google Search Result I aware about use of robots.txt file and can disallow images folder to solve this issue. But, It may block my images to get appear in Google image search. So, How can I fix this issue?
Intermediate & Advanced SEO | | CommercePundit0 -
Keeping the Navigation on the Sitemap HTML Page?
Hey everyone. We are about to create a sitemap.html page and have always just kept the site theme in place and put the sitemap in the "content" section of the page, with the header navigation, sidebars and footer in place. Well, now with the new "only first link counts" Google rule, wouldn't it be better to just have a "plain" html sitemap page without any other links on it?
Intermediate & Advanced SEO | | JamesO0 -
What <h>Tag to Use For Global Navigation</h>
I've read several blogs discussing how including more than one H1 per page is a serious no no. However, what is the most effective <h>tag to use for your global navigation system. Or should it not be an <h>tag period?</h></h>
Intermediate & Advanced SEO | | calin_daniel0