Crawl Budget and Faceted Navigation
-
Hi, we have an ecommerce website with facetted navigation for the various options available.
Google has 3.4 million webpages indexed. Many of which are over 90% duplicates.
Due to the low domain authority (15/100) Google is only crawling around 4,500 webpages per day, which we would like to improve/increase.
We know, in order not to waste crawl budget we should use the robots.txt to disallow parameter URL’s (i.e. ?option=, ?search= etc..). This makes sense as it would resolve many of the duplicate content issues and force Google to only crawl the main category, product pages etc.
However, having looked at the Google Search Console these pages are getting a significant amount of organic traffic on a monthly basis.
Is it worth disallowing these parameter URL’s in robots.txt, and hoping that this solves our crawl budget issues, thus helping to index and rank the most important webpages in less time.
Or is there a better solution?
Many thanks in advance.
Lee.
-
Hello, I have also been in a similar situation. What I did was to disallow the urls with parameters using the robots.txt and place (in only the pages with parameters) the following two html tags:
This will expressly indicate to google not to index these pages. I still have some errors but I guess they will disappear in a few months.
Regards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Crawl Errors?
We are using Google Search Console to monitor Crawl Errors. It seems Google is listing errors that are not actual errors. For instance, it shows this as "Not found": https://tapgoods.com/products/tapgoods__8_ft_plastic_tables_11_available So the page does not exist, but we cannot find any pages linking to it. It has a tab that shows Linked From, but if I look at the source of those pages, the link is not there. In this case, it is showing the front page (listed twice, both for http and https). Also, one of the pages it shows as linking to the non-existant page above is a non-existant page. We marked all the errors as fixed last week and then this week they came up again. 2/3 are the same pages we marked as fixed last week. Is this an issue with Google Search Console? Are we getting penalized for a non existant issue?
Intermediate & Advanced SEO | | TapGoods0 -
How does Tripadviser ensure all their user reviews get crawled?
Tripadvisor has a LOT of user generated content. Searching for a random hotel always seems to return a paginated list of 90+ pages. However once the first page is clicked and "#REVIEWS" is appended to the URL, the URL never changes with any subsequent clicks of the paginated links. How do they ensure that all this review content gets crawled? Thanks, linklater
Intermediate & Advanced SEO | | linklater0 -
Meta No INDEX and Robots - Optimizing Crawl Budget
Hi, Sometime ago, a few thousand pages got into Google's index - they were "product pop up" pages, exact duplicates of the actual product page but a "quick view". So I deleted them via GWT and also put in a Meta No Index on these pop up overlays to stop them being indexed and causing dupe content issues. They are no longer within the index as far as I can see, i do a site:www.mydomain.com/ajax and nothing appears - So can I block these off now with robots.txt to optimize my crawl budget? Thanks
Intermediate & Advanced SEO | | bjs20100 -
SEOMOZ crawl all my pages
SEOMOZ crawl all my pages including ".do" (all web pages after sign up ) . Coz of this it finishes all my 10.000 crawl page quota and be exposed to dublicate pages. Google is not crawling pages that user reach after sign up. Because these are private pages for customers I guess The main question is how we can limit SEOMOZ crawl bot. If the bot can stay out of ".do" java extensions it'll perfect to starting SEO analysis. Do you know think about it? Cheers Example; .do java extension (after sign up page) (Google can't crawl) http://magaza.turkcell.com.tr/showProductDetail.do?psi=1001694&shopCategoryId=1000021&model=Apple-iPhone-3GS-8GB Normal Page (Google can crawl) http://magaza.turkcell.com.tr/telefon/Apple-iPhone-3GS-8GB/1001694/.html
Intermediate & Advanced SEO | | hcetinsoy0 -
Correlation Between Domain Authority and Crawl Penetration?
A. Is there a correlation between domain authority and crawl penetration? B. Is there a correlation between domain authority and juice distribution?
Intermediate & Advanced SEO | | AWCthreads0 -
Navigation
An e-commerce site I am working on currently displays 6 Super-Categories with a drop down that contains about 100 Categories for items which filter down to sub-cats and then the actual products. The issue is that every page starts off with these 100+ links just in navigation alone. I can only assume this is crippling our ability to spread link juice efficiently. I have looked at larger sites that have moved towards side navigation. A few examples: *amazon.com *walmart.com *newegg.com My issue is that we would like to move towards less links on the homepage to funnel our incoming links more efficiently but I cannot figure out how large sites cope with this. As far as I can tell they are using side nav that disappears after selecting a category of item in which the navigation is replaced with filtering tools and the nav is hidden above (see the sites above). Is this the best way to handle this issue? Also is there a way to find out exactly what they are doing because I am trying to explain this to our IT person and I just get a response that our site is fine how it is and these navigation links don't affect anything...even though each page starts off with the same 100 follow links of navigation. Thanks
Intermediate & Advanced SEO | | MichealGooden0 -
Navigation
I've been wrestling with this one for a while. Take a standard small web site navigation with nav links for: Products Solutions Support Learning Center I believe having drop downs to show the sub-pages of each category provides a better user experience, but it also bloats my links per page in the navigation from 4 to 24. Most of the additional links are useful for user experience, but not search purposes. So, 2-years after Google's changing of how it treats nofollows (which used to be the easy answer to this question), what is considered best practice? A) Go ahead and add the full 24 nav links on each page. The user experience outweighs the SEO benefits of fewer links and Google doesn't worry too much about nav links relative to main body links. B) Stick to only 4 nav options. Having 20 additional links on every page is a big deal and removing them is worth the user experience hit. I can still get to all levels of this small site within 2-3 clicks and do cross category linking to mitigate silos. C) Use some technical voodoo with js links or iframes to hide the nav links from Google and get the best of both worlds. D) Do something that is not one of the first three choices. Does anyone feel strongly about any of the above options or is this a user-preference type of situation where it doesn't make much difference which option you choose on a small 100-200 page site? I'm really looking forward to everyone's thoughts on this. -DV
Intermediate & Advanced SEO | | dvansant0 -
What would cause a drastic drop in pages crawled per day?
The site didn't go down. There were no drop in rankings, or traffic. But we went from averaging 150,000 pages crawled per day, to ~1000 pages crawled per day. We're now back up to ~100,000 crawled per day, but we went more than a week with only 1000 pages being crawled daily. The question is, what could cause this drastic (but temporary) reduction in pages crawled?
Intermediate & Advanced SEO | | Fatwallet0