Robots.txt: Link Juice vs. Crawl Budget vs. Content 'Depth'
-
I run a quality vertical search engine. About 6 months ago we had a problem with our sitemaps, which resulted in most of our pages getting tossed out of Google's index. As part of the response, we put a bunch of robots.txt restrictions in place in our search results to prevent Google from crawling through pagination links and other parameter based variants of our results (sort order, etc). The idea was to 'preserve crawl budget' in order to speed the rate at which Google could get our millions of pages back in the index by focusing attention/resources on the right pages.
The pages are back in the index now (and have been for a while), and the restrictions have stayed in place since that time. But, in doing a little SEOMoz reading this morning, I came to wonder whether that approach may now be harming us...
http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo
http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutionsSpecifically, I'm concerned that a) we're blocking the flow of link juice and that b) by preventing Google from crawling the full depth of our search results (i.e. pages >1), we may be making our site wrongfully look 'thin'. With respect to b), we've been hit by Panda and have been implementing plenty of changes to improve engagement, eliminate inadvertently low quality pages, etc, but we have yet to find 'the fix'...
Thoughts?
Kurus
-
I always advise people NOT to use the robots txt to block off pages - it isnt the best way to handle things. In your case, there may be two options that you can consider:
1. For variant pages, (multiple parameters of the same page) use the rel canonical to increase the strength of the original page, and to keep the variants out of the index.
2. A controversial one this, and many may disagree, but depends on situation basis - allow crawling of the page, but dont allow indexing - follow, no index, which would still pass any juice, but wont index pages that you dont want in the SERPs. I normally do this for Search Result Pages that get indexed...
-
Got disconnected by seomoz as I posted so here is the short answer :
You were affected by Pand so you may pages with almost no content. These pages may be the one using crawl budget, much more than the paginated results. Worry about these low value pages and let Google handle the paginated results
-
Baptiste,
Thanks for the feedback. Can you clarify what you mean by the following?
"On a side note, if you were impacted by Panda, I would strongly suggest to remove / disallow the empty pages on your site. This will give you more crawl budget for interesting content."
-
I would not dig too much in the crawl budget + pagination problem - Google knows what is a pagination and will increase the crawl budget when necessary. On the 'thin' vision of your site, I think your right and I would immediately allow pages > 1 to be indexed.
Beware this may or not impact a lot on your site, it depends on the navigation system (you may have a lot of paginated subsets).
What tells site: requests ? Do you have all your items submitted in your sitemaps and indexed (see WMT) ?
On a side note, if you were impacted by Panda, I would strongly suggest to remove / disallow the empty pages on your site. This will give you more crawl budget for interesting content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Absolute vs. Relative Canonical Links
Hi Moz Community, I have a client using relative links for their canonicals (vs. absolute) Google appears to be following this just fine, but bing, etc. are still sending organic traffic to the non-canonical links. It's a drupal setup. Anyone have advice? Should I recommend that all canonical links be absolute? They are strapped for resources, so this would be a PITA if it won't make a difference. Thanks
Intermediate & Advanced SEO | | SimpleSearch1 -
How necessary is it to disavow links in 2017? Doesn't Google's algorithm take care of determining what it will count or not?
Hi All, So this is a obvious question now. We can see sudden fall or rise of rankings; heavy fluctuations. New backlinks are contributing enough. Google claims it'll take care of any low quality backlinks without passing pagerank to website. Other end we can many scenarios where websites improved ranking and out of penalty using disavow tool. Google's statement and Disavow tool, both are opposite concepts. So when some unknown low quality backlinks are pointing and been increasing to a website? What's the ideal measure to be taken?
Intermediate & Advanced SEO | | vtmoz0 -
Top hierarchy pages vs footer links vs header links
Hi All, We want to change some of the linking structure on our website. I think we are repeating some non-important pages at footer menu. So I want to move them as second hierarchy level pages and bring some important pages at footer menu. But I have confusion which pages will get more influence: Top menu or bottom menu or normal pages? What is the best place to link non-important pages; so the link juice will not get diluted by passing through these. And what is the right place for "keyword-pages" which must influence our rankings for such keywords? Again one thing to notice here is we cannot highlight pages which are created in keyword perspective in top menu. Thanks
Intermediate & Advanced SEO | | vtmoz0 -
Twitter Robots.TXT
Hello Moz World, So, I trying to wrap my head around all of the different robots.txt. I decided to dive into a site like Twitter, and look at their robot text. And now, I'm super confused. What are they telling the search engines with /hasttag/*src=. Why don't they just use: Useragent: * Disallow: But, they address each search engine. Is there any benefit to this? Thanks for all of the awesome responses!!! B/R Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Subdomain vs Subdirectory - does the content make a difference?
So I've read through all of the answers that suggest using a subdirectory is the best way to approach this - you rank more quickly and have all of your content on one site. BUT what if you're looking to move into a totally new market that your current site/content isn't in any way relevant to? Some examples are Supermarkets such as Tesco (who seem to use a mix of methods) http://www.tesco.com/groceries/, http://www.clothingattesco.com/, http://www.tesco.com/bank/ which links out from their main site to http://www.tescobank.com/ etc and Sainsburys http://www.sainsburys.co.uk/ who use subdomains - here they have their grocery offering, their bank offering, clothes, phones etc split into subdomains. If you have a product that is totally new to your Brand and different from all the products on your current site, does this change the answer to subdirectory vs subdomain? Would be great to hear your expert opinions on this. Thanks
Intermediate & Advanced SEO | | giffgaff2 -
Do you lose link juice when stripping query strings with canonicals?
It is well known that when page A canonicals to page B, some link juice is lost (similar to a 301). So imagine I have the following pages: Page A: www.mysite.com/main-page which has the tag: <link rel="canonical" href="http: www.mysite.com="" main-page"=""></link rel="canonical" href="http:> Page B: www.mysite.com/main-page/sub-page which is a variation of Page A, so it has a tag I know that links to page B will lose some of their SEO value, as if I was 301ing from page B to page A. Question: What about this link: www.mysite.com/main-page?utm_medium=moz&utm_source=qa&utm_campaign=forum Will it also lose link juice since the query string is being stripped by the canonical tag? In terms of SEO, is this like a redirect?
Intermediate & Advanced SEO | | YairSpolter0 -
OSE Confusion on 'External' Links
Hello All, I am still very new to this but am starting to get a grasp of things in the SEO world, but there are still a few things that I just don't get yet. For example, I've been trying to find out a great strategy for Link Building, what better way than looking at already existing SEO companies? So I did a quick search on a website (http://www.opensiteexplorer.org/links?site=www.springer-marketing.co.uk) and tried to look at all of the External incoming links. So I did a filter of Followed+301, Only External and all subdomains. But about 20 of the links for this site are coming from itself. Now, i'm not an expert, but presumably you can't just give yourself strong links? Is this some kind of trick, how or why would somebody do this? Mind Blows Paul
Intermediate & Advanced SEO | | Paul_Tovey0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0