Welcome to the Q&A Forum

YairSpolter

Looks like it caught up with them: http://searchengineland.com/confirmed-google-venture-backed-thumbtack-hit-with-manual-action-for-unnatural-links-222664

YairSpolter

Thanks Robert.

The pages that I'm talking about disallowing do not have rank or links. They are sub-pages of a profile page. If anything, the main page will be linked to, not the sub-pages.

Maybe I should have explained that I'm talking about a large site - around 400K pages. More than 1,000 new pages are created per week. That's why I am concerned about managing crawl budget. The pages that I'm referring to are not linked to anywhere on the site. Sure, Google can potentially get to them if someone decides to link to them on their own site, but this is unlikely and certainly won't happen on a large scale. So I'm not really concerned about about losing pagerank on the main profile page if I disallow them. To be clear: we have many thousands of pages with content that we want to rank. The pages I'm talking about are not important in those terms.

So it's really a question of balance... if these pages (there are MANY of them) are included in the crawl (and in our sitemap), potentially it's a real waste of crawl budget. Doesn't this outweigh the minuscule, far-fetched potential loss?

I understand that Google designed rel=canonical for this scenario, but that does not mean that it's necessarily the best way to go considering the other options.

YairSpolter

Thanks Takeshi.

Maybe I should have explained that I'm talking about a large site - around 400K pages. More than 1,000 new pages are created per week. That's why I am concerned about managing crawl budget. The pages that I'm referring to are not linked to anywhere on the site. Sure, Google can potentially get to them if someone decides to link to them on their own site, but this is unlikely (since it's a sub-page of the main profile page, which is where people would naturally link to) and certainly won't happen on a large scale. So I'm not really concerned about about link-juice evaporation. According to AJ Kohn here, it's not enough to see in Webmaster Tools that Google has indexed all pages on our site. There is also the issue of how often pages are being crawled, which is what we are trying to optimize for.

So it's really a question of balance... if these pages (there are MANY of them) are included in the crawl (and in our sitemap), potentially it's a real waste of crawl budget. Doesn't this outweigh the minuscule, far-fetched potential loss?

Would love to hear your thoughts...

YairSpolter

Thanks for the response, Robert.

I have read lots of SEO advice on maximizing your "crawl budget" - making sure your internal link system is built well to send the bots to the right pages. According to my research, since bots only spend a certain amount of time on your site when they are crawling, it is important to do whatever you can to ensure that they don't "waste time" on pages that are not important for SEO. Just as one example, see this post from AJ Kohn.

Do you disagree with this whole approach?

YairSpolter

When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed?

In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?

YairSpolter

Thanks for the quick response.

If the pages are presently not indexed, is there any advantage to follow/noindex over blocking via robots.php?

I guess my question is whether it's better or worse to have those pages spidered (by definition, any content that appears on these pages exists somewhere else on the site, since it is a search page)... what do you think?

YairSpolter

Hi Mark,

Can you explain why this is better than excluding the pages via robots.txt?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

YairSpolter

@YairSpolter

Posts made by YairSpolter

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved