Seomoz bar: No Follow and Robots.txt
-
Should the Mozbar pickup 'nofollow" links that are handled in robots.txt ?
the robots.tx blocks categories, but is still show as a followed (green) link when using the mozbar.
Thanks!
Holly
ETA: I'm assuming that- disallow: myblog.com/category/ - is comparable to the nofollow tag on catagory?
-
Thank you Cyrus for that great article link. And like that article states near the end, it touches on a common problem for those of us that assume all the info at SeoMoz is accurate even though it may not be current. (not only seomoz to be fair) I've found several instances where even authorities change their mind or google changes is for them?
But anyways, it appears using canonical or meta tags would be the better solution. Unfortunately,neither is possible in Squarespace. I had just about decided to change the robots.txt , get rid of the disallow: /category/ , and call it a day. But then I found an example where the noindex was used in the robots.txt file of a squarespace website (specializing in SEM among other things). Probably the "longest" robots list I've ever seen!
http://www.hunchfree.com/robots.txt
Would it be a good idea to use noindex, FOLLOW in the robots.txt for /category/
(if that's even possible) or just keep with my "call it a day" solution...at least where robots.txt is concerned.
BTW- I posted a similar question on the reasoning behind the robots.txt for ss websites at the developers forum- nothing but crickets. Unless it's about design, things pretty much drop like a rock. Oh well.
-
As Phil pointed out, blocking a URL with robot.txt may keep search engines from crawling your pages, but that doesn't mean they wont index those pages. The meta robots NOINDEX, FOLLOW tag is a much better choice.
Highly recommend the following article that explains this in more detail:
http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
Unfortunately, Sqarespace isn't all that flexible when it comes to meta tags. For the most part, Google is getting better at figuring this kind of duplicate content out, but it's best to address it when you can.
-
Thank you so much for the detailed reply. It's REALLY appreciated. The blog you are referring to is the Squarespace company's blog. This disallow: categories IS however on any site that uses their service. But I've done a similar search with my personal blog on Squarespace and a couple of categories still show up in the SERPs anyways. You can edit the robot file if you want, but you have to do a redirect as you don't have root access.
Unfortunately, (at least I don't think we can), include meta tags for noindex on a page by page basis. You can use it in robots.txt.
It seems their would be a lot more duplicate content issue with tags rather than categories as it's more granular than categories.
The point of all this is I'm creating new websites for some of our homeschool students and want to get it right from the start with the site architecture and how we use tags and categories with a balanced focus on usability as well as optimizing for search. These kids are super interested in all the reasoning behind things and their questions are tougher than any client! Ha!
Again, Thanks so much and take care,
Holly
-
Thanks for providing some more detail Holly. I definitely think it's applicable to leave here and I'm happy to help.
Some people like to prevent search engines from crawling category pages out of a fear of duplicate content. For example, say you have a post that's at this URL:
site.com/blog/chocolate-milk-is-great.html
and it's also the only post in the category "milk" with this url:
then search engines see the same exact content (your blog post) on two different URLs. Since duplicate content is a big no-no, many people choose to prevent the engines from crawling category pages. Although, in my experience, it's really up to you. Do you feel like your category pages will provide value to users? Would you like them to show up in search results? If so, then make sure you let Google crawl them.
If you DON'T want category pages to be indexed by Google, then I think there's a better choice than using robots.txt. Your best bet is applying the noindex, follow tag to these pages. This tag tells the engines NOT to index this page, but to follow all of the links on it. This is better than robots.txt because robots.txt won't always prevent your site from showing up in search results (that's another long story), but the noindex tag will.
If I'm not making sense at all then please just let me know :).
Lastly, from what I can see on your site and blog, it doesn't look like the category pages for your blog are actually in your robots.txt file. Have someone do a double check.
To check this myself, I just did a google search for this URL:
http://blog.squarespace.com/blog/?category=Roadmap
And it showed up in Google right away. Looks like something isn't going according to plan. Don't worry though, that happens all of the time and it should be an easy fix.
-
I know one day i may wakeup one morning and this will all click, but for now perhaps an example will help me get past this initial hurdle.
Squarespace disallows categories in the robots.txt, but using the mozbar I see the category links are green.
So if I understand (partly anyways), the disallow in robots keeps the bots from crawling those pages when they come knocking at my site. However, the category links in a blog post are being crawled? or what's the point?
I'm just trying to understand the reasoning behind disallowing categories and how that should impact the tagging and categorizing of blog posts.
Perhaps I should of started a new question? or is it applicable to leave it here..
-
The nofollow attribute and robots.txt file serve different purposes.
Nofollow Attribute
This attribute is used to tell search engines, "Don't follow this link", or even "Don't follow any links on this page." It doesn't prevent pages from being indexed, just prevents the search engines from following that link from that particular page.
Robots.txt
This file contains a list of pages that the search engine should not access and should not index.
To read more about robots.txt check out this page: http://googleblog.blogspot.com/2007/01/controlling-how-search-engines-access.html
For more on Nofollow, check out this page: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEOmoz PRO: How to manage a Site with 2 languages on the same domain - without mixing up data?
I want to track a "rootdomain" that has two languages on it, the english version is in a subfolder /en/ 1. http://website.de
Moz Pro | | inlinear
2. http://website.de/en/ I want to manage and track each language-version isolated. So I will setup:
1. http://website.de as campaign DE - german
= as Root Domain
But as there are links to the /en/ Subfolder these data will also be included in all reports. And there is still no option in SEOmoz PRO to exclude folders or even urls.?! This will be bad when wanting a clear report of just one Language Version. 2. http://website.de/en as campaing EN - english
To track as "Subfolder" will not work beacause this option will only consider exactly this subfolder... So is there a way to see data just only for one language Version?0 -
Setting a campaign in SEOmoz
Hi All, when i set a new campaign i am asked to decide if to use: domin-name.com Or www.domain-name.com Can someone please explain the different in terms of the campaign If i use: domain-name.com - will the campaign run on www.domain-name.com too? Thank you SEOwise
Moz Pro | | iivgi0 -
Changing the way SEOmoz Detects Duplicate Content
Hey everyone, I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post: 1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that: **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported. **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported. 2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
Moz Pro | | KeriMorgret2 -
Why does it take so long for SeoMoz to update data?
I changed the Anchor Text of 4 40/100 MozRank sites 2 months ago, yet SeoMoz still shows the old Anchor text in the reports. Why is this taking so long? I also notice my inbound domains hasn't increased nor has my MozRank in 3-4 weeks. What's the turnaround?
Moz Pro | | sanchez19600 -
Title tag discrepancy - is this a Yoast or SEOMoz thing?
Just took on a WP site using Yoast - need help understanding the title tag. SEOMoz reports that the HP title tag is 159 characters, but there are only 60 characters entered in the plug-in field and GWT reports no "too long" title tags. Is this a Yoast thing? Wordpress: San Diego Plumbing, Heating and Air Conditioning Specialists (60) Google Webmaster Tools – reporting 0 titles too long SEO Moz: San Diego Plumbing, Heating and Air Conditioning Specialists | Anderson Plumbing Heating and Air » San Diego Plumbing, Heating and Air Conditioning Specialists (159)
Moz Pro | | vernonmack0 -
SEOMoz toolbar - Anyone else have problems with Search Profiles?
(Using Firefox 7.0.1) I just downloaded the toolbar and the Custom Search Profiles do not work--clicking on any of them adds "%" and numbers to the search query. I've created a couple of specific locations and I'd really like to get this figured out. Does this function work correctly for anyone? Am I doing something wrong?
Moz Pro | | Court_LOQUA0 -
SEOmoz API - Links and Anchor Text Calls
Hi, I'm testing out the SEOmoz API - however I'm stuggling to understand the use of the Cols parameter within the "anchor-text" method. I've looped through increasing numbers of "Cols" for a standard query and there just seems to be no logical pattern.
Moz Pro | | AlexThomas
** - Could someone please enlighten me as to how this works?** E.g. of results for query: http://lsapi.seomoz.com/linkscape/anchor-text/www.seomoz.org/?Scope=term_to_page&Sort=domains_linking_page&Cols=1 1Array ( [0] => Array ( [aturid] => 86128451138 ) [1] => Array ( [aturid] => 86128451144 ) [2] => Array ( [aturid] => 86128451131 ) ) 2Array ( [0] => Array ( [atut] => seomoz ) [1] => Array ( [atut] => seomoz.org ) [2] => Array ( [atut] => seo ) ) 3Array ( [0] => Array ( [aturid] => 86128451138 [atut] => seomoz ) [1] => Array ( [aturid] => 86128451144 [atut] => seomoz.org ) [2] => Array ( [aturid] => 86128451131 [atut] => seo ) ) 4Array ( [0] => Array ( [atui] => 38845159274 ) [1] => Array ( [atui] => 38845159274 ) [2] => Array ( [atui] => 38845159274 ) ) 5Array ( [0] => Array ( [atui] => 38845159274 [aturid] => 86128451138 ) [1] => Array ( [atui] => 38845159274 [aturid] => 86128451144 ) [2] => Array ( [atui] => 38845159274 [aturid] => 86128451131 ) ) 6Array ( [0] => Array ( [atui] => 38845159274 [atut] => seomoz ) [1] => Array ( [atui] => 38845159274 [atut] => seomoz.org ) [2] => Array ( [atui] => 38845159274 [atut] => seo ) ) 7Array ( [0] => Array ( [atui] => 38845159274 [aturid] => 86128451138 [atut] => seomoz ) [1] => Array ( [atui] => 38845159274 [aturid] => 86128451144 [atut] => seomoz.org ) [2] => Array ( [atui] => 38845159274 [aturid] => 86128451131 [atut] => seo ) ) 8Array ( [0] => Array ( [atuiu] => 1 ) [1] => Array ( [atuiu] => 1 ) [2] => Array ( [atuiu] => 0 ) ) 9Array ( [0] => Array ( [atuiu] => 1 [aturid] => 86128451138 ) [1] => Array ( [atuiu] => 1 [aturid] => 86128451144 ) [2] => Array ( [atuiu] => 0 [aturid] => 86128451131 ) ) 10Array ( [0] => Array ( [atuiu] => 1 [atut] => seomoz ) [1] => Array ( [atuiu] => 1 [atut] => seomoz.org ) [2] => Array ( [atuiu] => 0 [atut] => seo ) ) Links API: Similar confusion here for:
"TargetCols"
"SourceCols"
"LinkCols" The description here http://apiwiki.seomoz.org/w/page/13991141/Links API - is a bit vague It appears that the links API spits out everything anyway - that one's less of an issue. So... could anyone explain how the Anchor-text API parameter Cols works?? Cheers!0 -
What would be a really good reason to pay for SEOmoz Pro service?
What would be a really good reason(s) to pay for PRO membership after 1 month free trial expires? What do I get here that I didn't already use during my trial, or can get somwhere else for free or less money? Not trying to criticize, just looking for facts. Thanks.
Moz Pro | | _Z_2