To block with robots.txt or canonicalize?
-
I'm working with an apt community with a large number of communities across the US. I'm running into dup content issues where each community will have a page such as "amenities" or "community-programs", etc that are nearly identical (if not exactly identical) across all communities.
I'm wondering if there are any thoughts on the best way to tackle this. The two scenarios I came up with so far are:
Is it better for me to select the community page with the most authority and put a canonical on all other community pages pointing to that authoritative page?
or
Should i just remove the directory all-together via robots.txt to help keep the site lean and keep low quality content from impacting the site from a panda perspective?
Is there an alternative I'm missing?
-
I think the canonical idea is better than blocking the pages all together. Depending on how the site is laid out you may try and make the pages more specific to location being talked about. Maybe adding header tags with the location information as well as adding that info to the page title and meta-description. If it is not too time consuming, I'd try and make those pages more unique especially since you might be getting searches based on a location. Location specific pages may help in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt file issues on Shopify server
We have repeated issues with one of our ecommerce sites not being crawled. We receive the following message: Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. Read our troubleshooting guide. Are you aware of an issue with robots.txt on the Shopify servers? It is happening at least twice a month so it is quite an issue.
Moz Pro | | A_Q0 -
Can't work out robots.txt issue.
Hi I'm getting crawl errors that MOZ isn't able to access my robots.txt file but it seems completely fine to me? Any chance anyone can help me understand what might be the issue? www.equip4gyms.co
Moz Pro | | brenmcc10 -
What is Linking C-Blocks
Currently i am using MOZ pro tool under moz analyticls >> Moz Competitive Link Metrics >> history having a graph "Linking C-Blocks" Please help me understanding Linking C-Blocks, what is, How to build, how to define ...
Moz Pro | | shankar3334 -
Will moz crawl pages blocked by robots.txt and nofollow links?
i have over 2,000 temporary redirects in my campaign report redirects are mostly events like being redirected to a login page before showing the actual data im thinking of adding nofollow on the link so moz wont crawl the redirection to reduce the notification will this solve my problem?
Moz Pro | | WizardOfMoz0 -
Robots.txt
I have a page used for a reference that lists 150 links to blog articles. I use in in a training area of my website. I now get warnings from moz that it has too many links. I decided to disallow this page in robots.text. Below is the what appears in the file. Robots.txt file for http://www.boxtheorygold.com User-agent: * Disallow: /blog-links/ My understanding is that this simply has google bypass the page and not crawl it. However, in Webmaster Tools, I used the Fetch tool to check out a couple of my blog articles. One returned an expected result. The other returned a result of "access denied" due to robots.text. Both blog article links are listed on the /blog/links/ reference page. Question: Why does google refuse to crawl the one article (using the Fetch tool) when it is not referenced at all in the robots.text file. Why is access denied? Should I have used a noindex on this page instead of robots.txt? I am fearful that robots.text may be blocking many of my blog articles. Please advise. Thanks,
Moz Pro | | Rong
Ron0 -
Meta-Robots noFollow and Blocked by Meta-Robots
On my most recent campaign report, I have 2 Notices that we can't find any cause for: Meta-Robots nofollow-
Moz Pro | | gfiedel
http://www.fateyes.com/the-effect-of-social-media-on-the-serps-social-signals-seo/?replytocom=92
"noindex nofollow" for the page: http://www.fateyes.com/the-effect-of-social-media-on-the-serps-social-signals-seo/ Blocked by Meta-Robots -Meta-Robots nofollow-
http://www.fateyes.com/the-effect-of-social-media-on-the-serps-social-signals-seo/?replytocom=92
"noindex nofollow" for the page: http://www.fateyes.com/the-effect-of-social-media-on-the-serps-social-signals-seo/ We are unable to locate any code whatsoever that may explain this. Any ideas anyone?0 -
Rogerbot Ignoring Robots.txt?
Hi guys, We're trying to block Rogerbot from spending 8000-9000 of our 10000 pages per week for our site crawl on our zillions of PhotoGallery.asp pages. Unfortunately our e-commerce CMS isn't tremendously flexible so the only way we believe we can block rogerbot is in our robots.txt file. Rogerbot keeps crawling all these PhotoGallery.asp pages so it's making our crawl diagnostics really useless. I've contacted the SEOMoz support staff and they claim the problem is on our side. This is the robots.txt we are using: User-agent: rogerbot Disallow:/PhotoGallery.asp Disallow:/pindex.asp Disallow:/help.asp Disallow:/kb.asp Disallow:/ReviewNew.asp User-agent: * Disallow:/cgi-bin/ Disallow:/myaccount.asp Disallow:/WishList.asp Disallow:/CFreeDiamondSearch.asp Disallow:/DiamondDetails.asp Disallow:/ShoppingCart.asp Disallow:/one-page-checkout.asp Sitemap: http://store.jrdunn.com/sitemap.xml For some reason the Wysiwyg edit is entering extra spaces but those are all single spaced. Any suggestions? The only other thing I thought of to try is to something like "Disallow:/PhotoGallery.asp*" with a wildcard.
Moz Pro | | kellydallen0 -
What's name of SEOmoz and Open Site Explorer robots?!
I would like to exclude in robots.txt SEOmoz and Open Site Explorer bots to don't let them index my sites… what's their names?
Moz Pro | | cezarylech0