I have a real estate company: www.company.com with approximately 400 agents.
When an agent gets hired we allow them to pick a URL which we then register and manage. For example: www.AGENT1.com
We then take this agent domain and 301 redirect it to a subdomain of our main site. For example
Agent1.com 301’s to agent1.company.com
We have each page on the agent subdomain canonicled back to the corresponding page on www.company.com
For example: agent1.company.com canonicles to www.company.com
What happened is that google indexed many URLS on the subdomains, and it seemed like Google ignored the canonical in many cases. Although these URLS were being crawled and indexed by google, I never noticed any of them rank in the results.
My theory is that Google crawled the subdomain first, indexed the page, and then later Google crawled the main URL. At that point in time, the two pages actually looked quite different from one another so Google did not recognize/honor the canonical. For example:
Agent1.company.com/category1 gets crawled on day 1
Company.com/category1 gets crawled 5 days later
The content (recently listed properties for sale) on these category pages changes every day. If Google crawled the pages (both the subdomain and the main domain) on the same day, the content on the subdomain and the main domain would look identical. If the urls are crawled on different days, the content will not match.
We had some major issues (duplicate content and site speed) on our www.company.com site that needed immediate attention. We knew we had an issue with the agent subdomains and decided to block the crawling of the subdomains in the robot.txt file until we got the main site “fixed”.
We have seen a small decrease in organic traffic from google to our main site since blocking the crawling of the subdomains. Whereas with Bing our traffic has dropped almost 80%.
After a couple months, we have now got our main site mostly “fixed” and I want to figure out how to handle the subdomains in order to regain the lost organic traffic. My theory is that these subdomains have a some link juice that is basically being wasted with the implementation of the robots.txt file on the subdomains.
Here is my question
If we put a ROBOTS rel=NOINDEX on all pages of the subdomains and leave the canonical (to the corresponding page of the company site) in place on each of those pages, will link juice flow to the canonical version?
Basically I want the link juice from the subdomains to pass to our main site but do not want the pages to be competing for a spot in the search results with our main site.
Another thought I had was to place the NOIndex tag only on the category pages (the ones that seem to change every day) and leave it off the product (property detail pages, pages that rarely ever change).
Thank you in advance for any insight.