CDN Being Crawled and Indexed by Google
-
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases.
It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt?
Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL.
Thanks!
Scott
-
It sounds like you got a hold of the problem.
Verify the subdomains in WMT
Block the CDN subdomains with robots.txt
Request site removal in WMT for the subdomains
make the canonicals absolute
Keep the blocked subdomains in WMT, when you log in you will see a message by the subdomains saying "Critical issue with your site" which is just telling you that the site is blocked.. I like to keep it in there so I can see it's still blocked.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index problems
“The website http://www.vaneyckshutters.com/nl/ does not show in the index of Google (site:vaneyckshutters.com/nl/). This must be the homepage in the Netherlands. Previously, the page www.vaneyckshutters.com was redirected to /nl/. This page is accessible now with a canonical tag to http://www.vaneyckshutters.com/nl/ in the hope to let /nl/ be indexed. When we look at the SERPS for keyword ‘shutters’, the page http://www.vaneyckshutters.com/ is shown in Google.nl on #32 and in Belgium #3. Problem & question: Why is it that /nl/ has not been indexed properly and why is it that we rank with http://www.vaneyckshutters.com on ‘shutters’ instead the/nl/ page?”
Technical SEO | | Happy-SEO1 -
Crawl issues
Hello there, I have found that when crawling my site I have errors regarding the meta description and it says it is missing from few pages. I checked these pages but there is a meta description. I also ran the same report with other tools and it comes up the same issues. What should I do?
Technical SEO | | PremioOscar0 -
How do I get my pages to go from "Submitted" to "Indexed" in Google Webmaster Tools?
Background: I recently launched a new site and it's performing much better than the old site in terms of bounce rate, page view, pages per session, session duration, and conversions. As suspected, sessions, users, and % new sessions are all down. Which I'm okay with because the the old site had a lot of low quality traffic going to it. The traffic we have now is much more engaged and targeted. Lastly, the site was built using Squarespace and was launched the middle of August. **Question: **When reviewing Google Webmaster Tools' Sitemaps section, I noticed it says 57 web pages Submitted, but only 5 Indexed! The sitemap that's submitted seems to be all there. I'm not sure if this is a Squarespace thing or what. Anyone have any ideas? Thanks!!
Technical SEO | | Nate_D0 -
Sitemap indexation
3 days ago I sent in a new sitemap for a new platform. Its 23.412 pages but until now its only 4 pages (!!) that are indexed according to the Webmaster Tools. Why so few? Our stage-enviroment got indexed (more than 50K pages) in a few days by a mistake.
Technical SEO | | Morten_Hjort0 -
About how google works
Hey guyz,
Technical SEO | | atakala
I want to ask a basic question. If I search for Larry Page lets say.
I think google look for it's index for word larry and page distinct.
And mix it up. But the question ;
Can google show a result which only Larry exist on the page but any of the synonym or the stem of the Page not exist .
If it can happen how this page can be showned in larry page query. Thank you.0 -
Website is not indexed in Google
Hi Guys, I have a problem with a website from a customer. His website is not indexed in Google (except for the homepage). I could not find anything that can possibly be the cause. I already checked the robots.txt, sitemap, and plugins on the website. In the HTML code i also couldn't find anything which makes indexing harder than usual. This is the website i am talking about: http://www.xxxx.nl/ (Dutch) The only thing that i am guessing now is the Google sandbox, but even that is quite unlikely. I hope you guys discover something i could not find! Thanks in advance 🙂
Technical SEO | | B.Great0 -
Google Index Speed Opinions
Hello Everyone, Under normal circumstances, new posts to my site are indexed almost instantly by Google. I know this because an occasional search with quotation marks surrounding the 1st paragraph of text displays my newly published page. I use this tactic from time to time to ensure contributors aren't syndicating content. My question is this: I've noticed over the last day or so that my newly published articles are not yet indexed. For example, an article that was published over 24 hours ago does not appear to be indexed yet. Is this cause for concern? Is there an average wait time for indexation? XML issue? Thanks in advance for the help/insight.
Technical SEO | | JSOC0 -
Google crawl index issue with our website...
Hey there. We've run into a mystifying issue with Google's crawl index of one of our sites. When we do a "site:www.burlingtonmortgage.biz" search in Google, we're seeing lots of 404 Errors on pages that don't exist on our site or seemingly on the remote server. In the search results, Google is showing nonsensical folders off the root domain and then the actual page is within that non-existent folder. An example: Google shows this in its index of the site (as a 404 Error page): www.burlingtonmortgage.biz/MQnjO/idaho-mortgage-rates.asp The actual page on the site is: www.burlingtonmortgage.biz/idaho-mortgage-rates.asp Google is showing the folder MQnjO that doesn't exist anywhere on the remote. Other pages they are showing have different folder names that are just as wacky. We called our hosting company who said the problem isn't coming from them... Has anyone had something like this happen to them? Thanks so much for your insight!
Technical SEO | | ILM_Marketing
Megan0