CDN Being Crawled and Indexed by Google
-
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases.
It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt?
Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL.
Thanks!
Scott
-
It sounds like you got a hold of the problem.
Verify the subdomains in WMT
Block the CDN subdomains with robots.txt
Request site removal in WMT for the subdomains
make the canonicals absolute
Keep the blocked subdomains in WMT, when you log in you will see a message by the subdomains saying "Critical issue with your site" which is just telling you that the site is blocked.. I like to keep it in there so I can see it's still blocked.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is indexing bad URLS
Hi All, The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/ I have done the following to prevent these URLs from being created & indexed: 1. Added a directive in my Htaccess to 404 all of these URLs 2. Blocked /wp-content/uploads/revslider/ in my robots.txt 3. Manually de-inedex each URL using the GSC tool 4. Deleted the plugin However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I Thanks!
Technical SEO | | Tom3_150 -
Sudden Indexation of "Index of /wp-content/uploads/"
Hi all, I have suddenly noticed a massive jump in indexed pages. After performing a "site:" search, it was revealed that the sudden jump was due to the indexation of many pages beginning with the serp title "Index of /wp-content/uploads/" for many uploaded pieces of content & plugins. This has appeared approximately one month after switching to https. I have also noticed a decline in Bing rankings. Does anyone know what is causing/how to fix this? To be clear, these pages are **not **normal /wp-content/uploads/ but rather "index of" pages, being included in Google. Thank you.
Technical SEO | | Tom3_150 -
Google crawling but not indexing for no apparent reason
Client's site went secure about two months ago and chose root domain as rel canonical (so site redirects to https://rootdomain.com (no "www"). Client is seeing the site recognized and indexed by Google about every 3-5 days and then not indexed until they request a "Fetch". They've been going through this annoying process for about 3 weeks now. Not sure if it's a server issue or a domain issue. They've done work to enhance .htaccess (i.e., the redirects) and robots.txt. If you've encountered this issue and have a recommendation or have a tech site or person resource to recommend, please let me know. Google search engine results are respectable. One option would be to do nothing but then would SERPs start to fall without requesting a new Fetch? Thanks in advance, Alan
Technical SEO | | alankoen1230 -
How to check if an individual page is indexed by Google?
So my understanding is that you can use site: [page url without http] to check if a page is indexed by Google, is this 100% reliable though? Just recently Ive worked on a few pages that have not shown up when Ive checked them using site: but they do show up when using info: and also show their cached versions, also the rest of the site and pages above it (the url I was checking was quite deep) are indexed just fine. What does this mean? thank you p.s I do not have WMT or GA access for these sites
Technical SEO | | linklander0 -
Sitemap indexation
3 days ago I sent in a new sitemap for a new platform. Its 23.412 pages but until now its only 4 pages (!!) that are indexed according to the Webmaster Tools. Why so few? Our stage-enviroment got indexed (more than 50K pages) in a few days by a mistake.
Technical SEO | | Morten_Hjort0 -
Indexing Problem
My URL is: www.memovalley.comWe have submitted our sitemap last month and we are having issues seeing our URLs listed in the search results. Even though our sitemaps contain over 200 URLs, we only currently only have 7 listed (excluding blog.memovalley.com).Can someone help us with this? | |
Technical SEO | | Memovalley
| | | | It looks like Googlebot has timed out, at least once, for one of our URLs. Why is Googlebot timing out? My server is located at Amazon WS, in North Carolina and it is a small instance. Could Google be querying multiple URLs at the same time and jamming my servers? Could it be becauseThanks for your help!0 -
Odd Google Indexing Issue
I have encountered something odd with Google indexing. According to the Google cache my site was last updated on April 6. I had been making a series of changes on April 7th and none of them show up in the cached version of the site (naturally). Then, on the 8th, my rankings seem to have dropped about 6 places and the main SERP is showing a text that isn't even on the Web site. The cached version has the correct page title from the page that was indexed on the 6th. How do I learn where Google is picking this up from? There is a clean page title tag on my Web site. I've checked the server, etc to see what's going on. The text isn't completely unrelated, but it definitely impacted my ranking. Does Google ever have these hiccups when indexing?
Technical SEO | | VERBInteractive0 -
Google Indexed URLs for Terms Have Changed Causing Huge SERP Drop
We haven't made any significant changes to our website, however the pages that google has indexed for our critical keywords have changed to pages that have caused our SERP to drop dramatically for those pages. In some cases, the changes make no sense at all. For example, one of our terms that used to be indexed to our homepage is now indexed to a dead category page that has nothing on it. One of our biggest terms, where we were 9th, changed and is now indexed to our FAQ. As a result, we now rank 44th. This is having a MAJOR impact on our business so any help on why this sudden change happened and what we can do to combat it is greatly appreciated.
Technical SEO | | EvergladesDirect0