Stop google indexing CDN pages
-
Just when I thought I'd seen it all, google hits me with another nasty surprise!
I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?
Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.
Anyone got an idea on how to stop that?
Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.
It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?
Have you seen this problem and beat it?
(Of course the next thing is Roger might look at google results and start crawling them too, LOL)
P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
-
Thank you Edward.
I don't have quite that problem, but I think you are right too.
My CDN is set up to be Origin Pull.
That means there is no need to FTP - the system just fetches content as requested.
- you should check that out if you have to ftp everything.
But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.
I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.
I do have a few other types of pages, so I will handle the code for them next.
I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.
-
It sounds like you have set up your CDN slightly wrong.
After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets
I imagine you have your origin directory for the CDN in the public html folder.
Create a subdomain, set that as the origin.
Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/
I have a subdomain called assets: http://assets.looksfishy.co.uk/
The cdn content: http://cdn.looksfishy.co.uk/
Files uploaded here:
http://assets.looksfishy.co.uk/species/holder/pike.jpg
Displayed here:
http://cdn.looksfishy.co.uk/species/holder/pike.jpg
Check the ip address on them.
It does make uploading images by ftp a bit of a faff, but does make your site better
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I have an https page with an http img that redirects to an https img, is it still considered by google to be a mixed content page?
With Google starting to crack down on mixed content I was wondering, if I have an https page with an http img that redirects to an https img, is it still considered by Google to be a mixed content page? e.g. In an old blog article, there are images that weren't updated when the blog migrated to https, but just 301ed to new https images. is it still considered a mixed content page?
Algorithm Updates | | David-Stern0 -
What happens when a de-indexed subdomain is redirected to another de-indexed subdomain? What happens to the link juice?
Hi all, We are planning to de-index and redirect a sub domain A to sub domain B. Consequently we now need to d-index sub domain B also. What happens now to the link juice or page rank they gained from hundreds and thousands of backlinks? Will there be any ranking impact on main domain? Backlinks of these sub domains are not much relevant to main domain content. Thanks
Algorithm Updates | | vtmoz1 -
Google AMP (accelerated mobile pages), can it be used for non-Google news and Ecommerce Websites?
Mozzers, I've been doing a lot of research on Google's new Accelerated Mobile Pages (AMP) https://moz.com/blog/accelerated-mobile-pages-whiteboard-friday. From what I'm seeing, these AMP version websites are only for Google News-worthy websites such as New York Times, Cosmopolitan, and the BuzzFeeds of the world. But what about Ecommerce websites like Ebay or Amazon? Will AMP versions of "scotch tape" via OfficeDepot work in the SERP's on non-Google News cards?
Algorithm Updates | | Shawn1240 -
Google not crawling click to expand content - suggestions?
It seems like Google confirmed this week in a G+ hangout that content in click to expand content e.g. 'read more' dropdown and tabbed content scenarios will be discounted. The suggestion was if you have content it needs to be visible on page load. Here's more on it https://www.seroundtable.com/google-index-click-to-expand-19449.html and the actual hangout, circa 11 mins in https://plus.google.com/events/cjcubhctfdmckph433d00cro9as. From a UX and usability point of view having a lot of content that was otherwise tabbed or in click to expand divs can be terrible, especially on mobile. Does anyone have workable solutions or can think of examples of really great landing pages (i'm mostly thinking ecommerce) that also has a lot of visible content? Thanks Andy
Algorithm Updates | | AndyMacLean0 -
My site dissapeared from google search...
I was ranked for the keyword 'airbnb clone' in 3rd page, my url is http://www.claydip.com/airbnb.html. But today it was not found in the search results...i dont understand...i checked with google webmaster tools, there is no errors in on page optimization....Please help...
Algorithm Updates | | claydip0 -
MOZ.com Page Rank of 2?
I don't recall the page rank of SEOMoz.com prior to the company's change to MOZ.com. But did notice that MOZ.com currently has a Page Rank of 2 (which I find weird since it's such a strong, content rich, highly-regarded site). I'd be interested in hearing about findings from the MOZ.com team on why the low PR and how has it affected your site since the change? (...and perhaps a look at the future through a crystal ball 🙂 I recall reading the MOZ domain changing article titled "Domain Migrations: Surviving the "Perfect Storm" of Site Changes" which had great info and addresses some reasons for PR loss in the 'Traffic and Ranking Loss' section: http://moz.com/blog/domain-migration-lessons
Algorithm Updates | | Prospector-Plastics0 -
How do I separate 2 Google+ business listings?
Ever since Google Places started merging with Google+, my client's business listing is now showing up in local search results incorrectly under another business name who shares the same address as them. Has anyone else encountered this problem or a way to correct it?
Algorithm Updates | | TheeDigital0 -
Geolocation: Google only crawls from the US
A question was previously asked about geo-location and specifically if Google crawled from other countries. I could not locate the original question but wanted to share the below information. As of earlier this year Google only crawls from US IP addresses: http://www.youtube.com/watch?v=7paVYBgH0Hw
Algorithm Updates | | RyanKent1