Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Stop google indexing CDN pages
-
Just when I thought I'd seen it all, google hits me with another nasty surprise!
I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?
Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.
Anyone got an idea on how to stop that?
Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.
It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?
Have you seen this problem and beat it?
(Of course the next thing is Roger might look at google results and start crawling them too, LOL)
P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
-
Thank you Edward.
I don't have quite that problem, but I think you are right too.
My CDN is set up to be Origin Pull.
That means there is no need to FTP - the system just fetches content as requested.
- you should check that out if you have to ftp everything.
But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.
I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.
I do have a few other types of pages, so I will handle the code for them next.
I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.
-
It sounds like you have set up your CDN slightly wrong.
After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets
I imagine you have your origin directory for the CDN in the public html folder.
Create a subdomain, set that as the origin.
Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/
I have a subdomain called assets: http://assets.looksfishy.co.uk/
The cdn content: http://cdn.looksfishy.co.uk/
Files uploaded here:
http://assets.looksfishy.co.uk/species/holder/pike.jpg
Displayed here:
http://cdn.looksfishy.co.uk/species/holder/pike.jpg
Check the ip address on them.
It does make uploading images by ftp a bit of a faff, but does make your site better
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Cache
So, when I gain a link I always check to see if the page that is linking is in the Google cache. I've noticed recently that more and more pages are actually not showing up in Google's cache, yet still appear in search results. I did read an article from someone whoo works at Google a few weeks back that there is sometimes an error with the cache and occasionally the cache will not display. This week, my own website isn't showing up in the cache yet I'm still ranking in SERP's. I'm not worried about it, mostly whitehat, but has there been any indication that Google are phasing out the ability to check cache's of websites?
Algorithm Updates | | ThorUK0 -
More pages or less pages for best SEO practices?
Hi all, I would like to know the community's opinion on this. A website with more pages or less pages will rank better? Websites with more pages have an advantage of more landing pages for targeted keywords. Less pages will have advantage of holding up page rank with limited pages which might impact in better ranking of pages. I know this is highly dependent. I mean to get answers for an ideal website. Thanks,
Algorithm Updates | | vtmoz1 -
Sitemaps for landing pages
Good morning MOZ Community, We've been doing some re-vamping recently on our primary sitemap, and it's currently being reindexed by the search engines. We have also been developing landing pages, both for SEO and SEM. Specifically for SEO, the pages are focused on specific, long-tail search terms for a number of our niche areas of focus. Should I, or do I need to be considering a separate sitemap for these? Everything I have read about sitemaps simply indicates that if a site has over 50 thousand pages or so, then you need to split a sitemap. Do I need to worry about a sitemap for landing pages? Or simply add them to our primary sitemap? Thanks in advance for your insights and advice.
Algorithm Updates | | bwaller0 -
US domain pages showing up in Google UK SERP
Hi, Our website which was predominantly for UK market was setup with a .com extension and only two years ago other domains were added - US (.us) , IE (.ie), EU (.eu) & AU (.com.au) Last year in July, we noticed that few .us domain urls were showing up in UK SERPs and we realized the sitemap for .us site was incorrectly referring to UK (.com) so we corrected that and the .us domain urls stopped appearing in the SERP. Not sure if this actually fixed the issue or was such coincidental. However in last couple of weeks more than 3 .us domain urls are showing for each brand search made on Google UK and sometimes it replaces the .com results all together. I have double checked the PA for US pages, they are far below the UK ones. Has anyone noticed similar behaviour &/or could anyone please help me troubleshoot this issue? Thanks in advance, R
Algorithm Updates | | RaksG0 -
Google Index
Hi all, I just submit my url and linked pages along with xml map to index. How long does it take google to index my new pages?
Algorithm Updates | | businessowner0 -
Homepage Index vs Home vs Default?
Should your home page be www.yoursite.com/index.htm or home.htm or default.htm on an apache server? Someone asked me this, and I have no idea. On our wordpress site, I have never even seen this come up, but according to my friend, every homepage HAS to be one of those three. So my question is which one is best for an apache server site AND does it actually have to be one of those three? Thanks, Ruben
Algorithm Updates | | KempRugeLawGroup0 -
Does a KML file have to be indexed by Google?
I'm currently using the Yoast Local SEO plugin for WordPress to generate my KML file which is linked to from the GeoSitemap. Check it out http://www.holycitycatering.com/sitemap_index.xml. A competitor of mine just told me that this isn't correct and that the link to the KML should be a downloadable file that's indexed in Google. This is the opposite of what Yoast is saying... "He's wrong. 🙂 And the KML isn't a file, it's being rendered. You wouldn't want it to be indexed anyway, you just want Google to find the information in there. What is the best way to create a KML? Should it be indexed?
Algorithm Updates | | projectassistant1 -
Using Brand Name in Page titles
Is it a good practice to append our brand name at the end of every page title? We have a very strong brand name but it is also long. Right now what we are doing is saying: Product Name | Long brand name here Product Category | Long brand name here Is this the right way to do it or should we just be going with ONLY the product and category names in our page titles? Right now we often exceed the 70 character recommendation limit.
Algorithm Updates | | mlentner1