Stop google indexing CDN pages
-
Just when I thought I'd seen it all, google hits me with another nasty surprise!
I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?
Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.
Anyone got an idea on how to stop that?
Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.
It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?
Have you seen this problem and beat it?
(Of course the next thing is Roger might look at google results and start crawling them too, LOL)
P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
-
Thank you Edward.
I don't have quite that problem, but I think you are right too.
My CDN is set up to be Origin Pull.
That means there is no need to FTP - the system just fetches content as requested.
- you should check that out if you have to ftp everything.
But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.
I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.
I do have a few other types of pages, so I will handle the code for them next.
I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.
-
It sounds like you have set up your CDN slightly wrong.
After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets
I imagine you have your origin directory for the CDN in the public html folder.
Create a subdomain, set that as the origin.
Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/
I have a subdomain called assets: http://assets.looksfishy.co.uk/
The cdn content: http://cdn.looksfishy.co.uk/
Files uploaded here:
http://assets.looksfishy.co.uk/species/holder/pike.jpg
Displayed here:
http://cdn.looksfishy.co.uk/species/holder/pike.jpg
Check the ip address on them.
It does make uploading images by ftp a bit of a faff, but does make your site better
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Cache
So, when I gain a link I always check to see if the page that is linking is in the Google cache. I've noticed recently that more and more pages are actually not showing up in Google's cache, yet still appear in search results. I did read an article from someone whoo works at Google a few weeks back that there is sometimes an error with the cache and occasionally the cache will not display. This week, my own website isn't showing up in the cache yet I'm still ranking in SERP's. I'm not worried about it, mostly whitehat, but has there been any indication that Google are phasing out the ability to check cache's of websites?
Algorithm Updates | | ThorUK0 -
Should plural keyword variations get their own targeted pages?
I am in the middle of changing a website from targeting just a single keyword on all pages to instead having each page target its own keyword/phrase. However, I'm a little conflicted on whether or not plural forms and other suffix (-ing) variations are different enough to get their own pages. SERP show different results for each keyword searched. Also, relevancy reports for the keywords score some differently and some the same. Is it best to instead use these as secondary and third level keywords on the same page as the main keyword for a page? See example below: OPTION A (Use each for different pages): Page 1 - Construction Fence Page 2 - Construction Fences Page 3 - Construction Fencing Page 4 - Construction Site Fence Page 5 - Construction Site Fences Page 6 - Construction Site Fencing ... OPTION B (Use as variations on same page): Page 1 - Construction Fence, Construction Fences, Construction Fencing Page 2 - Construction Site Fence, Construction Site Fences, Site Construction Fencing ... Any help is greatly appreciated. Thanks!
Algorithm Updates | | pac-cooper0 -
Sudden drop in rankings and indexed pages!
Over the past few days I have noticed some apparent major changes. Before I explain, let me say this: Checking my analytics and WMT: There is an increase in traffic (even via google organic) There is no drop in impressions or clicks There is no drop in indexed pages in GWT Having said that; When I check my indexed pages using site:www.mywebsite.com, I see only 30 results as opposed to the 120K that I was seeing before (it was steadily climbing). The indexed pages have increase 3 fold in the past year, because of the increase in pages, updates, and products on the site. I see a sudden drop in rankings for major keywords that had been steadily rising. For example, I had some major keywords that were on page 7-8, not they are on page 20+ or not at all. Also, the page that used to show in the rankings has changed. I have only done white-hat guest blogging in the past year for link building, on a small scale (maybe 20-30 links in a year). They only other change recently, is that we are: Posting products on Houzz and Pinterest daily adding our site to all local directories (white pages, Yelp, citysearch, etc.) My site got hit by Penguin more than a year ago, but we have done everything right since, and our traffic via organic results has more than doubled since the Penguin release. What the hell is going on? Should I be concerned?
Algorithm Updates | | inhouseseo0 -
Getting Listed in Google Places
How do I get listed in Google Places if I don't have a physical address? EG: I am a medical health insurance company in Colo Springs, Colorado, but service 20 cities? What is the best procedure? Getting a mailbox at Mailboxes, etc. or UPS Store?
Algorithm Updates | | GregWalt0 -
Google indexing my website's Search Results pages. Should I block this?
After running the SEOmoz crawl test, i have a spreadsheet of 11,000 urls of which 6381 urls are search results pages from our website that have been indexed. I know I've read that /search should be blocked from the engines, but can't seem to find that information at this point. Does anyone have facts behind why they should be blocked? Or not blocked?
Algorithm Updates | | Jenny10 -
Problems with Google results
Hi Everybody, I ve been dealing with this issue for a while now. i have a multilingual website: www.vallnord.com When a search for Vallnord in Google it always shows the result in Catalan, but it does not show what I specified in the meta description, it displays what it crawls from the home page. I have 2 problems here: It is not showing my meta description. What can I do? It is not showing the language from which the search was made. Example: if you search from Google.com and your default language is english it should been displayed the result from the english HTML. www.vallnord.com/en but it is not like this. It is always the catalan (default language of the site) the one that is displayed. I have tried several things already: Inserting the Hreflang function Changing the descriptions Resubmitting the sitemap via Google Webmaster I can not figure out what is going on because if you search: "Vallnord Castellano" it will display the spanish URL but still not the proper description. Moreover if you search: "www.vallnord.com/es" on google , it will display the proper URL and description. FYI, I am using 301 redirects for the languages: es.vallnord.com it is the sames as www.vallnord.com/es In addition to this, If using Yahoo search engine there is no problem. it will show the proper language. from yahoo.com the first result is in english and from yahoo.es the first result Spanish. So any idea what would be the problem?And furthermore, any Idea which would be the solution? Thanks in advance, Guido.
Algorithm Updates | | SilbertAd0 -
Google and Content at Top of Page Change?
We always hear about how Google made this change or that change this month to their algorithm. Sometimes it's true and other times it's just a rumor. So this week I was speaking with someone in the SEO field who said that this week a change occurred at Google and is going to become more prevalent where content placed at the "top of the fold" on merchant sites with products are going to get better placement, rather than if you have your products at top with some content beneath them at the bottom of the page. Any comments on this?
Algorithm Updates | | applesofgold0 -
Does google have the worst site usability?
Google tells us to make our sites better for our readers, which we are doing, but do you think google has horrible site usabilty? For example, in webmaster tools, I'm always being confused by their changes and the way they just drop things. In the HTML suggestions area, they don't tell you when the data was last updated, so the only way to tell is to download the files and check. In the URL removals, they used to show you the URLs they had removed. Now that is gone and the only way you can check is to try adding one. We don't have any URL parameters, so any parameters are as a result of some other site tacking on stuff at the end of our URL and there is no way to tell them that we don't have any parameters, so ignore them all. Also, they add new parameters they find on the end of the list, so the only way to check is to click through to the end of the list.
Algorithm Updates | | loopyal0