Stop google indexing CDN pages
-
Just when I thought I'd seen it all, google hits me with another nasty surprise!
I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site?
Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages.
Anyone got an idea on how to stop that?
Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there.
It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS?
Have you seen this problem and beat it?
(Of course the next thing is Roger might look at google results and start crawling them too, LOL)
P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
-
Thank you Edward.
I don't have quite that problem, but I think you are right too.
My CDN is set up to be Origin Pull.
That means there is no need to FTP - the system just fetches content as requested.
- you should check that out if you have to ftp everything.
But what you said that helped me is this - that I should have had one CNAME for images and anotehr CNAME for content and the content should be limited to a folder called content, so I can put the CSS files and the JS files in it and that way, the plain HTML pages at teh root level will never be affected.
I also realized, while checking the system, that I wasn't using a canonical tag in the intermediate pages, as I was in the story pages. So I just added code to add canonical tags for all the intermediate pages and the front page.
I do have a few other types of pages, so I will handle the code for them next.
I think adding the canonical tag might fix the problem, but I will also work on reconfiguring the CDN and change over when the action is not too busy, in case it takes a while to propagate.
-
It sounds like you have set up your CDN slightly wrong.
After setting up a few like you have I realised that I was actually making a complete duplicate of the site rather than just the images or assets
I imagine you have your origin directory for the CDN in the public html folder.
Create a subdomain, set that as the origin.
Eg.. I'm working on this site at the moment: http://looksfishy.co.uk/
I have a subdomain called assets: http://assets.looksfishy.co.uk/
The cdn content: http://cdn.looksfishy.co.uk/
Files uploaded here:
http://assets.looksfishy.co.uk/species/holder/pike.jpg
Displayed here:
http://cdn.looksfishy.co.uk/species/holder/pike.jpg
Check the ip address on them.
It does make uploading images by ftp a bit of a faff, but does make your site better
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Have you ever seen or experienced a page indexed which is actually from a website which is blocked by robots.txt?
Hi all, We use robots file and meta robots tags for blocking website or website pages to block bots from crawling. Mostly robots.txt will be used for website and expect all the pages to not getting indexed. But there is a condition here that any page from website can be indexed by Google even the site is blocked from robots.txt; because crawler may find the page link somewhere on internet as stated here at last paragraph. I wonder if this really the case where some webpages have got indexed. And even we use meta tags at page level; do we need to block from robots.txt file? Can we use both techniques at a time? Thanks
Algorithm Updates | | vtmoz0 -
Keywords in Paragraphs: How much do they matter at Google?
Hi all, Generally we care a lot about keywords at headings, title tags, URL, etc. always. But I wonder how much impact they have being in paragraphs. How much do they matter at paragraphs? Thanks
Algorithm Updates | | vtmoz0 -
Google inaccurate results: Common or error?
Hi all, While searching for our primary keyword, I can see 2 websites on second page results which are non-related to the keyword or industry but their company name is this keyword. Like if I want to rank and searching for "SEO", there are 2 websites which called "seo trucks" and "seo paints". I wonder how Google is ranking these websites for high competition keyword with 1 million searches per month. So the keyword in URL and this keyword mentioned across the website being their brand name taking over the other potential ranking factors like backlinks, relevant content, user clicks, etc..... Thanks
Algorithm Updates | | vtmoz0 -
Google's Presentation Yesterday
We hired a new website/marketing company that is a Preferred Google Partner (one of two in Charlotte according to them) and they hosted a presentation by Google at the Google Fiber office in Charlotte yesterday. As expected, there were lots of self-promotion by Google, accompanied with a plethora of data they created to support their PPC Marketing. It was an impressive performance with Molly Dince and Celena Fergusson, presenting Google Marketing Solutions: "Making the Web Work For You" and the keynote speaker Tim Reis, Director of Performance Agencies at Google: speaking on "Mobile Micromoments: Why Your Biggest Opportunities Are In The Smallest Moments" They ended with 15 minutes of Q&A and my question was answered with "I don't know" which I found surprising. So, here it is Thursday morning and I'm asking the same question to my Moz Family for some feedback: "Since the removal of Ads from the right column of a SERP, what percentage of Google traffic comes from Ads vs. the Organics?" I look forward to your comments. TY,
Algorithm Updates | | KevnJr
KJr0 -
Will increased pagerank increase traffic from google?
I got notified that my domain went from a google pagerank of 3 to 4. When this happens, does google raise me in the searches which can then hopefully get me more traffic, or is it a worthless number. Maybe only google knows 🙂
Algorithm Updates | | BrickPicker0 -
Google Reconsideration - To do or not to do?
We haven't been manually penalized by Google yet but we have had our fair share of things needing to be fixed; malware, bad links, lack/if no content, lack-luster UX, and issues with sitemaps & redirects. Should we still submit a reconsideration even though we haven't had a direct penalty? Does hurt us to send it?
Algorithm Updates | | GoAbroadKP0 -
Sudden drop in rankings and indexed pages!
Over the past few days I have noticed some apparent major changes. Before I explain, let me say this: Checking my analytics and WMT: There is an increase in traffic (even via google organic) There is no drop in impressions or clicks There is no drop in indexed pages in GWT Having said that; When I check my indexed pages using site:www.mywebsite.com, I see only 30 results as opposed to the 120K that I was seeing before (it was steadily climbing). The indexed pages have increase 3 fold in the past year, because of the increase in pages, updates, and products on the site. I see a sudden drop in rankings for major keywords that had been steadily rising. For example, I had some major keywords that were on page 7-8, not they are on page 20+ or not at all. Also, the page that used to show in the rankings has changed. I have only done white-hat guest blogging in the past year for link building, on a small scale (maybe 20-30 links in a year). They only other change recently, is that we are: Posting products on Houzz and Pinterest daily adding our site to all local directories (white pages, Yelp, citysearch, etc.) My site got hit by Penguin more than a year ago, but we have done everything right since, and our traffic via organic results has more than doubled since the Penguin release. What the hell is going on? Should I be concerned?
Algorithm Updates | | inhouseseo0 -
Implications of removing all google products from site
Is there any data on the implications of removing everything google from a site; analytics, adsense, webmaster tools, sitemaps, etc. Obviously they still have their search data and they say they dont use these other sources of data for ranking information but has anyone actually tried this or is there any existing data on this?
Algorithm Updates | | jessefriedman0