Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How can I prevent duplicate pages being indexed because of load balancer (hosting)?
-
The site that I am optimising has a problem with duplicate pages being indexed as a result of the load balancer (which is required and set up by the hosting company).
The load balancer passes the site through to 2 different URLs:
Some how, Google have indexed 2 of the same URLs (which I was obviously hoping they wouldn't) - the first on www and the second on www2.
The hosting is a mirror image of each other (www and www2), meaning I can't upload a robots.txt to the root of www2.domain.com disallowing all. Also, I can't add a canonical script into the website header of www2.domain.com pointing the individual URLs through to www.domain.com etc.
Any suggestions as to how I can resolve this issue would be greatly appreciated!
-
There are two ways to handle load balancing, and it appears that your hosting company / server company chose to use the DNS round-robin routing option.
According to the Wikipedia page on load balancing:
http://en.wikipedia.org/wiki/Load_balancing_(computing)"Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process."
Round Robin DNS Load Balancing: Basically you use the DNS routing system to handle requests. When someone visits your site, 50% of the people are routed to www.domain.com, and 50% are routed to ww1.domain.com. Both sites contain the same identical content; it's the URLs that are slightly different. Sometimes the domains are the same; but you have different IP addresses for www.domain.com.
Advantages: you don't need a dedicated load balancing piece of software or hardware, so it's less expensive.
Disadvantages: this technique exposes the individual web servers to the end user seeing the site. You can also suffer from duplicate content penalties, too. Finally, if you are relying on the round robin DNS system for load balancing, and a DNS server or one of the Web servers goes down, there's not an easy fail-over (as many DNS records are cached).More about Round Robin DNS: http://en.wikipedia.org/wiki/Round-robin_DNS
Hardware / Software Load Balancer:
In this case, your DNS zone file tells the end user to go to one IP address when they type in www.domain.com. The hardware or software load balancer then sees the request, and then hands off the content to one of the web servers in a cluster.Advantages: No duplicate content penalty; to the end user, they just see one web server and not individual sub-domains (www.domain.com and ww1.domain.com). A load balancer can also cache specific items like a CSS page, so the load on the Web server is even more minimal.
Disadvantages: You're introducing another piece of hardware or software (i.e. more cost); this piece could also be a single point of failure into the mix. You need someone to figure out how to set this up and make sure it all works.
More on this type of Load Balancing: http://en.wikipedia.org/wiki/Load_balancing_(computing)#Internet-based_services
Load balancing can get complicated as soon as you have databases involved, but with a good design, multiple front end Web servers can talk to one single backend database server. The goal would be to cache as much content as possible as "static" elements, using caching systems like Varnish, that essentially turn database-driven pages into static, old-school HTML pages. And then only when someone needs to save something from the database (i.e. making a purchase on an eCommerce site), the system then interacts with it.
My recommendation:
(1) Move from the Round Robin Robin DNS to a hardware or software load balancer.(2) If that isn't an easy solution, implement the Round Robin DNS solution to use identical A records for each server.
For example, you might have identical entries in your DNS zone files for both DNS servers:
www.domain.com A 69.94.15.10
NS2.domain.com:
www.domain.com A 75.64.18.12This should at least eliminate your duplicate content issue, but you still do have a few disadvantages (described above). This also could lead to server issues, as the servers might be confused if they are the authoritative ones.
And if both servers are sending email, pay special attention to your SPF record, to make sure that you are allowing both IP addresses to be able to send email. (This is often overlooked.)
Hope this is helpful!
-- Jeff
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How i can increase my page authority?
Hi, I have website and i want to increase my page authority. My website is latestdatabase.com I have making more backlinks but not good page authority so far. Please give me suggest.
Intermediate & Advanced SEO | | LatestMailingDatabase1 -
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | | chelseaskirtinguk0 -
How can I get Bing to index my subdomain correctly?
Hi guys, My website exists on a subdomain (i.e. https://website.subdomain.com) and is being indexed correctly on all search engines except Bing and Duck Duck Go, which list 'https://www.website.subdomain.com'. Unfortunately my subdomain isn't configured for www (the domain is out of my control), so searchers are seeing a server error when clicking on my homepage in the SERPs. I have verified the site successfully in Bing Webmaster Tools, but it still shows up incorrectly. Does anyone have any advice on how I could fix this issue? Thank you!
Intermediate & Advanced SEO | | cos20300 -
Is Google indexing Mp3 audio and MIDI music files? Can that cause any duplicate problems?
Hello, I own virtualsheetmusic.com website and we have several thousands of media files (Mp3 and MIDI files) that potentially Google can index. If that's the case, I am wondering if that could cause any "duplicate" issues of some sort since many of such media files have exact file names or same meta information inside. Any thoughts about this issue are very welcome! Thank you in advance to anyone.
Intermediate & Advanced SEO | | fablau0 -
Getting Pages Requiring Login Indexed
Somehow certain newspapers' webpages show up in the index but require login. My client has a whole section of the site that requires a login (registration is free), and we'd love to get that content indexed. The developer offered to remove the login requirement for specific user agents (eg Googlebot, et al.). I am afraid this might get us penalized. Any insight?
Intermediate & Advanced SEO | | TheEspresseo0 -
How to Remove Joomla Canonical and Duplicate Page Content
I've attempted to follow advice from the Q&A section. Currently on the site www.cherrycreekspine.com, I've edited the .htaccess file to help with 301s - all pages redirect to www.cherrycreekspine.com. Secondly, I'd added the canonical statement in the header of the web pages. I have cut the Duplicate Page Content in half ... now I have a remaining 40 pages to fix up. This is my practice site to try and understand what SEOmoz can do for me. I've looked at some of your videos on Youtube ... I feel like I'm scrambling around to the Q&A and the internet to understand this product. I'm reading the beginners guide.... any other resources would be helpful.
Intermediate & Advanced SEO | | deskstudio0 -
Duplicate internal links on page, any benefit to nofollow
Link spam is naturally a hot topic amongst SEO's, particularly post Penguin. While digging around forums etc, I watched a video blog from Matt Cutts posted a while ago that suggests that Google only pays attention to the first instance of a link on the page As most websites will have multiple instances of a links (header, footer and body text), is it beneficial to nofollow the additional instances of the link? Also as the first instance of a link will in most cases be within the header nav, does that then make the content link text critical or can good on page optimisation be pulled from the title attribute? I would appreciate the experiences and thoughts Mozzers thoughts on this thanks in advance!
Intermediate & Advanced SEO | | JustinTaylor880 -
Does rel=canonical fix duplicate page titles?
I implemented rel=canonical on our pages which helped a lot, but my latest Moz crawl is still showing lots of duplicate page titles (2,000+). There are other ways to get to this page (depending on what feature you clicked, it will have a different URL) but will have the same page title. Does having rel=canonical in place fix the duplicate page title problem, or do I need to change something else? I was under the impression that the canonical tag would address this by telling the crawler which URL was the URL and the crawler would only use that one for the page title.
Intermediate & Advanced SEO | | askotzko0