What is best practice to eliminate my IP addr content from showing in SERPs?
-
Our eCommerce platform provider has our site load balanced in a few data centers. Our site has two of our own exclusive IP addresses associated with it (one in each data center).
Problem is Google is showing our IP addresses in the SERPs with what I would assume is bad duplicate content (our own at that).
I brought this to the attention of our provider and they say they must keep the IP addresses open to allow their site monitoring software to work. Their solution was to add robots.txt files for both IP addresses with site wide/root disallows.
As a side note, we just added canonical tags so the pages indexed within the IP addresses ultimately show the correct URL (non IP address) via the canonical.
So here are my questions.
-
Is there a better way?
-
If not, is there anything else we need to do get Google to drop the several hundred thousand indexed pages at the IP address level? Or do we sit back and wait now?
-
-
I would allow Google to crawl those pages for a little while longer just to ensure that they see the rel canonical tags. Then once you feel that they have recrawled the IP address pages you can disallow them again if you want, thought that isn't entirely necessary if you have the rel canonical tag set up properly.
Another option would be to 301 redirect the IP version of the page to the corresponding www. version.
If they still don't drop from the index you can use the URL Removal Tool in GWT, but you will have to set up a GWT account for each of the IP domains.
-
Thanks. Any suggestions on how to get Google to drop these pages (make them inactive)?
-
Hi,
Since doing the disallow on the IP address sites, they are no longer getting crawled.
** The disavow list won't stop google crawl those domain / pages. Google will just treat those links as no follow - so they won't pass Page Rank.
You will still see those in Web master tools, the links will still be active.
-
Sorry - I just thought of something that could pose a problem and was hoping to get your advice.
Since doing the disallow on the IP address sites, they are no longer getting crawled. Does that mean that the canonical tags within those IP address sites wont be able to do their work?
Or
Will the canonicals picked up from the proper domain help the search engines know they should consolidate the indexed pages from the now disallowed IP addresses?
I am seeing that the IP addresses are no longer being crawled, and the pages in their indexes about the same (not going down).
Thoughts?
-
Sorry - I just thought of something that could pose a problem and was hoping to get your advice.
Since doing the disallow on the IP address sites, they are no longer getting crawled. Does that mean that the canonical tags within those IP address sites wont be able to do their work?
Or
Will the canonicals picked up from the proper domain help the search engines know they should consolidate the indexed pages from the now disallowed IP addresses?
I am seeing that the IP addresses are no longer being crawled, and the pages in their indexes about the same (not going down).
Thoughts?
-
Thanks!
-
Thanks. We are getting large daily crawls (nearly 100k a day) so fingers crossed this will sort it out soon.
-
Hi,
The canonical solution should be enough however I would still build some xml sitemaps and submit those via Web master Tools to speed the process. You can also build some html sitemaps with a clear structure and add those in the footer - again, to speed up the proces a little bit.
If you split the content into multiple xml sitemaps you can also track the crawling process.
You should also check your crawling speed in Web Master Tools to see how many pages in avarage the google bot is hitting each day - based on those numbers you can run some prediction on how long it will take more or less for google to re crawl your pages.
If your numbers is "bad" you will need to improve it some how to help with process - it can do wonders...
Hope it helps.
-
The canonical solution you have implemented is perfect. If you have decent authority and get deep crawls every couple days, you should be fine and pages from your IP should start to disappear shortly.
I would not worry about it anymore. You are on the right track. Sit back, relax and enjoy your flight
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google shows date in the SERPS for the homepage
Hi SEO's we've build a site and our now trying to rank it but it won't go up despite of regular new unique content and a higher DA than most competitors. All tools like MOZ en Yoast SEO show green lights so we're kinda out of ideas right now. Till we saw the date in the SERPS for the meta description. This gives us the idea that Google sees it as a post and not as a page which might explain the low ranking. However there are no technical causes we can think off for Google to show the date. Any ideas on this matter? Could it be that Yoast SEO is causing this even although we tell it not to show dates? Love to hear from you!
Intermediate & Advanced SEO | | Heers0 -
Best practice for disallowing URLS with Robots.txt
Hi Everybody, We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring: Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/" Color - For example: ?color=91 Price - For example: "?price=650-700" Order - For example: ?dir=desc&order=most_popular Page - For example: "?p=1&standards=704" Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/" My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled? Any advice would be appreciated!
Intermediate & Advanced SEO | | centurysafety0 -
Content Internal Linking ?
Should we internally link new content to old content using anchor tags (keywords) related to pages from all new blogposts or should be keep rotating the blogposts like link from some blog posts & not from others. What ratio should we maintain. Right now i keep 2 links maximum from a 300 words posts or 3 in 500 words posts maximum. But linking from each new blog posts will be good?
Intermediate & Advanced SEO | | welcomecure0 -
Control Over SERPs
I launched a content branch of over 1200 guides for my company's website one year ago. This content branch now receives 180,000 visits per month. Unfortunately, my company has needed to cut back on several of the vertical services that it offers. As a result, many of the top article search results accounting for the majority of the traffic to this branch are now obsolete. Are there any feasible ways to shift the authority of these obsolete top rated pages (located in the top 20 results) to other articles that cover services that we still do offer? For reference, please see the attached image containing an overview of the top 20x results accounting for 40% of all traffic to this content branch. Green are still valid content and but all other colors are not. Does anyone have suggestions for moving more green articles up and into the top 20x results while pushing the other ones down without loosing out on overall site traffic? Appreciate any and all feedback! Thanks!!! 1qMyYEJ.jpg
Intermediate & Advanced SEO | | TQContent0 -
I have search result pages that are completely different showing up as duplicate content.
I have numerous instances of this same issue in our Crawl Report. We have pages showing up on the report as duplicate content - they are product search result pages for completely different cruise products showing up as duplicate content. Here's an example of 2 pages that appear as duplicate : http://www.shopforcruises.com/carnival+cruise+lines/carnival+glory/2013-09-01/2013-09-30 http://www.shopforcruises.com/royal+caribbean+international/liberty+of+the+seas We've used Html 5 semantic markup to properly identify our Navigation <nav>, our search widget as an <aside>(it has a large amount of page code associated with it). We're using different meta descriptions, different title tags, even microformatting is done on these pages so our rich data shows up in google search. (rich snippet example - http://www.google.com/#hl=en&output=search&sclient=psy-ab&q=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&oq=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&gs_l=hp.3...1102.1102.0.1601.1.1.0.0.0.0.142.142.0j1.1.0...0.0...1c.1.7.psy-ab.gvI6vhnx8fk&pbx=1&bav=on.2,or.r_qf.&bvm=bv.44442042,d.eWU&fp=a03ba540ff93b9f5&biw=1680&bih=925 ) How is this distinctly different content showing as duplicate? Is SeoMoz's site crawl flawed (or just limited) and it's not understanding that my pages are not dupe? Copyscape does not identify these pages as dupe. Should we take these crawl results more seriously than copyscape? What action do you suggest we take? </aside> </nav>
Intermediate & Advanced SEO | | JMFieldMarketing0 -
Duplicate Content Question
My client's website is for an organization that is part of a larger organization - which has it's own website. We were given permission to use content from the larger organization's site on my client's redesigned site. The SEs will deem this as duplicate content, right? I can "re-write" the content for the new site, but it will still be closely based on the original content from the larger organization's site, due to the scientific/medical nature of the subject material. Is there a way around this dilemma so I do not get penalized? Thanks!
Intermediate & Advanced SEO | | Mills1 -
NOINDEX content still showing in SERPS after 2 months
I have a website that was likely hit by Panda or some other algorithm change. The hit finally occurred in September of 2011. In December my developer set the following meta tag on all pages that do not have unique content: name="robots" content="NOINDEX" /> It's been 2 months now and I feel I've been patient, but Google is still showing 10,000+ pages when I do a search for site:http://www.mydomain.com I am looking for a quicker solution. Adding this many pages to the robots.txt does not seem like a sound option. The pages have been removed from the sitemap (for about a month now). I am trying to determine the best of the following options or find better options. 301 all the pages I want out of the index to a single URL based on the page type (location and product). The 301 worries me a bit because I'd have about 10,000 or so pages all 301ing to one or two URLs. However, I'd get some link juice to that page, right? Issue a HTTP 404 code on all the pages I want out of the index. The 404 code seems like the safest bet, but I am wondering if that will have a negative impact on my site with Google seeing 10,000+ 404 errors all of the sudden. Issue a HTTP 410 code on all pages I want out of the index. I've never used the 410 code and while most of those pages are never coming back, eventually I will bring a small percentage back online as I add fresh new content. This one scares me the most, but am interested if anyone has ever used a 410 code. Please advise and thanks for reading.
Intermediate & Advanced SEO | | NormanNewsome0 -
Content Focus
I have a particular Page which shows primary contact details as well as "additional" contact details for the client. GIven I do not believe I want Google to misinterpret the focus of the page from the primary contact details which of the following three options would be best? Place the "additional" contact details (w/maps) in Javascript, Ajax or similar to suppress them from being crawled. Leave "additional" contact details alone but emphasize the Primary contact details by placing the Primary contact details in Rich Snippets/Microformats. Do nothing and allow Google to Crawl the pages with all contact details Thanks, Phil
Intermediate & Advanced SEO | | AU-SEO0