Removing Parameterized URLs from Google Index
-
We have duplicate eCommerce websites, and we are in the process of implementing cross-domain canonicals. (We can't 301 - both sites are major brands). So far, this is working well - rankings are improving dramatically in most cases.
However, what we are seeing in some cases is that Google has indexed a parameterized page for the site being canonicaled (this is the site that is getting the canonical tag - the "from" page). When this happens, both sites are being ranked, and the parameterized page appears to be blocking the canonical.
The question is, how do I remove canonicaled pages from Google's index? If Google doesn't crawl the page in question, it never sees the canonical tag, and we still have duplicate content. Example:
A. www.domain2.com/productname.cfm%3FclickSource%3DXSELL_PR is ranked at #35, and
B. www.domain1.com/productname.cfm is ranked at #12.
(yes, I know that upper case is bad. We fixed that too.)
Page A has the canonical tag, but page B's rank didn't improve. I know that there are no guarantees that it will improve, but I am seeing a pattern.
Page A appears to be preventing Google from passing link juice via canonical. If Google doesn't crawl Page A, it can't see the rel=canonical tag. We likely have thousands of pages like this. Any ideas? Does it make sense to block the "clicksource" parameter in GWT? That kind of scares me.
-
The reason that duplicate content is an issue is because Google will eventually drop one of the duplicates from its index, and if you have both page A and page B canonicalized to page B, Google will eventually drop page A in favor of page B.
If you are concerned that page A is not being crawled, you can use Fetch as Google to ask Google to recrawl it. But if A is not being crawled and B is, again B will be chosen over A in the results pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Javascript content not being indexed by Google
I thought Google has gotten better at picking up unique content from javascript. I'm not seeing it with our site. We rate beauty and skincare products using our algorithms. Here is an example of a product -- https://www.skinsafeproducts.com/tide-free-gentle-he-liquid-laundry-detergent-100-fl-oz When you look at the cache page (text) from google none of the core ratings (badges like fragrance free, top free and so forth) are being picked up for ranking. Any idea what we could do to have the rating incorporated in the indexation.
Intermediate & Advanced SEO | | akih0 -
Google News error in Google Search Console
My google search console states some errors as below: 1. Article fragmented Some of the urls in this error are the category urls. How to make google bot understand it is a category not an article? 2. Article too short In fact the article is quite long. I do not know why this is happen... 3. No sentence found In fact, there are a lot of sentences Please help!
Intermediate & Advanced SEO | | binhlai0 -
Google does not want to index my page
I have a site that is hundreds of page indexed on Google. But there is a page that I put in the footer section that Google seems does not like and are not indexing that page. I've tried submitting it to their index through google webmaster and it will appear on Google index but then after a few days it's gone again. Before that page had canonical meta to another page, but it is removed now.
Intermediate & Advanced SEO | | odihost0 -
Google Indexing & Caching Some Other Domain In Place of Mine-Lost Ranking -Sucuri.net Found nothing
Again I am facing same Problem with another wordpress blog. Google has suddenly started to Cache a different domain in place of mine & caching my domain in place of that domain. Here is an example page of my site which is wrongly cached on google, same thing happening with many other pages as well - http://goo.gl/57uluq That duplicate site ( protestage.xyz) is showing fully copied from my client's site but showing all pages as 404 now but on google cache its showing my sites. site:protestage.xyz showing all pages of my site only but when we try to open any page its showing 404 error My site has been scanned by sucuri.net Senior Support for any malware & there is none, they scanned all files, database etc & there is no malware found on my site. As per Sucuri.net Senior Support It's a known Google bug. Sometimes they incorrectly identify the original and the duplicate URLs, which results in messed ranking and query results. As you can see, the "protestage.xyz" site was hacked, not yours. And the hackers created "copies" of your pages on that hacked site. And this is why they do it - the "copy" (doorway) redirects websearchers to a third-party site [http://www.unmaskparasites.com/security-report/?page=protestage.xyz](http://www.unmaskparasites.com/security-report/?page=protestage.xyz) It was not the only site they hacked, so they placed many links to that "copy" from other sites. As a result Google desided that that copy might actually be the original, not the duplicate. So they basically hijacked some of your pages in search results for some queries that don't include your site domain. Nonetheless your site still does quite well and outperform the spammers. For example in this query: [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 But overall, I think both the Google bug and the spammy duplicates have the negative effect on your site. We see such hacks every now and then (both sides: the hacked sites and the copied sites) and here's what you can do in this situation: It's not a hack of your site, so you should focus on preventing copying the pages: 1\. Contact the protestage.xyz site and tell them that their site is hacked and that and show the hacked pages. [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 Hopefully they clean their site up and your site will have the unique content again. Here's their email flang.juliette@yandex.com 2\. You might want to send one more complain to their hosting provider (OVH.NET) abuse@ovh.net, and explain that the site they host stole content from your site (show the evidence) and that you suspect the the site is hacked. 3\. Try blocking IPs of the Aruba hosting (real visitors don't use server IPs) on your site. This well prevent that site from copying your site content (if they do it via a script on the same server). I currently see that sites using these two IP address: 149.202.120.102\. I think it would be safe to block anything that begins with 149.202 This .htaccess snippet should help (you might want to test it) #-------------- Order Deny,Allow Deny from 149.202.120.102 #-------------- 4\. Use rel=canonical to tell Google that your pages are the original ones. [https://support.google.com/webmasters/answer/139066?hl=en](https://support.google.com/webmasters/answer/139066?hl=en) It won't help much if the hackers still copy your pages because they usually replace your rel=canonical with their, so Google can' decide which one is real. But without the rel=canonical, hackers have more chances to hijack your search results especially if they use rel=canonical and you don't. I should admit that this process may be quite long. Google will not return your previous ranking overnight even if you manage to shut down the malicious copies of your pages on the hacked site. Their indexes would still have some mixed signals (side effects of the black hat SEO campaign) and it may take weeks before things normalize. The same thing is correct for the opposite situation. The traffic wasn't lost right after hackers created the duplicates on other sites. The effect build up with time as Google collects more and more signals. Plus sometimes they run scheduled spam/duplicate cleanups of their index. It's really hard to tell what was the last drop since we don't have access to Google internals. However, in practice, if you see some significant changes in Google search results, it's not because of something you just did. In most cases, it's because of something that Google observed for some period of time. Kindly help me if we can actually do anything to get the site indexed properly again, PS it happened with this site earlier as well & that time I had to change Domain to get rid of this problem after I could not find any solution after months & now it happened again. Looking forward for possible solution Ankit
Intermediate & Advanced SEO | | killthebillion0 -
404 broken URLs coming up in Google
When we do a search for our brand, we are get the following results in google.com.au (see image attachment). As outlined in red, there are listings in Google that result in 404 Page Not Found URLs. What can we do to enable google to do a recrawl or to ensure that these broken URLs are no longer listed in Google? Thanks for your help here! sBqpvtj
Intermediate & Advanced SEO | | Gavo0 -
Will Google View Using Google Translate As Duplicate?
If I have a page in English, which exist on 100 other websites, we have a case where my website has duplicate content. What if I use Google Translate to translate the page from English to Japanese, as the only website doing this translation will my page get credit for producing original content? Or, will Google view my page as duplicate content, because Google can tell it is translated from an original English page, which runs on 100+ different websites, since Google Translate is Google's own software?
Intermediate & Advanced SEO | | khi50 -
Google Sitemap only indexing 50% Is that a problem?
We have about 18,000 pages submitted on our Google Sitemap and only about 9000 of them are indexed. Is this a problem? We have a script that creates a sitemap on a daily basis and it is submitted on a daily basis. Am I better off only doing it once a week? Is this why I never get to the full 18,000 indexed?
Intermediate & Advanced SEO | | EcommerceSite0