Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing UpperCase URLs from Indexing
-
This search - site:www.qjamba.com/online-savings/automotix
gives me this result from Google:
Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products.and Google tells me there is another one, which is 'very simliar'. When I click to see it I get:
Automotix online coupons and shopping - Qjamba
https://www.qjamba.com/online-savings/Automotix
Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products.This is because I recently changed my program to redirect all urls with uppercase in them to lower case, as it appears that all lowercase is strongly recommended.
I assume that having 2 indexed urls for the same content dilutes link juice. Can I safely remove all of my UpperCase indexed pages from Google without it affecting the indexing of the lower case urls? And if, so what is the best way -- there are thousands.
-
Hi AMHC,
It makes sense that without hardly any backlinks built up Google wont find my upper case URLS since all the page links have been changed, however, I am writing out all of the urls that are redirected into email, and from that I can tell that Google is finding them--I guess they may have a list of urls from prior indexing that they crawl independent of what their crawler comes up with.
I'll keep looking to see what they have indexed and if it turns out they just aren't crawling certain pages, will put them in a sitemap to be crawled..It's a good idea for taking care of the problem quickly--so if it progresses too slowly I'll do that.
Thanks very much for your answers!
-
Google needs to crawl the bad pages that you 301d. If there are no live links to those pages, then Google can't find them to 301. In short, if you created new lower case URLs, you just increased your duplicate content problem.
To solve this problem, build an HTML sitemap with all of the bad URLs. Have Google fetch and submit the page and all of the pages it links to. Google will crawl all of your old pages and apply the 301s.
-
Thanks AMHC. In my case, I just don't have many back links so I don't have the urgency that you faced with getting Google to see all the redirects. But, I'm still not understanding--it sounds like you believe that once google sees the redirect it removes the old uppercase from its index. It doesn't look to me like that is what happened in my case because Google is currently indexing BOTH, and so that means it has crawled my new lowercase and I know it isn't crawling any uppercase anymore (it cant--all are redirected). So, that's why I wonder if I have to remove those uppercase urls...does that make sense or am I just not understanding it still?
EDIT: I just discovered I wasn't doing a 301 direct so it wasn't considered a permanent move. That, if I understand it right, will remove the upper case from googles index permanently.
-
Canonicals still drain link juice. Canonicals aren't like a 301. The link juice still stays on the canocalized page. All a canonical does is tell Google, in the case of duplicate content, which page is primary. Canonicals handle the duplicate content issue, they do not handle the link juice issue. If I have 2 pages: /product-name/ and /product-name=?khdfpohfo/ that are duplicates, you can via canonical, tell Google to ignore the page with the variable string and rank the page without the variable string. If the page with the variable string has links, the link juice stays on the page.
The HTML Sitemap is there to tell Google about the 301s. the sitemap would look like this:
After you do the 301 redirect, as well as set up parameters in the .htaccess file (I think - not the developer on this), everything should redirect to the lower case URL. The problem is that if you do a 301 redirect for your entire site, Google may not figure it out too quickly. When it crawls your home page downward, it's only going to see the new URLs, and can't crawl the old 301 URLs because there aren't any internal links pointing at them. The only way Google will see the 301 is via an external backlink. The way we solved this was to create an HTML sitemap of all of the old upper case URLs. We then had Google fetch and index/crawl the sitemap. As it crawls the sitemap, where all of the URLs are 301 redirects, it will likewise point all of the Link Juice at the new URLs.
-
I gotcha. Yeah, different thing going on here..these urls can be really difficult! I have uppercase lowercase, https http, urls that have different content(not just formatting) for mobile as desktop and vice versa, mobile urls that dont even exist for desktop, and desktop urls that dont exist for mobile..all under the same domain. 1000s of internal pages....In the desire to create a good website for users I've created an SEO monster because I didn't realize the many consequences with regard to search indexes.
If you know a true expert in these areas I need him/her. 4 years on this site, its live finally (2 months), and now I'm discovering all of these things have to be fixed, but i can't afford thousands of dollars..I'll do the work, I just need the knowledge!
-
I see where you are coming from, and I do not have a good answer then, when I did a lowercase redirect I started by creating the new lowercase pages then setting canonical to them. After a few months I removed the uppercase versions and redirected them to the new lowercase.
-
Hutch, thanks.
The site is dynamic with thousands of pages that are now being redirected to lower case, so I'm not seeing how using canonical would work because the upper case urls aren't on the site anymore. I guess I think of canonical as being useful when you have ongoing content on the site that duplicates one or more other pages on the same site. In my case none of the upper case urls exist anymore so they don't have 'ongoing' content. I'm still new to this so if it sounds like I have it wrong, please correct me.
-
Another quick fix would be to use a canonical tag on all of your pages pointing to the full lowercase versions.
So for the URLs example.com/UPPER; example.com/Upper; and example.com/upper you would place the following into the head so Google knows that these are just variations of the same page, and if will point search to the desired page example.com/upper
-
AMHC, thank you for your response. I'm in the middle of quite a mess, as this is one of several issues, so really appreciate your help. I must confess to not following everything you wrote exactly:
In your situation, I think i understand the redirect -- it is the same reason I am doing a redirect--it is so that anyone coming from to this site with uppercase in it will end up on the lower case page, and in the case of google will then index the page as a lower case page. BTW, for me that has been easy as I am doing it via php -- if the url doesn't equal its strtolower of the url , then I redirect to strtolower.
I think I get what you are saying about the sitemap -- it speeds up google crawling the site and seeing that all those upper cases should be lowercase from your redirect. In my case, i don't have the concern about Google discovering them as you did because my site is only a couple months old. And, I never have given Google a sitemap so many of my pages aren't crawled yet (I am trying to clean up my entire url structure before i submit a sitemap to them--however they have already crawled perhaps 20% of the site, so I'm now trying to examine what google has crawled and how it has been indexed to figure out what needs to be done).
What I'm not understanding is this: It seems to me that what you described should succeed for going forward to getting both Google and your users to the right ending page, but I don't see how it removes the prior uppercase urls from Google's index. What is it that tells Google your prior upper case urls should no longer be in their index? Is it the fact that they aren't in the sitemap you provide now? Or, do they literally have to be removed using some kind of removal or disavow tool? I discovered this (as you see in the op) because Google appears to never have removed the Uppercase ones even though they are indexing the lower case now.
Ted
-
We had the same issue. Boy, was it an education. I had no idea that URLs were case sensitive for Google, and neither did my SEO buddies. I bet if you asked 100 SEOs if URLs were case sensitive for Google, 95 would answer "No". We discovered the problem in GWT and GA when they had different statistics for the mixed case and all lower case versions of the URL. We believed that we had both a duplicate content issue as well as a link juice splitting issue, with backlinks being pointed at both URLs.
We solved the problem by doing a 301 redirect, but as we are an ecommerce site with thousands of products, it was a messy process. We had to redirect pretty much every page on the site since the mixed case categories contaminated subcategories and products.
The 301 went pretty smoothly, and we saw a minor bump up in some of our Rankings. I would strongly suggest that you create an HTML sitemap for every upper case URL that you are going to 301. Here were our thoughts - we could be wrong on this. If we just 301 a page, and don't tell Google, then Google won't know about it unless it tries to crawl the page. We felt like we needed to show Google that all of the pages are being redirected asap. Create an HTML sitemap with all of your upper case URLs. After you do the 301, have Google fetch and index the sitemap page and all of the pages that it links to. Leave the map up for a few days, and then you can take it down. This will expedite moving the link juice to the correct pages as Google will index the 301 for every page in the sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop URLs that include query strings from being indexed by Google
Hello Mozzers Would you use rel=canonical, robots.txt, or Google Webmaster Tools to stop the search engines indexing URLs that include query strings/parameters. Or perhaps a combination? I guess it would be a good idea to stop the search engines crawling these URLs because the content they display will tend to be duplicate content and of low value to users. I would be tempted to use a combination of canonicalization and robots.txt for every page I do not want crawled or indexed, yet perhaps Google Webmaster Tools is the best way to go / just as effective??? And I suppose some use meta robots tags too. Does Google take a position on being blocked from web pages. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Link juice through URL parameters
Hi guys, hope you had a fantastic bank holiday weekend. Quick question re URL parameters, I understand that links which pass through an affiliate URL parameter aren't taken into consideration when passing link juice through one site to another. However, when a link contains a tracking URL parameter (let's say gclid=), does link juice get passed through? We have a number of external links pointing to our main site, however, they are linking directly to a unique tracking parameter. I'm just curious to know about this. Thanks, Brett
Intermediate & Advanced SEO | | Brett-S0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Dev Subdomain Pages Indexed - How to Remove
I own a website (domain.com) and used the subdomain "dev.domain.com" while adding a new section to the site (as a development link). I forgot to block the dev.domain.com in my robots file, and google indexed all of the dev pages (around 100 of them). I blocked the site (dev.domain.com) in robots, and then proceeded to just delete the entire subdomain altogether. It's been about a week now and I still see the subdomain pages indexed on Google. How do I get these pages removed from Google? Are they causing duplicate content/title issues, or does Google know that it's a development subdomain and it's just taking time for them to recognize that I deleted it already?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Removing Dynamic "noindex" URL's from Index
6 months ago my clients site was overhauled and the user generated searches had an index tag on them. I switched that to noindex but didn't get it fast enough to avoid being 100's of pages indexed in Google. It's been months since switching to the noindex tag and the pages are still indexed. What would you recommend? Google crawls my site daily - but never the pages that I want removed from the index. I am trying to avoid submitting hundreds of these dynamic URL's to the removal tool in webmaster tools. Suggestions?
Intermediate & Advanced SEO | | BeTheBoss0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
Brackets in a URL String
Was talking with a friend about this the other day. Do Brackets and or Braces in a URL string impact SEO? (I know short human readable etc... but for the sake of conversation has anyone relaised any impacts of these particular Characters in a URL?
Intermediate & Advanced SEO | | AU-SEO0 -
Can a XML sitemap index point to other sitemaps indexes?
We have a massive site that is having some issue being fully crawled due to some of our site architecture and linking. Is it possible to have a XML sitemap index point to other sitemap indexes rather than standalone XML sitemaps? Has anyone done this successfully? Based upon the description here: http://sitemaps.org/protocol.php#index it seems like it should be possible. Thanks in advance for your help!
Intermediate & Advanced SEO | | CareerBliss0