Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Google is forcing a 301 by truncating our URLs
-
Just recently we noticed that google has indexed truncated urls for many of our pages that get 301'd to the correct page.
For example, we have:
http://www.eventective.com/USA/Massachusetts/Bedford/107/Doubletree-Hotel-Boston-Bedford-Glen.htmlas the url linked everywhere and that's the only version of that page that we use.
Google somehow figured out that it would still go to the right place via 301 if they removed the html filename from the end, so they indexed just:
http://www.eventective.com/USA/Massachusetts/Bedford/107/
The 301 is not new. It used to 404, but (probably 5 years ago) we saw a few links come in with the html file missing on similar urls so we decided to 301 them instead thinking it would be helpful. We've preferred the longer version because it has the name in it and users that pay attention to the url can feel more confident they are going to the right place.
We've always used the full (longer) url and google used to index them all that way, but just recently we noticed about 1/2 of our urls have been converted to the shorter version in the SERPs. These shortened urls take the user to the right page via 301, so it isn't a case of the user landing in the wrong place, but over 100,000 301s may not be so good.
You can look at: site:www.eventective.com/usa/massachusetts/bedford/ and you'll noticed all of the urls to businesses at the top of the listings go to the truncated version, but toward the bottom they have the full url.
Can you explain to me why google would index a page that is 301'd to the right page and has been for years?
I have a lot of thoughts on why they would do this and even more ideas on how we could build our urls better, but I'd really like to hear from some people that aren't quite as close to it as I am.
One small detail that shouldn't affect this, but I'll mention it anyway, is that we have a mobile site with the same url pattern.
http://m.eventective.com/USA/Massachusetts/Bedford/107/Doubletree-Hotel-Boston-Bedford-Glen.html
We did not have the proper 301 in place on the m. site until the end of last week. I'm pretty sure it will be asked, so I'll also mention we have the rel=alternate/canonical set up between the www and m sites.
I'm also interested in any thoughts on how this may affect rankings since we seem to have been hit by something toward the end of last week. Don't hesitate to mention anything else you see that may have triggered whatever may have hit us.
Thank you,
Michael -
Lynn,
We had a few "site:" queries that we were watching as the full URLs came back replacing the truncated ones, for example: site:eventective.com/usa/Georgia/Atlanta. When we discovered the original problem, almost every listing page in those SERPs had a truncated URL, but by the start of last week it had gradually cleared up to only 6 or 7 listings with truncated URLs while all others had the full URL. Then suddenly we had 5 pages (50 listings) of truncated URLs and now almost 300 of them for that one query have the truncated version indexed. It appears to be continuing.
Another detail I noticed was in Webmaster Tools. All of our listings are in our sitemap with the full URL. When we had this problem before only about 50% of our pages listed in our sitemap were indexed, assuming that is because the truncated ones were in the index instead of the full URLs that were in the sitemap. As the truncated URL problem cleared up that ratio improved to the point where it was pretty steady at about 96-97% of our pages in our sitemap were indexed. Once this problem started to reappear that number dropped down to 90% and kept going down to the point where it is at 77% now.
The only real change we made was an upgrade to our server hardware at our hosting company.
I've considered disallowing the truncated URL pattern in the robots.txt, but I really shouldn't have to do that with the 301.
I'm starting to wonder whether google is sending us a signal that they like the shorter version of the URL better.
Thanks for taking the time to take a look at it.
Michael
-
Hi Micheal,
When you say you started noticing it again, this is through webmaster tools or through your own monitoring? I ask because having a look at the site I can see no technical reason why those truncated urls would be getting indexed again at first glance. Maybe it is just a matter of waiting a bit more for the last of them to get removed? If all of a sudden they have started creeping up again, it suggests some variable in the mix has changed again, but I cannot see anything that stands out.
-
Lynn,
Thanks again for helping us out with this back in May. After we made the corrections you pointed out it cleared up over the course of a few months. There were just a few truncated urls left until suddenly this week we noticed it starting again. I've looked at our 301s, our canonical/alternates, and made sure we are not linking to the truncated version anywhere, yet google continues to index the truncated version. I'm tempted to disallow the truncated version in my robots.txt file, but hesitate to do that because of the possibility of some unexpected side effects.
Do you or anyone else reading this have any idea why google would index:
http://www.eventective.com/USA/Massachusetts/Bedford/107/
rather than:
http://www.eventective.com/USA/Massachusetts/Bedford/107/Doubletree-Hotel-Boston-Bedford-Glen.html
when all links point to the latter and the former is even 301'd to the latter.
Any and all help is appreciated.
Thank you,
Michael
-
Lynn,
You nailed it. That's exactly what the problem was. Since we were using the same URL pattern for m. and www., we had created the canonical by swapping the "m" out of the current url and replacing it with "www". Since the truncated versions for mobile were in the index, they were all pointed to a truncated version for desktop.
As you pointed out, this should resolve itself over time. Now I can focus on just the ranking issue.
Thank you both Lynn and Jesse for your help.
Michael
-
Hi Micheal,
I suspect the mobile site might be responsible for the indexed urls issue. Your mobile site has loads of indexed pages with the shorter urls: https://www.google.com/#output=search&sclient=psy-ab&q=site:m.eventective.com&oq=site:m.eventective.com&fp=9861fb8dc6b3e7c
Before the 301 redirects on the mobile site were created, were the rel canonical links pointing to the truncated urls on the main site? Seems to be the case on this random page I grabbed:
So a kind of odd mixture of 301s on the main site, and a well indexed mobile site saying the rel canonical on the main site is the shorter url. Seems maybe the rel canonical won! Are you sure this is a recent issue? Maybe it has been like this for a while and just not noticed much?
I would think that with the 301s and rel canonicals now properly implemented on the mobile site then the index will slowly sort itself out. I suppose you could put a rel canonical on the main site page also referencing itself, might speed up the process a bit more.
Agree with Jesse that it is not likely a major worry and wouldn't think this alone would cause a ranking issue.
-
I'm responding to this in a semi-rushed matter as something is coming up but I just want to mention that the most likely reason for Google to index this version of your URL is because of the links pointing to it. Those which caused you to put a 301 in place, those that were 404ing prior... They are clearly demonstrating to be the authoritative URL to Google.
I'm not sure why you're worried about what the customer/user sees for URL. They are most likely looking more at the Title/Description in the SERPs well before the URL string. Most people only read the domain portion of a URL string and it's more used for the search engines purposes.. (my opinion) Also, once the user clicks your title or page they are taken to the redirect and the full URL string will be visible in the address bar of their browser.
As for why your rankings are affected... I'd be surprised if it had anything to do with this, honestly. If anything redirecting should help especially if you had links pointing to a broken page. The only exception would be if those links were poison, of course.
Okay got to run hope I was helpful. Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should we include URLs with parameters in the sitemap?
Hi, I wanted to know whether we can include URLs with search parameters in the sitemap. Currently, we are trying to append structured data for our job listing page. There happens to be a large number of job listings around 1000 pages with unique job-id and location. Should we add these pages in the sitemap or is there any other solution to this? Regards, Tejas
Algorithm Updates | | tejasbansode0 -
How long does google takes to crawl a single site ?
lately i have been thinking , when a crawler visits an already visited site or indexed site, whats the duration of its scanning?
Algorithm Updates | | Sam09schulz0 -
Remove spam url errors from search console
My site was hacked some time ago. I've since then redesigned it and obviously removed all the injection spam. Now I see in search console that I'm getting hundreds of url errors (from the spam links that no longer work). How do I remove them from the search console. The only option I see is "mark as fixed", but obviously they are not "fixed", rather removed. I've already uploaded a new sitemap and fetched the site, as well as submitted a reconsideration request that has been approved.
Algorithm Updates | | rubennunez0 -
How long for google to de-index old pages on my site?
I launched my redesigned website 4 days ago. I submitted a new site map, as well as submitted it to index in search console (google webmasters). I see that when I google my site, My new open graph settings are coming up correct. Still, a lot of my old site pages are definitely still indexed within google. How long will it take for google to drop off or "de-index" my old pages? Due to the way I restructured my website, a lot of the items are no longer available on my site. This is on purpose. I'm a graphic designer, and with the new change, I removed many old portfolio items, as well as any references to web design since I will no longer offering that service. My site is the following:
Algorithm Updates | | rubennunez
http://studio35design.com0 -
Celebrity Profile On The Side of Google For High Profile Person
Hello! When I google "Justin Timberlake" I see web search results and a sidebar. See image below: http://screencast.com/t/qwYeiFZQRzT How does one get their results to display like this? Is this something that Google creates automatically or is it something the celebrity initiates/creates on their behalf. Does the celebrity have any options to choose from as to what displays on this sidebar? What is this called? I look forward to your response. qwYeiFZQRzT
Algorithm Updates | | InternetRep0 -
Flat Structure URL vs Structured Sub-directory URL
We are finally taking our classifieds site forward and moving into a much improved URL structure, however, there is some disagreement over whether to go with a Flat URL structure or a structured sub-directory. I've browsed all of the posts and Q&A's for this going back to 2011, and still don't feel like I have a real answer. Has anyone tested this yet, or is there any consensus over ranking? I am in a disagreement with another SEO manager about this for our proposed URL structure redesign who is for it because it is what our competitors are doing. Our classifieds are geographically based, and we group by state, county, and city. Most of our traffic comes from state and county based searches. We also would like to integrate categories into the URL for some of the major search terms we see. The disagreement arises around how to structure the site. I prefer the logical sub-directory style: [sitename]/[category]/[state]/[county]/
Algorithm Updates | | newspore
mysite.com/for-sale/california/kern-county/
or
[sitename]/[category]/[county]-county-[stateabb]/
mysite.com/for-sale/kern-county-ca/ I don't mind the second, except for when you look at it in the context of the whole site: Geo Landing Pages:
mysite.com/california/
mysite.com/los-angeles-ca-90210/ Actual Search Pages:
mysite.com/for-sale/orange-ca/[filters] Detail Pages:
mysite.com/widget-type/cool-product-name/productid I want to make sure this flat structure performs better before sacrificing my analytics sanity (and ordered logic). Any case studies, tests or real data around this would be most helpful, someone at Moz must've tackled this by now!0 -
URLs contains other language than English
I am in need of your advice in regards to urls of my new sites. I have got one site from gulf region site is in English and Arabic language. The issue is we are getting url from both. Some are Arabic, do you guys think it will effect the ranking result? url example is : www.mydomain.com/بيع-بي-سيارة
Algorithm Updates | | Mustansar0 -
Stop google indexing CDN pages
Just when I thought I'd seen it all, google hits me with another nasty surprise! I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site? Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages. Anyone got an idea on how to stop that? Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there. It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS? Have you seen this problem and beat it? (Of course the next thing is Roger might look at google results and start crawling them too, LOL) P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
Algorithm Updates | | loopyal0