Indexed non existent pages, problem appeared after we 301d the url/index to the url.
-
I recently read that if a site has 2 pages that are live such as:
http://www.url.com/index and http://www.url.com/ will come up as duplicate if they are both live...
I read that it's best to 301 redirect the http://www.url.com/index and http://www.url.com/. I read that this helps avoid duplicate content and keep all the link juice on one page.
We did the 301 for one of our clients and we got about 20,000 errors that did not exist. The errors are of pages that are indexed but do not exist on the server.
We are assuming that these indexed (nonexistent) pages are somehow linked to the http://www.url.com/index
The links are showing 200 OK.
We took off the 301 redirect from the http://www.url.com/index page however now we still have 2 exaact pages, www.url.com/index and http://www.url.com/.
What is the best way to solve this issue?
-
What are some examples of the "non-existent" URLs that are getting indexed, Bryan?
It's going to be pretty hard to diagnose this without actually seeing the site.
Paul
-
Hi I am afraid this is not the issue. It is not an endless loop, usually an endless loop will not let the site load, it keeps redirecting and you can never land on a page... This is not the case. But thank you for your efforts you get a +1
-
Bryan, you might have created an infinite loop that might be causing the issue you are describing. More on the issue HERE
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to Canonicalise all filter pages (URL parameters) to the main category
Hi guys, I am working on an e-commerce site that's running in Shopify. I noticed that the filter pages do not have canonical tags pointing to their respective main categories. I doubt that the action needed is to canonicalise each filter pages to the main category as it would take time (there are a lot of filter URLs involved). Do you know any technical coding to do in Shopify to have all filter pages canonicalise to its main category? Keen to hear from you. Cheers
Intermediate & Advanced SEO | | brandonegroup0 -
Index, follow on a paginated page with a different rel=canonical URL
Hello, I have a question about meta robots ="index, follow" and rel=canonical on category page pagination. Should the sorted page be <meta name="robots" content="index,follow"></meta name="robots" content="index,follow"> since the rel="canonical" is pointing to a separate page that is different from the URL? Any thoughts on this topic would be awesome. Thanks. Main Category Page
Intermediate & Advanced SEO | | Choice
https://www.site.com/category/
<meta name="robots" content="index,follow"><link rel="canonical" href="https: www.site.com="" category="" "=""></link rel="canonical" href="https:></meta name="robots" content="index,follow"> Sorted Page
https://www.site.com/category/?p=2&dir=asc&order=name
<meta name="robots" content="index, follow"=""><link rel="canonical" href="https: www.site.com="" category="" ?p="2""></link rel="canonical" href="https:></meta name="robots" content="index,> As you can see, the meta robots is telling Google to index https://www.site.com/category/?p=2&dir=asc&order=name , yet saying the canonical page is https://www.site.com/category/?p=2 .0 -
Old URLs that have 301s to 404s not being de-indexed.
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404. Here's what I mean. Case 1 - Good page: http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200 Case 2 - Bad page that no longer exists: http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404 Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version. Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing. Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
Intermediate & Advanced SEO | | boxclever0 -
URLs are not indexed
My website has 0.5 million pages with urls like this- **http://www.mycity4kids.com/Delhi-NCR/collage-painting-classes-%3cnear%3e-shalimar-bagh ****, **none of these urls are indexed. Question 1- What can be the possible reason for this issue? Users see this url as : http://www.mycity4kids.com/Delhi-NCR/collage-painting-classes-<near>-shalimar-bagh</near>
Intermediate & Advanced SEO | | prsntsnh
The symbol "<" and ">" get converted into "%3c" and "%3e" respectively, is this the reason for these urls not getting indexed?0 -
URL Parameter Being Improperly Crawled & Indexed by Google
Hi All, We just discovered that Google is indexing a subset of our URL’s embedded with our analytics tracking parameter. For the search “dresses” we are appearing in position 11 (page 2, rank 1) with the following URL: www.anthropologie.com/anthro/category/dresses/clothes-dresses.jsp?cm_mmc=Email--Anthro_12--070612_Dress_Anthro-_-shop You’ll note that “cm_mmc=Email” is appended. This is causing our analytics (CoreMetrics) to mis-attribute this traffic and revenue to Email vs. SEO. A few questions: 1) Why is this happening? This is an email from June 2012 and we don’t have an email specific landing page embedded with this parameter. Somehow Google found and indexed this page with these tracking parameters. Has anyone else seen something similar happening?
Intermediate & Advanced SEO | | kevin_reyes
2) What is the recommended method of “politely” telling Google to index the version without the tracking parameters? Some thoughts on this:
a. Implement a self-referencing canonical on the page.
- This is done, but we have some technical issues with the canonical due to our ecommerce platform (ATG). Even though page source code looks correct, Googlebot is seeing the canonical with a JSession ID.
b. Resubmit both URL’s in WMT Fetch feature hoping that Google recognizes the canonical.
- We did this, but given the canonical issue it won’t be effective until we can fix it.
c. URL handling change in WMT
- We made this change, but it didn’t seem to fix the problem
d. 301 or No Index the version with the email tracking parameters
- This seems drastic and I’m concerned that we’d lose ranking on this very strategic keyword Thoughts? Thanks in advance, Kevin0 -
Best possible linking on site with 100K indexed pages
Hello All, First of all I would like to thank everybody here for sharing such great knowledge with such amazing and heartfelt passion.It really is good to see. Thank you. My story / question: I recently sold a site with more than 100k pages indexed in Google. I was allowed to keep links on the site.These links being actual anchor text links on both the home page as well on the 100k news articles. On top of that, my site syndicates its rss feed (Just links and titles, no content) to this page. However, the new owner made a mess, and now the site could possibly be seen as bad linking to my site. Google tells me within webmasters that this particular site gives me more than 400K backlinks. I have NEVER received one single notice from Google that I have bad links. That first. But, I was worried that this page could have been the reason why MY site tanked as bad as it did. It's the only source linking so massive to me. Just a few days ago, I got in contact with the new site owner. And he has taken my offer to help him 'better' his site. Although getting the site up to date for him is my main purpose, since I am there, I will also put effort in to optimizing the links back to my site. My question: What would be the best to do for my 'most SEO gain' out of this? The site is a news paper type of site, catering for news within the exact niche my site is trying to rank. Difference being, his is a news site, mine is not. It is commercial. Once I fix his site, there will be regular news updates all within the niche we both are in. Regularly as in several times per day. It's news. In the niche. Should I leave my rss feed in the side bars of all the content? Should I leave an achor text link on the sidebar (on all news etc.) If so: there can be just one keyword... 407K pages linking with just 1 kw?? Should I keep it to just one link on the home page? I would love to hear what you guys think. (My domain is from 2001. Like a quality wine. However, still tanked like a submarine.) ALL SEO reports I got here are now Grade A. The site is finally fully optimized. Truly nice to have that confirmation. Now I hope someone will be able to tell me what is best to do, in order to get the most SEO gain out of this for my site. Thank you.
Intermediate & Advanced SEO | | richardo24hr0 -
Index.php canonical/dup issues
Hello my fellow SEOs! I would LOVE some additional insight/opinions on the following... I have a client who is an industry leader, big site, ranks for many competitive phrases, blah blah..you get the picture. However, they have a big dup content/canonical issue. Most pages resolve with and without the /index.php at the end of the URL. Obviously this is a dup content issue but more importantly they SEs sometimes serve an "index.php" version of the page, sometimes they don't, and it is constantly changing which version it serves and the rank goes up and down. Now, I've instructed them that we are going to need to write a sitewide redirect to attempt a uniform structure. Most people would say, redirect to the non index.php version buttttt 1. The index.php pages consistently outperforms the non index.php versions, except the homepage. 2. The client really would prefer to have the "index.php" at the end of the URL The homepage performs extremely well for a lot of competitive phrases. I'd like to redirect all pages to the "index.php" version except the homepage and I'm thinking that if I redirect all pages EXCEPT the homepage to the index.php version, it could cause some unforeseen issues. I can not use rel=canonical because they have many different versions of the their pages with different country codes in the URL..example, if I make the US version canonical, it will hurt the pages trying to rank with a fr URL, de URL, (where fr/de are country codes in the URL depending where the user is, it serves the correct version). Any advice would be GREATLY appreciated. Thanks in advance! Mike
Intermediate & Advanced SEO | | MikeCoughlin0 -
Best way to stop pages being indexed and keeping PageRank
If for example on a discussion forum, what would be the best way to stop pages such as the posting page (where a user posts a topic or message) from being indexed AND not diluting PageRank too? If we added them to the Disallow on robots.txt, would pagerank still flow through the links to those blocked pages or would it stay concentrated on the linking page? Your ideas and suggestions will be greatly appreciated.
Intermediate & Advanced SEO | | Peter2640