404'd pages still in index
-
I recently launched a site and shortly after performed a URL rewrite (not the greatest idea, i know). The developer 404'd the old pages instead of a permanent 301 redirect. This caused a mess in the index. I have tried to use Google's removal tool to remove these URL's from the index. These pages were being removed but now I am finding them in the index as just URL's to the 404'd page (i.e. no title tag or meta description). Should I wait this out or now go back and 301 redirect the old URL's (that are 404'd now) to the new URL's? I am sure this is the reason for my lack of ranking as the rest of my site is pretty well optimized and I have some quality links.
-
Will do. Thanks for the help.
-
I think the latter - robot and 301.
but (if you can) leave a couple without 301 and see what (if any) difference you get - would love to hear how it works out.
-
Is it better to remove the robots.txt entries that are specific to the old URL's so Google can see the 404 so Google will remove those pages at their own pace or remove those bits of the robots.txt file specific to the old URL's and 301 them to the new URL's. It seems those are my two options....? Obviously, I want to do what is best for the site's rankings and will see the fastest turnaround. Thanks for your help on this by the way!
-
I'm not saying remove the whole robots.txt file - just the bits relating to the old urls (if you have entries in a robots.txt that affect the old urls).
e.g. say you're robots.txt blocks access to
then you should remove that line from the robots.txt otherwise google won't be able to crawl those pages to 'see' the 404 and realise that they're not there.
My guess is a few weeks before it all settles down, but that really is a finger in the air guess. I went through a similar scenario with moving urls and then moving them again shortly after the first move - took a month or two.
-
I am a little confused regarding removal of the robots.txt file since that is a step in requesting removal from google (per their removal tool requirements). My natural tendency is to 301 redirect the old URL's to the new ones. Will I need to remove the robots.txt file prior to permanently redirecting the old URL's to the new ones? How long does it take Google (estimate) to remove old URL's after a 301?
-
Ok, got that, so that sounds like an external rewrite - which is fine. url only, but no title or description - that sounds like what you get when you block crawling via robots.txt - if you've got that situation, I'd suggest removing the block so that google can crawl them and find that they are 404s. Sounds like they'll fall out of the index eventually. Another thing you could try to hurry things along is: 301 the old urls to the new ones. submit a sitemap containing the old urls (so that they get crawled and the 301s are picked up) update your sitemap and resubmit with only the new urls.
-
When I say URL rewrite, I mean we restructured the URL's to be cleaner and more search friendly. For example, take a URL that was www.example.com/index/home/keyword and structure it to be www.example.com/keyword. Also, the old URL's (i.e. www.example.com/index/home/keyword) are being shows towards the end of the site:example.com search with just the old URL - no title or meta description. Is this a sign that they are on the way out of the index? Any insight would be helpful.
-
Couple of things probably need clarifying: When you say URL rewrite, I'm assuming you mean an external rewrite (in effect, a redirect)? If you do an internal rewrite, that (of itself) should make no difference at all to how any external visitors/engines see your urls/pages. If the old pages had links or traffic I would be inclined to 301 them to the new pages. If the old pages didn't have traffic/links, leave them, they'll fall out eventually - they're not in an xml sitemap by any chance are they (in which case update the sitemap). You often see a drop in rankings when restructuring a site and (in my experience), it can take a few weeks to recover. To give you an example, it took nearly two months for the non-www version of our site to disappear from the index after a similar move (and messing about with redirects).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Strange: page no longer present in SERPS and I'm not sure why
I indexed a new page last week and it ranked 1st The page is still live, still registering sessions in analytics, registering activity in search console Why is it no longer present for the keyword in ranked first for on Friday?
Intermediate & Advanced SEO | | Jacksons_Fencing0 -
What's the best way to noindex pages but still keep backlinks equity?
Hello everyone, Maybe it is a stupid question, but I ask to the experts... What's the best way to noindex pages but still keep backlinks equity from those noindexed pages? For example, let's say I have many pages that look similar to a "main" page which I solely want to appear on Google, so I want to noindex all pages with the exception of that "main" page... but, what if I also want to transfer any possible link equity present on the noindexed pages to the main page? The only solution I have thought is to add a canonical tag pointing to the main page on those noindexed pages... but will that work or cause wreak havoc in some way?
Intermediate & Advanced SEO | | fablau3 -
Content From One Domain Mysteriously Indexing Under a Different Domain's URL
I've pulled out all the stops and so far this seems like a very technical issue with either Googlebot or our servers. I highly encourage and appreciate responses from those with knowledge of technical SEO/website problems. First some background info: Three websites, http://www.americanmuscle.com, m.americanmuscle.com and http://www.extremeterrain.com as well as all of their sub-domains could potentially be involved. AmericanMuscle sells Mustang parts, Extremeterrain is Jeep-only. Sometime recently, Google has been crawling our americanmuscle.com pages and serving them in the SERPs under an extremeterrain sub-domain, services.extremeterrain.com. You can see for yourself below. Total # of services.extremeterrain.com pages in Google's index: http://screencast.com/t/Dvqhk1TqBtoK When you click the cached version of there supposed pages, you see an americanmuscle page (some desktop, some mobile, none of which exist on extremeterrain.com😞 http://screencast.com/t/FkUgz8NGfFe All of these links give you a 404 when clicked... Many of these pages I've checked have cached multiple times while still being a 404 link--googlebot apparently has re-crawled many times so this is not a one-time fluke. The services. sub-domain serves both AM and XT and lives on the same server as our m.americanmuscle website, but answer to different ports. services.extremeterrain is never used to feed AM data, so why Google is associating the two is a mystery to me. the mobile americanmuscle website is set to only respond on a different port than services. and only responds to AM mobile sub-domains, not googlebot or any other user-agent. Any ideas? As one could imagine this is not an ideal scenario for either website.
Intermediate & Advanced SEO | | andrewv0 -
How to properly 404 pages from a subdomain
SO I am working on a site that had a subdomain that attracted a lot of spammy links. I researched the backlinks to this subdomain, and there were no beneficial links at all. I am thinking the best thing is to 404 this subdomain. What is the best way to do this? Should I just edit the DNS settings so that this subdomain does not point to the root domain? Or is there something that should be done in webmaster tools? Thanks in advance!
Intermediate & Advanced SEO | | evan890 -
Google indexing "noindex" pages
1 weeks ago my website expanded with a lot more pages. I included "noindex, follow" on a lot of these new pages, but then 4 days ago I saw the nr of pages Google indexed increased. Should I expect in 2-3 weeks these pages will be properly noindexed and it may just be a delay? It is odd to me that a few days after including "noindex" on pages, that webmaster tools shows an increase in indexing - that the pages were indexed in other words. My website is relatively new and these new pages are not pages Google frequently indexes.
Intermediate & Advanced SEO | | khi50 -
Home page not being indexed
Hi Moz crew. I have two sites (one is a client's and one is mine). They are both Wordpress sites and both are hosted on WP Engine. They have both been set up for a long time, and are "on-page" optimized. Pages from each site are indexed, but Google is not indexing the homepage for either site. Just to be clear - I can set up and work on a Wordpress site, but am not a programmer. Both seem to be fine according to my Moz dashboard. I have Webmaster tools set up for each - and as far as I can tell (definitely not an exper in webmaster tools) they are okay. I have done the obvious and checked that the the box preventing Google from crawling is not checked, and I believe I have set up the proper re-directs and canonicals.Thanks in advance! Brent
Intermediate & Advanced SEO | | EchelonSEO0 -
Indexing a several millions pages new website
Hello everyone, I am currently working for a huge classified website who will be released in France in September 2013. The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it. Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ? The website will cover about 300 jobs : In all region (= 300 * 22 pages) In all departments (= 300 * 101 pages) In all cities (= 300 * 37 000 pages) Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ? More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ? One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet. Thanks for your help ! Best Regards, Raphael
Intermediate & Advanced SEO | | Pureshore0 -
Getting Pages Requiring Login Indexed
Somehow certain newspapers' webpages show up in the index but require login. My client has a whole section of the site that requires a login (registration is free), and we'd love to get that content indexed. The developer offered to remove the login requirement for specific user agents (eg Googlebot, et al.). I am afraid this might get us penalized. Any insight?
Intermediate & Advanced SEO | | TheEspresseo0