Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Is 404'ing a page enough to remove it from Google's index?
-
We set some pages to 404 status about 7 months ago, but they are still showing in Google's index (as 404's). Is there anything else I need to do to remove these?
-
Nice information John. I hadn't thought of adding a temporary page with a noindex tag but that sounds like a way to go for faster results.
I know Google has automatically removed 404 pages in the past. I noticed the issue Michelle is sharing and the information you shared offers great details on the process.
-
Setting pages to 404 should be enough to remove them after Google indexes your page enough times. Google has to be careful about this, because when many sites crash or have site maintenance, they return 404 instead of 503, so Google wouldn't want to remove pages from their index until they're sure the page is gone.
Google talks about removing pages from there index here. The Google Webmaster Tools URL removal tool is only intended for pages that urgently need to be removed, so I wouldn't recommend that. Google recommends:
- If the page no longer exists, make sure that the server returns a 404 (Not Found) or 410 (Gone) HTTP status code. This will tell Google that the page is gone and that it should no longer appear in search results.
- If the page still exists but you don't want it to appear in search results, use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page.
- Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it. This is a good solution if you don't have direct access to the site server. (You will need to be able to edit the HTML source of the page).
Is there a reason you are 404'ing these pages rather than redirecting them? If these pages have new pages with similar content, you should do a 301 redirect to keep the link juice flowing and to take advantage of these pages being linked to. If you do continue returning 404 for these pages (or even if you don't...), make sure your 404 page is a useful one, that helps users find the page they're looking for (Google help article).
Also, Ryan, I'd be interested in hearing the results of using the 410 status code. I would imagine that status code would do the trick! I'm surprised I haven't read about this more, or why it's not mentioned in the help file linked to above.
-
I have experienced this same issue with Google.
I just began a test by making a change on my site to one of the URLs. I am bookmarking this Q&A and will try to remember to update it if I see a change. It can take Google some time to check any individual link so it could take weeks.
In case you are curious, I have added a 410 status code for one of the pages involved. 410 means the resource is gone, while 404 is simply not found. Perhaps the 410 header code will send the right message to Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
Google does not want to index my page
I have a site that is hundreds of page indexed on Google. But there is a page that I put in the footer section that Google seems does not like and are not indexing that page. I've tried submitting it to their index through google webmaster and it will appear on Google index but then after a few days it's gone again. Before that page had canonical meta to another page, but it is removed now.
Intermediate & Advanced SEO | | odihost0 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed.
Intermediate & Advanced SEO | | alphonsehaThe above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!?
Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
0 -
Best way to remove full demo (staging server) website from Google index
I've recently taken over an in-house role at a property auction company, they have a main site on the top-level domain (TLD) and 400+ agency sub domains! company.com agency1.company.com agency2.company.com... I recently found that the web development team have a demo domain per site, which is found on a subdomain of the original domain - mirroring the site. The problem is that they have all been found and indexed by Google: demo.company.com demo.agency1.company.com demo.agency2.company.com... Obviously this is a problem as it is duplicate content and so on, so my question is... what is the best way to remove the demo domain / sub domains from Google's index? We are taking action to add a noindex tag into the header (of all pages) on the individual domains but this isn't going to get it removed any time soon! Or is it? I was also going to add a robots.txt file into the root of each domain, just as a precaution! Within this file I had intended to disallow all. The final course of action (which I'm holding off in the hope someone comes up with a better solution) is to add each demo domain / sub domain into Google Webmaster and remove the URLs individually. Or would it be better to go down the canonical route?
Intermediate & Advanced SEO | | iam-sold0 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Limit on Google Removal Tool?
I'm dealing with thousands of duplicate URL's caused by the CMS... So I am using some automation to get through them - What is the daily limit? weekly? monthly? Any ideas?? thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
Creating 100,000's of pages, good or bad idea
Hi Folks, Over the last 10 months we have focused on quality pages but have been frustrated with competition websites out ranking us because they have bigger sites. Should we focus on the long tail again? One option for us is to take every town across the UK and create pages using our activities. e.g. Stirling
Intermediate & Advanced SEO | | PottyScotty
Stirling paintball
Stirling Go Karting
Stirling Clay shooting We are not going to link to these pages directly from our main menus but from the site map. These pages would then show activities that were in a 50 mile radius of the towns. At the moment we have have focused our efforts on Regions, e.g. Paintball Scotland, Paintball Yorkshire focusing all the internal link juice to these regional pages, but we don't rank high for towns that the activity sites are close to. With 45,000 towns and 250 activities we could create over a million pages which seems very excessive! Would creating 500,000 of these types of pages damage our site? This is my main worry, or would it make our site rank even higher for the tougher keywords and also get lots of traffic from the long tail like we used to get. Is there a limit to how big a site should be? edit0 -
Should I use both Google and Bing's Webmaster Tools at the same time?
Hi All, Up till now I've been registered only to Google WMT. Do you recommend using at the same time Bing's WMT? Thanks
Intermediate & Advanced SEO | | BeytzNet0