404's being re-indexed
-
Hi All,
We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually.
However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's.
Thanks
-
Just to follow up - I have now actually 410'd the pages and the 410's are still being re-indexed.
-
I'll check this one out as well, thanks! I used a header response extension which reveals the presence of x-botots headers called web developer.
-
First it would be helpful to know how you are detecting that it isn't working. What indexation tool are you using to see whether the blocks are being detected? I personally really like this one: https://chrome.google.com/webstore/detail/seo-indexability-check/olojclckfadnlhnlmlekdihebmjpjnoa?hl=en-GB
Or obviously at scale - Screaming Frog
-
Thank you for the quick response,
The pages are truly removed, however, because there were so many of these types of pages that leaked into the index, I added a redirect to keep users on our site - no intentions of being "shady", I just didn't want hundreds of 404's getting clicked and causing a very high bounce rate.
For the x-robots header, could you offer some insight into why my directive isn't working? I believe it's a regex issue on the wp-content. I have tried to troubleshoot to no avail.
<filesmatch <strong="">"(wp-content)">
Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>I appreciate the help!
-
Well if a page has been removed and has not been moved to a new destination - you shouldn't redirect a user anyway (which kind of 'tricks' users into thinking the content was found). That's actually bad UX
If the content has been properly removed or was never supposed to be there, just leave it at a 410 (but maybe create a nice custom 410 page, in the same vein as a decent UX custom 404 page). Use the page to admit that the content is gone (without shady redirects) but to point to related posts or products. Let the user decide, but still be useful
If the content is actually still there and, hence you are doing a redirect - then you shouldn't be serving 404s or 410s in the first place. You should be serving 301s, and just doing HTTP redirects to the content's new (or revised) destination URL
Yes, the HTTP header method is the correct replacement when the HTML implementation gets stripped out. HTTP Header X-Robots is the way for you!
-
Thank you! I am in the process of doing so, however with a 410 I can not leave my JS redirect after the page loads, this creates some UX issues. Do you have any suggestions to remedy this?
Additionally, after the 410 the non x-robots noindex is now being stripped so it only resolves to a 410 with no noindex or redirect. I am still working on a noindex header, as the 410 is server-side, I assume this would be the only way, correct?
-
You know that 404 means "temporarily gone but will be coming back" right? By saying a page is temporarily unavailable, you actively encourage Google to come back later
If you want to say that the page is permanently gone use status code 410 (gone)
Leave the Meta no-index stuff in the HTTP header via X-Robots, that was a good call. But it was a bad call to combine Meta no-index and 404, as they contradict each other ("don't index me now but then do come back and index me later as I'll probably be back at some point")
Use Meta no-index and 410, which agree with each other ("don't index me now and don't bother coming back")
-
Yes, all pages have a noindex. I have also tried to noindex them using htaccess, to add an extra layer of security, but it seems to be incorrect. I believe it is an issue with the regex. Attempting to match anything with wp-content.
<filesmatch "(wp-content)"="">Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>
-
Back to basics. Have you marked those pages/posts as 'no-index'. With many wp plugins, you can no-index them in bulk then submit for re-indexation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's best practice for cart pages?
i don't mean e-commerce site in general, but the actual cart page itself. What's best practice for the links that customers click to add products to the cart, and the cart page itself? Also, I use vanity URLs for my cart links which redirect to the actual cart page with the parameters applied. Should I use use 301 or 302 redirects for the links? Do I make the cart page's canonical tag point back to the store home page so that I'm not accruing link juice to a page that customers don't actually want to land on from search? I'm kinda surprised at the dearth of information out there on this, or maybe I'm not looking in the right places?
Technical SEO | | VM-Oz0 -
Should I create a new site or keep company on parent company's subdomain?
I am working with a realty company that is hosted on a subdomain of the larger, parent realty company: [local realty company].[parent realty company].com How important is it to ride on the DA of the larger company (only about a 40)? I'm trying to weigh the value of creating an entirely separate domain for simplicity of the end user and Google bots: [local company].realtor They don't have any substantial links to their subdomain, so it wouldn't a huge loss. I have a couple options... Create an entirely new site on their current subdomain, leveraging the DA of the larger parent company. Create an entirely new site on a new URL, starting from scratch (which doesn't hurt you as much as it seems it once did). Create two sites, a micro site that targets a sector of their audience that they really want to reach, plus option (1) or (2). Love this community!
Technical SEO | | Gabe_BlueGuru0 -
My Homepage Won't Load if Javascript is Disabled. Is this an SEO/Indexation issue?
Hi everyone, I'm working with a client who recently had their site redesigned. I'm just going through to do an initial audit to make sure everything looks good. Part of my initial indexation audit goes through questions about how the site functions when you disable, javascript, cookies, and/or css. I use the Web Developer extension for Chrome to do this. I know, more recently, people have said that content loaded by Javascript will be indexed. I just want to make sure it's not hurting my clients SEO. http://americasinstantsigns.com/ Is it as simple as looking at Google's Cached URL? The URL is definitely being indexed and when looking at the text-only version everything appears to be in order. This may be an outdated question, but I just want to be sure! Thank you so much!
Technical SEO | | ccox10 -
Site Not Being Indexed
Hey Everyone - I have a site that is being treated strangely by google (at least strange to me) The site has 24 pages in the sitemap - submitted to WMT'S over 30 days ago I've manually triggered google to crawl the homepage and all connecting links as well and submitted a couple individually. Google has been parked the indexing at 14 of the 24 pages. None of the unindexed URL's have Noindex or follow tags on them - they are clearly and easily linked to from other places on the site. The site is a brand new domain, has no manual penalty history and in my research has no reason to be considered spammy. 100% unique handwritten content I cannot figure out why google isn't indexing these pages. Has anyone encountered this before? Know any solutions? Thanks in advance.
Technical SEO | | CRO_first0 -
How should I close my forum in a way that's best for SEO?
Hi Guys, I have a forum on a subdomain and it is no longer used. (like forum.mywebsite.com) It kind of feels like a dead limb and I don't know what's best to do for SEO. Should I just leave it as it is and let it stagnate? There is a link in the nav menu to the main domain so users have a chance to find the main domain. Or should I remove it and just redirect the whole subdomain to the main domain? I don't know if redirects would work as I doubt most of the threads would match our articles, plus there are 700 of them. The main domain is PR3 and so is the forum subdomain. Please help!
Technical SEO | | HCHQ0 -
No confirmation page on Google's Disavow links tool?
I've been going through and doing some spring cleaning on some spammy links to my site. I used Google's Disavow links tool, but after I submit my text file, nothing happens. Should I be getting some sort of confirmation page? After I upload my file, I don't get any notifications telling me Google has received my file or anything like that. It just takes me back to this page: http://cl.ly/image/0S320q46321R/Image 2013-04-26 at 11.15.25 AM.png Am I doing something wrong or is this what everyone else is seeing too?
Technical SEO | | shawn810 -
New EMD update effected my mom's legit author page? From page 1 in SERP to nowhere for her name
I think my mom's site, MargaretTerry.com was hit by this update for her name "Margaret Terry". Went from bouncing around the first page on google.com and .ca all the time to nowhere on the index. The results are now very strange, a mix of Youtube, linked in, and small book stores that she has done events at recently to promote her first book. I was checking after some of my SEO buddys were freaking out about their EMD's getting hit on Sunday. She is an aspiring author with a book coming out this month. There is obviously no ads or spam content on the site... I have never done SEO for it either except a bit of on page I guess. It sucks that people might be grabbing her book soon and when they Google her name nothing shows up. This couldn't have really happened at a worse time. Not to mention the hours spent building the site to her liking, free of charge of course 🙂 Is there anyone I can contact there to help me out? Shouldn't and EMD that is someones name still rank when you search their name?
Technical SEO | | Operatic0 -
I want my Meta Description re-indexed fast!
We have an old meta description that advertises an old offer (FREE X if you Buy Y) that we are no longer running on the site. I changed the meta description, now what is the fastest way I can get Google to update their SERP with the new description?
Technical SEO | | pbhatt0