404's being re-indexed
-
Hi All,
We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually.
However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's.
Thanks
-
Just to follow up - I have now actually 410'd the pages and the 410's are still being re-indexed.
-
I'll check this one out as well, thanks! I used a header response extension which reveals the presence of x-botots headers called web developer.
-
First it would be helpful to know how you are detecting that it isn't working. What indexation tool are you using to see whether the blocks are being detected? I personally really like this one: https://chrome.google.com/webstore/detail/seo-indexability-check/olojclckfadnlhnlmlekdihebmjpjnoa?hl=en-GB
Or obviously at scale - Screaming Frog
-
Thank you for the quick response,
The pages are truly removed, however, because there were so many of these types of pages that leaked into the index, I added a redirect to keep users on our site - no intentions of being "shady", I just didn't want hundreds of 404's getting clicked and causing a very high bounce rate.
For the x-robots header, could you offer some insight into why my directive isn't working? I believe it's a regex issue on the wp-content. I have tried to troubleshoot to no avail.
<filesmatch <strong="">"(wp-content)">
Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>I appreciate the help!
-
Well if a page has been removed and has not been moved to a new destination - you shouldn't redirect a user anyway (which kind of 'tricks' users into thinking the content was found). That's actually bad UX
If the content has been properly removed or was never supposed to be there, just leave it at a 410 (but maybe create a nice custom 410 page, in the same vein as a decent UX custom 404 page). Use the page to admit that the content is gone (without shady redirects) but to point to related posts or products. Let the user decide, but still be useful
If the content is actually still there and, hence you are doing a redirect - then you shouldn't be serving 404s or 410s in the first place. You should be serving 301s, and just doing HTTP redirects to the content's new (or revised) destination URL
Yes, the HTTP header method is the correct replacement when the HTML implementation gets stripped out. HTTP Header X-Robots is the way for you!
-
Thank you! I am in the process of doing so, however with a 410 I can not leave my JS redirect after the page loads, this creates some UX issues. Do you have any suggestions to remedy this?
Additionally, after the 410 the non x-robots noindex is now being stripped so it only resolves to a 410 with no noindex or redirect. I am still working on a noindex header, as the 410 is server-side, I assume this would be the only way, correct?
-
You know that 404 means "temporarily gone but will be coming back" right? By saying a page is temporarily unavailable, you actively encourage Google to come back later
If you want to say that the page is permanently gone use status code 410 (gone)
Leave the Meta no-index stuff in the HTTP header via X-Robots, that was a good call. But it was a bad call to combine Meta no-index and 404, as they contradict each other ("don't index me now but then do come back and index me later as I'll probably be back at some point")
Use Meta no-index and 410, which agree with each other ("don't index me now and don't bother coming back")
-
Yes, all pages have a noindex. I have also tried to noindex them using htaccess, to add an extra layer of security, but it seems to be incorrect. I believe it is an issue with the regex. Attempting to match anything with wp-content.
<filesmatch "(wp-content)"="">Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>
-
Back to basics. Have you marked those pages/posts as 'no-index'. With many wp plugins, you can no-index them in bulk then submit for re-indexation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Specific pages won't index
I have a few pages on my site that Google won't index, and I can't understand why. I've looked into possible issues with Robots, noindex, redirects, canonicals, and Search Console rules. I've got nothing. Example: I want this page to index https://tour.franchisebusinessreview.com/services/franchisee-satisfaction-surveys/ When I Google the full URL, I get results including the non-subdomain homepage, and various pages on the subdomain, including a child page of the page I want, but not the page itself. Any ideas? Thanks for the help!
Technical SEO | | ericstites0 -
Akamai's Edge Redirector good for SEO?
Hey guys, Just wondering if anyone has used/tested Akamai's new 'Edge Redirector' cloudlet?http://www.akamai.com/html/technology/edge-redirector.html It seems like it would be a better/faster option than redirects at the server level via htaccess.. thoughts? Thanks!,
Technical SEO | | wojkwasi
Woj1 -
The importance of url's - are they that important?
Hi Guys I'm reading some very contrasting and confusing reviews regarding urls and the impact they have on a sites ability to rank. My client has a number of flooring products, 71 to be exact - categorised under three sub categories 1. Gallery Wood - 2. Prefinshed Wood - 3. Parquet & Reclaimed. All of the 71 products are branded products (names that are completely unrelated to specific keyword search terms. This is having a major impact regarding how we optimise the site. FOR EXAMPLE: A product of the floor called "White Grain" - the "Key Word" we would like to rank this page for is Brown Engineered Flooring. I'm interested to know, should the name of the branded product match the url? What would you change to help this page rank better for the keyword - Brown Engineered Flooring. Title page: White Grain Url: thecompanyname.com/gallery-wood/white-grain (white grain is the name of the product) Key Word: Brown Engineered Flooring **Seo Title: **White Grain, Brown Engineered Flooring by X Meta Description: BLAH BLAH Brown Engineered Flooring BLAH BLAH Any feedback to help get my head around this would be really appreciated. Thank you.
Technical SEO | | GaryVictory0 -
Why is the report telling I have duplicate content for 'www' and No subdomain?
i am getting duplicate content for most of my pages. when i look into in your reports the 'www' and 'no subdomian' are the culprit. How can I resolve this as the www.domain.com/page and domain.com/page are the same page
Technical SEO | | cpisano0 -
What's our easiest, quickest "win" for page load speed?
This is a follow up question to an earlier thread located here: http://www.seomoz.org/q/we-just-fixed-a-meta-refresh-unified-our-link-profile-and-now-our-rankings-are-going-crazy In that thread, Dr. Pete Meyers said "You'd really be better off getting all that script into external files." Our IT Director is willing to spend time working on this, but he believes it is a complicated process because each script must be evaluated to determine which ones are needed "pre" page load and which ones can be loaded "post." Our IT Director went on to say that he believes the quickest "win" we could get would be to move our SSL javascript for our SSL icon (in our site footer) to an internal page, and just link to that page from an image of the icon in the footer. He says this javascript, more than any other, slows our page down. My question is two parts: 1. How can I verify that this javascript is indeed, a major culprit of our page load speed? 2. Is it possible that it is slow because so many styles have been applied to the surrounding area? In other words, if I stripped out the "Secured by" text and all the syles associated with that, could that effect the efficiency of the script? 3. Are there any negatives to moving that javascript to an interior landing page, leaving the icon as an image in the footer and linking to the new page? Any thoughts, suggestions, comments, etc. are greatly appreciated! Dana
Technical SEO | | danatanseo0 -
Moz Reporting Incorrect 404's
Hi Guys SEOMoz is telling me that we have 191 404 errors f. I have checked this with several other crawlers and this not the case. For example, http://www.opticalexpress.co.uk/eyecare/corporate-savings.html%0D%0A2027 But correct links its http://www.opticalexpress.co.uk/eyecare/corporate-savings.html which is fine... We have no record of these links so why is it appending these characters at the end of the URL which is causing the 404's....
Technical SEO | | EwanFisher0 -
Any idea why our sitemap images aren't indexed?
Here's our sitemap: http://www.driftworks.com/shop/sitemap/dw_sitemap.xml In google webmaster tools, I can see the sitemap report and it says: Items:Web Submitted:2,798 Indexed:2,910 Items:Images Submitted:3,178 Indexed:0 Do you have any idea why our images are not being indexed according to webmaster tools? I checked a few of the image URLs and they worked nicely. Thanks in advance, J
Technical SEO | | DWJames0 -
I want my Meta Description re-indexed fast!
We have an old meta description that advertises an old offer (FREE X if you Buy Y) that we are no longer running on the site. I changed the meta description, now what is the fastest way I can get Google to update their SERP with the new description?
Technical SEO | | pbhatt0