Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Will Google Recrawl an Indexed URL Which is No Longer Internally Linked?
-
We accidentally introduced Google to our incomplete site. The end result: thousands of pages indexed which return nothing but a "Sorry, no results" page. I know there are many ways to go about this, but the sheer number of pages makes it frustrating.
Ideally, in the interim, I'd love to 404 the offending pages and allow Google to recrawl them, realize they're dead, and begin removing them from the index. Unfortunately, we've removed the initial internal links that lead to this premature indexation from our site.
So my question is, will Google revisit these pages based on their own records (as in, this page is indexed, let's go check it out again!), or will they only revisit them by following along a current site structure?
We are signed up with WMT if that helps.
-
What we run into often is that on larger sites there 1) still are internal links to those pages from old blog posts etc. You have to really scrub your site to find those and manually update. I am only mentioning this as unless you used a tool to crawl the site and looked at it with a fine toothed comb, you might be surprised to find the links you missed 2) there are still external links to those pages. That said, even if 1 and 2 are not met, Google will still recrawl (although not as often). Google assumes that any initial 404 or even 301 may be a temporary error and so checks back. I have seen urls that we removed over a year ago, Google will still ping them. They really hang onto stuff. I have not gone as far as the 301 to a directory that I deindex, but generally just watch to see them show up and then fall out of Webmaster Tools and then I move on.
-
Right, but having lots of 404's that are still indexed probably isn't good for your site in general. If you wanted them de-indexed, 301'ing them to a new folder and filing a single removal request for that entire directory would probably work.
Thanks for the help. I've heard from a few people that they will recrawl these pages again even if nothing is linking to them. That's reassuring. Thanks all.
-
No reason other than finding all those 404 pages and doing individual URL removals for each isn't a very productive task. 404s generally have no impact on search rankings.
-
Interesting. Any reason why you haven't simply filed a removal request? I feel if there's too many to manually do, you could 301 them to a specific directory and then manually remove that directory all at once?
-
Hi Martijn,
Thanks for the response. I must apologize as I left out an important detail. While are pages are "No results" and basically useless to the user, they're not actually 404'd pages. They're live, valid pages that basically offer nothing.
As I stated earlier, 404'ing them would be ideal for us if we could be sure Google would recrawl them. I am hesitant due to uncertainty of Googlebot re-crawling unlinked internal links. Our deeper pages like these have not been updated/recrawled yet, so I'm a bit unsure as to how likely they will.
I guess I should just go ahead and 404 all of them now and see what happens, since it can't hurt. Just curious about Googlebot in general since it always helps to know more!
-
Don't count on Google dropping those 404ing pages from the index any time soon. We have pages that have 404d for over a year and they're still in the index.
-
They'll eventually drop these pages as they already know where to find them and as they give the proper 404 header they know that's a sign to drop them. In most cases pages that 404 are already not linked from any other pages so that will also be a sign to search engines that the specific pages aren't important anymore.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
Google not Indexing images on CDN.
My URL is: https://bit.ly/2hWAApQ We have set up a CDN on our own domain: https://bit.ly/2KspW3C We have a main xml sitemap: https://bit.ly/2rd2jEb and https://bit.ly/2JMu7GB is one the sub sitemaps with images listed within. The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: https://bit.ly/2FAWJjk. Yet, GWT still reports none of our images on the CDN are indexed. I ve followed all the steps and still none of the images are being indexed. My problem seems similar to this ticket https://bit.ly/2FzUnBl but however different because we don't have a separate image sitemap but instead have listed image urls within the sitemaps itself. Can anyone help please? I will promptly respond to any queries. Thanks
Technical SEO | | TNZ
Deepinder0 -
Google will index us, but Bing won't. Why?
Bing is crawling our site, but not indexing it, and we cannot figure out why -- plus it's being indexed fine in Google. Any ideas on what the issue with Bing might be? Here's are some details to let you know what we've already checked/established: We have 4 301’s and the rest of our site checks out We’ve already established our Robots is ok, and that we are fixing our site map/it's in fine shape We do not see anything blocking bingbot access to the site There is no varnish or any load balancers, so nothing on that end that would be blocking the access We also don't see any rules in the apache or the .htaccess config that would be blocking the access
Technical SEO | | Alex_RevelInteractive1 -
How To Cleanup the Google Index After a Website Has Been HACKED
We have a client whose website was hacked, and some troll created thousands of viagra pages, which were all indexed by Google. See the screenshot for an example. The site has been cleaned up completely, but I wanted to know if anyone can weigh in on how we can cleanup the Google index. Are there extra steps we should take? So far we have gone into webmaster tools and submitted a new site map. ^802D799E5372F02797BE19290D8987F3E248DCA6656F8D9BF6^pimgpsh_fullsize_distr.png
Technical SEO | | yoursearchteam0 -
Why google indexed pages are decreasing?
Hi, my website had around 400 pages indexed but from February, i noticed a huge decrease in indexed numbers and it is continually decreasing. can anyone help me to find out the reason. where i can get solution for that? will it effect my web page ranking ?
Technical SEO | | SierraPCB0 -
Why is my blog disappearing from Google index?
My Google blogger blog is about 10 months old. In that time i have worked really hard with adding unique content, building relationships with other bloggers in the same niche, and done some inbound marketing. 2 weeks ago I updated the template to something cleaner, with a little more "wordpress" feel to it. This means i've messed about with the code a lot in these weeks, adding social buttons etc. The problem is that from some point late last week thurs/fri my pages started disappearing from Googles index. I have checked webmaster tools and have no manual actions. My link profile is pretty clean as its a new site, and i have manually checked every piece of content published for plagiarism etc. So what is going on? Did i break my blog? Or is something else amiss? Impressions are down 96% comparing Nov 1-5th to previous 5 days. site is here: http://bit.ly/174beVm Thanks for any help in advance.
Technical SEO | | Silkstream0 -
Can Google read onClick links?
Can Google read and pass link juice in a link like this? <a <span="">href</a><a <span="">="#Link123" onClick="window.open('http://www.mycompany.com/example','Link123')">src="../../img/example.gif"/></a> Thanks!
Technical SEO | | jorgediaz0 -
What tool do you use to check for URLs not indexed?
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing. Thanks, Mark
Technical SEO | | elephantseo3