Indexing non-indexed content and Google crawlers
-
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website.
I've added a 'noindex' meta tag to articles that shouldn't be live until a future date.
When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page?
Thanks.
-
Wow! Nice detective work! I could see how that one would slip under the radar.
Congrats on finding a needle in a haystack!
You should buy yourself the adult beverage of your choice and have a little toast!
Cheers!
-
-
I think Screaming Frog has a trial version, I forget if it limits total number of pages etc. as we bought it a while ago. At least you can try out and see. May be others who have more tools as well.
-
Thanks. I agree I need to get rid of that noindex. The site is new and doesn't have much in the way of tag clouds etc. yet, so it's not like we have a lot of pages to check.
I've used the link: attribute to try and find the offending links each time, but nothing showed up. I use Xenu Link Sleuth rather than Screaming Frog, and I can't find a way to find backlinks with Xenu. Do you know if you can with the free version of Screaming Frog? I've seen the free version described as "almost fully functional" - the number of crawlable links seems to be the main restriction.
-
I like the automated sitemap answer for the cause (as this has bitten me before), but you mentioned you do not have that. I would still bet that somewhere on your web site you are linking to the page that you do not want indexed. It could be a tag cloud page or some other index page. We had a site that it would accidentally publish out articles on our home page ahead of schedule. Point here is that when you have a dynamic site with a CMS, you really have to be on your toes with stuff like this as the automation can get you into situations like this.
I would not use the noindex tag and remove it later. My concern would be that you are sending conflicting signals to Google. noindex tells good to remove this page from the index.
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it." from GWT
When I read that - it sounds like this is not what you want for this page.
You could also setup your system to show a 404 on the URL until the content is live and then let it 200, but you run into the same issue of Google getting 2 opposite signals on the same page. Either way, if you first give the signal to Google that you do not want something indexed, you are at the mercy of the next crawl to see if Google looks at it again.
Regardless, you need to get to the crux of the issue, how is Google finding this URL?
I would use a 3rd party spider tool. We have used Screaming Frog SEO Spider. There are others out there. You would be amazed what they find. The key to this tool is that when it finds something, it also tells you on what page it found it. We have big sites with thousands of pages and we have used it to find broken links to images and links to pages on our site that now 404. Really handy to clean things up. I bet it would find where there is a link on your site that contains the page (or pages) that link to the content. You can then update that page and not have to worry about using noindex etc. Also not that the spiders are much better than humans at finding this stuff. Even if you have looked, the spider looks at things differently.
It also may be as simple as searching for the URL on the web with the link: attribute. Google may show you where it is finding the link.
Good luck and please post back what you find. This is kind of like one of those "who dun it?" mystery shows!
-
There is no automated sitemap. We checked every page we could, including feeds.
-
Do you have an automated sitemap? On at least one occasion, I've found that to be a culprit.
Noindex means it won't be kept in the index. It doesn't mean it won't be crawled. I'm not sure how it would affect crawl timing , tho. I would assume that Google would assume that you would want things not indexed crawled less frequently. Something to potentially try is to use the GWT Fetch as Googlebot tool to force a new crawl of the page and see if that gets it in the index any faster.
http://googlewebmastercentral.blogspot.com/2011/08/submit-urls-to-google-with-fetch-as.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google News error in Google Search Console
My google search console states some errors as below: 1. Article fragmented Some of the urls in this error are the category urls. How to make google bot understand it is a category not an article? 2. Article too short In fact the article is quite long. I do not know why this is happen... 3. No sentence found In fact, there are a lot of sentences Please help!
Intermediate & Advanced SEO | | binhlai0 -
Product Pages not indexed by Google
We built a website for a jewelry company some years ago, and they've recently asked for a meeting and one of the points on the agenda will be why their products pages have not been indexed. Example: http://rocks.ie/details/Infinity-Ring/7170/ I've taken a look but I can't see anything obvious that is stopping pages like the above from being indexed. It has a an 'index, follow all' tag along with a canonical tag. Am I missing something obvious here or is there any clear reason why product pages are not being indexed at all by Google? Any advice would be greatly appreciated. Update I was told 'that each of the product pages on the full site have corresponding page on mobile. They are referred to each other via cannonical / alternate tags...could be an angle as to why product pages are not being indexed.'
Intermediate & Advanced SEO | | RobbieD910 -
Thin Content to Quality Content
How should i modify content from thin to high quality content. Somehow i realized that my pages where targetted keywords didn't had the keyword density lost a massive ranking after the last update whereas all pages which had the keyword density are ranking good. But my concern is all pages which are ranking good had all the keyword in a single statement like. Get ABC pens, ABC pencils, ABC colors, etc. at the end of a 300 word content describing ABC. Whereas the pages which dropped the rankings had a single keyword repeated just twice in a 500 word article. Can this be the reason for a massive drop. Should i add the single statement like the one which is there on pages ranking good? Is it good to add just a single line once the page is indexed or do i need to get a fresh content once again along with a sentence of keyword i mentioned above?
Intermediate & Advanced SEO | | welcomecure1 -
Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up. I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove. Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
Intermediate & Advanced SEO | | sparrowdog0 -
Does anyone know how to appear with snippet that says something like: Jobs 1-10 of 80 in the beginning of the description on Google? e.g. like on: https://www.google.co.za/#q=pickers+and+packers
Does anyone know how to appear with snippet that says something like: Jobs 1-10 of 80 in the beginning of the description on Google? e.g. like on: https://www.google.co.za/#q=pickers+and+packers Any markup that could be used to be listed like this. Why is some sites listed like this and some not. Why is the adzuna.co.za page listed with Results 1-10 while some other with Jobs 1-10 ?
Intermediate & Advanced SEO | | classifiedtech0 -
Google Processing but Not Indexing XML Sitemap
Like it says above, Google is processing but not indexing our latest XML sitemap. I noticed this Monday afternoon - Indexed status was still Pending - and didn't think anything of it. But when it still said Pending on Tuesday, it seemed strange. I deleted and resubmitted our XML sitemap on Tuesday. It now shows that it was processed on Tuesday, but the Indexed status is still Pending. I've never seen this much of a lag, hence the concern. Our site IS indexed in Google - it shows up with a site:xxxx.com search with the same number of pages as it always has. The only thing I can see that triggered this is Sunday the site failed verification via Google, but we quickly fixed that and re-verified via WMT Monday morning. Anyone know what's going on?
Intermediate & Advanced SEO | | Kingof50 -
Wordpress blog in a subdirectory not being indexed by Google
HI MozzersIn my websites sitemap.xml, pages are listed, such as /blog/ and /blog/textile-fact-or-fiction-egyptian-cotton-explained/These pages are visible when you visit them in a browser and when you use the Google Webmaster tool - Fetch as Google to view them (see attachment), however they aren't being indexed in Google, not even the root directory for the blog (/blog/) is being indexed, and when we query:site: www.hilden.co.uk/blog/ It returns 0 results in Google.Also note that:The Wordpress installation is located at /blog/ which is a subdirectory of the main root directory which is managed by Magento. I'm wondering if this causing the problem.Any help on this would be greatly appreciated!AnthonyToTOHuj.png?1
Intermediate & Advanced SEO | | Tone_Agency0 -
Indexation of content from internal pages (registration) by Google
Hello, we are having quite a big amount of content on internal pages which can only be accessed as a registered member. What are the different options the get this content indexed by Google? In certain cases we might be able to show a preview to visitors. In other cases this is not possible for legal reasons. Somebody told me that there is an option to send the content of pages directly to google for indexation. Unfortunately he couldn't give me more details. I only know that this possible for URLs (sitemap). Is there really a possibility to do this for the entire content of a page without giving google access to crawl this page? Thanks Ben
Intermediate & Advanced SEO | | guitarslinger0