Indexing non-indexed content and Google crawlers
-
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website.
I've added a 'noindex' meta tag to articles that shouldn't be live until a future date.
When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page?
Thanks.
-
Wow! Nice detective work! I could see how that one would slip under the radar.
Congrats on finding a needle in a haystack!
You should buy yourself the adult beverage of your choice and have a little toast!
Cheers!
-
-
I think Screaming Frog has a trial version, I forget if it limits total number of pages etc. as we bought it a while ago. At least you can try out and see. May be others who have more tools as well.
-
Thanks. I agree I need to get rid of that noindex. The site is new and doesn't have much in the way of tag clouds etc. yet, so it's not like we have a lot of pages to check.
I've used the link: attribute to try and find the offending links each time, but nothing showed up. I use Xenu Link Sleuth rather than Screaming Frog, and I can't find a way to find backlinks with Xenu. Do you know if you can with the free version of Screaming Frog? I've seen the free version described as "almost fully functional" - the number of crawlable links seems to be the main restriction.
-
I like the automated sitemap answer for the cause (as this has bitten me before), but you mentioned you do not have that. I would still bet that somewhere on your web site you are linking to the page that you do not want indexed. It could be a tag cloud page or some other index page. We had a site that it would accidentally publish out articles on our home page ahead of schedule. Point here is that when you have a dynamic site with a CMS, you really have to be on your toes with stuff like this as the automation can get you into situations like this.
I would not use the noindex tag and remove it later. My concern would be that you are sending conflicting signals to Google. noindex tells good to remove this page from the index.
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it." from GWT
When I read that - it sounds like this is not what you want for this page.
You could also setup your system to show a 404 on the URL until the content is live and then let it 200, but you run into the same issue of Google getting 2 opposite signals on the same page. Either way, if you first give the signal to Google that you do not want something indexed, you are at the mercy of the next crawl to see if Google looks at it again.
Regardless, you need to get to the crux of the issue, how is Google finding this URL?
I would use a 3rd party spider tool. We have used Screaming Frog SEO Spider. There are others out there. You would be amazed what they find. The key to this tool is that when it finds something, it also tells you on what page it found it. We have big sites with thousands of pages and we have used it to find broken links to images and links to pages on our site that now 404. Really handy to clean things up. I bet it would find where there is a link on your site that contains the page (or pages) that link to the content. You can then update that page and not have to worry about using noindex etc. Also not that the spiders are much better than humans at finding this stuff. Even if you have looked, the spider looks at things differently.
It also may be as simple as searching for the URL on the web with the link: attribute. Google may show you where it is finding the link.
Good luck and please post back what you find. This is kind of like one of those "who dun it?" mystery shows!
-
There is no automated sitemap. We checked every page we could, including feeds.
-
Do you have an automated sitemap? On at least one occasion, I've found that to be a culprit.
Noindex means it won't be kept in the index. It doesn't mean it won't be crawled. I'm not sure how it would affect crawl timing , tho. I would assume that Google would assume that you would want things not indexed crawled less frequently. Something to potentially try is to use the GWT Fetch as Googlebot tool to force a new crawl of the page and see if that gets it in the index any faster.
http://googlewebmastercentral.blogspot.com/2011/08/submit-urls-to-google-with-fetch-as.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What should I do if same content ranked twice or more on Google?
I have a Bangla SEO related blog where I have written article like "Domain Selection" "SEO Tools" "MOZ" etc. All the article has been written in Bengali language. I have used wp tag for every post. I have submit xml site map generated by Yoast SEO. However I kept "no index" for category. I know well duplicate content is a major problem for SEO. After publishing my content Google ranked them on 1st page. But my fear is that most of the content twice or more. The keywords are ranked by post, wp post tag and Archive. Now I have a fear of penalty. Please check the screenshot and please suggest me what to do. uRCHf yq7m2 rSLKFLG
Intermediate & Advanced SEO | | AccessTechBD0 -
In the google index but search redirects to homepage
Hi everyone, thanks for reading i have a website "www.gardeners.scot" and have the following pages listed in google site: command http://www.gardeners.scot/garden-landscaping-Edinburgh.htm & http://www.gardeners.scot/garden-maintenance-Edinburgh.htm however when a user searches for "garden landscaping Edinburgh" or "garden maintenance Edinburgh" we are in the rankings but google search links these phrases to the home page not to their targeted pages. the site is about a year old have checked the robots.txt, sitemap.xml & .htaccess files but can see anything wrong there. any ideas out there?
Intermediate & Advanced SEO | | livingphilosophy0 -
No images in Google index
No images are indexed on this site (client of ours): http://www.rubbermagazijn.nl/. We've tried everything (descriptive alt texts, image sitemaps, fetch&render, check robots) but a site:www.rubbermagazijn.nl shows 0 image results and the sitemap report in Search Console shows 0 images indexed. We're not sure how to proceed from here. Is there anyone with an idea what the problem could be?
Intermediate & Advanced SEO | | Adriaan.Multiply0 -
Why would one of our section pages NOT be indexed by Google?
One of our higher traffic section pages is not being indexed by Google. The products that reside on this section page ARE indexed by Google and are on page 1. So why wouldn't the section page be even listed and indexed? The meta title is accurate, meta description is good. I haven't received any notices in Webmaster Tools. Is there a way to check to see if OTHER pages might also not be indexed? What should a small ecom site do to see about getting it listed? SOS in Modesto. Ron
Intermediate & Advanced SEO | | yatesandcojewelers0 -
Why isn't google indexing our site?
Hi, We have majorly redesigned our site. Is is not a big site it is a SaaS site so has the typical structure, Landing, Features, Pricing, Sign Up, Contact Us etc... The main part of the site is after login so out of google's reach. Since the new release a month ago, google has indexed some pages, mainly the blog, which is brand new, it has reindexed a few of the original pages I am guessing this as if I click cached on a site: search it shows the new site. All new pages (of which there are 2) are totally missed. One is HTTP and one HTTPS, does HTTPS make a difference. I have submitted the site via webmaster tools and it says "URL and linked pages submitted to index" but a site: search doesn't bring all the pages? What is going on here please? What are we missing? We just want google to recognise the old site has gone and ALL the new site is here ready and waiting for it. Thanks Andrew
Intermediate & Advanced SEO | | Studio330 -
Need help with duplicate content. Same content; different locations.
We have 2 sites that will have duplicate content (e.g., one company that sells the same products under two different brand names for legal reasons). The two companies are in different geographical areas, but the client will put the same content on each page because they're the same product. What is the best way to handle this? Thanks a lot.
Intermediate & Advanced SEO | | Rocket.Fuel0 -
Indexing issue?
Hey guys when I do a search of site:thetechblock.com query in Google I don't seem to see any recent posts (nothing for August). In Google webmaster I see that the site is being crawled (I think), but I'm not sure. I also see the the sitemaps are being indexed but again it just seems really odd that I'm not seeing these in Google results. SEO seems all good too with SEO Moz. Is there something I'm not getting?
Intermediate & Advanced SEO | | ttb0 -
My website keywords have been almost completely taken out of indexing in Google since 04/26/11 and I cannot determine why, anyone know?
I had 12 to 15 1st page Google rankings in the iPhone, iPad, app review vertical. As of 04/26/11 I have lost all rankings, traffic has gone from 1,000 to 1,200 a day to 150 to 350 a day. I was using a plugin for auto press releases, but have removed this and deleted the urls. I also have changed themes and hosting over the last 3 weeks. I have been trying to get SEO help, but cannot seem to get anyone to help me. thank you Mike
Intermediate & Advanced SEO | | crazymikesapps1