Indexing non-indexed content and Google crawlers
-
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website.
I've added a 'noindex' meta tag to articles that shouldn't be live until a future date.
When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page?
Thanks.
-
Wow! Nice detective work! I could see how that one would slip under the radar.
Congrats on finding a needle in a haystack!
You should buy yourself the adult beverage of your choice and have a little toast!
Cheers!
-
-
I think Screaming Frog has a trial version, I forget if it limits total number of pages etc. as we bought it a while ago. At least you can try out and see. May be others who have more tools as well.
-
Thanks. I agree I need to get rid of that noindex. The site is new and doesn't have much in the way of tag clouds etc. yet, so it's not like we have a lot of pages to check.
I've used the link: attribute to try and find the offending links each time, but nothing showed up. I use Xenu Link Sleuth rather than Screaming Frog, and I can't find a way to find backlinks with Xenu. Do you know if you can with the free version of Screaming Frog? I've seen the free version described as "almost fully functional" - the number of crawlable links seems to be the main restriction.
-
I like the automated sitemap answer for the cause (as this has bitten me before), but you mentioned you do not have that. I would still bet that somewhere on your web site you are linking to the page that you do not want indexed. It could be a tag cloud page or some other index page. We had a site that it would accidentally publish out articles on our home page ahead of schedule. Point here is that when you have a dynamic site with a CMS, you really have to be on your toes with stuff like this as the automation can get you into situations like this.
I would not use the noindex tag and remove it later. My concern would be that you are sending conflicting signals to Google. noindex tells good to remove this page from the index.
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it." from GWT
When I read that - it sounds like this is not what you want for this page.
You could also setup your system to show a 404 on the URL until the content is live and then let it 200, but you run into the same issue of Google getting 2 opposite signals on the same page. Either way, if you first give the signal to Google that you do not want something indexed, you are at the mercy of the next crawl to see if Google looks at it again.
Regardless, you need to get to the crux of the issue, how is Google finding this URL?
I would use a 3rd party spider tool. We have used Screaming Frog SEO Spider. There are others out there. You would be amazed what they find. The key to this tool is that when it finds something, it also tells you on what page it found it. We have big sites with thousands of pages and we have used it to find broken links to images and links to pages on our site that now 404. Really handy to clean things up. I bet it would find where there is a link on your site that contains the page (or pages) that link to the content. You can then update that page and not have to worry about using noindex etc. Also not that the spiders are much better than humans at finding this stuff. Even if you have looked, the spider looks at things differently.
It also may be as simple as searching for the URL on the web with the link: attribute. Google may show you where it is finding the link.
Good luck and please post back what you find. This is kind of like one of those "who dun it?" mystery shows!
-
There is no automated sitemap. We checked every page we could, including feeds.
-
Do you have an automated sitemap? On at least one occasion, I've found that to be a culprit.
Noindex means it won't be kept in the index. It doesn't mean it won't be crawled. I'm not sure how it would affect crawl timing , tho. I would assume that Google would assume that you would want things not indexed crawled less frequently. Something to potentially try is to use the GWT Fetch as Googlebot tool to force a new crawl of the page and see if that gets it in the index any faster.
http://googlewebmastercentral.blogspot.com/2011/08/submit-urls-to-google-with-fetch-as.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
Google crawling different content--ever ok?
Here are a couple of scenarios I'm encountering where Google will crawl different content than my users on initial visit to the site--and which I think should be ok. Of course, it is normally NOT ok, I'm here to find out if Google is flexible enough to allow these situations: 1. My mobile friendly site has users select a city, and then it displays the location options div which includes an explanation for why they may want to have the program use their gps location. The user must choose the gps, the entire city, or he can enter a zip code, or choose a suburb of the city, which then goes to the link chosen. OTOH it is programmed so that if it is a Google bot it doesn't get just a meaningless 'choose further' page, but rather the crawler sees the page of results for the entire city (as you would expect from the url), So basically the program defaults for the entire city results for google bot, but for for the user it first gives him the initial ability to choose gps. 2. A user comes to mysite.com/gps-loc/city/results The site, seeing the literal words 'gps-loc' in the url goes out and fetches the gps for his location and returns results dependent on his location. If Googlebot comes to that url then there is no way the program will return the same results because the program wouldn't be able to get the same long latitude as that user. So, what do you think? Are these scenarios a concern for getting penalized by Google? Thanks, Ted
Intermediate & Advanced SEO | | friendoffood0 -
Does having all client websites on same server/same Google Analytics red flag Google?
If you have several clients, and they are all on the same server, and also under ONE Google Analytics account, will that negatively impact with Google? They all have different content and addresses, some have the same template, but with different images.
Intermediate & Advanced SEO | | BBuck1 -
Does anyone know how to appear with snippet that says something like: Jobs 1-10 of 80 in the beginning of the description on Google? e.g. like on: https://www.google.co.za/#q=pickers+and+packers
Does anyone know how to appear with snippet that says something like: Jobs 1-10 of 80 in the beginning of the description on Google? e.g. like on: https://www.google.co.za/#q=pickers+and+packers Any markup that could be used to be listed like this. Why is some sites listed like this and some not. Why is the adzuna.co.za page listed with Results 1-10 while some other with Jobs 1-10 ?
Intermediate & Advanced SEO | | classifiedtech0 -
Why Is Google Indexing These Product Pages On Shopify?
How can we communicate to Google the exact product pages we'd like indexed on our site? We're an apparel company that uses Shopify as our ecommerce platform. Website is sportiqe.com. Currently, Google is indexing all types of different pages on our site. **Example of a product page we want indexed: ** Product Page: sportiqe.com/products/PRODUCT-TITLE (Like This) **Examples of product pages being indexed: ** sportiqe.myshopify.com/products/PRODUCT-TITLE sportiqe.com/collections/COLLECTION-NAME/products/PRODUCT-TITLE See attached for an example of how two different "Boston Celtics Grateful Dead" shirts are being indexed. Any suggestions? We've used both Shopify and Google Webmaster tools to set our preferred domain (sportiqe.com). We've also added this snippet of code to our site three months ago thinking that would do the trick... {% if template == 'product' %}{% if collection %} {% endif %}{% endif %} sKwNZOl
Intermediate & Advanced SEO | | farmiloe0 -
Google Penalty or Not?
One of my sites I work with got this message: http://www.mysite: Unnatural inbound linksJune 27, 2013 Google has detected a pattern of artificial or unnatural links pointing to your site. Buying links or participating in link schemes in order to manipulate PageRank are violations of Google's Webmaster Guidelines. As a result, Google has applied a manual spam action to mysite.com/. There may be other actions on your site or parts of your site. But, when I got to manual actions it says: Manual Actions No manual webspam actions found. -- So which is it??? I have been doing link removal, but now I am confused if I need to do a reconsideration request or not.
Intermediate & Advanced SEO | | netviper0 -
Duplicate content reported on WMT for 301 redirected content
We had to 301 redirect a large number of URL's. Not Google WMT is telling me that we are having tons of duplicate page titles. When I looked into the specific URL's I realized that Google is listing an old URL's and the 301 redirected new URL as the source of the duplicate content. I confirmed the 301 redirect by using a server header tool to check the correct implementation of the 301 redirect from the old to the new URL. Question: Why is Google Webmaster Tool reporting duplicated content for these pages?
Intermediate & Advanced SEO | | SEOAccount320 -
My website keywords have been almost completely taken out of indexing in Google since 04/26/11 and I cannot determine why, anyone know?
I had 12 to 15 1st page Google rankings in the iPhone, iPad, app review vertical. As of 04/26/11 I have lost all rankings, traffic has gone from 1,000 to 1,200 a day to 150 to 350 a day. I was using a plugin for auto press releases, but have removed this and deleted the urls. I also have changed themes and hosting over the last 3 weeks. I have been trying to get SEO help, but cannot seem to get anyone to help me. thank you Mike
Intermediate & Advanced SEO | | crazymikesapps1