Indexing non-indexed content and Google crawlers
-
On a news website we have a system where articles are given a publish date which is often in the future. The articles were showing up in Google before the publish date despite us not being able to find them linked from anywhere on the website.
I've added a 'noindex' meta tag to articles that shouldn't be live until a future date.
When the date comes for them to appear on the website, the noindex disappears. Is anyone aware of any issues doing this - say Google crawls a page that is noindex, then 2 hours later it finds out it should now be indexed? Should it still appear in Google search, News etc. as normal, as a new page?
Thanks.
-
Wow! Nice detective work! I could see how that one would slip under the radar.
Congrats on finding a needle in a haystack!
You should buy yourself the adult beverage of your choice and have a little toast!
Cheers!
-
-
I think Screaming Frog has a trial version, I forget if it limits total number of pages etc. as we bought it a while ago. At least you can try out and see. May be others who have more tools as well.
-
Thanks. I agree I need to get rid of that noindex. The site is new and doesn't have much in the way of tag clouds etc. yet, so it's not like we have a lot of pages to check.
I've used the link: attribute to try and find the offending links each time, but nothing showed up. I use Xenu Link Sleuth rather than Screaming Frog, and I can't find a way to find backlinks with Xenu. Do you know if you can with the free version of Screaming Frog? I've seen the free version described as "almost fully functional" - the number of crawlable links seems to be the main restriction.
-
I like the automated sitemap answer for the cause (as this has bitten me before), but you mentioned you do not have that. I would still bet that somewhere on your web site you are linking to the page that you do not want indexed. It could be a tag cloud page or some other index page. We had a site that it would accidentally publish out articles on our home page ahead of schedule. Point here is that when you have a dynamic site with a CMS, you really have to be on your toes with stuff like this as the automation can get you into situations like this.
I would not use the noindex tag and remove it later. My concern would be that you are sending conflicting signals to Google. noindex tells good to remove this page from the index.
"When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it." from GWT
When I read that - it sounds like this is not what you want for this page.
You could also setup your system to show a 404 on the URL until the content is live and then let it 200, but you run into the same issue of Google getting 2 opposite signals on the same page. Either way, if you first give the signal to Google that you do not want something indexed, you are at the mercy of the next crawl to see if Google looks at it again.
Regardless, you need to get to the crux of the issue, how is Google finding this URL?
I would use a 3rd party spider tool. We have used Screaming Frog SEO Spider. There are others out there. You would be amazed what they find. The key to this tool is that when it finds something, it also tells you on what page it found it. We have big sites with thousands of pages and we have used it to find broken links to images and links to pages on our site that now 404. Really handy to clean things up. I bet it would find where there is a link on your site that contains the page (or pages) that link to the content. You can then update that page and not have to worry about using noindex etc. Also not that the spiders are much better than humans at finding this stuff. Even if you have looked, the spider looks at things differently.
It also may be as simple as searching for the URL on the web with the link: attribute. Google may show you where it is finding the link.
Good luck and please post back what you find. This is kind of like one of those "who dun it?" mystery shows!
-
There is no automated sitemap. We checked every page we could, including feeds.
-
Do you have an automated sitemap? On at least one occasion, I've found that to be a culprit.
Noindex means it won't be kept in the index. It doesn't mean it won't be crawled. I'm not sure how it would affect crawl timing , tho. I would assume that Google would assume that you would want things not indexed crawled less frequently. Something to potentially try is to use the GWT Fetch as Googlebot tool to force a new crawl of the page and see if that gets it in the index any faster.
http://googlewebmastercentral.blogspot.com/2011/08/submit-urls-to-google-with-fetch-as.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
MOZ is showing that I have non- indexed blog tag posts are they supposed to be nonindexed. My articles are indexed just not the blog tags that take you to other similar articles do I need to fix this or is it ok?
MOZ is showing that my blog post tags are not indexed my question is should they be indexed? my articles are indexed just not the tags that take you to posts that are similar. Do I need to fix this or not? Thank you
Intermediate & Advanced SEO | | Tyler58910 -
Blog Content In different language not indexed - HELP PLEASE!
I have an ecommerce site in English and a blog that is in Malay language. We have started the blog 3 weeks ago with about 20-30 articles written. Ecommerce is using MAgento CMS and Blog is wordpress. URL Structure: Ecommerce: www.example.com Blog: www.example.com/blog Blog category: www.example.com/blog/category/ However, google is indexing all pages including blog category but not individual post that is in Malay language. What could be the issue here? PLEASE help me!
Intermediate & Advanced SEO | | WayneRooney0 -
Google images
Hi, I am working on a website with a large number (millions) of images. For the last five months Ihave been trying to get Google Images to crawl and index these images (example page: http://bit.ly/1ePQvyd). I believe I have followed best practice in the design of the page, naming of images etc. Whilst crawlng and indexing of the pages is going reasonably well with the standard crawler, the image bot has only crawled about half a million images and indexed only about 40,000. Can anyone suggest what I could do to increase this number 100 fold? Richard
Intermediate & Advanced SEO | | RichardTay0 -
How would you structure this content?
We have a site where we write about our son who was born with Down syndrome. I had a question regarding some content I'm trying to create and structure and hoping you guys can point me in the right direction. One of the things we are often asked by new parents is what toys we suggest for people to buy for their child with Down syndrome, or as gifts for a friend who has a child with Down syndrome. So I'd like to write some posts that suggest great toys for each year of a kids life (and continue that as Noah grows.) However, there are some variations of key words that I would like to rank for as well and it gets a little messy, which is where I need the help. For example for each year I could have a post titled: Top Ten (I could also change out top ten for Best, etc..)Toys For A One Year Old with Down Syndr Top Ten Christmas Gift Ideas For A One Year Old With Down Syndrome Top Ten Birthday Gift Ideas For a One Year Old With D.S. Top Ten Learning Toys For A One Year Old With D.S. Top Ten Toys Under 25 Dollars For A One Year Old with DS Top Ten Developmental Toys for a One Year Old With DS Top Ten Fisher Price Toys for a child with ds Best Light Up Toys For a one year old with ds best muscial toys for a one year old with ds I could also think of other variations as well. Also I can make each of these with the various ages. 2 year old, 3 year old, etc... So I'm not sure what the best way to go is. I could easily have a ton of content that is all virtually the same (birthday gifts / christmas gifts..although I could suggest different toys) so I'd have a ton of different toys pages trying to rank for one term each that is good for google searchers but probably not so great for folks coming to my site as I would have toy pages scattered all over the site. I also don't know how landing pages would fit in to all of this. Would I want a "Down Syndrome Toy Guide" landing page, or "Down Syndrome Gift Guide" ... or both...or something else, and then link all of those other pages on that page? I have a few pages on my site now that I wrote before I started to think about all the different combinations I wanted to rank for: http://noahsdad.com/gift-ideas-down-syndrome/ and http://noahsdad.com/best-fisher-price-learning-toys/ I'm open to any feedback you guys may have on this. I'd also like to do some posts on "Down Syndrome Books" and hope to use the same info that you guys give me and apply to books. (Therapy books, touch and feel books, resource books, new parents books, etc..) Hoping some folks chime in as your help would really be appreciated.
Intermediate & Advanced SEO | | NoahsDad0 -
To index or de-index internal search results pages?
Hi there. My client uses a CMS/E-Commerce platform that is automatically set up to index every single internal search results page on search engines. This was supposedly built as an "SEO Friendly" feature in the sense that it creates hundreds of new indexed pages to send to search engines that reflect various terminology used by existing visitors of the site. In many cases, these pages have proven to outperform our optimized static pages, but there are multiple issues with them: The CMS does not allow us to add any static content to these pages, including titles, headers, metas, or copy on the page The query typed in by the site visitor always becomes part of the Title tag / Meta description on Google. If the customer's internal search query contains any less than ideal terminology that we wouldn't want other users to see, their phrasing is out there for the whole world to see, causing lots and lots of ugly terminology floating around on Google that we can't affect. I am scared to do a blanket de-indexation of all /search/ results pages because we would lose the majority of our rankings and traffic in the short term, while trying to improve the ranks of our optimized static pages. The ideal is to really move up our static pages in Google's index, and when their performance is strong enough, to de-index all of the internal search results pages - but for some reason Google keeps choosing the internal search results page as the "better" page to rank for our targeted keywords. Can anyone advise? Has anyone been in a similar situation? Thanks!
Intermediate & Advanced SEO | | FPD_NYC0 -
Technical Automated Content - Indexing & Value
One of my clients provides some Financial Analysis tools, which generate automated content on a daily basis for a set of financial derivatives. Basically they try to estimate through technical means weather a particular share price is going up or down, during the day as well as their support and resistance levels. These tools are fairly popular with the visitors, however I'm not sure on the 'quality' of the content from a Google Perspective. They keep an archive of these tools which tally up to nearly a 100 thousand pages, what bothers me particularly is that the content in between each of these varies only slightly. Textually there are maybe up to 10-20 different phrases which describe the move for the day, however the page structure is otherwise similar, except for the Values which are thought to be reached on a daily basis. They believe that it could be useful for users to be able to access back-dated information to be able to see what happened in the past. The main issue is however that there is currently no back-links at all to any of these pages and I assume Google could deem these to be 'shallow' provide little content which as time passes become irrelevant. And I'm not sure if this could cause a duplicate content issue; however they already add a Date in the Title Tags, and in the content to differentiate. I am not sure how I should handle these pages; is it possible to have Google prioritize the 'daily' published one. Say If I published one today; if I had to search "Derivative Analysis" I would see the one which is dated today rather then the 'list-view' or any other older analysis.
Intermediate & Advanced SEO | | jonmifsud0 -
How long does google take to show the results in SERP once the pages are indexed ?
Hi...I am a newbie & trying to optimize the website www.peprismine.com. I have 3 questions - A little background about this : Initially, close to 150 pages were indexed by google. However, we decided to remove close to 100 URLs (as they were quite similar). After the changes, we submitted the NEW sitemap (with close to 50 pages) & google has indexed those URLs in sitemap. 1. My pages were indexed by google few days back. How long does google take to display the URL in SERP once the pages get indexed ? 2. Does google give more preference to websites with more number of pages than those with lesser number of pages to display results in SERP (I have just 50 pages). Does the NUMBER of pages really matter ? 3. Does removal / change of URLs have any negative effect on ranking ? (Many of these URLs were not shown on the 1st page) An answer from SEO experts will be highly appreciated. Thnx !
Intermediate & Advanced SEO | | PepMozBot0 -
Google.ca vs Google.com Ranking
I have a site I would like to rank high for particular keywords in the Google.ca searches and don't particularly care about the Google.com searches (it's a Canadian service). I have logged into Google Webmaster Tools and targeted Canada. Currently my site is ranking on the third page for my desired keywords on Google.com, but is on the 20th page for Google.ca. Previously this change happened quite quickly -- within 4 weeks -- but it doesn't seem to be taking here (12 weeks out and counting). My optimization seems to be fine since I'm ranking well on Google.com: not sure why it's not translating to Google.ca. Any help or thoughts would be appreciated.
Intermediate & Advanced SEO | | seorm0