Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Errors For Pages That Never Existed
I'm seeing a lot of 404 errors with slugs related to cryptocurrency (not my website's industry at all). We've never created pages remotely similar, but I see a lot of 404 errors with keywords like "bitcoin" and "litecoin". Any recommendations on what to do about this? Another keyword is "yelz". It usually presents like .../yelz/-ripper-vs-steller/ or .../bitcoin-vs-litecoin/. I don't really even have the time to fix all the legitimate 404 errors, let alone these mysterious requests. Any advice is appreciated.
White Hat / Black Hat SEO | | bcaples1 -
Nuisance visitors to non active page. What's going on?
Hi Guys, for the past several months, I get high volume of searches on a non-existing page /h/9249823.html. These searches come from all over the world from different domains and have a zero session duration. They are automatically forwarded to my home page. The source re Google Analytics is 12-reasons-for-seo.com. The full referrer is 12.reasons-for-seo.com/seo2php. Any idea what is provoking this activity? Any chance it's screwing with my legitimate search results or rankings?
White Hat / Black Hat SEO | | Lysarden0 -
How to ignore spam links to page?
Hey Moz pals, So for some reason someone is building thousands of links to my websites (all spam), likely someone doing negative seo on my site. Anyway, all these links are pointing to 1 sub url on my domain. That url didn't have anything on it so I deleted the page so now it comes up with a 404. Is there a way to reject any link that ever gets built to that old page? I don't want all this spam to hurt my website. What do you suggest?
White Hat / Black Hat SEO | | WongNs0 -
Google is giving one of my competitors a quasi page 1 monopoly, how can I complain?
Hi, When you search for "business plan software" on google.co.uk, 7 of the 11 first results are results from 1 company selling 2 products, see below: #1. Government site (related to "business plan" but not to "business plan software")
White Hat / Black Hat SEO | | tbps
#2. Product 1 from Palo Alto Software (livePlan)
#3. bplan.co.uk: content site of Palo Alto Software (relevant to "business plan" but only relevant to "business plan software" because it is featuring and linking to their Product 1 and Product 2 sites)
#4. Same site as #3 but different url
#5. Palo Alto Software Product 2 (Business Plan Pro) page on Palo Alto Software .co.uk corporate site
#6. Same result as #5 but different url (the features page)
#7. Palo Alto Software Product 2 (Business Plan Pro) local site
#8, #9 and #10 are ok
#11. Same as #3 but the .com version instead of the .co.uk This seems wrong to me as it creates an illusion of choice for the customer (especially because they use different sites) whereas in reality the results are showcasing only 2 products. Only 1 of Palo Alto Software's competitors is present on page 1 of the search results (the rest of them are on page 2 and page 3). Did some of you experience a similar issue in a different sector? What would be the best way to point it out to Google? Thanks in advance Guillaume0 -
Sudden Drop in Keyword Ranking - No Idea Why
Hi Mozzers, I am in charge of everything Web Optimization for the company I work for. I keep active track of our SEO/SEM practices, especially our keyword rankings. Prior to my arrival at the company, in January of this year, we had a consultant handling the SEO work and though they did a decent job on maintaining our rankings for a hefty set of keywords, they were unable to get a particular competitive keyword ranking. This is odd because other derivations of that keyword which are equally competitive are all still ranking on page one. Also, full disclosure, they were not engaging in any questionable linking. In fact, they didn't do much of any link building whatsoever. I also haven't been engaging in any questionable content creation or spammy linking. We put out content regularly as we are a publicly traded company - nothing spammy at all. Anyway, one thing I tried since February was engaging in a social media sharing campaign among friends and coworkers to share the respective page and keyword on their Facebook and Google+ pages. To my surprise, this tactic worked just like natural search usually does - slowly and through the months I saw the keyword rank from completely invisible, to page 6, to page 3, to page 2, and finally onto position 6 page one as of just last week. Today, unfortunately, the keyword is invisible again :(. I am perplexed. It's tough to build links for our company as we are in the public and everything we do has to be approved by someone higher up. I also checked our webmaster tools and haven't seen any notifications that can give me clue as to what's going on. I am aware that there was a Penguin update recently and there are monthly Panda updates, but I'm skeptical as to whether or not those updates would be correlated to this because, at initial glance, our traffic and rankings for other keywords and pages don't seem to be affected. Suggestions? Advice? Answers? Thanks!
White Hat / Black Hat SEO | | CSawatzky0 -
Page 1 Ranking - Disappeared!
Hi All We launched our client's website http://rollerbannerscheap.co.uk in January this year. We have been building links making sure we are not over optmising anchor text and only following ethical SEO tactics. Our client's site eventually hit page 1 for it's main key word 'Roller Banner' 1 week ago, the site received impressions/clicks from the SERPS and has started to gain traffic from that particular keyword. I have checked today, and I cannot our client's website URL within the first 10 pages of Google, nevermind on page 1. Our client is currently undercutting competitiors on price, which we stated (the price) in the meta tag. Is it possible other SEOs could de-rank our website? If not, what would be a likely explaination for this occurance? Would just like to add, I recently build a link with anchor text 'Roller Banner Website, but one of my older links uses anchor text 'Roller Banners Cheap Website' - They're not exact match, but could this affect our ranking? Awaiting help Lewis
White Hat / Black Hat SEO | | SO_UK0 -
One Blog Comment Now on Many Pages of The Same Domain
My question is I blog commented on this site http://blogirature.com/2012/07/01/half-of-200-signals-in-googles-ranking-algorithm-revealed/#comment-272 under the name "Peter Rota". For some reason the recent comments is a site wide link so, bascially my link from my website is pretty much on each page of their site now. I also noticed that the anchor text for each one of my links says "Peter Rota". This is my concern will google think its spammy if im on a lot of pages on a same site for one blog comment, and will I be penailzied for the exact same anchor text on each page? If this is the case what could I do in trying to get the links removed? thanks
White Hat / Black Hat SEO | | ilyaelbert0