Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My site in 2 page
my site in 2 page how can i rank with this keywords in dubai legal translation in Dubai
White Hat / Black Hat SEO | | saharali150 -
Third part http links on the page source: Social engineering content warning from Google
Hi, We have received "Social engineering content" warning from Google and one of our important page and it's internal pages have been flagged as "Deceptive site ahead". We wonder what's the reason behind this as Google didn't point exactly to the specific part of the page which made us look so to the Google. We don't employ any such content on the page and the content is same for many months. As our site is WP hosted, we used a WordPress plugin for this page's layout which injected 2 http (non-https) links in our page code. We suspect if this is the reason behind this? Any ideas? Thanks
White Hat / Black Hat SEO | | vtmoz1 -
Schema Markup for regular web pages?
I'm a bit confused about what Schema markup should be applied to such regular, informative web pages.
White Hat / Black Hat SEO | | gray_jedi
We have a few pages describing our technology and solutions. These pages are not products or news articles. And they are not something that should be reviewed/rated. What Schema markup should be used for a standard run-of-the mill web page?
Is there a good reference / tutorial for optimizing the schema markup of an informational website? Any advice is much appreciated, thank you!0 -
Steps to Improve Page Rank and Domain Authority
Hi If the quality of web pages is very good I believe rankings should reflect this. My domain is at least 3 years old.
White Hat / Black Hat SEO | | SEOguy1
What steps would you recommend short term and long term in terms of how to improve page rank and domain authority? Thanks.0 -
Site De-Indexed except for Homepage
Hi Mozzers,
White Hat / Black Hat SEO | | emerald
Our site has suddenly been de-indexed from Google and we don't know why. All pages are de-indexed in Google Webmaster Tools (except for the homepage and sitemap), starting after 7 September: Please see screenshot attached to show this: 7 Sept 2014 - 76 pages indexed in Google Webmaster Tools 28 Sept until current - 3-4 pages indexed in Google Webmaster Tools including homepage and sitemaps. Site is: (removed) As a result all rankings for child pages have also disappeared in Moz Pro Rankings Tracker. Only homepage is still indexed and ranking. It seems like a technical issue blocking the site. I checked for robots.txt, noindex, nofollow, canonical and site crawl for any 404 errors but can't find anything. The site is online and accessible. No warnings or errors appear in Google Webmaster Tools. Some recent issues were that we moved from Shared to Dedicated Server around 7 Sept (using same host and location). Prior to the move our preferred domain was www.domain.com WITH www. However during the move, they set our domain as domain.tld WITHOUT the www. Running a site:domain.tld vs site:www.domain.tld command now finds pages indexed under non-www version, but no longer as www. version. Could this be a cause of de-indexing? Yesterday we had our host reset the domain to use www. again and we resubmitted our sitemap, but there is no change yet to the indexing. What else could be wrong? Any suggestions appeciated. Thanks. hDmSHN9.gif0 -
Duplicate content showing on local pages
I have several pages which are showing duplicate content on my site for web design. As its a very competitive market I had create some local pages so I rank high if someone is searching locally i.e web design birmingham, web design tamworth etc.. http://www.cocoonfxmedia.co.uk/web-design.html http://www.cocoonfxmedia.co.uk/web-design-tamworth.html http://www.cocoonfxmedia.co.uk/web-design-lichfield.html I am trying to work out what is the best way reduce the duplicate content. What would be the best way to remove the duplicate content? 1. 301 redirect (will I lose the existing page) to my main web design page with the geographic areas mentioned. 2. Re write the wording on each page and make it unique? Any assistance is much appreciated.
White Hat / Black Hat SEO | | Cocoonfxmedia0 -
Influence of users' comments on a page (on-page SEO)
Do you think when Google crawls your page, it "monitors" comments updates to use this as a ranking factor? If Google is looking for social signs, looking for comments updates might be a social sign as well (ok a lot easier to manipulate, but still social). thx
White Hat / Black Hat SEO | | gt30