Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Errors For Pages That Never Existed
I'm seeing a lot of 404 errors with slugs related to cryptocurrency (not my website's industry at all). We've never created pages remotely similar, but I see a lot of 404 errors with keywords like "bitcoin" and "litecoin". Any recommendations on what to do about this? Another keyword is "yelz". It usually presents like .../yelz/-ripper-vs-steller/ or .../bitcoin-vs-litecoin/. I don't really even have the time to fix all the legitimate 404 errors, let alone these mysterious requests. Any advice is appreciated.
White Hat / Black Hat SEO | | bcaples1 -
Page plumetting with a optimisation score of 97\. HELP
Hi everyone, One of my pages has an optimisation score of 93, but ranks in 50+ place. What on earth can I do to address this? It's a course page so I've added the 'course' schema. I've added all the alt tags to say the keyword, UX signals aren't bad. Keyword is in the title tag. It has a meta description. Added an extra 7 internal, anchor-rich links pointing at the page this week. Nothing seems to address it. Any ideas? Cheers, Rhys
White Hat / Black Hat SEO | | SwanseaMedicine1 -
Should I submit a sitemap for a site with dynamic pages?
I have a coupon website (http://couponeasy.com)
White Hat / Black Hat SEO | | shopperlocal_DM
Being a coupon website, my content is always keeps changing (as new coupons are added and expired deals are removed) automatically. I wish to create a sitemap but I realised that there is not much point in creating a sitemap for all pages as they will be removed sooner or later and/or are canonical. I have about 8-9 pages which are static and hence I can include them in sitemap. Now the question is.... If I create the sitemap for these 9 pages and submit it to google webmaster, will the google crawlers stop indexing other pages? NOTE: I need to create the sitemap for getting expanded sitelinks. http://couponeasy.com/0 -
Forcing Google to Crawl a Backlink URL
I was surprised that I couldn't find much info on this topic, considering that Googlebot must crawl a backlink url in order to process a disavow request (ie Penguin recovery and reconsideration requests). My trouble is that we recently received a great backlink from a buried page on a .gov domain and the page has yet to be crawled after 4 months. What is the best way to nudge Googlebot into crawling the url and discovering our link?
White Hat / Black Hat SEO | | Choice0 -
Why Link Spamming Website Coming on First Page Google?
As we all already know about link spamming. As per Google Guidelines Link building, Exact Keywords Anchor Link Building is dead now but i am looking most of the website coming on first page in Google doing same exact keywords linking. I think directory, article, social bookmarking, press release and other link building activity is also dead now. Matt always saying content is more important but if we will not put any keywords link in content part then how website rank in first page in Google. Can anybody explain why is website coming on first page because when i am doing same activity for quality links with higher domain authority website then we are affected in Google update.
White Hat / Black Hat SEO | | dotlineseo0 -
Negative SEO to inner page: remove page or disavow links?
Someone decided to run a negative-SEO campaign, hitting one of the inner pages on my blog 😞 I noticed the links started to pile up yesterday but I assume there will be more to come over the next few days. The targeted page is of little value to my blog, so the question is: should I remove the affected page (hoping that the links won't affect the entire site) or to submit a disavow request? I'm not concerned about what happens to the affected page, but I want to make sure the entire site doesn't get affected as a result of the negative-SEO. Thanks in advance. Howard
White Hat / Black Hat SEO | | howardd0 -
Rel Canonical and Rel No Index, Follow
Hi, Cant implement rel next and prev as getting difficulty in coding - tried lot for same, but to no luck... Considering now rel=canonical and rel noindex,follow to 2 sections Deals and Discounts - We have been consistenly ranking on first position for over 1.5 yr, however recently slipped to position 4,5 on many keywords in this section URL - http://www.mycarhelpline.com/index.php?option=com_offers&view=list&Itemid=9 here, the page content for page 1 and 2 pertains to the current month and from page 3 to all other pages pertains to previous months. Is adding up rel canonical from page 3 to last page to page 1 - makes sense & also simultaneously add noindex, follow from page 3 to last page News & Reviews Section - Here, all news & article items are posted. Been the links of news items are primarily there. However, the pages are not duplicates, does adding noindex, follow makes sense here URL - http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 Look forward for recommendations to implement the best - to gain SERP, avoid duplicate and white hat method.. Many thanks
White Hat / Black Hat SEO | | Modi0 -
Need clarification on what is a landing page vs. doorway page
Hello everyone - I just became a PRO member today and wanted to say hello and ask this question... I am launching a new product, but 6 months before I created 4 different domains with landing pages to "prime" my SEO for the keywords I am trying to pursue. Now that I have launched my new product, it resides on the main domain name (let's call it "MainDomain.com"). Here's my dilemma... I want to create landing pages on each of the different domains for my PPC and optimized organic search traffic. For example, on one of the other domains (let's call it "LandingDomain1.com"), I have created a page to optimize for the keyword "event planning software" and sending my PPC traffic for "event planning software" there as well as my email campaigns. This page has original content that I have written for it (it's not duplicate content used elsewhere), but it also has navigation and links pointing to MainDomain.com, which is where we convert and collect registrations. My question is, will this activity be considered a doorway page even though I'm using it for a landing page for a particular audience? And, if it could be considered a doorway page, would I be better off moving all these optimized landing pages to my MainDomain.com and then doing a 301 redirect from those other domains to the MainDomain.com. Your input is much appreciated ... thanks.
White Hat / Black Hat SEO | | DenverDude1