Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best seo benefit location ( main page text or h1 , h2)?
i have learned that h1 has more value than h2 and h2 has more than h3, but lets say if i want to place my keywords in there. should i include them in the main body or should take advantage of header tags?
White Hat / Black Hat SEO | | Sam09schulz0 -
Back links to pages on our site that don't exist on forums we haven't used with irrelevant product anchor text
Hi, I have a recurring issue that I can't find a reason for. I have a website that has over 7k backlinks that I monitor quite closely. Each month there are additional links on third party forums that have no relevance to the site or subject matter that are as a result toxic. Our clients site is a training site yet these links are appearing on third party sites like http://das-forum-der-musik.de/mineforum/ and have anchor text with "UGG boots for sale" to pages on our url listed as /mensuggboots.html that obviously don't exist. Each month, I try to contact the site owners and then I add them to Google using the disavow tool. Two months later they are gone and then are replaced with new backlinks on a number of different forum websites. Quite random but always relating to UGG boots. There are at least 100 extra links each month. Can anyone suggest why this is happening? Has anyone seen this kind of activity before? Is it possibly black hat SEO being performed by a competitor? I just don't understand why our URL is listed. To be fair, there are other websites linked to using the same terms that aren't ours and are also of a different theme so I don't understand what the "spammer" is trying to achieve. Any help would be appreciated.
White Hat / Black Hat SEO | | rufo
KInd Regards
Steve0 -
Robots.txt file in Shopify - Collection and Product Page Crawling Issue
Hi, I am working on one big eCommerce store which have more then 1000 Product. we just moved platform WP to Shopify getting noindex issue. when i check robots.txt i found below code which is very confusing for me. **I am not getting meaning of below tags.** Disallow: /collections/+ Disallow: /collections/%2B Disallow: /collections/%2b Disallow: /blogs/+ Disallow: /blogs/%2B Disallow: /blogs/%2b I can understand that my robots.txt disallows SEs to crawling and indexing my all product pages. ( collection/*+* ) Is this the query which is affecting the indexing product pages? Please explain me how this robots.txt work in shopify and once my page crawl and index by google.com then what is use of Disallow: Thanks.
White Hat / Black Hat SEO | | HuptechWebseo0 -
Cloaking for better user experience and deeper indexing - grey or black?
I'm working on a directory that has around 800 results (image rich results) in the top level view. This will likely grow over time so needs support thousands. The main issue is that it is built in ajax so paginated pages are dynamically generated and look like duplicate content to search engines. If we limit the results, then not all of the individual directory listing pages can be found. I have an idea that serves users and search engines what they want but uses cloaking. Is it grey or black? I've read http://moz.com/blog/white-hat-cloaking-it-exists-its-permitted-its-useful and none of the examples quite apply. To allow users to browse through the results (without having a single page that has a slow load time) we include pagination links but which are not shown to search engines. This is a positive user experience. For search engines we display all results (since there is no limit the number of links so long as they are not spammy) on a single page. This requires cloaking, but is ultimately serving the same content in slightly different ways. 1. Where on the scale of white to black is this? 2. Would you do this for a client's site? 3. Would you do it for your own site?
White Hat / Black Hat SEO | | ServiceCrowd_AU0 -
[linkbuilding] link partner page on webshop, is it working?
Hello Mozzers, I am wondering about the effect of link building by swapping links between websites and adding a link partner page to the web shop containing hundreds of links. I have this new competitor coming in to the SERP of Google competing on the keywords I am targeting. The competitor has way more links than our web shop. The competitor has a page with hundreds of links to other web shops witch on there turn has a link to there web shop. (not all off them link back btw) I always thought it is no use sharing links with other websites this way in creating a huge page with hundreds of links. it is of no benefit for neighter website to do this. Still it does seems to work (?) and tis strategy is used by a lot of web shops in the Netherlands. How are you guys looking at this?
White Hat / Black Hat SEO | | auke1810
Witch of you guy's are using strategy like this?
Should I pick up this strategy myself?0 -
Many Regional Pages: Bad for SEO?
Hello Moz-folks We are relatively well listed for "Edmonton web design." - the city we work out of. As an effort to reach out new clients, we created about 15 new pages targeting other cites in Alberta, BC and Saskatchewan. Although we began to show up quite well in some of these regions, we have recently seen our rankings in Edmonton drop by a few spots. I'm wondering if setting up regional pages that have lots of keywords for that region can be detrimental to our overall rankings.Here is one example of a regional page: http://www.web3.ca/red-deer-web-design Thanks, Anton TWeb3 Marketing Inc.
White Hat / Black Hat SEO | | Web3Marketing870 -
Would Headspace Plug-in be a bad idea?
We use the plug in headspace for some posts because some things we want to show in a certain way on our site ie with a certain title but we want the title to be more descriptive for google. It used to work really well but now I have noticed a lot of posts that used to do really well in search being flagged up for multiple meta description and headers that I wondered wether they would be harming the site's query stats? Does anyone think that after the penguin/panda updates etc using headspace might be a negative option?
White Hat / Black Hat SEO | | luwhosjack0 -
Trying to determine if my site was de-indexed...
I ran a search using the allinsite:floridainboundmarketing.com command and found that virtually all of my pages are not being returned in the results. I'm one of those who (foolishly) used ALN blog network for a few months, got the unnatural links notice in WMT and on advice of other SEOs (including some here) I ignored it based on the idea that if my SERPS dropped due to alog update that a request for reconsideration was of no value. As I watched my SERPs dropping I was confident that it was simply because those links were no longer being counted and overall link profile was poor, so the results started dropping. I've not read where G has gone back and started de-indexing pages for such sites but it may be happening as (unless I'm wrong) my site is gone... Anyone got any ideas? Am I searching correctly?
White Hat / Black Hat SEO | | sdennison0