Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplication content management across a subdir based multisite where subsites are projects of the main site and naturally adopt some ideas and goals from it
Hi, I have the following problem and would like which would be the best solution for it: I have a site codex21.gal that is actually part of a subdirectories based multisite (galike.net). It has a domain mapping setup, but it is hosted on a folder of galike.net multisite (galike.net/codex21). My main site (galike.net) works as a frame-brand for a series of projects aimed to promote the cultural & natural heritage of a region in NW Spain through creative projects focused on the entertainment, tourism and educational areas. The projects themselves will be a concretion (put into practice) of the general views of the brand, that acts more like a company brand. CodeX21 is one of those projects, it has its own logo, etc, and is actually like a child brand, yet more focused on a particular theme. I don't want to hide that it makes part of the GALIKE brand (in fact, I am planning to add the Galike logo to it, and a link to the main site on the menu). I will be making other projects, each of them with their own brand, hosted in subsites (subfolders) of galike.net multisites. Not all of them might have their own TLD mapped, some could simply be www.galike.net/projectname. The project codex21.gal subsite might become galike.net/codex21 if it would be better for SEO. Now, the problem is that my subsite codex21.gal re-states some principles, concepts and goals that have been defined (in other words) in the main site. Thus, there are some ideas (such as my particular vision on the possibilities of sustainable exploitation of that heritage, concepts I have developed myself as "narrative tourism" "geographical map as a non lineal story" and so on) that need to be present here and there on the subsite, since it is also philosophy of the project. BUT it seems that Google can penalise overlapping content in subdirectories based multisites, since they can seem a collection of doorways to access the same product (*) I have considered the possibility to substitute those overlapping ideas with links to the main page of the site, thought it seems unnatural from the user point of view to be brought off the page to read a piece of info that actually makes part of the project description (every other child project of Galike might have the same problem). I have considered also taking the subsite codex21 out of the network and host it as a single site in other server, but the problem of duplicated content might persist, and anyway, I should link it to my brand Galike somewhere, because that's kind of the "production house" of it. So which would be the best (white hat) strategy, from a SEO point of view, to arrange this brand-project philosophy overlapping? (*) “All the same IP address — that’s really not a problem for us. It’s really common for sites to be on the same IP address. That’s kind of the way the internet works. A lot of CDNs (content delivery networks) use the same IP address as well for different sites, and that’s also perfectly fine. I think the bigger issue that he might be running into is that all these sites are very similar. So, from our point of view, our algorithms might look at that and say “this is kind of a collection of doorway sites” — in that essentially they’re being funnelled toward the same product. The content on the sites is probably very similar. Then, from our point of view, what might happen is we will say we’ll pick one of these pages and index that and show that in the search results. That might be one variation that we could look at. In practice that wouldn’t be so problematic because one of these sites would be showing up in the search results. On the other hand, our algorithm might also be looking at this and saying this is clearly someone trying to overdo things with a collection of doorway sites and we’ll demote all of them. So what I recommend doing here is really trying to take a step back and focus on fewer sites and making those really strong, and really good and unique. So that they have unique content, unique products that they’re selling. So then you don’t have this collection of a lot of different sites that are essentially doing the same thing.” (John Mueller, Senior Webmaster Trend Analyst at Google. https://www.youtube.com/watch?time_continue=1&v=kQIyk-2-wRg&feature=emb_logo)
White Hat / Black Hat SEO | | PabloCulebras0 -
Does Lazy Loading Create Indexing Issues of products?
I have store with 5000+ products in one category & i m using Lazy Loading . Does this effects in indexing these 5000 products. as google says they index or read max 1000 links on one page.
White Hat / Black Hat SEO | | innovatebizz0 -
Site Search external hosted pages - Penguin
Hi All, On the site www.myworkwear.co.uk we have a an externally hosted site search that also creates separately hosted pages of popular searches which rank in Google and create traffic. An example of this is listed below: Google Search: blue work trousers (appears on front page of Google) Site Champion Page: http://workwear.myworkwear.co.uk/workwear/Navy%20Blue%20Work%20Trousers Nearest Category page: http://www.myworkwear.co.uk/category/Mens-Work-Trousers-936.htm Could this be a penalisation or duplication factor? Could these be interpreted as a dodgy link factor? Thanks in advance for your help. Kind Regards, Andy Southall
White Hat / Black Hat SEO | | MarzVentures0 -
Opinions sought on outbound Links page.
Hello Forum, I'm about the remove my outbound Links page at: http://www.pictureframe.com.au/---obs--picture-frames-links.html I think that Google could be assessing this page as a link scheme, ie: I-link-you-if-you-link me. I haven't received any messages from Google about this but I think the page may be devaluing my site. What do you guys~gals think? Thank you for any and all feedback Paul the Picture Framer
White Hat / Black Hat SEO | | Picframer0 -
Duplicate content showing on local pages
I have several pages which are showing duplicate content on my site for web design. As its a very competitive market I had create some local pages so I rank high if someone is searching locally i.e web design birmingham, web design tamworth etc.. http://www.cocoonfxmedia.co.uk/web-design.html http://www.cocoonfxmedia.co.uk/web-design-tamworth.html http://www.cocoonfxmedia.co.uk/web-design-lichfield.html I am trying to work out what is the best way reduce the duplicate content. What would be the best way to remove the duplicate content? 1. 301 redirect (will I lose the existing page) to my main web design page with the geographic areas mentioned. 2. Re write the wording on each page and make it unique? Any assistance is much appreciated.
White Hat / Black Hat SEO | | Cocoonfxmedia0 -
Pages Getting Deindexed
My Question Is I have 16 pages on my site that were all indexed until yesterday now there are only 3 indexed. I tried resubmitting my site map, and when i did it was the same result as before 3 pages indexed and 13 pages deindexed. I was wondering if someone could explain to me why this is happening and what I can do to fix it? Keep in mind my site is almost three months old, and this has happened before but, it fixed itself over time thanks.
White Hat / Black Hat SEO | | ilyaelbert0 -
Google Results Pages. after the bomb
So, ever since Google "went nuclear" a few weeks ago I have seen major fluctuations in search engine results pages. Basically what I am seeing is a settling down and RE-inclusion of some of my web properties. Basically I had a client affected by the hack job initially, but about a week later I not only saw my original ranking restored but a litany of other long tails appeared. I wasn't using any shady link techniques but did have considerable article distribution that MAY have connected me inadvertently to some of the "bad neighborhoods." The website itself is a great site with original relevant content, so if it is possible, Google definitely recognized some error in their destructive ranking adjustment and is making good on it for those sites that did not deserve the penalty. Alternatively, it could just be random Google reordering and I got lucky. What are your experiences with the updates?
White Hat / Black Hat SEO | | TheGrid0 -
Pages For Products That Don't Exist Yet?
Hi, I have a client that makes products that are accessories for other company's popular consumer products. Their own products on their website rank for other companies product names like, for made up example "2011 Super Widget" and then my client's product... "Charger." So, "Super Widget 2011 Charger" might be the type of term my client would rank for. Everybody knows the 2012 Super Widget will be out in some months and then my client's company will offer the 2012 Super Widget Charger. What do you think of launching pages now for the 2012 Super Widget Charger. even though it doesn't exist yet in order to give those pages time to rank while the terms are half as competitive. By the time the 2012 is available, these pages have greater authority/age and rank, instead of being a little late to the party? The pages would be like "coming soon" pages, but still optimized to the main product search term. About the only negative I see is that they'lll have a higher bounce rate/lower time on page since the 2012 doesn't even exist yet. That seems like less of a negative than the jump start on ranking. What do you think? Thanks!
White Hat / Black Hat SEO | | 945010