Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl anamoly issue on Search Console
Has anyone checked the crwal anamoly issue under the index section on Search console? We recently move to a new site and I'm seeing a huge list of excluded urls which are classified as crawl anamoly (they all lead to 404 page). Does anyone know that if we need to 301 redirect all the links? Is there any other smarter/ more efficiently way to deal with them like set up canonical link (I thought that's what they're used for isn't it?) Thanks!
White Hat / Black Hat SEO | | greenshinenewenergy0 -
Why would a blank page rank? What am I missing about this page?
In terms of content, this page is blank. Yes, there's a sidebar and footer, but no content. I've seen a page like this rank before. I'm curious if they're implementing something on the back-end I don't realize or if this is just a fluke? Etc. Also, the DA of the site is only a 15, so I don't think that's the reason. http://www.thenurselawyer.com/component/tags/tag/20-pasco-county-personal-injury-lawyers.html Thanks, Ruben
White Hat / Black Hat SEO | | KempRugeLawGroup1 -
How does Google handle product detail page links hiden in a <noscript>tag?</noscript>
Hello, During my research of our website I uncovered that our visible links to our product detail pages (PDP) from grid/list view category-nav/search pages are <nofollowed>and being sent through a click tracking redirect with the (PDP) appended as a URL query string. But included with each PDP link is a <noscript>tag containing the actual PDP link. When I confronted our 3rd party e-commerce category-nav/search provider about this approach here is the response I recieved:</p> <p style="padding-left: 30px;">The purpose of these links is to firstly allow us to reliably log the click and then secondly redirect the visitor to the target PDP.<br /> In addition to the visible links there is also an "invisible link" inside the no script tag. The noscript tag prevents showing of the a tag by normal browsers but is found and executed by bots during crawling of the page.<br /> Here a link to a blog post where an SEO proved this year that the noscript tag is not ignored by bots: <a href="http://www.theseotailor.com.au/blog/hiding-keywords-noscript-seo-experiment/" target="_blank">http://www.theseotailor.com.au/blog/hiding-keywords-noscript-seo-experiment/<br /> </a> <br /> So the visible links are not obfuscating the PDP URL they have it encoded as it otherwise cannot be passed along as a URL query string. The plain PDP URL is part of the noscript tag ensuring discover-ability of PDPs by bots.</p> <p>Does anyone have anything in addition to this one blog post, to substantiate the claim that hiding our links in a <noscript> tag are in fact within the SEO Best Practice standards set by Google, Bing, etc...? </p> <p>Do you think that this method skirts the fine line of grey hat tactics? Will google/bing eventually penalize us for this?</p> <p>Does anyone have a better suggestion on how our 3rd party provider could track those clicks without using a URL redirect & hiding the actual PDP link?</p> <p>All insights are welcome...Thanks!</p> <p>Jordan K.</p></noscript></nofollowed>
White Hat / Black Hat SEO | | eImprovement-SEO0 -
Redirecting 86'd Brand Product Category Page
What would be the approach if my website is no longer selling products for a brand that is driving top organic traffic? Where should I redirect the traffic on the page? I'm trying to decide between the homepage or another similar brand product page.
White Hat / Black Hat SEO | | JMSCC0 -
Why is this Page Ranking for such a competitive keyword?
Hello MOZ Community! I have a question, I am hoping someone can help me understand. I am looking at this URL: http://goo.gl/BkSish ...it is ranking for this Keyword: POS Systems Now, this seems to be a pretty new URL, with few links being generated to it, as seen here: Open Site Explorer: http://moz.com/researchtools/ose/comparisons?site=http%3A%2F%2Fwww.shopkeep.com%2Fpos-system Majestic SEO: https://www.majesticseo.com/reports/site-explorer/link-profile?folder=&q=http%3A%2F%2Fwww.shopkeep.com%2Fpos-system&oq=http%3A%2F%2Fwww.shopkeep.com%2Fpos-system&IndexDataSource=F&wildcard=1 QUESTION: Can someone help us understand how or why this page is ranking so well, so quickly, for such a competitive Keyword? Thank you!
White Hat / Black Hat SEO | | mstpeter0 -
On-site duplication working - not penalised - any ideas?
I've noticed a website that has been set up with many virtually identical pages. For example many of them have the same content (minimal text, three video clips) and only the town name varies. Surely this is something that Google would be against? However the site is consistently ranking near the top of Google page 1, e.g. http://www.maxcurd.co.uk/magician-guildford.html for "magician Guildford", http://www.maxcurd.co.uk/magician-ascot.html for "magician Ascot" and so on (even when searching without localisation or personalisation). For years I've heard SEO experts say that this sort of thing is frowned on and that they will get penalised, but it never seems to happen. I guess there must be some other reason that this site is ranked highly - any ideas? The content is massively duplicated and the blog hasn't been updated since 2012 but it is ranking above many established older sites that have lots of varied content, good quality backlinks and regular updates. Thanks.
White Hat / Black Hat SEO | | MagicianUK0 -
Cloaking for better user experience and deeper indexing - grey or black?
I'm working on a directory that has around 800 results (image rich results) in the top level view. This will likely grow over time so needs support thousands. The main issue is that it is built in ajax so paginated pages are dynamically generated and look like duplicate content to search engines. If we limit the results, then not all of the individual directory listing pages can be found. I have an idea that serves users and search engines what they want but uses cloaking. Is it grey or black? I've read http://moz.com/blog/white-hat-cloaking-it-exists-its-permitted-its-useful and none of the examples quite apply. To allow users to browse through the results (without having a single page that has a slow load time) we include pagination links but which are not shown to search engines. This is a positive user experience. For search engines we display all results (since there is no limit the number of links so long as they are not spammy) on a single page. This requires cloaking, but is ultimately serving the same content in slightly different ways. 1. Where on the scale of white to black is this? 2. Would you do this for a client's site? 3. Would you do it for your own site?
White Hat / Black Hat SEO | | ServiceCrowd_AU0 -
Negative SEO to inner page: remove page or disavow links?
Someone decided to run a negative-SEO campaign, hitting one of the inner pages on my blog 😞 I noticed the links started to pile up yesterday but I assume there will be more to come over the next few days. The targeted page is of little value to my blog, so the question is: should I remove the affected page (hoping that the links won't affect the entire site) or to submit a disavow request? I'm not concerned about what happens to the affected page, but I want to make sure the entire site doesn't get affected as a result of the negative-SEO. Thanks in advance. Howard
White Hat / Black Hat SEO | | howardd0