Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Link with Anchor to header of the page: Keyword is ranking
I saw something interesting this week. I am doing research and spec-ing out a content page we are creating and one of our competitors "office Depot" on their phone repair page create exact match keywords that lead to an anchor that took you to the header of that pages. They were ranking first for all of those keywords with little to no links Thier strategy is the more local long tail that includes "near me" Have you guys ever seen this
White Hat / Black Hat SEO | | uBreakiFix
this is the URL: https://www.officedepot.com/a/content/customer-service/samedayrepair/ They are ranking for these keywords ( Top 3 nationally ) iphone 6 repair near me iphone 7 repair near me I am assuming that this is both due to their PA and DA authority shifting the authority to itself, but it does not make sense how they are lacking in a lot of SEO low-hanging fruits like H1/H2 keyword saturation, URL, Title Tag within this content page....Anyone up for discussing this?0 -
Hacked Websites (Doorways) Ranking First Page of Google
Hello Moz community! I could really use your help with some suggestions here with some recent changes I've noticed in the Google serps for terms I've been currently working on. Currently one of the projects I am working on is for an online pharmacy and noticed that the SERPs are being now taken up by hacked websites which look like doorways to 301 redirect to an online pharmacy the hacker wants the traffic to go to. Seems like they may be wordpress sites that are hacked and have unrelated content on their websites compared to online pharmacies. We've submitted these issues as spam to Google and within chrome as well but haven't heard back. When searching terms like "Canadian Pharmacy Viagra" and other similar terms we see this issue. Any other recommendations on how we can fix this issue? Thanks for your time and attached is a screenshot of the results we are seeing for one of our searches. 1Orus
White Hat / Black Hat SEO | | monarkg0 -
Page plumetting with a optimisation score of 97\. HELP
Hi everyone, One of my pages has an optimisation score of 93, but ranks in 50+ place. What on earth can I do to address this? It's a course page so I've added the 'course' schema. I've added all the alt tags to say the keyword, UX signals aren't bad. Keyword is in the title tag. It has a meta description. Added an extra 7 internal, anchor-rich links pointing at the page this week. Nothing seems to address it. Any ideas? Cheers, Rhys
White Hat / Black Hat SEO | | SwanseaMedicine1 -
Submitting a page to Google Search Console or Bing Webmaster Tools with nofollow tags
Hello, I was hoping someone could help me understand if there is any point to submit a domain or subdomain to Google Search Console (Webmaster Tools) and Bing Webmaster Tools if the pages (on the subdomain for example) all have nofollow/noindex tags ... or the pages are being blocked by the robots.txt file). There are some pages on a data feed onto a subdomain which I manage that have these above characteristics ... which I cannot change ... but I am wondering if it is better to simply exclude from submitting those from GWT and BWT (above) thereby eliminating generating errors or warnings ... or is it better to tell Google and Bing about them anyway then perhaps there is a chance those nofollow pages may be indexed/contextualised in some way, making it worth the effort? Many thanks!
White Hat / Black Hat SEO | | uworlds
Mark0 -
Controlling crawl speed/delay through dynamic server-code and 503's
Lately i'm experiencing performance trouble caused by bot traffic. Although Googlebot is not the worst (it's mainly bingbot and ahrefsbot), they cause heavy server load from time to time. We run a lot of sites on one server, so heavy traffic on one site impacts other site's performance. Problem is that 1) I want a centrally managed solution for all sites (per site administration takes too much time), which 2) takes into account total server-load in stead of only 1 site's traffic and 3) controls overall bot-traffic in stead of controlling traffic for one bot. IMO user-traffic should always be prioritized higher than bot-traffic. I tried "Crawl-delay:" in robots.txt, but Googlebot doesn't support that. Although my custom CMS system has a solution to centrally manage Robots.txt for all sites at once, it is read by bots per site and per bot, so it doesn't solve 2) and 3). I also tried controlling crawl-speed through Google Webmaster Tools, which works, but again it only controls Googlebot (and not other bots) and is administered per site. No solution to all three of my problems. Now i came up with a custom-coded solution to dynamically serve 503 http status codes to a certain portion of the bot traffic. What traffic-portion for which bots can be dynamically (runtime) calculated from total server load at that certain moment. So if a bot makes too much requests within a certain period (or whatever other coded rule i'll invent), some requests will be answered with a 503 while others will get content and a 200. Remaining question is: Will dynamically serving 503's have a negative impact on SEO? OK, it will delay indexing speed/latency, but slow server-response-times do in fact have a negative impact on the ranking, which is even worse than indexing-latency. I'm curious about your expert's opinions...
White Hat / Black Hat SEO | | internetwerkNU1 -
Hits in H1 will improve ranking by regular crawling ?
Hello ! I was wondering if it's a good idea to keep the "Hits" in the H1 ? http://www.ibremarketing.com/item/netapp-e5400-storage-system.html Will Google come to check regularly the update (new information if I'm right) or if he will not like the idea to come back just for hits update. As I have very good results on this part of the website, I do not want to take any risk. Thanks a lot !
White Hat / Black Hat SEO | | AymanH0 -
Is widget linkbaiting a bad idea now that webmasters are getting warnings of unnatural links?
I was reading this article about how many websites are being deindexed because of an unnatural linking profile and it got me thinking about some widgets that I have created. In the example given, a site was totally deindexed and the author believes the reason was because of multiple footer links from themes that they created. I have one site that has a very popular widget that I offer to others to embed into their site. The embed code contains a line that says, "Tool provided by Site Name". Now, it just so happens that my site name contains my main keyword. So, if I have hundreds of websites using this tool and linking back to me using the same anchor text, could Google see this as unnatural and possibly deindex me? I have a few thoughts on what I should do but would love to hear your thoughts: 1. I could use a php script to provide one of several different anchor text options when giving my embed code. 2. I could change the embed code so that the anchor text is simply my domain name, ie www.mywebsitename.com rather than "my website name". Thoughts?
White Hat / Black Hat SEO | | MarieHaynes1 -
Campaign landing pages
Hi At our company we decided we wanted to reach out to a more global audience. So we bought a bank of domains for different countries, e.g. ".asia". Some are our company name, others are things like "barcelonaprivatejets.com." We then put up single page websites for each of these domains, which link to our main .com site. However, I don't know if this is good for our SEO or bad. I've seen so many different things written but I cannot find a definitive answer. The text will be different on all the pages, but being only one page, and the "design" being the same, will we get penalized in some way or another? I've also added links to 2/3 of them in the footer of our main site but now I'm reading that this is bad too - so should I remove these? If anyone also has any ideas of how better we could use these Country-specific domains I would be welcome to suggestions to that too! I am not an SEO person really, I'm a web developer, so this is all completely different to me. P.S My name is Michael not Andy.
White Hat / Black Hat SEO | | JetBookMike0