Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Script must not be placed outside HTML tag? If not, how Google treats the page?
Hi, We have recently received the "deceptive content" warning from Google about some of our website pages. We couldn't able to find the exact reason behind this. However, we placed some script outside the HTML tag in some pages (Not in the same pages with the above warning). We wonder whether this caused an issue to Google to flag our pages. Please help. Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Link with Anchor to header of the page: Keyword is ranking
I saw something interesting this week. I am doing research and spec-ing out a content page we are creating and one of our competitors "office Depot" on their phone repair page create exact match keywords that lead to an anchor that took you to the header of that pages. They were ranking first for all of those keywords with little to no links Thier strategy is the more local long tail that includes "near me" Have you guys ever seen this
White Hat / Black Hat SEO | | uBreakiFix
this is the URL: https://www.officedepot.com/a/content/customer-service/samedayrepair/ They are ranking for these keywords ( Top 3 nationally ) iphone 6 repair near me iphone 7 repair near me I am assuming that this is both due to their PA and DA authority shifting the authority to itself, but it does not make sense how they are lacking in a lot of SEO low-hanging fruits like H1/H2 keyword saturation, URL, Title Tag within this content page....Anyone up for discussing this?0 -
Hacked Websites (Doorways) Ranking First Page of Google
Hello Moz community! I could really use your help with some suggestions here with some recent changes I've noticed in the Google serps for terms I've been currently working on. Currently one of the projects I am working on is for an online pharmacy and noticed that the SERPs are being now taken up by hacked websites which look like doorways to 301 redirect to an online pharmacy the hacker wants the traffic to go to. Seems like they may be wordpress sites that are hacked and have unrelated content on their websites compared to online pharmacies. We've submitted these issues as spam to Google and within chrome as well but haven't heard back. When searching terms like "Canadian Pharmacy Viagra" and other similar terms we see this issue. Any other recommendations on how we can fix this issue? Thanks for your time and attached is a screenshot of the results we are seeing for one of our searches. 1Orus
White Hat / Black Hat SEO | | monarkg0 -
How to stop google bot from crawling spammy injected pages by hacker?
Hello, Please help me. Our one of website is under attack by hacker once again. They have injected spammy URL and google is indexing, but we could not find these pages on our website. These all are 404 Pages. Our website is not secured. No HTTPS Our website is using wordpress CMS Thanks
White Hat / Black Hat SEO | | ShahzadAhmed0 -
On Page #2 of Bing But Nowhere on Google. Please Help !
Hi, community. I have a problem with the ranking of my blog and I hope anyone could help me to solve this problem. I have been trying to rank my blog post for a keyword for almost 6 months but still getting no success. My URL is: this blog post
White Hat / Black Hat SEO | | Airsionquin
Target keyword: best laptops for college The interesting fact is that the post has been on page #2 of BING but nowhere on google. It was on page #3 of google for about one month, but it's been 1-2 weeks gone(not ranked anymore but it's still well indexed). The post has been replaced by another post of my blog(let's say post A) which doesn't have any link. The Post A is ranking on page #4 right now.
The weird thing is my post which ranks for this keyword frequently changes. One day the Post A was on page#4 then after a few days it changed to the post B. Yesterday I searched on google for a keyword "number one on bing but nowhere on google" and then I
come across to read this article on MOZ community and one of the people here said that it was over optimization issue. I think my post has been suffering for an over optimization penalty algorithm. Just for your information, I have been building backlinks to this URL for the last 5 months(it's 1+ year old). It has backlinks only about 1,5k from 200 domains(according to ahref). I have used the exact match anchor only under +/- 2%. The rest is branded, naked URL and generic anchors.
So, in this case, I thought that I haven't done any over anchor optimization.
I have checked the keyword density and I found it was "safe". One important thing I can remember before the post has gone is I add a backlink from lifehack.org(guest post) with exact match anchor.
I suspect this is really the cause because 2-3 days after doing that then the post is gone(dropped) and replaced by another post of my blog(as I've mentioned before). But it's very strange because the amount of the anchor keyword(including the long tail) is only about 10(from 200 domains) or only 5% which mean it should be safe. I'm so Sorry. It's a long story 🙂 So, What is actually happening to my post? and How to fix this problem... Please..please help me... Any hep is appreciated. By the way, Sorry for my poor english.. 🙂0 -
A Sitemap Web page & A Sitemap in htaccess - will a website be penalised for having both?
Hi I have a sitemap url already generated by SEO Yoast in the htaccess file, and I have submitted that to the search engines. I'd already created a sitemap web page on the website, also as a helpful aid for users to see a list of all page urls. Is this a problem and could this scenario create duplicate issues or any problems with search engines? Thanks.
White Hat / Black Hat SEO | | SEOguy10 -
Sudden Ranking Drop from 1st Page
My client's Website http://countryfeelingholidays.com is experiencing a huge drop of its rankings since Aug 1st. It was at 2nd on 1st page on google.lk for the keyword Holidays Sri Lanka . But When I checked it last it has gone to 20th page. I really cannot find a reason for this drop . Only thing that comes to mind is that we put a comment on a blog but finally it appeared on all pages because of top commentator plugin . huge rise in backlinks in oneday . from next day we lost its ranking on google.lk but on google.com it is still at the same position where it used to be . What would be the reason for this ? Could it be a penelty ? What should we do now to get its ranking back ?
White Hat / Black Hat SEO | | Osanda0 -
My attempt to reduce duplicate content got me slapped with a doorway page penalty. Halp!
On Friday, 4/29, we noticed that we suddenly lost all rankings for all of our keywords, including searches like "bbq guys". This indicated to us that we are being penalized for something. We immediately went through the list of things that changed, and the most obvious is that we were migrating domains. On Thursday, we turned off one of our older sites, http://www.thegrillstoreandmore.com/, and 301 redirected each page on it to the same page on bbqguys.com. Our intent was to eliminate duplicate content issues. When we realized that something bad was happening, we immediately turned off the redirects and put thegrillstoreandmore.com back online. This did not unpenalize bbqguys. We've been looking for things for two days, and have not been able to find what we did wrong, at least not until tonight. I just logged back in to webmaster tools to do some more digging, and I saw that I had a new message. "Google Webmaster Tools notice of detected doorway pages on http://www.bbqguys.com/" It is my understanding that doorway pages are pages jammed with keywords and links and devoid of any real content. We don't do those pages. The message does link me to Google's definition of doorway pages, but it does not give me a list of pages on my site that it does not like. If I could even see one or two pages, I could probably figure out what I am doing wrong. I find this most shocking since we go out of our way to try not to do anything spammy or sneaky. Since we try hard not to do anything that is even grey hat, I have no idea what could possibly have triggered this message and the penalty. Does anyone know how to go about figuring out what pages specifically are causing the problem so I can change them or take them down? We are slowly canonical-izing urls and changing the way different parts of the sites build links to make them all the same, and I am aware that these things need work. We were in the process of discontinuing some sites and 301 redirecting pages to a more centralized location to try to stop duplicate content. The day after we instituted the 301 redirects, the site we were redirecting all of the traffic to (the main site) got blacklisted. Because of this, we immediately took down the 301 redirects. Since the webmaster tools notifications are different (ie: too many urls is a notice level message and doorway pages is a separate alert level message), and the too many urls has been triggering for a while now, I am guessing that the doorway pages problem has nothing to do with url structure. According to the help files, doorway pages is a content problem with a specific page. The architecture suggestions are helpful and they reassure us they we should be working on them, but they don't help me solve my immediate problem. I would really be thankful for any help we could get identifying the pages that Google thinks are "doorway pages", since this is what I am getting immediately and severely penalized for. I want to stop doing whatever it is I am doing wrong, I just don't know what it is! Thanks for any help identifying the problem! It feels like we got penalized for trying to do what we think Google wants. If we could figure out what a "doorway page" is, and how our 301 redirects triggered Googlebot into saying we have them, we could more appropriately reduce duplicate content. As it stands now, we are not sure what we did wrong. We know we have duplicate content issues, but we also thought we were following webmaster guidelines on how to reduce the problem and we got nailed almost immediately when we instituted the 301 redirects.
White Hat / Black Hat SEO | | CoreyTisdale0