Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to rank with this page http://www.servicesarab.com/%D9%86%D9%82%D9%84-%D8%B9%D9%81%D8%B4-%D8%A7%D9%84%D9%83%D9%88%D9%8A%D8%AA/
i want to rank with this page http://www.servicesarab.com/%D9%86%D9%82%D9%84-%D8%B9%D9%81%D8%B4-%D8%A7%D9%84%D9%83%D9%88%D9%8A%D8%AA/
White Hat / Black Hat SEO | | saharali150 -
Hacked Websites (Doorways) Ranking First Page of Google
Hello Moz community! I could really use your help with some suggestions here with some recent changes I've noticed in the Google serps for terms I've been currently working on. Currently one of the projects I am working on is for an online pharmacy and noticed that the SERPs are being now taken up by hacked websites which look like doorways to 301 redirect to an online pharmacy the hacker wants the traffic to go to. Seems like they may be wordpress sites that are hacked and have unrelated content on their websites compared to online pharmacies. We've submitted these issues as spam to Google and within chrome as well but haven't heard back. When searching terms like "Canadian Pharmacy Viagra" and other similar terms we see this issue. Any other recommendations on how we can fix this issue? Thanks for your time and attached is a screenshot of the results we are seeing for one of our searches. 1Orus
White Hat / Black Hat SEO | | monarkg0 -
Duplication Effects on Page Rank and Domain Authority
Hi Does page rank and domain authority page rank drop due to duplication issues on a web domain or on a web page? Thanks.
White Hat / Black Hat SEO | | SEOguy10 -
How do I optimize pages for content that changes everyday?
Hi Guys I run daily and weekend horoscopes on my site, the daily horoscopes are changing every day for obvious reasons, and the weekend horoscopes change every weekend. However, I'm stuck in how the pages need to be structured. I also don't know how I should go about creating title tags and meta tags for content that changes daily. Each daily and weekend entry creates a new page. As you can see here http://bit.ly/1FV6x0y you can see todays horoscope. Since our weekend horoscopes cover Friday Sat and Sunday, there is no daily for Friday, so it shows duplicate pages across Friday, Sat and sunday. If you click on today, tomorrow and weekend all pages showing are duplicate and this will happen for each star sign from Fri, Sat Sun. My question is, will I be penalized doing this? Even if the content changes? How can I optimize the Title Tags and Meta Tags for pages that are constantly changing? I'm really stuck on this one and would appreciate some feedback into this tricky beast. Thanks in advance
White Hat / Black Hat SEO | | edward-may0 -
A doorway-page vendor has made my SEO life a nightmare! Advice anyone!?
Hey Everyone, So I am the SEO at a mid-sized nationwide retailer and have been working there for almost a year and half. This retailer is an SEO nightmare. Imagine the worst possible SEO nightmare, and that is my unfortunate yet challenging everyday reality. In light of the new algorithm update that seems to be on the horizon from Google to further crack down on the usage of doorway pages, I am coming to the Moz community for some desperately needed help. Before I was employed here, the eCommerce director and SEM Manager connected with a vendor that told them basically that they can do a PPC version of SEO for long-tail keywords. This vendor sold them on the idea that they will never compete with our own organic content and can bring in incremental traffic and revenue due to all of this wonderful technology they have that is essentially just a scraper. So for the past three years, this vendor has been creating thousands of doorway pages that are hosted on their own server but our masked as our own pages. They do have a massive index / directory in HTML attached to our website and even upload their own XML site maps to our Google Web Master Tools. So even though they “own” the pages, they masquerade as our own organic pages. So what we have today is thousands upon thousands of product and category pages that are essentially built dynamically and regurgitated through their scraper / platform, whatever. ALL of these pages are incredibly thin in content and it’s beyond me how Panda has not exterminated them. ALL of these pages are built entirely for search engines, to the point that you would feel like the year was 1998. All of these pages are incredibly over- optimized with spam that really is equivalent to just stuffing in a ton of meta keywords. (like I said – 1998) Almost ALL of these scraped doorway pages cause an incredible amount of duplicate content issues even though the “account rep” swears up and down to the SEM Manager (who oversees all paid programs) that they do not. Many of the pages use other shady tactics such as meta refresh style bait and switching. For example: The page title in the SERP shows as: Personalized Watch Boxes When you click the SERP and land on the doorway page the title changes to: Personalized Wrist Watches. Not one actual watch box is listed. They are ALL simply the most god awful pages in terms of UX that you will ever come across BUT because of the sheer volume of this pages spammed deep within the site, they create revenue just playing the odds game. Executives LOVE revenue. Also, one of this vendor’s tactics when our budget spend is reduced for this program is to randomly pull a certain amount of their pages and return numerous 404 server errors until spend bumps back up. This causes a massive nightmare for me. I can go on and on but I think you get where I am going. I have spent a year and half campaigning to get rid of this black-hat vendor and I am finally right on the brink of making it happen. The only problem is, it will be almost impossible to not drop in revenue for quite some time when these pages are pulled. Even though I have helped create several organic pages and product categories that will pick-up the slack when these are pulled, it will still be awhile before the dust settles and stabilizes. I am going to stop here because I can write a novel and the millions of issues I have with this vendor and what they have done. I know this was a very long and open-ended essay of this problem I have presented to you guys in the Moz community and I apologize and would love to clarify anything I can. My actual questions would be: Has anyone gone through a similar situation as this or have experience dealing with a vendor that employs this type of black-hat tactic? Is there any advice at all that you can offer me or experiences that you can share that can help be as armed as I can when I eventually convince the higher-ups they need to pull the plug? How can I limit the bleeding and can I even remotely rely on Google LSI to serve my organic pages for the related terms of the pages that are now gone? Thank you guys so much in advance, -Ben
White Hat / Black Hat SEO | | VBlue1 -
SEO for Career sites and sup-pages
For main job categories: We manage several career pages for several clients but the competition for the main keywords (even several long tail) is from big names like Indeed and similar job boards?
White Hat / Black Hat SEO | | rflores
What would you recommend? For job posts: Since the job posts that our clients post are short lived (80% live less than a month) would it still be incorrect to purchase backlinks? or is it always a big no Thanks for your help. And if a similar question has been asked I would appreciate if you could point me to it. I could not find one.0 -
Pages higher than my website in Google have fewer links and a lower page authority
Hi there I've been optimising my website pureinkcreative.com based on advice from SEOMoz and at first this was working as in a few weeks the site had gone from nowhere to the top of page three in Google for our main search term 'copywriting'. Today though I've just checked and the website is now near the bottom of page four and competitors I've never heard of are above my site in the rankings. I checked them out on Open Site Explorer and many of these 'newbies' have less links (on average about 200 less links) and a poorer page authority. My page authority is 42/100 and the newly higher ranking websites are between 20 and 38. One of these pages which is ranking higher than my website only has internal links and every link has the anchor text of 'copywriting' which I've learnt is a bad idea. I'm determined to do whiter than white hat SEO but if competitors are ranking higher than my site because of 'gimmicks' like these, is it worth it? I add around two blog posts a week of approx 600 - 1000 words of well researched, original and useful content with a mix of keywords (copywriting, copywriter, copywriters) and some long tail keywords and guest blog around 2 - 3 times a month. I've been working on a link building campaign through guest blogging and comment marketing (only adding relevant, worthwhile comments) and have added around 15 links a week this way. Could this be why the website has dropped in the rankings? Any advice would be much appreciated. Thanks very much. Andrew
White Hat / Black Hat SEO | | andrewstewpot0 -
404checker.com / crawl errors
I noticed a few strange crawl errors in a Google Webmaster Tools account - further investigation showed they're pages that don't exist linked from here: http://404checker.com/404-checker-log Basically that means anyone can enter a URL into the website and it'll get linked from that page, temporarily at least. As there are hundreds of links of varying quality - at the moment they range from a well known car manufacturer to a university, porn and various organ enlargement websites - could that have a detrimental effect on any websites linked? They are all nofollow. Why would they choose to list these URLs on their website? It has some useful tools and information but I don't see the point in the log page. I have used it myself to check HTTP statuses but may look elsewhere from now on.
White Hat / Black Hat SEO | | Alex-Harford0