Blog page won't get indexed
-
Hi Guys,
I'm currently asked to work on a website. I noticed that the blog posts won't get indexed in Google. www.domain.com/blog does get indexed but the blogposts itself won't. They have been online for over 2 months now.
I found this in the robots.txt file:
Allow: / Disallow: /kitchenhandle/ Disallow: /blog/comments/ Disallow: /blog/author/ Disallow: /blog/homepage/feed/
I'm guessing that the last line causes this issue. Does anyone have an idea if this is the case and why they would include this in the robots.txt?
Cheers!
-
Thanks alot!
-
Hi Dirk,
Good observation, I missed the canonical part somehow. So, google is indexing the canonical URLs here which doesn't have /blog/ in it and that's the problem. Have a look at the indexed page for this particular instance here. Non /blog/ instance is indexed, which will take you to its /blog/ version with wrong canonical URL.
Solution: Either remove the canonical URLs on these pages to point them to the current page itself. And yeah! As rightly mentioned by Dirk, do a proper /blog/ page linking from the blog page and other pages from where you're linking these articles.
-
This is definitely the issue. Fix that canonical and they'll be indexed.
-
To update - even worse: on the blog itself you are linking to the canonical version - not to the /blog/ version. So it would be impossible for Google to index /blog/ type of content.
If you do woontrends 2016 site:www.keukensduitsland.nl you will notice that the canonical version is properly indexed (even with the strange js redirect.
Dirk
-
It's not related to the robots.txt - you can easily check that in Webmastertools (Crawl > Robots.txt tester)
First issue is the location of the link - if you put a small link to the blog hidden in the left corner at the bottom of the page Google is not going to attribute a lot of importance to this link.
Most important issue on your blog articles is the canonical - example:
http://www.keukensduitsland.nl/blog/woontrends-2016/ has as canonical url: http://www.keukensduitsland.nl/woontrends-2016/ - however this page will redirect you with javascript to the blog article.
Make the canonical self referencing and do a proper redirect on the other pages (301 rather than js redirect)
Dirk
-
Hi Happy SEO,
Well, the robots.txt looks find here. Could you try to fetch any of the blog page/post as google in the search console and share the screenshot here?
Also, to cross check the robots.txt (which looks fine though), you have robots.txt tester in search console where you can put any blog page/post to check if bots can crawl it. Please share a screenshot of that as well.
On a separate note, the sitemap.xml link mentioned in the robots.txt (http://www.keukensduitsland.nl/sitemap.xml) is broken. Fix that as well.
-
Hi Nitin,
The URL is www.keukensduitsland.nl (/blog). The link to the blog page is in the bottom left corner called "Keukennieuws".
-
Hi Happy SEO,
Could you please share the blog URL here? Sounds like an interesting issue and would love to give a try to help you with this
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages not indexed
Hey everyone Despite doing the necessary checks, we have this problem that only a part of the sitemap is indexed.
Technical SEO | | conversal
We don't understand why this indexation doesn't want to take place. The major problem is that only a part of the sitemap is indexed. For a client we have several projects on the website with several subpages, but only a few of these subpages are indexed. Each project has 5 to 6 subpages. They all should be indexed. Project: https://www.brody.be/nl/nieuwbouwprojecten/nieuwbouw-eeklo/te-koop-eeklo/ Mainly subelements of the page are indexed: https://www.google.be/search?source=hp&ei=gZT1Wv2ANouX6ASC5K-4Bw&q=site%3Abrody.be%2Fnl%2Fnieuwbouwprojecten%2Fnieuwbouw-eeklo%2F&oq=site%3Abrody.be%2Fnl%2Fnieuwbouwprojecten%2Fnieuwbouw-eeklo%2F&gs_l=psy-ab.3...30.11088.0.11726.16.13.1.0.0.0.170.1112.8j3.11.0....0...1c.1.64.psy-ab..4.6.693.0..0j0i131k1.0.p6DjqM3iJY0 Do you have any idea what is going wrong here?
Thanks for your advice! Frederik
Digital marketeer at Conversal0 -
Disallowing WP 'author' page archives
Hey Mozzers. I want to block my author archive pages, but not the primary page of each author. For example, I want to keep /author/jbentz/ but get rid of /author/jbentz/page/4/. Can I do that in robots by using a * where the author name would be populated. ' So, basically... my robots file would include something like this... Disallow: /author/*/page/ Will this work for my intended goal... or will this just disallow all of my author pages?
Technical SEO | | Netrepid0 -
Home Page .index.htm and .com Duplicate Page Content/Title
I have been whittling away at the duplicate content on my clients' sites, thanks to SEOmoz's pro report, and have been getting push back from the account manager at register.com (the site was built here and the owner doesn't want to move it). He says these are the exact same page and he can't access one to redirect to the other. Any suggestions? The SEOmoz report says there is duplicate content on both these urls: Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/index.htm Durango Mountain Biking | Durango Mountain Resort - Cascade Village http://www.cascadevillagehotel.com/ Your help is greatly appreciated! Sheryl
Technical SEO | | TOMMarketingLtd.0 -
De-indexing millions of pages - would this work?
Hi all, We run an e-commerce site with a catalogue of around 5 million products. Unfortunately, we have let Googlebot crawl and index tens of millions of search URLs, the majority of which are very thin of content or duplicates of other URLs. In short: we are in deep. Our bloated Google-index is hampering our real content to rank; Googlebot does not bother crawling our real content (product pages specifically) and hammers the life out of our servers. Since having Googlebot crawl and de-index tens of millions of old URLs would probably take years (?), my plan is this: 301 redirect all old SERP URLs to a new SERP URL. If new URL should not be indexed, add meta robots noindex tag on new URL. When it is evident that Google has indexed most "high quality" new URLs, robots.txt disallow crawling of old SERP URLs. Then directory style remove all old SERP URLs in GWT URL Removal Tool This would be an example of an old URL:
Technical SEO | | TalkInThePark
www.site.com/cgi-bin/weirdapplicationname.cgi?word=bmw&what=1.2&how=2 This would be an example of a new URL:
www.site.com/search?q=bmw&category=cars&color=blue I have to specific questions: Would Google both de-index the old URL and not index the new URL after 301 redirecting the old URL to the new URL (which is noindexed) as described in point 2 above? What risks are associated with removing tens of millions of URLs directory style in GWT URL Removal Tool? I have done this before but then I removed "only" some useless 50 000 "add to cart"-URLs.Google says themselves that you should not remove duplicate/thin content this way and that using this tool tools this way "may cause problems for your site". And yes, these tens of millions of SERP URLs is a result of a faceted navigation/search function let loose all to long.
And no, we cannot wait for Googlebot to crawl all these millions of URLs in order to discover the 301. By then we would be out of business. Best regards,
TalkInThePark0 -
No index directory pages?
All, I have a site built on WordPress with directory software (edirectory) on the backend that houses a directory of members. The Wordpress portion of the site is full of content and drives traffic through to the directory. Like most directories, the results pages are thin on content and mainly contain links to member profiles. Is it best to simply no index the search results for the directory portion of the site?
Technical SEO | | JSOC0 -
Google indexing page with description
Hello, We rank fairly high for a lot of terms but Google is not indexing our descriptions properly. An example is with "arnold schwarzenegger net worth". http://www.google.ca/search?q=arnold+schwarzenegger+net+worth&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a When we add content, we throw up a placeholder page first. The content gets added with no body content and the page only contains the net worth amount of the celebrity. We then go back through and re-add the descriptions and profile bio shortly after. Will that affect how the pages are getting indexed and is there a way we can get Google to go back to the page and try to index the description so it doesn't just appear as a straight link? Thanks, Alex
Technical SEO | | Anti-Alex0 -
Grr . . . Just can't seem to get there
mrswitch.com.au is one site that we are consistantly struggling with . . . It has a page rank of 3 which beats most of the competitors, but when it comes to Google AU searches such as Sydney Electrician and Electrician Sydney etc, we just can't seem to get there and the rankings keep dropping. We backlink and update the pages on a regular basis Any ideas? - Could it be the custom CMS system?
Technical SEO | | kayweb0 -
Seomoz is showing duplicate page content for my wordpress blog
Hi Everyone, My seomoz crawl diagnostics is indicating that I have duplicate content issues in the wordpress blog section of my site located at: http://www.cleversplash.com/blog/ What is the best strategy to deal with this? Is there a plugin that can resolve this? I really appreciate your help guys. Martin
Technical SEO | | RogersSEO0