Issue with Robots.txt file blocking meta description
-
Hi,
Can you please tell me why the following error is showing up in the serps for a website that was just re-launched 7 days ago with new pages (301 redirects are built in)?
A description for this result is not available because of this site's robots.txt – learn more.
Once we noticed it yesterday, we made some changed to the file and removed the amount of items in the disallow list.
Here is the current Robots.txt file:
# XML Sitemap & Google News Feeds version 4.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/ Sitemap: http://www.website.com/sitemap.xml Sitemap: http://www.website.com/sitemap-news.xml User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Other notes... the site was developed in WordPress and uses that followign plugins:
- WooCommerce All-in-One SEO Pack
- Google Analytics for WordPress
- XML Sitemap
- Google News Feeds
Currently, in the SERPs, it keeps jumping back and forth between showing the meta description for the www domain and showing the error message (above).
Originally, WP Super Cache was installed and has since been deactivated, removed from WP-config.php and deleted permanently.
One other thing to note, we noticed yesterday that there was an old xml sitemap still on file, which we have since removed and resubmitted a new one via WMT. Also, the old pages are still showing up in the SERPs.
Could it just be that this will take time, to review the new sitemap and re-index the new site?
If so, what kind of timeframes are you seeing these days for the new pages to show up in SERPs? Days, weeks? Thanks, Erin ```
-
At the moment, it doesn't seem that rel=publisher is doing all that much for sites (aside from sometimes showing better info ion the knowledge graph listing on Brand searches) but personally I believe it's functionality and influence are going to be greatly expanded fairly soon, so well worth doing. As far as it contributing anything to help speed up indexing... doubt it.
P.
-
Paul,
Thanks... you hit upon my hunch, that we will just have to wait.
Much of the information in the SERPs (metadescriptions, titles and urls) are still old,even though they redirect to the new pages when I click.
Thanks for the tip... and about social media.
Do you think it will help to get the rel=publisher link to the Google+ page on the site?
Erin
-
A lot of people, especially WP users use modules that may block certain spiders crawling your site, but in your case, you don't seem to have any.
-
If you just changed the robots.txt file yesterday, my guess is you're going to have to be patient while the site gets recrawled, Erin. Any of the pages that are in the index and were cached before yesterday's robots update will still include the directive not to include the metadescription (since that's the condition they were under when they were cached.)
I suspect the pages you're seeing with metadescriptions were crawled since the robots update. Are you seeing the same page change whether it shows metadescription or not?
As far as old pages showing in the SERPs, again they'll all have to be crawled before the 301 redirects can be discovered and the SEs can begin to understand they should be dropped. (Even then it can take days to weeks for the originals to drop out.)
Another very effective way to help get the new site indexed faster is to attract some good-quality new links to the new pages. Social Media can be especially effective for this, Google+ in particular.
Paul
-
Thanks!
What do I need to look for in the .htaccess file?
Here is what is there... and the rest (not shown) are redirects:
BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]</ifmodule> # END WordPress
BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]</ifmodule> # END WordPress
-
Thanks for the tips! Let me check it out.
-
I'd also insure its not something to do with your .htacess file.
-
Make sure the pages aren't blocked with meta robots noindex tag
Fetch as Google in WMT to request a full site recrawl.
Run brokenlinkcheck.com and see if their crawler is successfully crawling or if it's blocked.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Infinite scrolling issue?
Hi Guys, Reviewing this E-commerce page - https://tinyurl.com/ybjjwr65 Based on this Google article: https://webmasters.googleblog.com/2014/02/infinite-scroll-search-friendly.html It mentions: Make sure that you or your content management system produces a paginated series (component pages) to go along with your infinite scroll. How would you check this, is there a tool to conduct this test? Cheers.
Intermediate & Advanced SEO | | kayl870 -
How to solve this issue and avoid duplicated content?
My marketing team would like to serve up 3 pages of similar content; www.example.com/one, www.example.com/two and www.example.com/three; however the challenge here is, they'd like to have only one page whith three different titles and images based on the user's entry point (one, two, or three). To avoid duplicated pages, how would suggest this best be handled?
Intermediate & Advanced SEO | | JoelHer0 -
Meta Tags (again)
Hey, I know this has been discussed to death but look back through previous postings there doesn't seem to be a consensus on the exact Meta tags that an eCommerce site should include, specifically whether to remove the keyword tag or not since it is believed that Yahoo potentially still makes use of it. Currently our homepage has the following Meta Tags: <title>Buy Printer Cartridges | Ink and Toner Cartridge for Inkjet and Laser Printers</title> Description" content="<a class="attribute-value">Visit Refresh Cartridges for great prices on ink cartridges, toner cartridges, ink, printers and accessories.</a>" /> Keywords" content="<a class="attribute-value">ink cartridges, cheap cartridges, inkjet cartridges, inkjet ink cartridges, ink cartridge, printer ink cartridges, laser cartridges, toner, laser printers</a>" /> Content-Type" content="<a class="attribute-value">text/html; charset=iso-8859-1</a>"/> author" content="<a class="attribute-value">Ink Cartridges, Inkjet Cartridge, Printer Cartridge, Toner Cartridges Refresh Cartridges</a>" /> expires" content="<a class="attribute-value">0</a>" /> robots" content="<a class="attribute-value">noodp,index,follow</a>" /> Language" content="<a class="attribute-value">English</a>" /> Cache-Control" content="<a class="attribute-value">Public</a>" /> verify-v1" content="<a class="attribute-value">sJXqAAWP6ar/LTEOMyUgG6nqothxk62tJTid+ryBJxo=</a>" /> viewport" content="<a class="attribute-value">width=1024</a>" /> This is too messy but before I do something drastic that I'll possibly regret please can you confirm that, in your opinion, I am best to remove everything with the exception of this: <title>Buy Printer Cartridges | Ink and Toner Cartridge for Inkjet and Laser Printers</title> Description" content="<a class="attribute-value">Visit Refresh Cartridges for great prices on ink cartridges, toner cartridges, ink, printers and accessories.</a>" /> Content-Type" content="<a class="attribute-value">text/html; charset=iso-8859-1</a>"/>
Intermediate & Advanced SEO | | ChrisHolgate
viewport" content="<a class="attribute-value">width=1024</a>" /> I realise there is a verify-v1 tag in there but this can be done through a file on our server so while cleaning up that might as well go. Would there be an argument for keeping any of the other tags or are they all pretty much redundant now? Many thanks! Chris0 -
How to leverage browser cache a specific file
Hello all,
Intermediate & Advanced SEO | | asbchris
I am trying to figure out how to add leverage browser caching to these items. http://maps.googleapis.com/maps/api/js?v=3.exp&sensor=false&language=en http://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js http://www.google-analytics.com/analytics.js Whats hard is I understand the purpose, but unlike a css file, how do you specify an expiration on an actual direct path file? Any help or link to get help is appreciated. Chris0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Should I block wordpress archive and tag?
I use Wodpress and Wordpress SEO by Yoast. I've set ip up to add noindex meta tag on all archive and tag pages. I don't think its useful to include thoses pages in search results because there's quite a few. Especialy the tag archive. Should I consider anything else or change my mind? What do you think? Thanks
Intermediate & Advanced SEO | | Akeif0 -
Canonicalization issue I cant work out
Seo Moz have kindly brought to my attention some canonicalization issues with my site. Firstly I've adjust http://capitalalist.com to 301 redirect to http://www.capitalalist.com via htaccess. But the crawl has shown for every page in my site the problem below: http://www.capitalalist.com/cirque-du-soir http://www.capitalalist.com/cirque-du-soir/ It's just that last / that's causing the problem. But I can't seem to see anyone having the same issue before. BTW im using wordpress if that makes a difference. Can anyone elaborate on the issue? How would i adjust my htaccess file to redirect a request with a / on the end of it? Thanks in advance!
Intermediate & Advanced SEO | | AdenBrands0 -
Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?
Hello guys, A client of ours has thousand of pages returning 404 visibile on googl webmaster tools. These are all old pages which don't exist anymore but Google keeps on detecting them. These pages belong to sections of the site which don't exist anymore. They are not linked externally and didn't provide much value even when they existed What do u suggest us to do: (a) do nothing (b) redirect all these URL/folders to the homepage through a 301 (c) block these pages through the robots.txt. Are we inappropriately using part of the crawling budget set by Search Engines by not doing anything ? thx
Intermediate & Advanced SEO | | H-FARM0