Issue with Robots.txt file blocking meta description
-
Hi,
Can you please tell me why the following error is showing up in the serps for a website that was just re-launched 7 days ago with new pages (301 redirects are built in)?
A description for this result is not available because of this site's robots.txt – learn more.
Once we noticed it yesterday, we made some changed to the file and removed the amount of items in the disallow list.
Here is the current Robots.txt file:
# XML Sitemap & Google News Feeds version 4.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/ Sitemap: http://www.website.com/sitemap.xml Sitemap: http://www.website.com/sitemap-news.xml User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Other notes... the site was developed in WordPress and uses that followign plugins:
- WooCommerce All-in-One SEO Pack
- Google Analytics for WordPress
- XML Sitemap
- Google News Feeds
Currently, in the SERPs, it keeps jumping back and forth between showing the meta description for the www domain and showing the error message (above).
Originally, WP Super Cache was installed and has since been deactivated, removed from WP-config.php and deleted permanently.
One other thing to note, we noticed yesterday that there was an old xml sitemap still on file, which we have since removed and resubmitted a new one via WMT. Also, the old pages are still showing up in the SERPs.
Could it just be that this will take time, to review the new sitemap and re-index the new site?
If so, what kind of timeframes are you seeing these days for the new pages to show up in SERPs? Days, weeks? Thanks, Erin ```
-
At the moment, it doesn't seem that rel=publisher is doing all that much for sites (aside from sometimes showing better info ion the knowledge graph listing on Brand searches) but personally I believe it's functionality and influence are going to be greatly expanded fairly soon, so well worth doing. As far as it contributing anything to help speed up indexing... doubt it.
P.
-
Paul,
Thanks... you hit upon my hunch, that we will just have to wait.
Much of the information in the SERPs (metadescriptions, titles and urls) are still old,even though they redirect to the new pages when I click.
Thanks for the tip... and about social media.
Do you think it will help to get the rel=publisher link to the Google+ page on the site?
Erin
-
A lot of people, especially WP users use modules that may block certain spiders crawling your site, but in your case, you don't seem to have any.
-
If you just changed the robots.txt file yesterday, my guess is you're going to have to be patient while the site gets recrawled, Erin. Any of the pages that are in the index and were cached before yesterday's robots update will still include the directive not to include the metadescription (since that's the condition they were under when they were cached.)
I suspect the pages you're seeing with metadescriptions were crawled since the robots update. Are you seeing the same page change whether it shows metadescription or not?
As far as old pages showing in the SERPs, again they'll all have to be crawled before the 301 redirects can be discovered and the SEs can begin to understand they should be dropped. (Even then it can take days to weeks for the originals to drop out.)
Another very effective way to help get the new site indexed faster is to attract some good-quality new links to the new pages. Social Media can be especially effective for this, Google+ in particular.
Paul
-
Thanks!
What do I need to look for in the .htaccess file?
Here is what is there... and the rest (not shown) are redirects:
BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]</ifmodule> # END WordPress
BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]</ifmodule> # END WordPress
-
Thanks for the tips! Let me check it out.
-
I'd also insure its not something to do with your .htacess file.
-
Make sure the pages aren't blocked with meta robots noindex tag
Fetch as Google in WMT to request a full site recrawl.
Run brokenlinkcheck.com and see if their crawler is successfully crawling or if it's blocked.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How Important is Meta Description on Non-Entry Pages?
If a page is never used as an entry page to your website -- in other words it's an obscure, relatively unimportant page that never ranks high enough in the search engines to be in the first few pages of the search results for any significant number of searches -- does editing the META Description really have any significant benefit? I guess the question could also be phrased as, does the Google Search Algo factor in the META Description tag, or is it only used for display purposes on the search results and doesn't affect ranking?
Intermediate & Advanced SEO | | SeoJaz0 -
Best practice for disallowing URLS with Robots.txt
Hi Everybody, We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring: Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/" Color - For example: ?color=91 Price - For example: "?price=650-700" Order - For example: ?dir=desc&order=most_popular Page - For example: "?p=1&standards=704" Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/" My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled? Any advice would be appreciated!
Intermediate & Advanced SEO | | centurysafety0 -
Should I be using meta robots tags on thank you pages with little content?
I'm working on a website with hundreds of thank you pages, does it make sense to no follow, no index these pages since there's little content on them? I'm thinking this should save me some crawl budget overall but is there any risk in cutting out the internal links found on the thank you pages? (These are only standard site-wide footer and navigation links.) Thanks!
Intermediate & Advanced SEO | | GSO0 -
Weirdist Meta Description I've Seen in a SERP
For one e-commerce website, in place of the proper meta description, Google is showing a 318-character-long mix of snippets from the homepage content for the domain search (e.g. [example.com]). A brand search returns the correct meta description - as do the keywords the homepage ranks for. I know Google changes the meta description if it doesn't think it's relevant, but this one (there is only one) is and has (as far as we know) shown until now, and I've never seen such a mix of text in the SERP, and so many characters - it's picking up random text from bits of anchor text e.g. "privacy policy", title attributes from links, labels from radio buttons and more. The home page W3C validates apart from a couple of basic things like missing alt text. The only things that might be related that don't are some custom meta name tags added by the CMS - but I wouldn't think this would make any difference to whether a meta description is displayed properly or not? I've recommended we wait until tomorrow to see if Google fixes this on recrawl, but does anyone have any ideas if it doesn't? The homepage doesn't feature much standalone text, so I was thinking if we add a few extra words it might encourage Google to pick from that if it doesn't want to use the meta description. The text would have to be useful for users and fit in with the design of course, which could be awkward...
Intermediate & Advanced SEO | | Alex-Harford1 -
Whole site blocked by robots in webmaster tools
My URL is: www.wheretobuybeauty.com.auThis new site has been re-crawled over last 2 weeks, and in webmaster tools index status the following is displayed:Indexed 50,000 pagesblocked by robots 69,000Search query 'site:wheretobuybeauty.com.au' returns 55,000 pagesHowever, all pages in the site do appear to be blocked and over the 2 weeks, the google search query site traffic declined from significant to zero (proving this is in fact the case ).This is a Linux php site and has the following: 55,000 URLs in sitemap.xml submitted successfully to webmaster toolsrobots.txt file existed but did not have any entries to allow or disallow URLs - today I have removed robots.txt file completely URL re-direction within Linux .htaccess file - there are many rows within this complex set of re-directions. Developer has double checked this file and found that it is valid.I have read everything that google and other sources have on this topic and this does not help. Also checked webmaster crawl errors, crawl stats, malware and there is no problem there related to this issue.Is this a duplicate content issue - this is a price comparison site where approx half the products have duplicate product descriptions - duplicated because they are obtained from the suppliers through an XML data file. The suppliers have the descriptions from the files in their own sites.Help!!
Intermediate & Advanced SEO | | rrogers0 -
Google indexing issue?
Hey Guys, After a lot of hard work, we finally fixed the problem on our site that didn't seem to show Meta Descriptions in Google, as well as "noindex, follow" on tags. Here's my question: In our source code, I am seeing both Meta descriptions on pages, and posts, as well as noindex, follow on tag pages, however, they are still showing the old results and tags are also still showing in Google search after about 36 hours. Is it just a matter of time now or is something else wrong?
Intermediate & Advanced SEO | | ttb0 -
Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?
I currently have a site that was recently restructured, causing much of its content to be reposted, creating new URL's for each page. To avoid duplicates, all of the existing pages were added to the robots file. That said, it has now been over a week - I know Google has recrawled the site - and when I search for term X, it is stil the old page that is ranking, with the new one nowhere to be seen. I'm assuming it's a cached version, but why are so many of the old pages still appearing in the index? Furthermore, all "tags" pages (it's a Q&A site, like this one) were also added to the robots a few months ago, yet I think they are all still appearing in the index. Anyone got any ideas about why this is happening, and how I can get my new pages indexed?
Intermediate & Advanced SEO | | corp08030 -
How to fix issues regarding URL parameters?
Today, I was reading help article for URL parameters by Google. http://www.google.com/support/webmasters/bin/answer.py?answer=1235687 I come to know that, Google is giving value to URLs which ave parameters that change or determine the content of a page. There are too many pages in my website with similar value for Name, Price and Number of product. But, I have restricted all pages by Robots.txt with following syntax. URLs:
Intermediate & Advanced SEO | | CommercePundit
http://www.vistastores.com/table-lamps?dir=asc&order=name
http://www.vistastores.com/table-lamps?dir=asc&order=price
http://www.vistastores.com/table-lamps?limit=100 Syntax in Robots.txt
Disallow: /?dir=
Disallow: /?p=
Disallow: /*?limit= Now, I am confuse. Which is best solution to get maximum benefits in SEO?0