Issue with Robots.txt file blocking meta description
-
Hi,
Can you please tell me why the following error is showing up in the serps for a website that was just re-launched 7 days ago with new pages (301 redirects are built in)?
A description for this result is not available because of this site's robots.txt – learn more.
Once we noticed it yesterday, we made some changed to the file and removed the amount of items in the disallow list.
Here is the current Robots.txt file:
# XML Sitemap & Google News Feeds version 4.2 - http://status301.net/wordpress-plugins/xml-sitemap-feed/ Sitemap: http://www.website.com/sitemap.xml Sitemap: http://www.website.com/sitemap-news.xml User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Other notes... the site was developed in WordPress and uses that followign plugins:
- WooCommerce All-in-One SEO Pack
- Google Analytics for WordPress
- XML Sitemap
- Google News Feeds
Currently, in the SERPs, it keeps jumping back and forth between showing the meta description for the www domain and showing the error message (above).
Originally, WP Super Cache was installed and has since been deactivated, removed from WP-config.php and deleted permanently.
One other thing to note, we noticed yesterday that there was an old xml sitemap still on file, which we have since removed and resubmitted a new one via WMT. Also, the old pages are still showing up in the SERPs.
Could it just be that this will take time, to review the new sitemap and re-index the new site?
If so, what kind of timeframes are you seeing these days for the new pages to show up in SERPs? Days, weeks? Thanks, Erin ```
-
At the moment, it doesn't seem that rel=publisher is doing all that much for sites (aside from sometimes showing better info ion the knowledge graph listing on Brand searches) but personally I believe it's functionality and influence are going to be greatly expanded fairly soon, so well worth doing. As far as it contributing anything to help speed up indexing... doubt it.
P.
-
Paul,
Thanks... you hit upon my hunch, that we will just have to wait.
Much of the information in the SERPs (metadescriptions, titles and urls) are still old,even though they redirect to the new pages when I click.
Thanks for the tip... and about social media.
Do you think it will help to get the rel=publisher link to the Google+ page on the site?
Erin
-
A lot of people, especially WP users use modules that may block certain spiders crawling your site, but in your case, you don't seem to have any.
-
If you just changed the robots.txt file yesterday, my guess is you're going to have to be patient while the site gets recrawled, Erin. Any of the pages that are in the index and were cached before yesterday's robots update will still include the directive not to include the metadescription (since that's the condition they were under when they were cached.)
I suspect the pages you're seeing with metadescriptions were crawled since the robots update. Are you seeing the same page change whether it shows metadescription or not?
As far as old pages showing in the SERPs, again they'll all have to be crawled before the 301 redirects can be discovered and the SEs can begin to understand they should be dropped. (Even then it can take days to weeks for the originals to drop out.)
Another very effective way to help get the new site indexed faster is to attract some good-quality new links to the new pages. Social Media can be especially effective for this, Google+ in particular.
Paul
-
Thanks!
What do I need to look for in the .htaccess file?
Here is what is there... and the rest (not shown) are redirects:
BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]</ifmodule> # END WordPress
BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L]</ifmodule> # END WordPress
-
Thanks for the tips! Let me check it out.
-
I'd also insure its not something to do with your .htacess file.
-
Make sure the pages aren't blocked with meta robots noindex tag
Fetch as Google in WMT to request a full site recrawl.
Run brokenlinkcheck.com and see if their crawler is successfully crawling or if it's blocked.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 Question - issue
A while back we had a 'bleed' on one of our sites, which basically meant one of our sites started to leak across pages to another and that site started to rank for the same pages and now we have hundreds of pages ranking for urls that do not exists. It's hard to explain, bare with me. If you were to click on the cached view in Google for the ranked page it would show you the main site, but if you were to click it as usual, then you would be taken to the site but a 404 would show as the intended page was not for that site. We believe we fixed the 'bleed' and have setup 301s for all the affected pages to go to the home page for the site it affected. But these pages have not been removed from Google, which we thought a 301 would do. So we still have hundreds of pages being ranked but are redirected to the home page. Why hasn't these pages been removed?
Intermediate & Advanced SEO | | JH_OffLimits0 -
Robots.txt question
I notice something weird in Google robots. txt tester I have this line Disallow: display= in my robots.text but whatever URL I give to test it says blocked and shows this line in robots.text for example this line is to block pages like http://www.abc.com/lamps/floorlamps?display=table but if I test http://www.abc.com/lamps/floorlamps or any page it shows as blocked due to Disallow: display= am I doing something wrong or Google is just acting strange? I don't think pages with no display= are blocked in real.
Intermediate & Advanced SEO | | rbai0 -
University website outbound links issue
Hi - I'm working on a university website and have found a load of (1) outbound links to companies that have commercial tie ups to the university and, beyond that, loads of (2) outbound links to companies set up by alumni and (3) outbound links to commercial clients of the university. Your opinions on whether I should nofollow these, or not, would be welcome. At the moment I'm tempted to nofollow (1) yet leave (2) and (3) - quite simply because the (1) backlinks may have been negotiated as part of a package (nobody can actually remember at the university!), yet (2) and (3) were freely given by the university. Your thoughts would be welcome!
Intermediate & Advanced SEO | | McTaggart0 -
Meta Tags (again)
Hey, I know this has been discussed to death but look back through previous postings there doesn't seem to be a consensus on the exact Meta tags that an eCommerce site should include, specifically whether to remove the keyword tag or not since it is believed that Yahoo potentially still makes use of it. Currently our homepage has the following Meta Tags: <title>Buy Printer Cartridges | Ink and Toner Cartridge for Inkjet and Laser Printers</title> Description" content="<a class="attribute-value">Visit Refresh Cartridges for great prices on ink cartridges, toner cartridges, ink, printers and accessories.</a>" /> Keywords" content="<a class="attribute-value">ink cartridges, cheap cartridges, inkjet cartridges, inkjet ink cartridges, ink cartridge, printer ink cartridges, laser cartridges, toner, laser printers</a>" /> Content-Type" content="<a class="attribute-value">text/html; charset=iso-8859-1</a>"/> author" content="<a class="attribute-value">Ink Cartridges, Inkjet Cartridge, Printer Cartridge, Toner Cartridges Refresh Cartridges</a>" /> expires" content="<a class="attribute-value">0</a>" /> robots" content="<a class="attribute-value">noodp,index,follow</a>" /> Language" content="<a class="attribute-value">English</a>" /> Cache-Control" content="<a class="attribute-value">Public</a>" /> verify-v1" content="<a class="attribute-value">sJXqAAWP6ar/LTEOMyUgG6nqothxk62tJTid+ryBJxo=</a>" /> viewport" content="<a class="attribute-value">width=1024</a>" /> This is too messy but before I do something drastic that I'll possibly regret please can you confirm that, in your opinion, I am best to remove everything with the exception of this: <title>Buy Printer Cartridges | Ink and Toner Cartridge for Inkjet and Laser Printers</title> Description" content="<a class="attribute-value">Visit Refresh Cartridges for great prices on ink cartridges, toner cartridges, ink, printers and accessories.</a>" /> Content-Type" content="<a class="attribute-value">text/html; charset=iso-8859-1</a>"/>
Intermediate & Advanced SEO | | ChrisHolgate
viewport" content="<a class="attribute-value">width=1024</a>" /> I realise there is a verify-v1 tag in there but this can be done through a file on our server so while cleaning up that might as well go. Would there be an argument for keeping any of the other tags or are they all pretty much redundant now? Many thanks! Chris0 -
Crawling issue
Hello, I am working on 3 weeks old new Magento website. On GWT, under index status >advanced, I can only see 1 crawl on the 4th day of launching and I don't see any numbers for indexed or blocked status. | Total indexed | Ever crawled | Blocked by robots | Removed |
Intermediate & Advanced SEO | | sedamiran
| 0 | 1 | 0 | 0 | I can see the traffic on Google Analytic and i can see the website on SERPS when i search for some of the keywords, i can see the links appear on Google but i don't see any numbers on GWT.. As far as I check there is no 'no index' or robot block issue but Google doesn't crawl the website for some reason. Any ideas why i cannot see any numbers for indexed or crawled status on GWT? Thanks Seda | | | | |
| | | | |0 -
Robots Disallow Backslash - Is it right command
Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,
Intermediate & Advanced SEO | | Modi0 -
Duplicate on page content - Product descriptions - Should I Meta NOINDEX?
Hi, Our e-commerce store has a lot of product descriptions duplicated - Some of them are default manufacturer descriptions, some are descriptions because the colour of the product varies - so essentially the same product, just different colour. It is going to take a lot of man hours to get the unique content in place - would a Meta No INDEX on the dupe pages be ok for the moment and then I can lift that once we have unique content in place? I can't 301 or canonicalize these pages, as they are actually individual products in their own right, just dupe descriptions. Thanks, Ben
Intermediate & Advanced SEO | | bjs20101 -
.htaccess files
I am working with a clients website which has multiple htaccess files (.htaccess , .htaccess.holiding, and .htaccess.live -all in the same directory) My question is how does a server process these files? All 3 files? Currently the domain has 301 redirect showing for the home page to the mobile site (which is a problem) in one of the files (.htaccess but not others) Has anyone come across this before with regard to SEO problems?
Intermediate & Advanced SEO | | OnlineAssetPartners0