Removing robots.txt on WordPress site problem
-
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap.
Checked source code and the robots instruction has gone so a little lost. Any ideas please?
-
Hi,
I edited the robots.txt file for my website http://debtfreefrombankruptcy.com yesterday to allow search engines to crawl my site. However, Google isn't recognizing the new file and is still saying that my sitemap is blocked from search. Here is a link to the file itself:
http://www.debtfreefrombankruptcy.com/robots.txt
The Blocked URLs tester said that the file allows Google to crawl the site, but in actuality it still isn't recognizing the new file. Any advice would be appreciated. Thanks!
-
I can help you out as this issue DROVE ME NUTS.
1. I didnt have a Robots.txt (yet)
2. I had Yoast installed
3. Im pretty sure it created a Robots.txt even though it doesnt exist in my root (.com/here)
4. My Google webmaster tools shows this
User-agent: Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category//* Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /? Disallow: /?Allow: /wp-content/uploadsAllow: /assets Create a Robots.txt
1. login to wordpress 2. Click SEO in your side toolbar (Yoast WordPress Plugin settings) 3. Go to edit files under SEO (in the side toolbar)
And now you have the option to edit your Robots.txt file.
-
Hi Sophia
I just checked and see your homepage indexed in google.co.uk with a cache date of April 26th. You should be all set!
-Dan
-
Quick update - by amending the robots text file and switching sitemap plugin over to Yoast I finally got the sitemap to index without robots.txt warnings although the Home page of site was not indexed, 'oh dear'. 5 out of the 7 pages in the sitemap were indexed by Google so It's a start but some more investigating to be done on my side.
-
Dan,
Cant thank you enough! The sitemap request is still pending in Google - maybe I sent too many requests But it's time to sit back and wait for the good news hopefully. Thanks again.
-
Hi Sofia
I just ran the same validator on your sitemap and it went through fine - see screenshot
I intended to mean that you should just be sure Google Webmaster Tools accepts the sitemap as valid - if so, there's no need to run through a 3rd party validator. Apologies if I didn't state it clearly!
Let me know, but from what I can see it looks good!
-Dan
EDIT - Looking more closely, it looks like your ran the homepage through the validator - you would actually enter the sitemap address its self in the validator - http://containerforsale.co.uk/sitemap.xml
-
Hi Dan,
I followed the above advice and switched to the Yoast generated sitemap but after testing on http://www.xml-sitemaps.com/validate-xml-sitemap.html I got the following result - no idea what it means but it looks nasty...
Schema validating with XSV 3.1-1 of 2007/12/11 16:20:05Schema validator crashed
The maintainers of XSV will be notified, you don't need to
send mail about this unless you have extra information to provide.
If there are Schema errors reported below, try correcting
them and re-running the validation.Target: http://containerforsale.co.uk
(Real name: http://containerforsale.co.uk
Server: Apache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/0.9.8e-fips-rhel5 mod_bwlimited/1.4)The target was not assessedLow-level XML well-formedness and/or validity processing output
Warning: Undefined entity raquo
in unnamed entity at line 16 char 83 of http://containerforsale.co.uk
Warning: Undefined entity nbsp
in unnamed entity at line 160 char 10 of http://containerforsale.co.uk
Error: Expected ; after entity name, but got =
in unnamed entity at line 274 char 631 of http://containerforsale.co.u -
Sofia
You are using Yoast SEO plugin for WordPress, so use the XML sitemap within Yoast. You don't need a separate plugin for the XML sitemap. And yes, within Yoast turn the sitemap on.
Hope that helps!
-Dan
-
Indeed, thanks everyone - it's really appreciated!
I have updated the robots.txt as indicated and re submitted site map but looks like Google still has problems with my site since the error warning for robots is there after the processing is done.
Quick question - I am using a plugin called Google XML Sitemaps which has the following tick box option.
'Add sitemap URL to the virtual robots.txt file'.
The virtual robots.txt generated by WordPress is used. A real robots.txt file must NOT exist in the blog directory!'Should this box be ticked or un-ticked please? Fyi I currently don't have the box ticked.
-
Thanks guys for all the responses and helping!
Three Things to try
1.Fix Robots.txt
Sofia - I just checked your robots.txt now and it reads;
User-agent: * Disallow: Sitemap: http://containerforsale.co.uk/sitemap.xml.gz
- with the sitemap on the same line as disallow - I'd check on that and make sure its on a separate line.
- ALSO, you don't need the .gz on the sitemap file just sitemap.xml
2. Re-submit Sitemap
- RESUBMIT your sitemap to webmaster tools and make sure its valid.
3. Submit URL to Webmaster Tools (only last resort)
this is only last case scenario, shouldn't have to do this on the homepage if everything is correct.
- go to fetch as googlebot ->run the fetch ->then submit URL
- do this for the homepage
- see article on google blog for reference
Let us know if you're all set, thanks!
-Dan
-
Ok thanks Brent, I changed to
User-agent: *
Disallow:
Sitemap: http://containerforsale.co.uk/sitemap.xml.gz
Guess I will just have to wait for Google to refresh now...
-
yes, the urls being blocked are includes from your Wordpress program.
-
Thanks for the heads up.
The warning just says 7 Url''s blocked by robots.txt. - have seen this issue posted on the WordPress boards by others but no real insight into solutions.
Perhaps I should try your idea of
Change the robots.txt file to this:
User-agent *
Disallow:
-
Well there is a robots.txt file. You can view it here: http://containerforsale.co.uk/robots.txt
What warnings are you getting in your sitemap submission area? It appears to look alright: http://containerforsale.co.uk/sitemap.xml But I tried to validate it and got a 504 Gateway Time-out error. http://www.xml-sitemaps.com/index.php?op=validate-xml-sitemap&go=1&sitemapurl=http%3A%2F%2Fcontainerforsale.co.uk%2Fsitemap.xml&submit=Validate
-
Its weird, the front page warning on Google webmaster for robots has disappeared now, but still got the warnings in the sitemap submission area. My host suggests I just wait a bit longer for Google to update because he said same as you - that there doesn't seem to be any robot.txt file.
-
Doesn't appear to be blocked, so maybe it has something to do with your /wp-includes/ directory.
Change the robots.txt file to this:
User-agent *
Disallow:
-
Hey Guys,
Thanks for your replies...the domain is http://containerforsale.co.uk ,My host told me to look in the Public HTML file folder for the robots.txt file and just delete it but can't see it in there?
My host said he found a tester site and it doesn't report any issues:
http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php
This is the display I get from http://containerforsale.co.uk/robots.txt
User-agent *
Disallow: /wp-admin/
Disallow: /wp-includes/ -
Hi Sofia,
Two things you need to consider when troubleshooting this:
The actual robots.txt file (located in the root directory of your site) and the meta-robots tags in the section of your HTML. When you say you checked the source code and the robots instructions were missing, I think you were talking about the meta-robots tags in the actual HTML of your site.
Webmaster Tools is probably referring to the actual robots.txt file in your domain's root path, which would differ entirely and not be visible by checking the HTML on your site. Like Nakul and Brent said, if you'll let us know your site's URL and paste the content of your robots.txt file here, I'm sure one of us can help you resolve the problem fairly quickly.
Thanks!
Anthony
-
copy whatever you have in your robots.txt file here and we will tell you the issue.
SEOmoz has a great article about Robots.txt files here: http://www.seomoz.org/learn-seo/robotstxt
-
The robots.txt would probably not be a part of the Wordpress Configuration. Allow indexing is controlled via Meta Data by the Wordpress Architecture.
I would look for something like this in yourdomain.com/robots.txt
disallow /
or something like that. If that does not help, PM me your site URL and I would be glad to look it up for you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
PortfolioID urls appearing in my wordpress site- what to do?
Hey guys, Hoping someone may have some advice on a wordpress site. Most of their URL's are duplicates due to a PortfolioID appearing in the URLs causing a duplicate title tags
Technical SEO | | Swanny_s
It's the same page but it's being flagged as duplicate. Would you remove the portfolioID url or 301 redirect? Many thanks
Simon0 -
Removing extension
Hi there, I have thousands of pages that use the extension .php and thought about cleaning these URLs up for example rather than www.mysite.com/trainers/adidias/samba.php I could have have www.mysite.com/trainers/adidas/samba/ is it worth changing? Currently driving around 1 million visits to the website each month from organic search and slightly concerned it could have a negative impact on my rankings. Thanks for any help.
Technical SEO | | Paul780 -
How to remove a thin site penalty
Wondering if anyone could help out. A while back I made an affiliate store using wordpress and merchants products feeds. I didn't get found to adding any unique content to the site and, as was to be expected, I gained a penalty and my search traffic died. A few months back I redesigned the store, still using merchant csv but now with 98% unique content on each page. However, try as I may I still cannot get anywhere in the engines. The domain doesn't even rank for it's own name!! I have submitted reconsideration request but they have replied saying no penalty on the site. The domain is www.digitalcatwalk.co.uk. While the domain isn't massively strong I would prefer not to have to start again as I feel it is a very good domain name. Any advise would be most gratefully received. Thanks Carl
Technical SEO | | GrumpyCarl0 -
Problem? Use no follow with paid advertisers ? Or Duplicate site www.
I recently changed some content and added a few advertisers on my real estate site. then ... my traffic stopped! I thought it was possible duplicate indexpage.. can I just redirect index.html? I read the post about link dillution from today. The a site cape cod realtor.co since adding a few sponsors I noticed I lost some rank especiall for key word cape cod realtor. Im not showing in top 100 anymore with big "G" and I was #4. It also removed my G places rank I was #4 .. I shop 40 links in bing nothing in google that I can see from mozilla tool... thanks- J
Technical SEO | | Capecod0 -
Is it terrible to not have robots.txt ?
I was under the impression that you really should have a robots.txt page, and not having one is pretty bad. However, hubspot (which I'm not impressed with) does not have the capability of properly implementing one. Will this hurt the site?
Technical SEO | | StandUpCubicles1 -
Removing a site from Google's index
We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?
Technical SEO | | issuebasedmedia0 -
.CA site same as .com site - are both necessary?
Dear Friend, We representa a major national brand in the auto care industry, and they have locations in both US and Canada. There is a primary content site at .com that we have duplicated at .ca. We are hosting the .ca site on a separate IP on a server in Canada - but by in large it is the same site. (there are some minor changes we made to change US English to Canadian English - though minor. When we search Google.ca we generally see strong search results for the .com site, but rarely, if ever any evidence of rankings for the .ca site. The .com site was launched several years ago about 18 months before the .ca site. Why doesn't Google.ca show the .ca site? Is this an issue of duplicate content, and Google.ca simply shows the .com version which it knew about first? Are we wasting our time, money and efforts having both? Thanks, Tim ps. this isn't about location. We use a separate site to locate local shops, and have coordinated that well with Google Places, and when looking for local auto care - we do well in both US and Canada. The sites described above are largetl content sites.
Technical SEO | | lunavista-comm0