How to stop robots.txt restricting access to sitemap?
-
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself?
The robots.txt page shows
User-agent: * Disallow: / And then sitemap: with the correct sitemap link
-
Hi there
Right now, you're telling crawlers to not crawl your entire site, so the sitemap XML would be included in that. Are you wanting your site to be crawled completely? Simply change the robots.txt to this...
User-agent: *
Allow: /Here is another great resource from SEOBook to check out!
Hope this helps! Good luck!
Patrick
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
My Website stopped being in the Google Index
Hi there, So My website is two weeks old, and I published it and it was ranking at about page 10 or 11 for a week maybe a bit longer. The last few days it dropped off the rankings, which I assumed was the google algorithm doing its thing but when I checked Google Search Console it says my domain is not in the index. 'This page is not in the index, but not because of an error. See the details below to learn why it wasn't indexed.' I click request indexing, then after a bit, it goes green saying it was successfully indexed. Then when I refresh the website it gives me the same message 'This page is not in the index, but not because of an error. See the details below to learn why it wasn't indexed.' Not sure why it says this, any ideas or help is appreciated cheers.
Technical SEO | | sydneygardening0 -
Sitemap For Static Content And Blog
We'll be uploading a sitemap to google search console for a new site. We have ~70-80 static pages that don't really chance much (some may change as we modify a couple pages over the course of the year). But we have a separate blog on the site which we will be adding content to frequently. How can I set up the sitemap to make sure that "future" blog posts will get picked up and indexed. I used a sitemap generator and it picked up the first blog post that's on the site, but am wondering what happens with future ones? I don't want to resubmit a new sitemap each time that has a link to a new blog post we posted.
Technical SEO | | vikasnwu0 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
Summarize your question.Sitemap blocking or not blocking that is the question?
Hi from wet & overcast wetherby UK 😞 Ones question is this... " Is the sitemap plus boxes blocking bots ie they cant pass on this page http://www.langleys.com/Site-Map.aspx " Its just the + boxes that concern me, i remeber reading somewherte javascript nav can be toxic. Is there a way to test javascript nav set ups and see if they block bots or not? Thanks in advance 🙂
Technical SEO | | Nightwing0 -
Sitemap for 170 K webpages
I have 170 K pages on my website which I want to be indexed. I have created a multiple HTML sitemaps (e.g. sitemap1.html, sitemap2.html,...etc) with each sitemap page having 3000 links. Is this right approach or should i switch to xml based sitemaps and that too multiple one. Please suggest.
Technical SEO | | ArtiKalra0 -
Robots.txt question
Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_20050