Have I constructed my robots.txt file correctly for sitemap autodiscovery?
-
Hi,
Here is my sitemap:
User-agent: *
Sitemap: http://www.bedsite.co.uk/sitemaps/sitemap.xml
Directories
Disallow: /sendfriend/
Disallow: /catalog/product_compare/
Disallow: /media/catalog/product/cache/
Disallow: /checkout/
Disallow: /categories/
Disallow: /blog/index.php/
Disallow: /catalogsearch/result/index/
Disallow: /links.htmlI'm using Magento and want to make sure I have constructed my robots.txt file correctly with the sitemap autodiscovery?
thanks,
-
Hey thanks for the response. There are about 14,000 url's in the sitemap. It shouldn't freeze up - please would you try again.
http://www.bedsite.co.uk/sitemaps/sitemap.xml
I know what you mean about the allow all
-
Also, here is the best place to answer your questions.
From Google: "The Test robots.txt tool will show you if your robots.txt file is accidentally blocking Googlebot from a file or directory on your site, or if it's permitting Googlebot to crawl files that should not appear on the web. " You can find it here
-
The robots.txt looks fine. I always add an allow all, even knowing it is not necessary but it makes me feel better lol.
The problem you have is with the sitemap itself. How big is it? I cannot tell how many links you have because it locks up every time I go to it in both chrome and firefox.
I tried to send a tool that is designed to pull sitemaps as the SERPS do and it also freezes up.
How many links do you have?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking pages from Moz and Alexa robots
Hello, We want to block all pages in this directory from Moz and Alexa robots - /slabinventory/search/ Here is an example page - https://www.msisurfaces.com/slabinventory/search/granite/giallo-fiesta/los-angeles-slabs/msi/ Let me know if this is a valid disallow for what I'm trying to. User-agent: ia_archiver
Technical SEO | | Pushm
Disallow: /slabinventory/search/* User-agent: rogerbot
Disallow: /slabinventory/search/* Thanks.0 -
Protecting sitemaps - Good idea or humbug?
Is there a way to protect your sitemap.xml so that only Google can read it and would it make sense to do this?
Technical SEO | | Roverandom0 -
NGINX 301 configuration - it is correct?
I'm totally not an expert in Technical Seo... but i am worry that my server admin neither is. Below is his vhost configuration, anyone can check this? it's this correct and SEO friendly? server { listen *:80; server_name domainaddress.pl domainaddress.com.pl; root /home/www/domainaddress.pl/web; index index.html index.htm key-words.php index.php index.cgi index.pl index.xhtml; location /key {
Technical SEO | | Nemo85
rewrite ^/key-words/$ http://domainaddress.pl/ permanent;
rewrite ^/key-words.php$ http://domainaddress.pl/ break;
} location / {
if ($http_host ~ "^www.domainaddress.pl"){
rewrite ^(.*)$ http://domainaddress.pl/$1 permanent;
} rewrite ^/key-words.php$ http://domainaddress.pl/ permanent;
} }0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Sitemaps
Hi, I have doubt using sitemaps My web page is a news we page and we have thousands of articles in every section. For example we have an area that is called technology We have articles since 1999!! So the question is how can Make googl robot index them? Months ago when you enter the section technology we used to have a paginator without limits, but we notice that this query consume a lot of CPU per user every time was clicked. So we decide to limit to 10 pages with 1 records. Now it works great BUT I can see in google webmaster tools that our index decreased dramatically The answer is very easy, the bot doesn't have a way to get older technoly news articles because we limit he query to 150 records total Well, the Questin is how can I fix this? Options: 1) leave the query without limits 2) create a new button " all tech news" with a different query without a limit but paginated with (for example) 200 records each page 3) Create a sitemap that contain all the tech articles Any idea? Really thanks.
Technical SEO | | informatica8100 -
Does anyone know a sitemap generation tool that updates your sitemap based on changes on your website?
We have a massive site with thousands of pages which we update everyday. Is there a sitemap generator that can create google sitemaps on the fly and change only based on changes in the site? Our site is much too large to create new sitemaps on regular basis. Is there a tool that will run on server that does this automatically?
Technical SEO | | gwynethmarta0 -
Omitting URLs from XML Sitemap - Bad??
Hi all, We are working on an extremely large retail site with some major duplicate content issues that we are in the process of remedying. The site also does not currently have an XML sitemap. Would it be advisable to create a small XML sitemap with only the main category pages for the time being, and then after our duplicate content issues are resolved, uploading the complete sitemap? Or should we wait to upload anything until all work is complete down to the product page level and canonicals are in place? Will uploading a incomplete sitemap be fraudulent or misleading in the eyes of the search engines and prompt a penalty, or would having at least the main pages mapped while we continue work be okay? Please let me know if more info is needed to answer! Thanks in advance!
Technical SEO | | seo320 -
Robots.txt usage
Hey Guys, I am about make an important improvement to our site's robots.txt we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view. We donot want gallery pages to get indexed and want to save our crawl budget for more important pages. this is one example of our site: http://www.holiday-rentals.co.uk/France/r31.htm When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m". http://www.holiday-rentals.co.uk/France/r31.htm?view=l http://www.holiday-rentals.co.uk/France/r31.htm?view=g http://www.holiday-rentals.co.uk/France/r31.htm?view=m Now my question is: I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too? Will be very thankful if yo look into this for us. Many thanks Hassan I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
Technical SEO | | holidayseo0