I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
-
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
-
There are standards for the sitemaps .txt and .xml sitemaps, where there are no standards for html varieties. Neither guarantees the listed pages will be crawled, though. HTML has some advantage of potentially passing pagerank, where .txt and .xml varieties don't.
These days, xml sitemaps may be more common than .txt sitemaps but both perform the same function.
-
yes, sitemap.txt is blocked for some strange reason. I know SEOs do this sometimes for various reasons, but in this case it just doesn't make sense - not to me, anyway.
-
Thanks for the useful feedback Chris - much appreciated - Is it good practice to use both - I guess it's a good idea if onsite version only includes top-level pages? PS. Just checking nature of block!
-
Luke,
The .php one would have been created as a navigation tool to help users find what they're looking for faster, as well as to provide html links to search engine spiders to help them reach all pages on the site. On small sites, such sitemaps often include all pages of the site, on large ones, it might just be high level pages. The .txt file is non html and exists to provide search engines with a full list of urls on the site for the sole purpose of helping search engines index all the site's pages.
The robots.txt file can also be used to specify the location of the sitemap.txt file such as
sitemap: http://www.example.com/sitemap_location.txt
Are you sure the sitemap is being blocked by the robots.txt file or is the robots.txt file just listing the location of the sitemap.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website can't break into Google Top100 for main keywords, considering 301 Redirect to a new domain
A little background on our case. Our website, ex: http://ourwebsite.com was officially live in December 2015 but it wasn't On-Site optimized and we haven't done any Off-site SEO to it. In April we decided to do a small redesign and we did it an online development server. Unfortunately, the developers didn't disallow crawlers and the website got indexed while we were developing it on the development server. The development version that got indexed in Google was http://dev.web.com/ourwebsite We learned that it got indexed when we migrated the new redesigned website to the initial domain. When we did the migration we decided to add www and now it looks like: http://www.ourwebsite.com Meanwhile, we deleted the development version from the development server and submitted "Remove outdated content" from the development server's Search Console. This was back in early May. It took about 15-20 days for the development version to get de-indexed and around 30 days for the original website (http://www.ourwebsite.com) to get indexed. Since then we have started our SEO campaign with Press Releases, Outreach to bloggers for Guest and Sponsored Posts etc. The website currently has 55 Backlinks from 44 Referring domains (ahrefs: UR25, DR37) moz DA:6 PA:1 with various anchor text. We are tracking our main keywords and our brand keyword in the SERPs and for our brand keyword we are position #10 in Google, but for the rest of the main (money) keywords we are not in the Top 100 results in Google. It is very frustrating to see no movement in the rankings for the past couple of months and our bosses are demanding rankings and traffic. We are currently exploring the option of using another similar domain of ours and doing a complete 301 Redirect from the original http://www.ourwebsite.com to http://www.ournewebsite.com Does this sound like a good option to you? If we do the 301 Redirect, will the link-juice be passed from the backlinks that we already have from the referring domains to the new domain? Or because the site seems "stuck," would it not pass any power to the new domain? Also, please share any other suggestions that we might use to at least break into the Top 100 results in Google? Thanks.
Intermediate & Advanced SEO | | DanielGorsky0 -
Application & understanding of robots.txt
Hello Moz World! I have been reading up on robots.txt files, and I understand the basics. I am looking for a deeper understanding on when to deploy particular tags, and when a page should be disallowed because it will affect SEO. I have been working with a software company who has a News & Events page which I don't think should be indexed. It changes every week, and is only relevant to potential customers who want to book a demo or attend an event, not so much search engines. My initial thinking was that I should use noindex/follow tag on that page. So, the pages would not be indexed, but all the links will be crawled. I decided to look at some of our competitors robots.txt files. Smartbear (https://smartbear.com/robots.txt), b2wsoftware (http://www.b2wsoftware.com/robots.txt) & labtech (http://www.labtechsoftware.com/robots.txt). I am still confused on what type of tags I should use, and how to gauge which set of tags is best for certain pages. I figured a static page is pretty much always good to index and follow, as long as it's public. And, I should always include a sitemap file. But, What about a dynamic page? What about pages that are out of date? Will this help with soft 404s? This is a long one, but I appreciate all of the expert insight. Thanks ahead of time for all of the awesome responses. Best Regards, Will H.
Intermediate & Advanced SEO | | MarketingChimp100 -
Why some pages show schema and some don't in Google?
I notice Google displays the schema(reviews, price, availability etc.) in results only for some of our item pages in same category using same template. Any ideas why this is happening. They are created around same time - more than a year ago. Schema was also added a year ago.
Intermediate & Advanced SEO | | rbai0 -
Multiple Sitemaps Vs One Sitemap and Why 500 URLs?
I have a large website with rental listings in 14 markets, listings are added and taken off weekly if not daily. There are hundreds of listings in each market and all have their own landing page with a few pages associated. What is the best process here? I could run one sitemap and make each market's landing page .8 priority in the sitemap or make 14 sitemaps for each market and then have one sitemap for the general and static pages. From there, what would be the better way to structure? Should I keep all the big main landing pages in the general static sitemap or have them be at the top of the market segmented sitemaps? Also, I have over 5,000 urls, what is the best way to generate a sitemap over 500 urls? Is it necessary?
Intermediate & Advanced SEO | | Dom4410 -
Can I consolidate tasty link juice from several categories to one?
I have two categories currently "Men's Christian Jewellery" and "Women's Christian Jewellery" but neither pick up search engine traffic as well as just "Christian Jewellery" as a unisex category. My question is this; if I create a new category "Christian Jewellery" but then remove the two others and create 301 redirects from them to this new category, will this transfer all of the juice from the other pages to this new one? Thanks in advance for any replies! 🙂
Intermediate & Advanced SEO | | acecream0 -
I'm facinated by SEO but the truth is, I don't have the time to do it. Who can I hire?
I'm facinated by SEO but the truth is, I don't have the time to do it. I trust the moz community more than some of those other SEO forums out there so I'm asking you all, where can I go to find a good SEO firm who's affordable enough for a small startup? The next part of the question is, what should I expect to pay for services that will really make a difference? Please don't spam this thread....I seriously just want an honest opinion as to where I can find some credible help.
Intermediate & Advanced SEO | | Chaz880 -
Robots.txt disallow subdomain
Hi all, I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only? Thanks in advance!
Intermediate & Advanced SEO | | Partouter0 -
Duplicate block of text on category listings
Fellows, We are deciding whether we should include our category description on all pages of the category listing - for example; page 1, page 2, page 3... The category description is currently a few paragraphs of text that sits on page 1 of the category only at present. It also includes an image (linked to a large version of it) with appropriate ALT text. Would we benefit from including this introductory text on the rest of the pages in the category? Or should we leave it on the first page only? Would it flag up duplicate signals? Ideas please! Thanks.
Intermediate & Advanced SEO | | Peter2640