What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
-
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file.
Other than perhaps using up crawl budget, are there any other negative implications?
-
I highly doubt it would effect rankings due to low quality issues but it will show that you have site map error warnings in your GWT console. That issue is technically classified as 'Warnings' and not 'Errors'. The right thing to do in that scenario is take the robots.txt block off and just use a 'noindex' tag on the pages. That way they can stay in the site map but they won't show up in the index. Otherwise you should remove them from the sitemap if you don't want the warnings in GWT.
-
I personally do not think there is any penalty SEO wise in doing it. Although, I do think it will mess up the metric in GWT that shows how many pages have been submitted and how many have been indexed. I find that metric useful, so it would make it no longer useful if there are a lot of pages blocked by the robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google selecting incorrect URL as canonical: 'Duplicate, submitted URL not selected as canonical'
Hi there, A number of our URLs are being de-indexed by Google. When looking into this using Google Search Console the same message is appearing on multiple pages across our sites: 'Duplicate, submitted URL not selected as canonical' 'IndexingIndexing allowed? YesUser-declared canonical - https://www.mrisoftware.com/ie/products/real-estate-financial-software/Google-selected canonical - https://www.mrisoftware.com/uk/products/real-estate-financial-software/'Has anyone else experienced this problem?How can I get Google to select the correct, user-declared canoncial? Thanks.
Technical SEO | | nfrank0 -
Folders in url structure?
Hello, Revamping an out-of-date website and am wondering if I need to include the folders (categories) in the url structure? The proposed structure has 8 main folders. I've been reading that Google is ok if the folder is not included in the url, but is it really? The hesitation I have is that the urls are getting long and the main folder only has only a sub folder beneath it. So, /folder-name/facility-name/treatment-overview. This looks too long, doesn't it? Thanks!
Technical SEO | | lfrazer1230 -
Robots.txt & meta noindex--site still shows up on Google Search
I have set up my robots.txt like this: User-agent: *
Technical SEO | | RoxBrock
Disallow: / and I have this meta tag in my on a Wordpress site, set up with SEO Yoast name="robots" content="noindex,follow"/> I did "Fetch as Google" on my Google Search Console My website is still showing up in the search results and it says this: "A description for this result is not available because of this site's robots.txt" This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.1 -
I want to resubmit sitemap
I am doing major changes in my website some of my old url pages i don't want them to be indexed or submitted in site map some of other old pages i want to keep them and there is new pages any one can give me hints what should i do also I have thousands of pages on my website and I don't want to submit all my pages i want to submit best pages to google in sitemap that why i want to resubmit new site maps
Technical SEO | | Jamalon0 -
I have a sub domain that has live content on it but the root domain redirects to another URL. I know this is not great but what are the implications?
I have a subdomain that is populated and has content. The root domain that the sub lives on redirects to an entirely different URL. I am trying to make a case as to why this isn't great besides the fact that it is just weird user experiences. What are the SEO implications etc. Would any equity that gets built up on the subdomain get passed along in the redirect? Or will there be indexation issues with Google? Cheers, Mark
Technical SEO | | mjsikorsky0 -
Adding multi-language sitemaps to robots.txt
I am working on a revamped multi-language site that has moved to Magento. Each language runs off the core coding so there are no sub-directories per language. The developer has created sitemaps which have been uploaded to their respective GWT accounts. They have placed the sitemaps in new directories such as: /sitemap/uk/sitemap.xml /sitemap/de/sitemap.xml I want to add the sitemaps to the robots.txt but can't figure out how to do it. Also should they have placed the sitemaps in a single location with the file identifying each language: /sitemap/uk-sitemap.xml /sitemap/de-sitemap.xml What is the cleanest way of handling these sitemaps and can/should I get them on robots.txt?
Technical SEO | | MickEdwards0 -
301 an old URL with a ? in the URL?
I am redoing a site and the URL's are changing structure. The client's site was in magento and in the store they would get two URLs, for example: /store/categoryname/productname and /store/categoryname/productname?SID=dslkajsfdoiu947598whouieht983hg98 Do I have to 301 redirect both of these URL's to their new counterpart? Both go to the same content but magento seemed to add these SIDs into the navigation and Google has both versions in the index.
Technical SEO | | DanDeceuster0 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0