Sitemap Contains Blocked Resources

ATP

Hey Mozzers,

I have several pages on my website that are for user search purposes only. They sort some products by range and answer some direct search queries users type into the site. They are basically just product collections that are else ware grouped in different ways.

As such I didn't wants SERPS getting their hands on them so blocked them in robots so I could add then worry free. However, they automatically get pulled into the sitemap by Magento.

This has made Webmaster tools give me a warning that 21 urls in the sitemaps are blocked by robots.

Is this terrible SEO wise?

Should I have opted to NOINDEX these URLS instead? I was concerned about thin content so really didnt want google crawling them.

ATP

Thanks for the latest responses guys

I have researched it into the grave and it the way Magento generates the sitemap makes it impossible for me to exclude these URLS.

I will just unblock them from robots, and make them all noindex. This seems to solve all problems, i will then block them when im 100% sure they are unindexed.

Thanks Again chaps.

Big help as always.

Andy.Drinkwater

OK so first because some are indexed, if you block access, they will never be removed.

What you will need to do is add a noindex tag to the pages but don't block access to them so that Google can honour the noindex. Remove the pages via Search Console and once you have confirmed these are all removed from the index, you will be good to then block access via robots.txt.

As CleverPhD said, ideally you don't want pages in the index that can't be crawled, but it isn't likely to cause a penalty of any sort (I have a client with about 70-80 blocked - long story - no issues in 12 months) if you are stuck because of Megento - Perhaps research to see how others have got around this?

-Andy

CleverPhD

I would recommend that you try and get those pages out of your sitemap. If you look through the Google sitemap best practices, it states that the sitemap should be for pages that Googlebot can access.

http://googlewebmastercentral.blogspot.com/2014/10/best-practices-for-xml-sitemaps-rssatom.html

URLs

URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:

Only include URLs that can be fetched by Googlebot. **A common mistake is **including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don't exist.

ATP

Hi Andy,

I just checked and yes they were previously index'd and some of them still are.

Andy.Drinkwater

Hi,

Is this terrible SEO wise?

Not really - it just means that Google can see that there is a page they can't access so are informing you of this. There is no negative penalty that is going to come from this. If there were old pages that are now 404's then it would be a different story.

I just want to be sure of something - were the pages previously open to Google? Are they currently indexed?

-Andy

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Sitemap Contains Blocked Resources

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Desktop & Mobile XML Sitemap Submitted But Only Desktop Sitemap Indexed On Google Search Console

Will it be possible to point diff sitemap to same robots.txt file.

Automate XML Sitemaps

Sitemap

Block bad crawlers

Host sitemaps on S3?

XML Sitemap without PHP

How does a sitemap affect the definition of canonical URLs?