RegEx help needed for robots.txt potential conflict

MSTJames

I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both:

Allow: /*?p=

and

Disallow: /?p=&

I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number?

I've looked at several resources and there is practically no reference to what "&" does...

Can anyone shed any light on this, to ensure I am allowing suitable access to a shop?

Thanks in advance for any assistance

Marcus_Miller

Hey James

It looks to me like you are just disallowing access to any URLs that have more than the initial p= variable. So, you are reducing the impact of potential duplication through searches and the like.

Good

?p=1

Bad

?p=1&q=search string

I am no magento expert but this seems to be a simple attempt to reduce the myriad duplication that can happen with search pages and the like inside a complex CMS like Magento.

The SEOMoz crawler tool should give you some good insight and to be sure, try removing the 'Disallow: /?p=&' and see if you get a buckletload of duplicate content warnings.

Ultimately, the thing to remember here is that the & is part of the URL and not part of the regex.

Hope that helps!
Marcus

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

RegEx help needed for robots.txt potential conflict

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

How can I make it so that robots.txt is not ignored due to a URL re-direct?

Will a robots.txt disallow apply to a 301ed URL?

Do I have a robots.txt problem?

Easy Question: regarding no index meta tag vs robot.txt

How is this possible? A 200 response and 'nothing' to be seen? Need help!

Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?

What to do about "blocked by meta-robots"?

Video Sitemaps - Clarification Needed