RegEx help needed for robots.txt potential conflict
-
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both:
Allow: /*?p=
and
Disallow: /?p=&
I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number?
I've looked at several resources and there is practically no reference to what "&" does...
Can anyone shed any light on this, to ensure I am allowing suitable access to a shop?
Thanks in advance for any assistance
-
Hey James
It looks to me like you are just disallowing access to any URLs that have more than the initial p= variable. So, you are reducing the impact of potential duplication through searches and the like.
Good
?p=1
Bad
?p=1&q=search string
I am no magento expert but this seems to be a simple attempt to reduce the myriad duplication that can happen with search pages and the like inside a complex CMS like Magento.
The SEOMoz crawler tool should give you some good insight and to be sure, try removing the 'Disallow: /?p=&' and see if you get a buckletload of duplicate content warnings.
Ultimately, the thing to remember here is that the & is part of the URL and not part of the regex.
Hope that helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do I need to do for HTTPS switch in Webmaster Tools?
My site is currently verified using a meta tag for both Google and Bing. Will I need to recreate the meta tag or will I be able to use the same one?
Technical SEO | | EcommerceSite1 -
Google is indexing blocked content in robots.txt
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
Technical SEO | | elisainteractive0 -
Need some help Pagerank N/A
Hi All! We have a small e commerce site, www.ruggedpcstore.com, built on Wordpress. The home page has a PR of 3, our About page is a PR 0, and all other pages are PR N/A. We've been pulling our hair out trying to fix what's wrong. Any ideas?
Technical SEO | | CsmBill0 -
Help! Getting 5XX error
Keep getting a 5XX error and my site is obviously losing ranking, Asked the hoster. Nobody seems to know what is wrong. Site is www.monteverdetours.com I know this is probably an obvious problem and easy to fix but I don't know how to do it! Any comments will be greatly appreciated.
Technical SEO | | Llanero0 -
What steps can you take to help a site that does not change
Hi, i am working on a product and services website www.clairehegarty.co.uk but the problem i have is, the site does not really change. The home page stays the same and the only time it changes is when a new course is advertised. The most important page on the website is http://www.clairehegarty.co.uk/virtual-gastric-band-with-hypnotherapy but we have seen the site drop in rankings because the page is not being updated. This page has all the information you could want on weight loss but we have seen the page drop from number one in google to number four. I would like to know what steps we should take to increase our rankings in google and would be grateful for your suggestions. If i put in articles on the site, had a section where we put a new article every week, would this then get google to visit the whole site more and move our pages back up the rankings, or should we be looking at doing other things.
Technical SEO | | ClaireH-1848860 -
Internal search : rel=canonical vs noindex vs robots.txt
Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million
Technical SEO | | JohannCR0 -
How can I exclude display ads from robots.txt?
Google has stated that you can do this to get spiders to content only, and faster. Our IT guy is saying it's impossible.
Technical SEO | | GregBeddor
Do you know how to exlude display ads from robots.txt? Any help would be much appreciated.0 -
Website Page Structuring and URL re writing - need helpful resources
Hello, I am not technically very sound and I need some good articles that teach me how to think about and go about website pages structuring and url rewriting that is seo friendly. I will be most obliged if some of you great seomoz-ers can pitch in with help. Regards, Talha ZigZag Solutions
Technical SEO | | TopGearMedia0