RegEx help needed for robots.txt potential conflict
-
I've created a robots.txt file for a new Magento install and used an existing site-map that was on the Magento help forums but the trouble is I can't decipher something. It seems that I am allowing and disallowing access to the same expression for pagination. My robots.txt file (and a lot of other Magento site-maps it seems) includes both:
Allow: /*?p=
and
Disallow: /?p=&
I've searched for help on RegEx and I can't see what "&" does but it seems to me that I'm allowing crawler access to all pagination URLs, but then possibly disallowing access to all pagination URLs that include anything other than just the page number?
I've looked at several resources and there is practically no reference to what "&" does...
Can anyone shed any light on this, to ensure I am allowing suitable access to a shop?
Thanks in advance for any assistance
-
Hey James
It looks to me like you are just disallowing access to any URLs that have more than the initial p= variable. So, you are reducing the impact of potential duplication through searches and the like.
Good
?p=1
Bad
?p=1&q=search string
I am no magento expert but this seems to be a simple attempt to reduce the myriad duplication that can happen with search pages and the like inside a complex CMS like Magento.
The SEOMoz crawler tool should give you some good insight and to be sure, try removing the 'Disallow: /?p=&' and see if you get a buckletload of duplicate content warnings.
Ultimately, the thing to remember here is that the & is part of the URL and not part of the regex.
Hope that helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I rely on just robots.txt
We have a test version of a clients web site on a separate server before it goes onto the live server. Some code from the test site has some how managed to get Google to index the test site which isn't great! Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?
Technical SEO | | spiralsites0 -
Canonicalization of index.html - please help
I've read up on the subject but am new at this so I thought I would just put forth a simple question. We want our home page to be referred to as www.domain.com. We want the search engines to find and return this URl in search results. But the page has to have a name and the actual name is NOT to www.domain.com/index.html. This, I believe is what can cause duplicate cotnent issues (not really duplicate but perceived by the serach engines as duplicate content). Is it best to insert http://www.domain.com/" /> in the HEAD section of the index.html page or am I totally misunderstanding this concept?
Technical SEO | | TBKO0 -
Help - we're blocking SEOmoz cawlers
We have a fairly stringent blacklist and by the looks of our crawl reports we've begin unintentionally blocking the SEOmoz crawler. can you guys let me know the useragent string and anything else I need to enable mak sure you're crawlers are whitelisted? Cheers!
Technical SEO | | linklater0 -
Page Not Found Help!
Hi, I recently (about 2 months ago) moved a blog from a separate domain name over to my eCommerce site to help with marketing. http://www.moondoggieinc.com/blog. I seem to have gotten it all to work right, but I'm getting tons of 404 errors and they all have " in them for example: http://www.moondoggieinc.com/blog/”http://www.moondoggieinc.com/custom_dog_tanks_and_tees.php” I'm not sure how this happened of how to fix it, but there are about 250 pages like this. I know how to redirect them all with a 301 in htaccess, but Im not sure if that's the appropriate course to fix this or if that's just putting a patch on something that's causing a more major issue. Or do i just need to write 250 301 redirects? Thanks! Kristy O
Technical SEO | | KristyO0 -
Yoast settings help
I could use some real help here in my Yoast settings. I had some great settings before but we switched servers and it looks like we lost all our settings. I've taken some screenshots and I'm hoping someone can help! http://d.pr/i/chNQ http://d.pr/i/51TY http://d.pr/i/io7S http://d.pr/i/nak http://d.pr/i/acon The site is run by a couple guys. Please help!
Technical SEO | | ttb0 -
Blocked by meta-robots but there is no robots file
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
Technical SEO | | Twinbytes0 -
301 Redirect Help
Hello! I am getting ready to launch my freshly coded site in the next week or so. My product URLs are changing SLIGHTLY and want to confirm I am going about things the right way: A. My LIVE site store URLs look like http://hiphound.com/shop/dog-collars . My DEV site store URLs look like http://hiphound.com/dog-collars . No /shop directory. B. The dev firm installed the rewrite rule below: ############################################ enable rewrites Options +FollowSymLinks RewriteEngine on #RedirectMatch 301 ^/shop?/$ http://hiphound.com/ RedirectMatch 301 ^/shop?/$ http://hiphound.com ########################################### C. When I manually enter a URL with /shop in the address the website redirects to the correct page which is good. QUESTIONS I HAVE 1. Is the above redirect correct? I need them to permanent. Don't think the above is right... 2. Will links in the Google index be redirected as well? I am assuming yes but just want to confirm. 3. For each page indexed in Google will its pagerank, etc. be passed to the new page using just the 301 above? 4. Do I need to create addtional 301s for each page? So mapping the old page to the new page? Please advise. The goal here is to of course preserve the rankings of the pages already in the Google index. THANK YOU!!! Lynn
Technical SEO | | hiphound0 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0