Blocking URL's with specific parameters from Googlebot
-
Hi,
I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor".
How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages.
DON'T want to block:
http://www.mysite.com/product.php?productid=1234
WANT to block:
http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2
Javacript button code:
onclick="javascript: document.voteform.submit();"
Thanks in advance for any advice given.
Regards,
Asim -
Good to hear, I am glad you perservered
-
Tried them all now and all come back with "Success"... May be I'll post in the WMT Forum and see if anyone can shed light on this problem. Thanks for your help Alan, it's much appreciated.
-
Yes correct, did you try the other formats?
-
Tried "Fetch as Googlebot" in Diagnostics and it came back as "Success" so I guess the robots.txt directive is not working. I'm assuming it should have reported a failure message when attempting to fetch a URL containing "?mode=vote".
-
Wrong place, go to diagnostics, then look for fetch as googlebot
-
I added "Disallow: /mode=vote" to the robots.txt file and also manually entered it on Crawler Access page, then clicked "Test" and no errors were reported. The WMT page states that robots.txt was last downloaded 16 hours ago so I'll wait until it picks the file up again and then check for any errors. Hopefully that will do trick
-
Try this in robots.txt, I did not think that Google allows wild cards but i just read that they do.
Disallow: /*mode=vote*
or
Disallow: /*mode=vote
or
Disallow: /*mode
Then try in Google WMT to read with googlebot to see if it works.
The first in the list seems right to me, but I have seen others do it the other ways.
-
Thanks for the reply. The site was developed using PHP, mySQL and Javascript. I was hoping there was a way to do it without getting programmers involved...
-
dont think you are going to do it in robots.txt, rather do a 301 from mode=vote to non mode vote.
If you dont know how to put this into practise, tell me what your site is built with, if it is ASP.NET, i will show you how to impliment, if not someone else should be able to help.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Good alternatives to Xenu's Link Sleuth and AuditMyPc.com Sitemap Generator
I am working on scraping title tags from websites with 1-5 million pages. Xenu's Link Sleuth seems to be the best option for this, at this point. Sitemap Generator from AuditMyPc.com seems to be working too, but it starts handing up, when a sitemap file, the tools is working on,becomes too large. So basically, the second one looks like it wont be good for websites of this size. I know that Scrapebox can scrape title tags from list of url, but this is not needed, since this comes with both of the above mentioned tools. I know about DeepCrawl.com also, but this one is paid, and it would be very expensive with this amount of pages and websites too (5 million ulrs is $1750 per month, I could get a better deal on multiple websites, but this obvioulsy does not make sense to me, it needs to be free, more or less). Seo Spider from Screaming Frog is not good for large websites. So, in general, what is the best way to work on something like this, also time efficient. Are there any other options for this? Thanks.
Technical SEO | | blrs120 -
My old URL's are still indexing when I have redirected all of them, why is this happening?
I have built a new website and have redirected all my old URL's to their new ones but for some reason Google is still indexing the old URL's. Also, the page authority for all of my pages has dropped to 1 (apart from the homepage) but before they were between 12 to 15. Can anyone help me with this?
Technical SEO | | One2OneDigital0 -
How should I close my forum in a way that's best for SEO?
Hi Guys, I have a forum on a subdomain and it is no longer used. (like forum.mywebsite.com) It kind of feels like a dead limb and I don't know what's best to do for SEO. Should I just leave it as it is and let it stagnate? There is a link in the nav menu to the main domain so users have a chance to find the main domain. Or should I remove it and just redirect the whole subdomain to the main domain? I don't know if redirects would work as I doubt most of the threads would match our articles, plus there are 700 of them. The main domain is PR3 and so is the forum subdomain. Please help!
Technical SEO | | HCHQ0 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0 -
Google's Omitted Results - Attempt to De-Index
We're trying to get webpages from our QA site out of Google's index. We've inserted the NOINDEX tags. Google now shows only 3 results (down from 196,000), however, they offer a link to "show omitted results" at the bottom of the page. (A) Did we do something wrong? or (B) were we successful with our NOINDEX but Google will offer to show omitted results anyway? Please advise! Thanks!
Technical SEO | | BVREID0 -
Are the duplicate content and 302 redirects errors negatively affecting ranking in my client's OS Commerce site?
I am working on an OS Commerce site and struggling to get it to rank even for the domain name. Moz is showing a huge number of 302 redirects and duplicate content issues but the web developer claims they can not fix those because ‘that is how the software in which your website is created works’. Have you any experience of OS Commerce? Is it the 302 redirects and duplicate content errors negatively affecting the ranking?
Technical SEO | | Web-Incite0 -
Custom Permalinks (aka alias') - does it look spammy to googlebot?
I am moving my whole site over to wordpress (150+pgs). In the process I assigned pages to appropriate parent pages via "page attributes". I was really excited about this. I like how it organizes everything in the pages dashboard. I also think that the sitemap that comes with my theme can create something really great for visitors with this info. What I realized after doing that is that it changed my url to include the parent page. Basically, the url is now "domain.com/parent-page/child-page.html". This is rather disasterous because the url's of these newly created child pages on my old site are simple "domain.com/child-page". Not that they're defined as parent or child pages on my existing dreamweaver/html site... but you know what I mean - Right?! I got a plugin called "Permalink Editor" to let me customize the url. So, I went through all of the child pages and got rid of the parent page in the url. Then when I woke up this morning I realized that what I've created is a "permalink alias". That sounds a little bit scary to me. Perhaps like google could consider it spam and like I'm trying to "sculpt link flow". I'm not... I'm just trying to recreate my site as it is in wordpress. I want the site to be exactly the same in terms of the url's. But, I want the many benefit's of wordpress' CMS. Should I go an unassign all of the parent/child pages in the "Page Attributes". Or, am I being paranoid and should I leave it as is? fyi - this is the first page that came up with I searched for permalink alias. It looks kind of black-hatty to me?!
Technical SEO | | nsjadmin
- http://www.seodesignsolutions.com/blog/wordpress-seo/seo-ultimate-4-7/ Thanks so much. I look forward to a response!0 -
How Best to Handle 'Site Jacking' (Unauthorized Use of Someone else's Dedicated IP Address)
Anyone can point their domain to any IP address they want. I've found at least two domains (same owner) with two totally unrelated domains (to each other and to us) that are currently pointing their domains to our IP address. The IP address is on our dedicated server (we control the entire physical server) and is exclusive to only that one domain (so it isn't a virtual hosting misconfiguration issue) This has caused Google to index their two domains with duplicate content from our site (found by searching for site:www.theirdomain.com) Their site does not come up in the first 50 results though for any of the keywords we come up for so Google obviously knows THEY are the dupe content, not us (our site has been around for 12 years - much longer than them.) Their registration is private and we have not been able to contact these people. I'm not sure if this is just a mistake on the DNS for the two domains or it is someone doing this intentionally to try to harm our ranking. It has been going on for a while, so it is most likely not a mistake for two live sites as they would have noticed long ago they were pointing to the wrong IP. I can think of a variety of actions to take but I can find no information anywhere regarding what Google officially recommends doing in this situation, assuming you can't get a response. Here's my ideas. a) Approach it as a Digital Copyright Violation and go through the lengthy process of having their site taken down. Pro: Eliminates the issue. Con: Sort of a pain and we could be leaving possibly some link juice on the table? b) Modify .htaccess to do a 301 redirect from any URL not using our domain, to our domain. This means Google is going to see several domains all pointing to the same IP and all except our domain, 301 redirecting to our domain. Not sure if THAT will harm (or help) us? Would we not receive link juice then from any site out there that was linking to these other domains? Con: Google will see the context of the backlinks and their link text will not be related at all to our site. In addition, if any of these other domains pointing to our IP have backlinks from 'bad neighborhoods' I assume it could hurt us? c) Modify .htaccess to do a 404 File Not Found or 403 forbidden error? I posted in other forums and have gotten suggestions that are all over the map. In many cases the posters don't even understand what I'm talking about - thinking they are just normal backlinks. Argh! So I'm taking this to "The Experts" on SEOMoz.
Technical SEO | | jcrist1