Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Blocking URL's with specific parameters from Googlebot
-
Hi,
I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor".
How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages.
DON'T want to block:
http://www.mysite.com/product.php?productid=1234
WANT to block:
http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2
Javacript button code:
onclick="javascript: document.voteform.submit();"
Thanks in advance for any advice given.
Regards,
Asim -
Good to hear, I am glad you perservered
-
Tried them all now and all come back with "Success"... May be I'll post in the WMT Forum and see if anyone can shed light on this problem. Thanks for your help Alan, it's much appreciated.
-
Yes correct, did you try the other formats?
-
Tried "Fetch as Googlebot" in Diagnostics and it came back as "Success" so I guess the robots.txt directive is not working. I'm assuming it should have reported a failure message when attempting to fetch a URL containing "?mode=vote".
-
Wrong place, go to diagnostics, then look for fetch as googlebot
-
I added "Disallow: /mode=vote" to the robots.txt file and also manually entered it on Crawler Access page, then clicked "Test" and no errors were reported. The WMT page states that robots.txt was last downloaded 16 hours ago so I'll wait until it picks the file up again and then check for any errors. Hopefully that will do trick
-
Try this in robots.txt, I did not think that Google allows wild cards but i just read that they do.
Disallow: /*mode=vote*
or
Disallow: /*mode=vote
or
Disallow: /*mode
Then try in Google WMT to read with googlebot to see if it works.
The first in the list seems right to me, but I have seen others do it the other ways.
-
Thanks for the reply. The site was developed using PHP, mySQL and Javascript. I was hoping there was a way to do it without getting programmers involved...
-
dont think you are going to do it in robots.txt, rather do a 301 from mode=vote to non mode vote.
If you dont know how to put this into practise, tell me what your site is built with, if it is ASP.NET, i will show you how to impliment, if not someone else should be able to help.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's the best way to test Angular JS heavy page for SEO?
Hi Moz community, Our tech team has recently decided to try switching our product pages to be JavaScript dependent, this includes links, product descriptions and things like breadcrumbs in JS. Given my concerns, they will create a proof of concept with a few product pages in a QA environment so I can test the SEO implications of these changes. They are planning to use Angular 5 client side rendering without any prerendering. I suggested universal but they said the lift was too great, so we're testing to see if this works. I've read a lot of the articles in this guide to all things SEO and JS and am fairly confident in understanding when a site uses JS and how to troubleshoot to make sure everything is getting crawled and indexed. https://sitebulb.com/resources/guides/javascript-seo-resources/ However, I am not sure I'll be able to test the QA pages since they aren't indexable and lives behind a login. I will be able to crawl the page using Screaming Frog but that's generally regarded as what a crawler should be able to crawl and not really what Googlebot will actually be able to crawl and index. Any thoughts on this, is this concern valid? Thanks!
Technical SEO | | znotes0 -
301 Re-directing 'empty' domains
Hello, My client had purchased a few domains and 301 re-directed them, pointing to our main website. As far as I am aware the 'empty domains' are brand related but no content has ever been displayed on them, and I doubt they have much authority. The issue here is that we took a dive in ranking for our main keyword, I had a look on ahrefs and found the below: | www.empty-domain/our-keyword | 30 | 19 | 1 | fb 0
Technical SEO | | SO_UK
G+ 0
in 4 | REDIRECT 301 TO www.main-domain/our-keyword | 8 Feb '175 d | The ranking dip happened at the same time as the re-direct was re-discovered / re-crawled. Could the 'empty' URL in question been causing us any issues? I understand that this is terrible practice for 301 redirects, I was hoping someone in the community could shed light on any possible solution for this.0 -
Good alternatives to Xenu's Link Sleuth and AuditMyPc.com Sitemap Generator
I am working on scraping title tags from websites with 1-5 million pages. Xenu's Link Sleuth seems to be the best option for this, at this point. Sitemap Generator from AuditMyPc.com seems to be working too, but it starts handing up, when a sitemap file, the tools is working on,becomes too large. So basically, the second one looks like it wont be good for websites of this size. I know that Scrapebox can scrape title tags from list of url, but this is not needed, since this comes with both of the above mentioned tools. I know about DeepCrawl.com also, but this one is paid, and it would be very expensive with this amount of pages and websites too (5 million ulrs is $1750 per month, I could get a better deal on multiple websites, but this obvioulsy does not make sense to me, it needs to be free, more or less). Seo Spider from Screaming Frog is not good for large websites. So, in general, what is the best way to work on something like this, also time efficient. Are there any other options for this? Thanks.
Technical SEO | | blrs120 -
Why do some URLs for a specific client have "/index.shtml"?
Reviewing our client's URLs for a 301 redirect strategy, we have noticed that many URLs have "/index.shtml." The part we don'd understand is these URLs aren't the homepage and they have multiple folders followed by "/index.shtml" Does anyone happen to know why this may be occurring? Is there any SEO value in keeping the "/index.shtml" in the URL?
Technical SEO | | FranFerrara0 -
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated! Thanks
Technical SEO | | BestRide0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
Why am I not showing up in the SERP's or Google Local?
I have been trying to optimise the following site for both Google SERP's and Google Local - Pixel Primate The URL has been around for around 3 years now but they just updated the website and launched it in December 2012. I did the on-page optimisation early in January 2013 and Google seems to have indexed the changes, for the home page at least. One major keyword I am targeting for the home page is 'Web Design Leicester'. I understand that the DA is fairly low (24) so this is something I need to improve. However, I've experienced positive results fairly quickly from just on-page optimisation for other sites I have worked on. The site just doesn't seem to be ranking at all for any keywords. Maybe the industry type is just extremely competitve but I find it very strange to not be visible anywhere in the SERPs. The site does not seem to have any penalties as it ranks for 'Pixel Primate' and all pages appear when doing a site: search. Also what's strange is that I set up the Google Local listing years ago but it doesn't appear anywhere in the local listing, not even when I search for it manually. Any suggestions would be appreciated.
Technical SEO | | CWseo0 -
Google's "cache:" operator is returning a 404 error.
I'm doing the "cache:" operator on one of my sites and Google is returning a 404 error. I've swapped out the domain with another and it works fine. Has anyone seen this before? I'm wondering if G is crawling the site now? Thx!
Technical SEO | | AZWebWorks0