Robots.txt file - How to block thosands of pages when you don't have a folder path
-
Hello.
Just wondering if anyone has come across this and can tell me if it worked or not.Goal:
To block review pagesChallenge:
The URLs aren't constructed using folders, they look like this:
www.website.com/default.aspx?z=review&PG1234
www.website.com/default.aspx?z=review&PG1235
www.website.com/default.aspx?z=review&PG1236So the first part of the URL is the same (i.e. /default.aspx?z=review) and the unique part comes immediately after - so not as a folder. Looking at Google recommendations they show examples for ways to block 'folder directories' and 'individual pages' only.
Question:
If I add the following to the Robots.txt file will it block all review pages?User-agent: *
Disallow: /default.aspx?z=reviewMuch thanks,
Davinia -
Also remember that blocking in robots.txt doesn't prevent Google from indexing those URLs. If the URLs are already indexed or if they are linked to, either internally or externally they may still in appear in the index with limited snippet information. If so, you'll need to add a noindex meta tag to those pages.
-
An * added to the end! Great thank you!
-
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Head down to the pattern matching section.
I think
User-agent: *
Disallow: /default.aspx?z=review*should do the trick though.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My "search visibility" went from 3% to 0% and I don't know why.
My search visibility on here went from 3.5% to 3.7% to 0% to 0.03% and now 0.05% in a matter of 1 month and I do not know why. I make changes every week to see if I can get higher on google results. I do well with one website which is for a medical office that has been open for years. This new one where the office has only been open a few months I am having trouble. We aren't getting calls like I am hoping we would. In fact the only one we did receive I believe is because we were closest to him in proximity on google maps. I am also having some trouble with the "Links" aspect of SEO. Everywhere I see to get linked it seems you have to pay. We are a medical office we aren't selling products so not many Blogs would want to talk about us. Any help that could assist me with getting a higher rank on google would be greatly appreciated. Also any help with getting the search visibility up would be great as well.
Intermediate & Advanced SEO | | benjaminleemd1 -
No designated 404 page, but any made-up URL path displays homepage Good / Bad?
I have a custom website where if you type in companyxyz.com/_any-made-up-url _it displays the homepage. So then you will see the homepage and in the URL bar the made up URL path remains visible "companyxyz.com/any-made-up-url" Is this good or bad or not an issue?
Intermediate & Advanced SEO | | Rich_Coffman0 -
Robots.txt, Disallow & Indexed-Pages..
Hi guys, hope you're well. I have a problem with my new website. I have 3 pages with the same content: http://example.examples.com/brand/brand1 (good page) http://example.examples.com/brand/brand1?show=false http://example.examples.com/brand/brand1?show=true The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages... I don't know how should do now, but, i am thinking 2 posibilites: Remove filters (true, false) and leave only the good page and show 404 page for others pages. Update robots.txt with disallow for these parameters & remove those URL's manually Thank you so much!
Intermediate & Advanced SEO | | thekiller990 -
Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
Our parent company has included their sitemap links in our robots.txt file. All of their sitemap links are on a different domain and I'm wondering if this will have any impact on our searchability or potential rankings.
Intermediate & Advanced SEO | | tsmith1310 -
We used to speak of too many links from same C block as bad, have CDN's like CloudFlare made that concept irrelevant?
Over lunch with our head of development, we were discussing the way CloudFlare and other CDN's help prevent DDOS attacks, etc. and I began to wonder about the IP address vs. the reverse proxy IP address. Before we would look to see commonalities in the IP as a way that search engines would modify the value to given links and most link software showed this. For ahrefs, I know they still show common IPs using the C block as the reference point. I began to get curious about what was the real IP when our head of dev said, that is the IP from CloudFlare... So, I ran a site in ahrefs and we got an older site we had developed years ago that showed up as follows: Actos-lawsuit.org 104.28.13.57 and again as 104.28.12.57 (duplicate C block is first three sets of numbers are the same and obviously, this has a .12 and a .13 so not duplicate.) Then we looked at our host to see what was the IP shown there: 104.239.226.120. So, this really begs a question of is C Block data or even IP address data still relevant with regard to links? What do the search engines see when they look for IP address now? Yes, I have an opinion, but would love to hear yours first!
Intermediate & Advanced SEO | | RobertFisher0 -
Sites still rank who don't seem like they should. Why?
So you've been MOZing and SEOing for years and we're all convinced of the 10x factor when it comes to content and ranking for certain search terms... right? So what do you do when some older sites that don't even produce content dominate the first page of a very important search term? They're home pages with very little content and have clearly all dabbled in pre Panda SEO. Surely people are still seeing this and wondering why?
Intermediate & Advanced SEO | | wearehappymedia0 -
What if my site isn't ready for Mobile Armageddon by April 21st??
Hello Moz Experts, I am fighting for one of our sites to be mobile optimized, but the fight is taking longer than anticipated (need approval from higher ups). What happens if my site is not ready by April 21st? Will it take long to recover, like Penguin? Or, will the recovery be fairly quick? Say I release a mobile version of my site a week later. Then Google will have to reindex it and rank me again. How long will that take before I regain my traffic? Thanks,
Intermediate & Advanced SEO | | TMI.com0 -
Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site). And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
Intermediate & Advanced SEO | | nicole.healthline0