Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt & Disallow: /*? Question!
-
Hi,
I have a site where they have:
Disallow: /*?
Problem is we need the following indexed:
?utm_source=google_shopping
What would the best solution be? I have read:
User-agent: *
Allow: ?utm_source=google_shopping
Disallow: /*?Any ideas?
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
use this it will help you and your problem will solve
Regards
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
this will work ??
Regards
Sajad -
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /*?* Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml use this it will help you Regards [Saad](https://clicktestworld.com/)
-
Hi Jeff,
Robots.txt tester as per the above link is definitely worth playing with and is the easiest route to achieving what you want.
Another reactive way of managing this is in some cases is to simply see the range of parameters Google has naturally crawled within Search Console.
You can see this in the old search console for now. So login and go to Crawl --> URL Parameters.
If Googlebot has encountered any ?=params it will list them. You'll then have an option how to manage them or exclude them from the index.
It can be a decent way of cleaning up a site with lot's of indexed pages (1,000+), although please be sure to read this documentation before using it: https://support.google.com/webmasters/answer/6080548?hl=en
-
With this kind of thing, it's really better to pick the specific parameters (or parameter combinations) which you'd like to exclude, e.g:
User-agent: *
Disallow: /shop/product/&size=*
Disallow: */shop/product/*?size=*
Disallow: /stockists?product=*
^ I just took the above from a robots.txt file which I have been working on, as these particular pages don't have 'pretty' URLs with unique content on. Very soon now that will change and the blocks will be lifted
If you are really 100% sure that there's only one param which you want to let through, then you'd go with:
User-agent: *
Disallow: /?
Allow: /?utm_source=google_shopping
Allow: /*&utm_source=google_shopping*
(or something pretty similar to that!)
Before you set anything live, get down a list of URLs which represent the blocks (and allows) which you want to achieve. Test it all with the Robots.txt tester (in Search Console) before you set anything live!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
301 redirect with /? in URL
For a Wordpress site that has the ending / in the URL with a ? after it... how can you do a 301 redirect to strip off anything after the / For example how to take this URL domain.com/article-name/?utm_source=feedburner and 301 to this URL domain.com/article-name/ Thank you for the help
Intermediate & Advanced SEO | | COEDMediaGroup0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc
I've been doing a bit of stats research prompted by read the recent ranking blog http://www.seomoz.org/blog/gettings-rankings-into-ga-using-custom-variables There are a few things that have come up in my research that I'd like to clear up. The below analysis has been done on my "conversions". 1/. What does "/aclk" mean in the Referrer URL? I have noticed a strong correlation between this and "gclid" in the landing page variable. Does it mean "ad click" ?? Although they seem to "closely" correlate they don't exactly, so when I have /aclk in the referrer Url MOSTLY I have gclid in the landing page URL. BUT not always, and the same applies vice versa. It's pretty vital that I know what is the best way to monitor adwords PPC, so what is the best variable to go on? - Currently I am using "gclid", but I have about 25% extra referral URL's with /aclk in that dont have "gclid" in - so am I underestimating my number of PPC conversions? 2/. The use of the variable "cd" is great, but it is not always present. I have noticed that 99% of my google "Referrer URL's" either start with:
Intermediate & Advanced SEO | | James77
/aclk - No cd value
/search - No cd value
/url - Always contains the cd variable. What do I make of this?? Thanks for the help in advance!0