Robots.txt & Disallow: /*? Question!
-
Hi,
I have a site where they have:
Disallow: /*?
Problem is we need the following indexed:
?utm_source=google_shopping
What would the best solution be? I have read:
User-agent: *
Allow: ?utm_source=google_shopping
Disallow: /*?Any ideas?
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
use this it will help you and your problem will solve
Regards
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
this will work ??
Regards
Sajad -
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /*?* Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml use this it will help you Regards [Saad](https://clicktestworld.com/)
-
Hi Jeff,
Robots.txt tester as per the above link is definitely worth playing with and is the easiest route to achieving what you want.
Another reactive way of managing this is in some cases is to simply see the range of parameters Google has naturally crawled within Search Console.
You can see this in the old search console for now. So login and go to Crawl --> URL Parameters.
If Googlebot has encountered any ?=params it will list them. You'll then have an option how to manage them or exclude them from the index.
It can be a decent way of cleaning up a site with lot's of indexed pages (1,000+), although please be sure to read this documentation before using it: https://support.google.com/webmasters/answer/6080548?hl=en
-
With this kind of thing, it's really better to pick the specific parameters (or parameter combinations) which you'd like to exclude, e.g:
User-agent: *
Disallow: /shop/product/&size=*
Disallow: */shop/product/*?size=*
Disallow: /stockists?product=*
^ I just took the above from a robots.txt file which I have been working on, as these particular pages don't have 'pretty' URLs with unique content on. Very soon now that will change and the blocks will be lifted
If you are really 100% sure that there's only one param which you want to let through, then you'd go with:
User-agent: *
Disallow: /?
Allow: /?utm_source=google_shopping
Allow: /*&utm_source=google_shopping*
(or something pretty similar to that!)
Before you set anything live, get down a list of URLs which represent the blocks (and allows) which you want to achieve. Test it all with the Robots.txt tester (in Search Console) before you set anything live!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unanswered questions in forums
What should be done with forum questions that go unanswered for a long time (i.e. year or longer)? Are these types of questions valuable content? Should we opt out of having these types of pages indexed? Since these pages are just one sentences doesn't seem like it is adding value to the site.
Intermediate & Advanced SEO | | nandaMesa0 -
Large robots.txt file
We're looking at potentially creating a robots.txt with 1450 lines in it. This will remove 100k+ pages from the crawl that are all old pages (I know, the ideal would be to delete/noindex but not viable unfortunately) Now the issue i'm thinking is that a large robots.txt will either stop the robots.txt from being followed or will slow our crawl rate down. Does anybody have any experience with a robots.txt of that size?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Migration Challenge Question
I work for a company that recently acquired another company and we are in the process of merging the brands. Right now we have two website, lets call them: www.parentcompanyalpha.com www.acquiredcompanyalpha.com We are working with a web development company who is designing our brand new site, which will launch at the end of September, we can call that www.parentacquired.com. Normally it would be simple enough to just 301 redirect all content from www.parentcompanyalpha.com and www.acquiredcompanyalpha.com to the mapped migrated content on www.parentacquired.com. But that would be too simple. The reality is that only 30% of www.acquiredcompanyalpha.com will be migrating over, as part of that acquired business is remaining independent of the merged brands, and might be sold off. So someone over there mirrored the www.acquiredcompanyalpha.com site and created an exact duplicate of www.acquiredcompanybravo.com. So now we have duplicate content for that site out there (I was unaware they were doing this now, we thought they were waiting until our new site was launched). Eventually we will want some of the content from acquiredcompanyalpha.com to redirect to acquiredcompanybravo.com and the remainder to parentacquired.com. What is the best interim solution to maintain as much of the domain values as possible? The new site won't launch until end of September, and it could fall into October. I have two sites that are mirrors of each other, one with a domain value of 67 and the new one a lowly 17. I am concerned about the duplicate site dragging down that 67 score. I can ask them to use rel=canonical tags temporarily if both sites are going to remain until Sept/Oct timeframe, but which way should they go? I am inclined to think the best result would be to have acquiredcompanybravo.com rel=canonical back to acquiredcompanyalpha.com for now, and when the new site launches, remove those and redirect as appropriate. But will that have long term negative impact on acquiredcomapnybravo.com? Sorry, if this is convoluted, it is a little crazy with people in different companies doing different things that are not coordinated.
Intermediate & Advanced SEO | | Kenn_Gold0 -
Http to https question (SSL)
Hi, I recently made two big changes to a site - www.aerlawgroup.com (not smart, I know). First, I changed from Weebly to Wordpress (WP Engine hosting with CDN + Cloudflare - is that overkill?) and I added SSL (http to https). From a technical perspective, I think I made a better site: (1) blazing fast, (2) mobile responsive, (3) more secure. I'm seeing the rankings fluctuate quite a bit, especially on the important keywords. I added SSL to my other sites, and saw no rankings change (they actually all went up slightly). I'm wondering if anyone has had experience going to SSL and can give me feedback on something I might have overlooked. Again, it's strange that all the other sites responded positively, but the one listed above is going in the opposite direction. Maybe there are other problems, and the SSL is just a coincidence. Any feedback would be appreciated. I followed this guide: http://moz.com/blog/seo-tips-https-ssl - which helped tremendously (FYI).
Intermediate & Advanced SEO | | mrodriguez14400 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
Another deduplication question.
Where an existing website has duplicate content issues - specifically the www. and non-www. type; what is the most effective way to inform the searchers and spiders that there is only one page? I have a site where the ecommerce software (Shopfitter 4) allows a fair bit of meta data to be inserted into each product page but I am uncertain, after a couple of attempts to deduplicate some pages, which is the most effective way to ensure that the www related duplication is eliminated sitewide - there is such a solution. I have to own up to having looked at ,htaccess 301 redirects webmaster tools and become increasingly bamboozled by the conflicting advice as to which is the most effective way or combination to get rid of this problem. too olod to learn new tricks I reckon 😉 Your help and clarification would be appreciated as this may help head off more fruitless work.
Intermediate & Advanced SEO | | SkiBum0 -
Image Links Vs. Text Links, Questions About PR & Anchor Text Value
I am searching for testing results to find out the value of text links versus image links with alt text. Do any of you have testing results that can answer or discuss these questions? If 2 separate pages on the same domain were to have the same Page Authority, same amount of internal and external links and virtually carry the same strength and the location of the image or text link is in the same spot on both pages, in the middle of the body within paragraphs. Would an image link with alt text pass the same amount of Page Authority and PR as a text link? Would an image link with alt text pass the same amount of textual value as a text link? For example, if the alt text on the image on one page said "nike shoes" and the text link on the other page said "nike shoes" would both pass the same value to drive up the rankings of the page for "nike shoes"? Would a link wrapped around an image and text phrase be better than creating 2 links, one around the image and one around the text pointing to the same page? The following questions have to do with when you have an image and text link on a page right next to each other, like when you link a compelling graphic image to a category page and then list a text link underneath it to pass text link value to the linked-to page. If the image link displays before the text link pointing to a page, would first link priority use the alt text and not even apply the anchor text phrase to the linked page? Would it be best to link the image and text phrase together pointing to the product page to decrease the link count on the page, thus allowing for more page rank and page authority to pass to other pages that are being linked to on the page? And would this also pass anchor text value to the link-to page since the link would include an image and text? I know that the questions sound a bit repetitive, so please let me know if you need any further clarification. I'd like to solve these to further look into ways to improve some user experience aspects while optimizing the link strength on each page at the same time. Thanks!
Intermediate & Advanced SEO | | abernhardt
Andrew0 -
Should we block urls like this - domainname/shop/leather-chairs.html?brand=244&cat=16&dir=ascℴ=price&price=1 within the robots.txt?
I've recently added a campaign within the SEOmoz interface and received an alarming number of errors ~9,000 on our eCommerce website. This site was built in Magento, and we are using search friendly url's however most of our errors were duplicate content / titles due to url's like: domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=1 and domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=4. Is this hurting us in the search engines? Is rogerbot too good? What can we do to cut off bots after the ".html?" ? Any help would be much appreciated 🙂
Intermediate & Advanced SEO | | MonsterWeb280