Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Using 2 wildcards in the robots.txt file
-
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string.
So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string?
Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on?
Thanks.
-
I'm not 100% positive, however it does make sense to use it this way.
User-agent: *
Disallow: /*_Q1$
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How important is the file extension in the URL for images?
I know that descriptive image file names are important for SEO. But how important is it to include .png, .jpg, .gif (or whatever file extension) in the url path? i.e. https://example.com/images/golden-retriever vs. https://example.com/images/golden-retriever.jpg Furthermore, since you can set the filename in the Content-Disposition response header, is there any need to include the descriptive filename in the URL path? Since I'm pulling most of our images from a database, it'd be much simpler to not care about simulating a filename, and just reference an image id in my templates. Example: 1. Browser requests GET /images/123456
Intermediate & Advanced SEO | | dsbud
2. Server responds with image setting both Content-Disposition, and Link (canonical) headers Content-Disposition: inline; filename="golden-retriever"
Link: <https: 123456="" example.com="" images="">; rel="canonical"</https:>1 -
Should I use the on classified listing pages that have expired?
We have went back and forth on this and wanted to get some outside input. I work for an online listing website that has classified ads on it. These ads are generated by companies on our site advertising weekend events around the country. We have about 10,000 companies that use our service to generate their online ads. This means that we have thousands of pages being created each week. The ads have lots of content: pictures, sale descriptions, and company information. After the ads have expired, and the sale is no longer happening, we are currently placing the in the heads of each page. The content is not relative anymore since the ad has ended. The only value the content offers a searcher is the images (there are millions on expired ads) and the descriptions of the items for sale. We currently are the leader in our industry and control most of the top spots on Google for our keywords. We have been worried about cluttering up the search results with pages of ads that are expired. In our Moz account right now we currently have over 28k crawler warnings alerting us to the being in the page heads of the expired ads. Seeing those warnings have made us nervous and second guessing what we are doing. Does anybody have any thoughts on this? Should we continue with placing the in the heads of the expired ads, or should we be allowing search engines to index the old pages. I have seen websites with discontinued products keeping the products around so that individuals can look up past information. This is the closest thing have seen to our situation. Any help or insight would be greatly appreciated! -Matt
Intermediate & Advanced SEO | | mellison0 -
Should I be using meta robots tags on thank you pages with little content?
I'm working on a website with hundreds of thank you pages, does it make sense to no follow, no index these pages since there's little content on them? I'm thinking this should save me some crawl budget overall but is there any risk in cutting out the internal links found on the thank you pages? (These are only standard site-wide footer and navigation links.) Thanks!
Intermediate & Advanced SEO | | GSO0 -
Using disavow tool for 404s
Hey Community, Got a question about the disavow tool for you. My site is getting thousands of 404 errors from old blog/coupon/you name it sites linking to our old URL structure (which used underscores and ended in .jsp). It seems like the webmasters of these sites aren't answering back or haven't updated their sites in ages so it's returning 404 errors. If I disavow these domains and/or links will it clear out these 404 errors in Google? I read the GWT help page on it, but it didn't seem to answer this question. Feel free to ask any questions that may help you understand the issue more. Thanks for your help,
Intermediate & Advanced SEO | | IceIcebaby
-Reed0 -
When to Use Schema vs. Facebook Open Graph?
I have a client who for regulatory reasons cannot engage in any social media: no Twitter, Facebook, or Google+ accounts. No social sharing buttons allowed on the site. The industry is medical devices. We are in the process of redesigning their site, and would like to include structured markup wherever possible. For example, there are lots of schema types under MedicalEntity: http://schema.org/MedicalEntity Given their lack of social media (and no plans to ever use it), does it make sense to incorporate OG tags at all? Or should we stick exclusively to the schemas documented on schema.org?
Intermediate & Advanced SEO | | Allie_Williams0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
What Wordpress Update Services Should You Be Using on Your Wordpress Blog?
I have been told that pingomatic.com is all that you need however yesterday I went to a conference and others were recommending to have a good list of pinging services to cover all your bases Here are 4 that have been recommended: pingomatic technorati blogsearch.google.com feedburner Any others that should be included on this list? My goal is not to spam these ping lists however want to make sure my content is getting indexed quickly
Intermediate & Advanced SEO | | webestate0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0