Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Using 410 To Remove URLs Starting With Same Word
-
We had a spam injection a few months ago. We successfully cleaned up the site and resubmitted to google. I recently received a notification showing a spike in 404 errors.
All of the URLS have a common word at the beginning injected via the spam:
sitename.com/mono
sitename.com/mono.php?buy-good-essays
sitename.com/mono.php?professional-paper-writerThere's about 100 total URLS with the same syntax with the word "mono" in them. Based on my research, it seems that it would be best to serve a 410. I wanted to know what the line of HTACCESS code would be to do that in bulk for any URL that has the word "mono" after the sitename.com/
-
Martijn -
Thanks for your reply. I tried the code you provided, however it still provided a 404 error. I was able to get the following to work properly - any drawbacks to doing it this way?
RewriteRule ^mono(.*)$ - [NC,R=410,L]
The browser now shows the following anytime there is the word "mono" immediately after "sitename.com/"
The requested resource
/mono.php
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.Additionally, a 410 Gone error was encountered while trying to use an ErrorDocument to handle the request.
-
Thanks for the detailed response. Yes, there are some negative-SEO backlinks to some of the URLs created during the spam injection. I've seen a few backlinks from other forum sites to our site to one of the spam created URLs which has hurt our rankings such as the following URL created on our site:
sitename.com/mono.php?best-resume-writing-service-for-it-professionals
I was confused by the following in your response: "If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation"
- Is that all done view the htaccess file? Code? Or is the meta no-index directive done in the robots.txt?- custom 410 page? I've seen some 404 pages, but not custom 410 pages. Would that be similar to a new 404 page?
Thanks for your response.
-
There are so many ways to deal with this. If these were indeed spam URLs, someone may have attached negative-SEO links to them (to water down your site's ranking power). As such, redirecting these URLs back to their parents could pull spam metrics 'onto' your site which would be really bad. I can see why you are thinking about using 410 (gone)
Using Canonical tags to stop Google from indexing those bad parameter-based URLs could also be helpful. If you 'canonicalled' those addresses to their non-parameter based parents, Google would stop crawling those pages. When a URL 'canonicals' to another, different page - it cites itself as non-canonical, and thus gets de-indexed (usually, although this is only a directive). Again though, canonical tags interrelate pages. If those spam URLs were backed by negative SEO attacks, the usage of canonical tags would (again) be highly inadvisable (leaving your 410 suggestion as a better method).
Google listens for wildcard rules in your robots.txt file, though it runs very simplified regex (in fact I think only the "*" wildcard is supported). In your robots.txt you could do something like:
User-agent: *
Disallow: /mono.php?*That would cull Google's crawling of most of those URLs, but not necessarily the indexation. This would be something to do after Google has swallowed most of the 410s and 'got the message'. You shouldn't start out with this, as if Google can't crawl those URLs - it won't see your 410s! Just remember this, so that when the issue is resolved you can smack this down and stop the attack from occurring again (or at least, it will be preemptively nullified)
Finally you have Meta "No-Index" tags. They don't stop Google from crawling a URL, but they will remove those URLs from Google's index. If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation
So now we have a bit of an action plan:
- 410 the bad URLs alongside a Meta no-index directive served from the same URL
- Once Google has swallowed all that (may be some weeks or just over 1 month), back-plate it with robots.txt wildcards
With regards to your oriignal question (sorry I took so long to get here) I'd use something like:
Redirect 410 /mono.php?*
I think .htaccess swallows proper regex (I think). The back slashes say "whatever character follows me, treat that character as a value and do not apply its general regex function". It's the regex escape character (usually). This would go in the .htaccess file at the root of your site, not in a subdir .htaccess file
Please sandbox text my recommendation first. I'm really more of a technical data analyst than a developer!
This document seems to suggest that a .htaccess file will properly swallow "" as the escape character:
https://premium.wpmudev.org/forums/topic/htaccess-redirects-with-special-characters
Hope this helps!
-
Hi,
Have you also excluded these pages from the robots.txt file so you can make sure that they're also not being crawled?
The code for the redirect looks something like this:RewriteEngine on
RewriteRule ^/mono* - [G,NC]Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Using NoIndex Tag instead of 410 Gone Code on Discontinued products?
Hello everyone, I am very new to SEO and I wanted to get some input & second opinions on a workaround I am planning to implement on our Shopify store. Any suggestions, thoughts, or insight you have are welcome & appreciated! For those who aren't aware, Shopify as a platform doesn't allow us to send a 410 Gone Code/Error under any circumstance. When you delete or archive a product/page, it becomes unavailable on the storefront. Unfortunately, the only thing Shopify natively allows me to do is set up a 301 redirect. So when we are forced to discontinue a product, customers currently get a 404 error when trying to go to that old URL. My planned workaround is to automatically detect when a product has been discontinued and add the NoIndex meta tag to the product page. The product page will stay up but be unavailable for purchase. I am also adjusting the LD+JSON to list the products availability as Discontinued instead of InStock/OutOfStock.
Technical SEO | | BakeryTech
Then I let the page sit for a few months so that crawlers have a chance to recrawl and remove the page from their indexes. I think that is how that works?
Once 3 or 6 months have passed, I plan on archiving the product followed by setting up a 301 redirect pointing to our internal search results page. The redirect will send the to search with a query aimed towards similar products. That should prevent people with open tabs, bookmarks and direct links to that page from receiving a 404 error. I do have Google Search Console setup and integrated with our site, but manually telling google to remove a page obviously only impacts their index. Will this work the way I think it will?
Will search engines remove the page from their indexes if I add the NoIndex meta tag after they have already been index?
Is there a better way I should implement this? P.S. For those wondering why I am not disallowing the page URL to the Robots.txt, Shopify won't allow me to call collection or product data from within the template that assembles the Robots.txt. So I can't automatically add product URLs to the list.0 -
In writing the url, it is better to use the language used by the people of my country or English?
We speak Persian and all people search in Persian on Google. But I read in some sources that the url should be in English. Please tell me which language to use for url writing?
Technical SEO | | ghesta
For example, I brought down two models: 1fb0e134-10dc-4737-904f-bfdf07143a98-image.png https://ghesta.ir/blog/how-to-become-rich/
2)https://ghesta.ir/blog/چگونه-پولدار-شویم/0 -
Is 301 redirect the only way when using Vanity URLs?
We have been using vanity urls for some of our pages. Mostly the pages that have a vanity URL have a long URL length. But now the problem is, the vanity URL is getting displayed on the search engine when the particular keyword related to the page is entered. I checked the google search console, the vanity URL is indexed and the original URL remains unindexed. What should I do? Is adding 301 redirect to the vanity URLs are solution? Since some of vanity URLs are not redirecting to the original. Some of the original pages are not getting traffic. Also, can using canonical tag help?
Technical SEO | | tejasbansode0 -
Old URLs Appearing in SERPs
Thirteen months ago we removed a large number of non-corporate URLs from our web server. We created 301 redirects and in some cases, we simply removed the content as there was no place to redirect to. Unfortunately, all these pages still appear in Google's SERPs (not Bings) for both the 301'd pages and the pages we removed without redirecting. When you click on the pages in the SERPs that have been redirected - you do get redirected - so we have ruled out any problems with the 301s. We have already resubmitted our XML sitemap and when we run a crawl using Screaming Frog we do not see any of these old pages being linked to at our domain. We have a few different approaches we're considering to get Google to remove these pages from the SERPs and would welcome your input. Remove the 301 redirect entirely so that visits to those pages return a 404 (much easier) or a 410 (would require some setup/configuration via Wordpress). This of course means that anyone visiting those URLs won't be forwarded along, but Google may not drop those redirects from the SERPs otherwise. Request that Google temporarily block those pages (done via GWMT), which lasts for 90 days. Update robots.txt to block access to the redirecting directories. Thank you. Rosemary One year ago I removed a whole lot of junk that was on my web server but it is still appearing in the SERPs.
Technical SEO | | RosemaryB3 -
Spaces (actual spaces) in URL
Hi all, Is there a huge loss of SEO performance if a URL shows spaces with an actual space (i.e. %20) in the URL rather than a "-" (or indeed a "_")? I know the preferred option is to have a "-", but I am just wondering if it is worth our effort to manually change the "%20" to a "-" in all the instances? Thanks 🙂 Diana
Technical SEO | | Diana.varbanescu0 -
Does anyone use pingler and is it any good
Hi, i have joined pingler and pay per month to use it but i have not seen any difference with traffic or google rankings and i would like to know if anyone else is using the paid version of pingler.com and if they find it a good service
Technical SEO | | ClaireH-1848860 -
Duplicate canonical URLs in WordPress
Hi everyone, I'm driving myself insane trying to figure this one out and am hoping someone has more technical chops than I do. Here's the situation... I'm getting duplicate canonical tags on my pages and posts, one is inside of the WordPress SEO (plugin) commented section, and the other is elsewhere in the header. I am running the latest version of WordPress 3.1.3 and the Genesis framework. After doing some testing and adding the following filters to my functions.php: <code>remove_action('wp_head', 'genesis_canonical'); remove_action('wp_head', 'rel_canonical');</code> ... what I get is this: With the plugin active + NO "remove action" - duplicate canonical tags
Technical SEO | | robertdempsey
With the plugin disabled + NO "remove action" - a single canonical tag
With the plugin disabled + A "remove action" - no canonical tag I have tried using only one of these remove_actions at a time, and then combining them both. Regardless, as long as I have the plugin active I get duplicate canonical tags. Is this a bug in the plugin, perhaps somehow enabling the canonical functionality of WordPress? Thanks for your help everyone. Robert Dempsey0 -
Starting a new product, should we use new domain or subdomain
I'm working with a company that has a high page rank on it's main domain and is looking to launch a new business / product offering. They are evaluating either creating a subdomain or launching a brand new domain. In either case, their current site will link contextually to the new site. Is there one method that would be better for SEO than the other? The new business / product is related to the main offering, but may appeal to different / new customers. The new business / product does need it's own homepage and will have a different conversion funnel than the existing business.
Technical SEO | | gallantc0