Using 410 To Remove URLs Starting With Same Word
-
We had a spam injection a few months ago. We successfully cleaned up the site and resubmitted to google. I recently received a notification showing a spike in 404 errors.
All of the URLS have a common word at the beginning injected via the spam:
sitename.com/mono
sitename.com/mono.php?buy-good-essays
sitename.com/mono.php?professional-paper-writerThere's about 100 total URLS with the same syntax with the word "mono" in them. Based on my research, it seems that it would be best to serve a 410. I wanted to know what the line of HTACCESS code would be to do that in bulk for any URL that has the word "mono" after the sitename.com/
-
Martijn -
Thanks for your reply. I tried the code you provided, however it still provided a 404 error. I was able to get the following to work properly - any drawbacks to doing it this way?
RewriteRule ^mono(.*)$ - [NC,R=410,L]
The browser now shows the following anytime there is the word "mono" immediately after "sitename.com/"
The requested resource
/mono.php
is no longer available on this server and there is no forwarding address. Please remove all references to this resource.Additionally, a 410 Gone error was encountered while trying to use an ErrorDocument to handle the request.
-
Thanks for the detailed response. Yes, there are some negative-SEO backlinks to some of the URLs created during the spam injection. I've seen a few backlinks from other forum sites to our site to one of the spam created URLs which has hurt our rankings such as the following URL created on our site:
sitename.com/mono.php?best-resume-writing-service-for-it-professionals
I was confused by the following in your response: "If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation"
- Is that all done view the htaccess file? Code? Or is the meta no-index directive done in the robots.txt?- custom 410 page? I've seen some 404 pages, but not custom 410 pages. Would that be similar to a new 404 page?
Thanks for your response.
-
There are so many ways to deal with this. If these were indeed spam URLs, someone may have attached negative-SEO links to them (to water down your site's ranking power). As such, redirecting these URLs back to their parents could pull spam metrics 'onto' your site which would be really bad. I can see why you are thinking about using 410 (gone)
Using Canonical tags to stop Google from indexing those bad parameter-based URLs could also be helpful. If you 'canonicalled' those addresses to their non-parameter based parents, Google would stop crawling those pages. When a URL 'canonicals' to another, different page - it cites itself as non-canonical, and thus gets de-indexed (usually, although this is only a directive). Again though, canonical tags interrelate pages. If those spam URLs were backed by negative SEO attacks, the usage of canonical tags would (again) be highly inadvisable (leaving your 410 suggestion as a better method).
Google listens for wildcard rules in your robots.txt file, though it runs very simplified regex (in fact I think only the "*" wildcard is supported). In your robots.txt you could do something like:
User-agent: *
Disallow: /mono.php?*That would cull Google's crawling of most of those URLs, but not necessarily the indexation. This would be something to do after Google has swallowed most of the 410s and 'got the message'. You shouldn't start out with this, as if Google can't crawl those URLs - it won't see your 410s! Just remember this, so that when the issue is resolved you can smack this down and stop the attack from occurring again (or at least, it will be preemptively nullified)
Finally you have Meta "No-Index" tags. They don't stop Google from crawling a URL, but they will remove those URLs from Google's index. If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation
So now we have a bit of an action plan:
- 410 the bad URLs alongside a Meta no-index directive served from the same URL
- Once Google has swallowed all that (may be some weeks or just over 1 month), back-plate it with robots.txt wildcards
With regards to your oriignal question (sorry I took so long to get here) I'd use something like:
Redirect 410 /mono.php?*
I think .htaccess swallows proper regex (I think). The back slashes say "whatever character follows me, treat that character as a value and do not apply its general regex function". It's the regex escape character (usually). This would go in the .htaccess file at the root of your site, not in a subdir .htaccess file
Please sandbox text my recommendation first. I'm really more of a technical data analyst than a developer!
This document seems to suggest that a .htaccess file will properly swallow "" as the escape character:
https://premium.wpmudev.org/forums/topic/htaccess-redirects-with-special-characters
Hope this helps!
-
Hi,
Have you also excluded these pages from the robots.txt file so you can make sure that they're also not being crawled?
The code for the redirect looks something like this:RewriteEngine on
RewriteRule ^/mono* - [G,NC]Martijn.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you use a seperate url for a interior product page on a site?
I have a friend that has a health insurance agency site. He wants to add a new page, for child health care insurance to his existing site. But the issue is, he brought a new URL; insurancemykidnow.com and he want's to use it for the new page. Now, I'm not sure I'm right on this, but I don't think that can be done? I'm I wrong? = Thanks in advance.
Technical SEO | | Coppell0 -
URL has caps, but canonical does not. Now what?
Hi, Just started working with a site that has the occasional url with a capital, but then the url in the canonical as lower case. Neither, when entered in a browser, resolves to the other. It's a Shopify site. What do you think I should do?
Technical SEO | | 945010 -
Remove Directory
www.enakliyat.com.tr/detaylar/page-6255 www.enakliyat.com.tr/detaylar/page-6240 www.enakliyat.com.tr/detaylar/page-6253 www.enakliyat.com.tr/detaylar/page-6255 ..... ..... ..... We have so many page like this page and we want to remove all this page from Google search results. How can I remove from Google Webmaster Tools Help!
Technical SEO | | iskq0 -
Updating content on URL or new URL
High Mozzers, We are an event organisation. Every year we produce like 350 events. All the events are on our website. A lot of these events are held every year. So i have an URL like www.domainname.nl/eventname So what would you do. This URL has some inbound links, some social mentions and so on. SO if the event will be held again in 2013. Would it be better to update the content on this URL or create a new one. I would keep this URL and update it because of the linkvalue and it is allready indexed and ranking for the desired keyword for that event. Cheers, Ruud
Technical SEO | | RuudHeijnen0 -
How to keep a URL social equity during a URL structure/name change?
We are in the process of making significant URL name/structure change to one of our property and we want to keep the social equity (likes, share, +1, tweets) from the old to the new URL. We have been trying many different option without success. We are running our social "button" in an iframe. Thanks
Technical SEO | | OlivierChateau0 -
Multiple URLs
I'm trying to check the URLs of this site- http://www.ofo.com.au, and I see that their old site has 301 re-directed to it...but the site http://ofo.com.au and http://outdoorfurnitureoutlet.com.au are both still up and I can't see any 301 redirects from them. Is it a problem even if when I do a site: search for them I get no results?
Technical SEO | | UnaRealidad0 -
Removing some of the indexed pages from my website
I am planning to remove some of the webpages from my website and these webpages are already indexed with search engine. Is there any way by which I need to inform search engine that these pages are no more available.
Technical SEO | | ArtiKalra0 -
Hyphen in URL
Hi, I would like to know if the following statement holds true today or it doesn't matter whether we use hyphens or underscore If you have a URL like keyword1_keyword2, Google will only return that page if the user searches for keyword1_keyword2 ( highly unlikely ) . But If you have a URL like keyword1-keyword2, that page can be returned for the searches - keyword1,keyword2 and even “keyword1keyword2” Thanks
Technical SEO | | seoug_20050