Welcome to the Q&A Forum

effectdigital

Under certain circumstances it can harm the rankings and SEO performance of your main site. In terms of low-end technical SEO issues like broken links and stuff, that usually wouldn't be a problem (unless it were creating loads of broken links from the sub-site, to the main site). If you had a more substantial issue like an active penalty (manual GSC notification) or a malware threat (or hacked content) on your subdomain, that could assuredly harm the main site's rankings. Except in extreme circumstances such as these - I wouldn't worry about it too much

effectdigital

In response to your second question, it's fine to have /usa/ although /us/ or /en/ would be a more typical deployment (lots of people go like, /en-us/ and /en-gb/ as that structure allows for really granular international deployment!)

As long as the hreflangs are accurate and tell Google what language and region the URLs are for, as long as the hreflangs are deployed symmetrically with no conflicts or missing parts - it should be ok

Note that Google will expect to see different content on different regional URLs, sometimes even if they're the same language but targeted at different countries (tailor your content to your audience, don't just cut and paste sites and change tags and expect extra footprint). Stuff like shipping info and prices (currency shown) should also be different (otherwise don't even bother!)

Your hreflangs, if you are doing USA as your EN country, should not use 'en-gb' in the hreflang (instead they should use 'en-us')

If you're thing God the HTML implementation will make the code bloated and messy, read this:

https://support.google.com/webmasters/answer/189077?hl=en

There are also HTTP header and XML sitemap deployment options (though IMO, HTML is always best and is the hardest, strongest signal)

effectdigital

You have taken some good measures there, but it does take Google time to revisit URLs and re-index them (or remove them from the index!)

Did you know, 404 just means a URL was temporarily removed and will be coming back? The status code you are looking to serve is 410 (gone) which is a harder signal

Robots.txt (for Google) does in-fact support wild cards. It's not full regex, in-fact the only wildcard supported is "*" (asterisk: matching any character or string of characters). You could supplement with a rule like this:

User-agent: *

Disallow: /*revslider*

That should, theoretically block any URL from indexation if it contains the string "revslider"
Be sure to **validate** any new robots.txt rules using Google Search Console to check they are working right!

Remember that robots.txt affects crawling and **not indexation!** To give Google a directive not to index a URL, you should use the Meta no-index tag:
[https://support.google.com/webmasters/answer/93710?hl=en](https://support.google.com/webmasters/answer/93710?hl=en)

**The steps are:**

Remove your existing robots.txt rule (which would stop Google crawling the URL and thus stop them seeing a Meta no-index tag or any change in status code)
Apply status 410 to those pages instead of 404
Apply Meta no-index tags to the 410'ing URLs
Wait for Google to digest and remove the pages from its index
Put your robots.txt rule back to prevent it ever happening again
Supplement with an additional wildcard rule
Done!

-

Hope that helps

effectdigital

As Joe said canonical is fine. No need to play with all the other tags, leave them alone!

effectdigital

Hmm, that's interesting - it should work just as you say! This is the point where you need a developer's help rather than an SEO analysts :') sorry!

Google will revisit 410s if it believes there is a legitimate reason to do so, but it's much less likely to revisit them than it is with 404s (which actively tell Google that the content will return).

Plugins are your friends. Too many will overload a site and make it run pretty slowly (especially as PHP has no multi-threading support!) - but this plugin, you would only need it temporarily anyway.

You might have to start using something like PHPMyAdmin to browse your SQL databases. It's possible that the uninstall didn't work properly and there are still databases at work, generating fresh URLs. You can quash them at the database level if required, however I'd say go to a web developer as manual DB edits can be pretty hazardous to a non-expert

effectdigital

Just seems overly complex and like there's way more in there than there needs to be

I'd go with something that 'just' does what you have stated that you want to achieve, and nothing else

User-Agent: *

Disallow: /wp-content/plugins/

Disallow: /comments

Disallow: /*?s=

Disallow: /*&s=

Disallow: /search

See if that helps

effectdigital

All of the plugins I can find allow the tag to be deployed on pages, posts etc. You pick from a pre-defined list of existing content, instead of just whacking in a URL and having it inserted (annoying!)

If you put an index.php at that location (the location of the 404), you could put whatever you wanted in it. Might work (maybe test with one). Would resolve a 200 so you'd then need to force a 410 over the top. Not very scalable though...

effectdigital

There are so many ways to deal with this. If these were indeed spam URLs, someone may have attached negative-SEO links to them (to water down your site's ranking power). As such, redirecting these URLs back to their parents could pull spam metrics 'onto' your site which would be really bad. I can see why you are thinking about using 410 (gone)

Using Canonical tags to stop Google from indexing those bad parameter-based URLs could also be helpful. If you 'canonicalled' those addresses to their non-parameter based parents, Google would stop crawling those pages. When a URL 'canonicals' to another, different page - it cites itself as non-canonical, and thus gets de-indexed (usually, although this is only a directive). Again though, canonical tags interrelate pages. If those spam URLs were backed by negative SEO attacks, the usage of canonical tags would (again) be highly inadvisable (leaving your 410 suggestion as a better method).

Google listens for wildcard rules in your robots.txt file, though it runs very simplified regex (in fact I think only the "*" wildcard is supported). In your robots.txt you could do something like:

User-agent: *
Disallow: /mono.php?*

That would cull Google's crawling of most of those URLs, but not necessarily the indexation. This would be something to do after Google has swallowed most of the 410s and 'got the message'. You shouldn't start out with this, as if Google can't crawl those URLs - it won't see your 410s! Just remember this, so that when the issue is resolved you can smack this down and stop the attack from occurring again (or at least, it will be preemptively nullified)

Finally you have Meta "No-Index" tags. They don't stop Google from crawling a URL, but they will remove those URLs from Google's index. If you can serve the 410s on a custom 410 page which also gives the Meta no-index directive, that will be a very strong signal to Google indeed that those aren't proper pages or fit for indexation

So now we have a bit of an action plan:

410 the bad URLs alongside a Meta no-index directive served from the same URL
Once Google has swallowed all that (may be some weeks or just over 1 month), back-plate it with robots.txt wildcards

With regards to your oriignal question (sorry I took so long to get here) I'd use something like:

Redirect 410 /mono.php?*

I think .htaccess swallows proper regex (I think). The back slashes say "whatever character follows me, treat that character as a value and do not apply its general regex function". It's the regex escape character (usually). This would go in the .htaccess file at the root of your site, not in a subdir .htaccess file

Please sandbox text my recommendation first. I'm really more of a technical data analyst than a developer!

This document seems to suggest that a .htaccess file will properly swallow "" as the escape character:

https://premium.wpmudev.org/forums/topic/htaccess-redirects-with-special-characters

Hope this helps!

effectdigital

Your problem is that you have two different sites loading on the same URL. If you are returning both the mobile and desktop / laptop site on the same URL, you would be expected to be using responsive design. In-fact, you may have re-invented another different way to implement responsive design which is probably, slightly less fluid yet slightly more efficient :')

Since your mobile and desktop pages both reside on exactly the same URL, I'd test the page(s) with this tool (the mobile friendly tool) and this tool (the page-speed insights tool). If Google correctly views your site as mobile friendly, and if within PageSpeed insights Google is correctly differentiating between the mobile and desktop site versions (check the mobile and desktop tabs) then both URLs should canonical to themselves (self referencing canonical) and no alternate tag should be used or deployed. Google will misread the alternate tag, which points to itself - as an error. That tag is to be used when your separate mobile site (page) exists on a separate URL, like an 'm.' subdomain or something like that

Imagine you are Googlebot. You are crawling in desktop mode, load the desktop URL version and find that the page says, it (itself) is also the mobile page. You'd get really confused

Check to see whether your implementation is even supported by Google using the tools I linked you to. If it is, then just use self referencing canonical tags and do not deploy alternate tags (which would make no sense, since both versions of the site are on the same URL). When people build responsive sites (same source code on the same URL, but it's adaptive CSS which re-organises the contents of the page based upon viewport widths) - they don't use alternate tags, only canonicals

Since your situation is more similar to responsive design (from a crawling perspective) than it is to separate mobile site design, drop the alt

effectdigital

The self referencing canonical advice was solid and I 100% agree with it. The rel=alternate advice, I felt would cause problems (IMO). But as we all know, fiddly issues like this are highly subjective

effectdigital

The authority has probably decayed, I think it's more a case of starting over and rebuilding the authority - rather than waiting and hoping for the best. I know, it sucks when you have shelled out on a domain. But in my experience domain purchasing is really hit and miss. If you don't see an immediate difference, often you don't see one at all. Maybe others have different POVs though

effectdigital

I think it is an issue because, people browsing your site in other languages will have the wrong language title displayed in their browser tabs if they are multi-tab browsing! The title tag is still one of the important ones for SEO, nothing has really come along to replace it

A businesses' ambitions in terms of an international roll-out, are to break into new (foreign) international query-spaces and get extra traffic (especially from Google, or leading search engines in other nations like Yandex and Baidu). Google's ambitions (when adding your international pages to their index) are that their audience can break onto other areas of the web which (due to the language barrier) were previously closed to them. But they want your content to then be 'tailored' to their international audiences, traffic which Google has no obligation to send your way. Google wants good UX for their searchers, so that Google remains top-dog in the search world

The less tailored your international roll-out is, the more shallow it is (with more pieces missing), the less confident Google will be. They will be less confident that sending their users to you will result in positive search-sentiment

Every piece of the jigsaw which you are missing, counts against you. It makes your international roll-out look more like a quick Google-translate powered land-grab, and less like an authentic international roll-out

My question to you is, when you identify a bad signal - why carry on sending it to Google?

Search is a competitive environment. If there are thing you won't do, others will

effectdigital

Yes that can hurt Google rankings. Insecure pages tend to rank less well and over time, that trend is only set to increase (with Google becoming less and less accepting of insecure pages, eventually they will probably be labelled a 'bad neighborhood' like gambling and porn sites). Additionally, URLs which link out to insecure pages (which are not on HTTPS) can also see adverse ranking effects (as Google knows that those pages are likely to direct users to insecure areas of the web)

At the moment, you can probably get by with some concessions. Those concessions would be, accepting that the insecure URLs probably won't rank very well compared with pages offering the same entertainment / functionality, which have fully embraced secure browsing (which are on HTTPS, which are still responsive, which don't link to insecure addresses)

If you're confident that the functionality you are offering, fundamentally can't be offered through HTTPS - then that may be only a minor concern (as all your competitors are bound by the same restrictions). If you're wrong, though - you're gonna have a bad time. Being 'wrong' now, may be more appealing than being 'dead wrong' later

Google will not remove the warnings your pages have, unless you play ball. If you think that won't bother your users, or that your competition is fundamentally incapable of a better, more secure integration - fair enough. Google is set to take more and more action on this over time

P.S: if your main, ranking pages are secure and if they don't directly link to this small subset of insecure pages, then you'll probably be ok (at least in the short term)

effectdigital

If you had two different source codes served via user-agent (web-user vs googlebot) then you'd be more at risk of this. I can't categorically state that there is no risk in what you are doing, as Google operates multiple mathematical algorithms to determine when 'cloaked' content is being used - and guess what? Sometimes they go wrong

That being said, I don't believe your risk of garnering a penalty is particularly high with this type of thing

These are the guidelines:

https://support.google.com/webmasters/answer/66355?hl=en

You're in a really gray area because, you aren't serving different URLs - but you _could _be serving different content (albeit only slightly). I say 'could' rather than 'are' as it entirely depends upon whether Google (on any particular crawl) decides to enable rendered crawling or not

If Google uses rendered crawling, and they take the content from their headless-browser page-render (which they can do, but don't always choose to as it's a more intensive crawling technique) then your content is actually the same for users and search engines. If however they just do a base-source scrape (which they also do frequently) and they take the content from the source code (which doesn't contain the visual cut-off) then you are serving different content to users and search engines

Because you've got right down into a granular area where the rules may or may not apply conditionally, I wouldn't think the risk was very high. If you ever get any problems, your main roadblock will be explaining the detail of the problem on Google's Webmaster Forums here. Support can be very hit and miss

effectdigital

You should have used 301 redirects which infer a 'permanent' move from one place to another. Google doesn't send link juice through 301 redirects because that's what the SEO industry says they should do, it's the other way around. Status code 301 infers that the contents of a web page have permanently moved from one URL to another, thus is 'may' be fair to shift all (or a portion) of SEO (ranking) equity from one address to another

Note that even if you do the right thing at the right time, it won't always work. If your redesign heavily removes content (which was previously perceived as useful) from a web page, don't expect the 301 redirect to carry 'all' the link juice from one page to another. Had this recently with a client who decided to streamline some of their more in-depth articles as part of a site redesign and move to HTTP (simultaneously). They did correctly use 301 redirects (A to B, nothing in the middle) and they did point all the posts from the old HTTP URLs to the HTTPS URLs on the new site (same domain, but again - protocol altered and change of design)

Because the posts contained quite radically different (stripped down) content on the new site, the 301 redirects only seemed to pass across between 25% and 33% of the ranking equity. They did everything right, but if you're telling Google that content has moved from one URL to another, you had better actually move the content (lies don't work)

If you take into account that, even doing most things correctly you can cause some major issues, if you use the wrong response code then obviously you greatly increase the risk of losing all (or much of) your ranking power

I'm going to say this now, one year is probably way too late to get back to where you were just by changing some redirects. If that's your expectation, check yourself before you wreck yourself. Redirects (of any kind) slowly decay over time and most people think that a lot of the equity transfer has occurred by six months, let alone twelve. If you transferred your ranking equity into the void of cyberspace... well, it's probably 'mostly' gone by now. I'd still recommend converting the redirects as it really is your only option other than building your ranking equity over from scratch

**Let's get onto, why what you did was wrong **(why is important!)

So to you, a '303' is a type of redirect. But in its wider context, it's actually a 'status code'. Not all status codes result in a redirect and they all mean completely different things. They basically tell a client or a web-browser, which makes a request (that results in some kind of error), what the best way to proceed is. Some just send information back, others perform more concrete actions like the 3XX codes (redirect codes)

One common thing we get on here is, people saying: "I want to de-index some pages from Google, but I can't get Meta no-index into the source code, what can I do?" - very often I look at those questions and find, the pages which they want de-indexed are sending status code 404. Status code (error) 404 simply means "this resource or page isn't available temporarily, but keep tabs on it because it's only temporary and it will be back". So quite often I suggest to them, well you can deploy no-index in the HTTP header via X-robots, but also why don't you change the status code from 404 to 410? Status code 410 roughly means "gone, not coming back so don't bother coming back"

You did use a redirect code, but you used the wrong one which had the wrong meaning:

https://en.wikipedia.org/wiki/HTTP_303

So what does status code 303 mean?

I cite from Wikipedia:

"The HTTP response status code 303 See Other is a way to redirect web applications to a new URI, particularly after a HTTP POST has been performed, since RFC 2616 (HTTP 1.1).

According to RFC 7231, which obsoletes RFC 2616, "A 303 response to a GET request indicates that the origin server does not have a representation of the target resource that can be transferred by the server over HTTP. However, the Location field value refers to a resource that is descriptive of the target resource, such that making a retrieval request on that other resource might result in a representation that is useful to recipients without implying that it represents the original target resource."

So in English a 303 translates roughly to:

"Hey web user. I can't give you the page you are requesting because it's gone, and I can't redirect you to that same content on another URL because guess what? It wasn't moved to another URL. That being said, I think this page I am going to send you to, is at least partially relevant. I'll send you there - ok?"

But you're only stating that the resource is partially equivalent, so you can only expect fractional (at best) equity transfer from one URL to the other

Using a 301 tells Google: "this exact page has moved to this other exact page and it's likely to be 75% the same or higher overall. Ok so maybe we changed how the nav menu looks an moved to HTTPS, but the written content and images and stuff that was unique to this page to begin with - that should basically be all the same. As such, you don't need to re-evaluate the ranking potential of this page"

... of course, Google still will (in many instances) re-evaluate the page against the query, which is why (although loads of people say they do) - 301 redirects don't always transfer 100% of your SEO equity. If the content is adjusted too much, even 301s don't save you and it's time to build up again from ground zero

As stated redirects decay over time as the SEO equity moves from one place to another. In your case you have asked Google to move one portion of the equity from one URL to another (which they may or may not have, depending on content alterations) and also to delete the remaining portion of your ranking power. If that movement is now complete, then gains from fixing the redirects won't be all you are hoping and dreaming of

It will help. Be sure that you do it, because it's a seconds to minutes change in your .htaccess file or web.config file. It's not hard, it's very simple and you could luck out. But with a whole year behind you... the odds aren't fantastic. Still it's some 'free' equity that you can get back, which you won't have to re-earn (so take it). But it won't be all-encompassing (sorry)

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

effectdigital

@effectdigital

Best posts made by effectdigital

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved