Salvaging links from WMT “Crawl Errors” list?
-
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.
But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.
First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:
www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/
How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?
Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:
www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com
Your help is greatly appreciated. Thanks!
Greg
-
Hi Gregory -
Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.
Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.
Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.
Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.
Hope that helps!
-
Exactly.
Let's do some cleanup
To redirect everything domain.com/** to www.domain.com you need this:
RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]That's it for the www and non-www redirection.
Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.
You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html
-
Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html
So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...that the first two lines can be removed? So each 301 redirect rules is just one line like this...
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?
If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.
Thanks again!
-
Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:
RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]
Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
-
Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.
But I still would like to know how to solve the questions asked above...
-
Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.
I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.
By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Internal link is creating duplicate content issues and generating 404s from website crawl.
Not sure what the best way to describe it but the site is built with Elementor page builder. We are finding out that a feature that is included with a pop modal window renders an HTML code as so: Click So when crawled I think the crawling is linking itself for some reason so the crawl returns something like this: xyz.com/builder/listing/ - what we want what we don't want xyz.com/builder/listing/ xyz.com/builder/listing/%23elementor-action%3Aaction%3Dpopup%3Aopen%26settings%3DeyJpZCI6Ijc2MCIsInRvZ2dsZSI6ZmFsc2V9/ xyz.com/builder/listing/%23elementor-action%3Aaction%3Dpopup%3Aopen%26settings%3DeyJpZCI6Ijc2MCIsInRvZ2dsZSI6ZmFsc2V9//%23elementor-action%3Aaction%3Dpopup%3Aopen%26settings%3DeyJpZCI6Ijc2MCIsInRvZ2dsZSI6ZmFsc2V9/ so you'll notice how that string in the HREF is appended each time and it loops a couple times. Could I 301 this issue, what's the best way to go about handling something like this? It's causing duplicate meta descriptions/content errors for some listing pages we have. I did add a rel='nofollow' to the anchor tag with JavaScript but not sure if that'll help.
Technical SEO | | JoseG-LP0 -
Worried About Broken Links
In Wordpress, I'm using a plugin called Broken Link Checker to check for broken links. Should I be worried about/spend time fixing outbound links that result in: 403 Forbidden -Server Not Found -Timeout -500 Internal Server Error -etc. Thanks for your help! Mike
Technical SEO | | naturalsociety3 -
Sitemap error in Webmaster tools - 409 error (conflict)
Hey guys, I'm getting this weird error when I submit my sitemap to Google. It says I'm getting a 409 error in my post-sitemap.xml file (https://cleargear.com/post-sitemap.xml). But when I check it, it looks totally fine. I am using YoastSEO to generate the sitemap.xml file. Has anyone else experienced this? Is this a big deal? If so, Does anyone know how to fix? Thanks EwTswL4
Technical SEO | | Extima-Christian0 -
500 - server error
Hi All, A site crawl reveals several server errors (status code 500) about a clients wordpress website. My question: what are the most common causes for server errors and what advice can I give about how to fix them? Thanks in advance,
Technical SEO | | WeAreDigital_BE
Jens0 -
Crawl Diagnostics - How to find where broken links are located?
Hi, One of my sites has a 4xx error that has been picked up in the crawl diagnostics section. It is a broken link. Does anybody know if it is possible for me to find out which page the broken link was found on? I have checked all of the pages on the site that I thought were linking to the page that seems to have a problem but all of these links are fine / not broken. Any ideas? Thanks
Technical SEO | | CherryK0 -
WIki Contextual Links
I want to understand what are Wiki Contextual Links and how are they helpful for SEO. I hear google likes them. Is that true?
Technical SEO | | KS__0 -
How to remove the 4XX Client error,Too many links in a single page Warning and Cannonical Notices.
Firstly,I am getting around 12 Errors in the category 4xx Client error. The description says that this is either bad or a broken link.How can I repair this ? Secondly, I am getting lots of warnings related to too many page links of a single page.I want to know how to tackle this ? Finally, I don't understand the basics of Cannonical notices.I have around 12 notices of this kind which I want to remove too. Please help me out in this regard. Thank you beforehand. Amit Ganguly http://aamthoughts.blogspot.com - Sustainable Sphere
Technical SEO | | amit.ganguly0 -
Why would you remove a canonical link?
Currently, my client's blog makes a duplicate page every time someone comments on a post. The previous SEO consultant told the developer to not put a canonical link directing it to the main blog post. Did taking out the canonical link result in these duplicate pages? My question is why would she recommend this action? Is it best to now add in the canonical link in or should we implement a 301 redirect or insert a index: no follow? Would adding a canonical link keep duplicate pages from happening in the future?
Technical SEO | | Scratch_MM0