Salvaging links from WMT “Crawl Errors” list?
-
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.
But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.
First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:
www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/
How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?
Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:
www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com
Your help is greatly appreciated. Thanks!
Greg
-
Hi Gregory -
Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.
Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.
Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.
Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.
Hope that helps!
-
Exactly.
Let's do some cleanup
To redirect everything domain.com/** to www.domain.com you need this:
RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]That's it for the www and non-www redirection.
Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.
You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html
-
Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html
So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...that the first two lines can be removed? So each 301 redirect rules is just one line like this...
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?
If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.
Thanks again!
-
Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:
RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]
Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
-
Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.
But I still would like to know how to solve the questions asked above...
-
Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.
I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.
By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website crawl error
Hi all, When I try to crawl a website, I got next error message: "java.lang.IllegalArgumentException: Illegal cookie name" For the moment, I found next explanation: The errors indicate that one of the web servers within the same cookie domain as the server is setting a cookie for your domain with the name "path", as well as another cookie with the name "domain" Does anyone has experience with this problem, knows what it means and knows how to solve it? Thanks in advance! Jens
Technical SEO | | WeAreDigital_BE0 -
Www vs non www - Crawl Error 902
I have just taken over admin of my company website and I have been confronted with crawl error 902 on the existing campaign that has been running for years in Moz. This seems like an intermittent problem. I have searched and tried to go over many of the other solutions and non of them seem to help. The campaign is currently set-up with the url http://companywebsite.co.uk when I tried to do a Moz manual crawl using this URL I got an error message. I changed the link to crawl to http://www.companywebsite.co.uk and the crawl went off without a hitch and im currently waiting on the results. From testing I now know that if i go to the non-www version of my companies website then nothing happens it never loads. But if I go to the www version then it loads right away. I know for SEO you only want 1 of these URLS so you dont have duplicate content. But i thought the non-www should redirect to the www version. Not just be completely missing. I tried to set-up a new campaign with the defaults URL being the www version but Moz automatically changed it to the non-www version. It seems a cannot set up a new campaign with it automatically crawling the www version. Does it sound like im out the right path to finding this cause? Or can somebody else offer up a solution? Many thanks,
Technical SEO | | ATP
Ben .0 -
Pages appear fine in browser but 404 error when crawled?
I am working on an eCommerce website that has been written in WordPress with the shop pages in E commerce Plus PHP v6.2.7. All the shop product pages appear to work fine in a browser but 404 errors are returned when the pages are crawled. WMT also returns a 404 error when ‘fetch as Google’ is used. Here is a typical page: http://www.flyingjacket.com/proddetail.php?prod=Hepburn-Jacket Why is this page returning a 404 error when crawled? Please help?
Technical SEO | | Web-Incite0 -
Website not crawled
i added website www.nsale.in in add campaign, it shows only 1 page crawled. but its working fine for other sites, any idea why it failed ?
Technical SEO | | Dhinesh0 -
Too many links?
Hello! I've just started with SEOmoz, and am getting an error about too many links on a few of my blog posts - it's pages with high numbers of comments, and the links are coming from each commenter's profile (hopefully that makes sense they're not just random stuffed links). Is there a way to help this not cause a problem? Thanks!
Technical SEO | | PaulineMagnusson0 -
Wrong listing in SERP
For one of the keyword of my website wrong URL appearing in SERP.I wanted to remove that listing in SERPs.How can i remove that listing in SERP?
Technical SEO | | Alick3000 -
Domain Crawl Question
We have our domain hosted by two providers - web.com for the root and godaddy for the subdomain. Why SEOMOZ is not picking up the total pages of the entire domain?
Technical SEO | | AppleCapitalGroup0 -
OSE Link Differential
I have the chrome toolbar installed. In the SERP a site I was looking at had 686 links from 12 domains linking to the root domain. When I checked this site in OSE with filters set to all pages in root domain it shows 65 links from 12 domains. Can anyone explain the difference?
Technical SEO | | waynekolenchuk0