Salvaging links from WMT “Crawl Errors” list?
-
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.
But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.
First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:
www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/
How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?
Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:
www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com
Your help is greatly appreciated. Thanks!
Greg
-
Hi Gregory -
Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.
Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.
Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.
Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.
Hope that helps!
-
Exactly.
Let's do some cleanup
To redirect everything domain.com/** to www.domain.com you need this:
RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]That's it for the www and non-www redirection.
Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.
You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html
-
Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html
So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...that the first two lines can be removed? So each 301 redirect rules is just one line like this...
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?
If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.
Thanks again!
-
Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:
RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]
Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
-
Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.
But I still would like to know how to solve the questions asked above...
-
Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.
I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.
By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Backlink Profile: Should I disavow these links? Auto-Generated Links etc
Hello Moz Community, At first I wanted to say that I really like the Q&A section and that I read and learned a lot - and today it is time for my first own question 😉 I checked our backlink-profile these days and I found in my opinion a few bad/spammy links, most of them are auto-generated by pickung up some (meta) information from our webpage. Now my question is if I should dasavow these links over webmasters or if these links shouldn't matter as I guess basically every webpage will be picked up from them. Especially from the perspective that our rankings dropped significantly last weeks, but I am not sure if this can be the real reason. Examples are pages like: https://www.askives.com/ -Auto-Generates for example meta descriptions with links http://www.websitesalike.com/ -find similar websites http://mashrom.ir/ -no idea about this, really crazy Or we are at http://www.europages.com/, which makes sense for me and we get some referral traffic as well, but they auto-generated links from all their TLDs like .gr / .it / .cn etc. -just disavow all other TLDs than .com? Another example would be links from OM services like: seoprofiler.com Moreover we have a lot of links from different HR portals (including really many outdated job postings). Can these links “hurt” as well? Thanks a lot for your help! Greez Heiko
Technical SEO | | _Heiko_0 -
Absurdly High Crawl Stats
Over the past month and a half, our crawl stats have been rising violently. A few weeks ago, our crawl stats rose, such that the pages crawled per day worked out to the entire site being crawled 6 times a day, with a corresponding rise in KB downloaded per day. Last week, the crawl rate jumped again, such that the site is being crawled roughly 30x a day. I'm not seeing any chatter at there about an algorithm change, and I've checked and double-checked the site for signs of duplicate content, changes in our backlink profile, or anything else. We haven't seen appreciable changes in our search volume, either impressions or clicks. Any ideas what could be going on?
Technical SEO | | Tyler-Brown0 -
Transferring link juice on a page with over 150 links
I'm building a resource section that will probably, hopefully, attract a lot of external links but the problem here is that on the main index page there will be a big number of links (around 150 internal links - 120 links pointing to resource sub-pages and 30 being the site's navigational links), so it will dilute the passed link juice and possibly waste some of it. Those 120 sub-pages will contain about 50-100 external links and 30 internal navigational links. In order to better visualise the matter think of this resource as a collection of hundreds of blogs categorised by domain on the index page (those 120 sub-pages). Those 120 sub-pages will contain 50-100 external links The question here is how to build the primary page (the one with 150 links) so it will pass the most link juice to the site or do you think this is OK and I shouldn't be worried about it (I know there used to be a roughly 100 links per page limit)? Any ideas? Many thanks
Technical SEO | | flo20 -
Assessing Link Profiles
Hi Guys, When doing a link cleanup, it can be sometimes hard to tell, how a link got there (i.e is it natural or not). Apart from spammy directories, blog comments and forum profiles, some link exchanges could have been done naturally with just very good outreach. If you were looking at this one:- http://5startemplates.com/communications_links(4).html Would you say remove if I know they have definitely taken part in link exchanges (their link profile seems to suggest they have) or just change it to a brand/url. This sites rankings have been tanking due to duplicate content and possibly (although not definitely) a penguin update too. Any advice would be great! Kind Regards Neil
Technical SEO | | nezona0 -
Broken link
I know SEO Moz has a lot of info about 404 301 302 etc but I am trying to figure out easy way to fix two of the broken links from flash. I am redirecting following links with wordpress redirect plug in http://soobumimphotography.com/gallery.php?GalleryID=126&GalleryName=Wedding&OrderNum=1 http://soobumimphotography.com/gallery.php?GalleryID=126&GalleryName=Wedding&OrderNum=1 What would be the best way to solve this? Is there anyway I can remove those?
Technical SEO | | BistosAmerica0 -
How is my competition causing bad crawl errors and links on my site
We have a compeditor who we are in a legal dispute at the moment, and they are using under hand tactics to cause us to have bad links and crawl errors and i do not know how they are doing it or how to stop it. The crawl errors we are getting is the site having two urls together, for example www.testsite.com/www.testsite.com and other errors are pages that we do not even have or pages that are spelt wrong or have a dot after the page name. We have been told off a number of people in our field that this has also happened to them and i would like to know how they are doing it so we can have this stopped Since they have been doing this our traffic has gone down by half
Technical SEO | | ClaireH-1848860 -
Link Share Matrix
Our developers have requested our "Link Share Matrix" - does anyone know what a Link Share Matrix is? Google and the Wiki haven't provided any decent results so I still don't know exactly what it is. Thanks in advance 🙂
Technical SEO | | Seaward-Group0 -
SEO LINKS
New to S.E.O. so excuse my naivety. I have made lots of new links some of them paid for e.g. Best of the Web but I don’t see any change in the latest competitive link analysis. Some of the links we have been accepted for just do not show. Also the keywords we are trying to promote the most have disappeared off the radar for over 2 weeks now. I think we have followed the optimization suggestions correctly. Please could you enlighten me. Regards Paul www.curtainpolesemporium.co.uk
Technical SEO | | CPE0