Salvaging links from WMT “Crawl Errors” list?
-
When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.
But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.
First, let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:
www.mydomain.com/pagename1.htmlsale www.mydomain.com/pagename2.htmlhttp:// www.mydomain.com/pagename3.html" www.mydomain.com/pagename4.html/
How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?
Second, is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:
www.webutation.net titlesaurus.com www.webstatsdomain.com www.ericksontribune.com www.addondashboard.com search.wiki.gov.cn www.mixeet.com dinasdesignsgraphics.com
Your help is greatly appreciated. Thanks!
Greg
-
Hi Gregory -
Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.
Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution. First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.
Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.
Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess. The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.
Hope that helps!
-
Exactly.
Let's do some cleanup
To redirect everything domain.com/** to www.domain.com you need this:
RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]That's it for the www and non-www redirection.
Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.
You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html
-
Thank you Federico. I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html
So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...
RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com$
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...that the first two lines can be removed? So each 301 redirect rules is just one line like this...
RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]
...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?
If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.
Thanks again!
-
Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:
RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]
Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:
RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]
-
Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.
But I still would like to know how to solve the questions asked above...
-
Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.
I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.
By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl errors - 2,513 not found. Response code 404
Hi,
Technical SEO | | JamesHancocks1
I've just inherited a website that I'll be looking after. I've looked in the Search Console in the Crawl errors section and discovered thousands of urls that point to non- existent pages on Desktop. There's 1,128 on Smartphone.
Some are odd and make no sense. for example: | bdfqgnnl-z3543-qh-i39634-imbbfuceonkqrihpbptd/ | Not sure why these have are occurring but what's the best way to deal with them to improve our SEO? | northeast/ | 404 | 8/29/18 |
| | 2 | blog/2016/06/27/top-tips-for-getting-started-with-the-new-computing-curriculum/ | 404 | 8/10/18 |
| | 3 | eastmidlands | 404 | 8/21/18 |
| | 4 | eastmidlands/partner-schools/pingle-school/ | 404 | 8/27/18 |
| | 5 | z3540-hyhyxmw-i18967-fr/ | 404 | 8/19/18 |
| | 6 | northeast/jobs/maths-teacher-4/ | 404 | 8/24/18 |
| | 7 | qfscmpp-z3539-i967-mw/ | 404 | 8/29/18 |
| | 8 | manchester/jobs/history-teacher/ | 404 | 8/5/18 |
| | 9 | eastmidlands/jobs/geography-teacher-4/ | 404 | 8/30/18 |
| | 10 | resources | 404 | 8/26/18 |
| | 11 | blog/2016/03/01/world-book-day-how-can-you-get-your-pupils-involved/ | 404 | 8/31/18 |
| | 12 | onxhtltpudgjhs-z3548-i4967-mnwacunkyaduobb/ | Cheers.
Thanks in advance,
James.0 -
Wordpress 404 Errors
Hi Guys, One of my clients is scratching his head after a site migration. He has moved to wordpress and now GWT is creating weird and wonderful strange 404 errors. For example http://www.allsee-tech.com/digital-signage-blog/category/clients.html There are loads like the above which seem to be made up out of his blog and navigation http://www.allsee-tech.com/clients.html works! Any ideas? Is it a rogue plugin? How do we fix? Kind Regards Neil
Technical SEO | | nezona0 -
Remove Links or 301
Howdy Guys, Our main site has been hit pretty hard by penguin and we are just wondering what steps we should now take. For the past 2 months we have been working through our back link profile removing spammy / un-natural links, we have documented everything in a spreadsheet... We recently submitted a reconsideration request to Google and they have now responded saying we still have bad links. I'm just wondering would be it easier just to 301 redirect our site to another TLD we have for our main site? Or Do we keep working through our links 1 by 1 and removing them? Has anyone had any success in 301ing? Thanks, Scott
Technical SEO | | ScottBaxterWW0 -
Nofollow links if you have more than one link on a page to the same destination.
Hi, I am wondering if someone can confirm that its best practice to have nofollow on secondary links on a page. For instance the contact page may have a link in the navigation and in the the blurb down the page have another link to the contact page saying contact us here etc.. So in this instance i would put a nofollow on the secondary link in the blurb would this be the best way to impliment this. Many thanks Chris
Technical SEO | | InteractiveRed670 -
Link Juice
When we say "link juice", does it mean that a particular page has link juice ( due to backlinks pointing towards the page ) or each link on that page has link juice which it passes to the target page I suppose "link juice " is different from Pagerank ?
Technical SEO | | seoug_20050 -
Linking to related business?
If your working with a local business, is it a good idea to reach out to similar businesses in other states and ask for a link? Example: I own a paint shop in Minnesota, and I reach out to a paint shop in California to see if we want to link to each others site to help our SEO. Because we aren’t in competition with each other wouldn’t this help us both?
Technical SEO | | marker-3115280 -
WP Blog Errors
My WP blog is adding my email during the crawl, and I am getting 200+ errors for similar to the following; http://www.cisaz.com/blog/2010/10-reasons-why-microsofts-internet-explorer-dominance-is-ending/tony@cisaz.net "tony@cisaz.net" is added to Every post. Any ideas how I fix it? I am using Yoast Plug in. Thanks Guys!
Technical SEO | | smstv0 -
Good links pratice for listing pages?
Hello, I'm wondering which is the best way to handle this kindle of page... You can have a look at my screen capture, or see directly my page here. I've in my case, for the same "ski resort", 3 differents anchor link type (title, image and more info…), all of them are going on the same page. I know it's not that good, my idea, it to keep only the more info like, but with a better anchor link, something like : more information about this ski resort... Thanks in advance 🙂 Best regards links.jpg
Technical SEO | | Alexandre_0