Just thinking - does the 404 page return a correct 404 response in the header? i.e. make sure it's not returning a 200 ("this is a legit page") response
Best posts made by timhatton
-
RE: 404 appearing in Sitelinks
-
RE: 301 redirects from old to new pages whit a lot of changes
The best way to think about this is - what happens if you don't do the 301 redirect from the old pages?
- Any links to those pages will be worthless (as the spiders will end up at a 404 error page, so the search engine will eventually drop the page from the search results)
- Any traffic which comes into those pages will get the same 404 page - so any referral traffic which follows those links in will be lost
- You're forcing the spiders to start from scratch again with your site - to spider and index all the content from the home page
What approach you take depends on your timescales, and how quickly you're going to create the new content. I'm also assuming that maintaining the existing URLs either isn't feasible or there's more value in creating new URLs (e.g. old ones were www.domain.com/index.php?id=3483844 and ones are www.domain.com/category/title-title-title.)
I would be tempted to try and do both at once - just doing the redirect will get people going to the correct URL, but the content will still be regarded as "old" by the search engines. Substantially changed content going onto new URLs will get the benefit of the 301 from the old URL (for referral links etc.), plus a "look again" from the spiders as it will be considered 'fresh'.
There is a very good recent discussion from a couple of weeks ago about new vs. old content http://www.seomoz.org/blog/new-links-old-pages-whiteboard-friday and one from November at http://justinbriggs.org/methods-for-evaluating-freshness
-
RE: Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?
You could also look at using the meta robots = noindex tag on /search/ pages, rather than just blocking it in robots.txt, as this will remove existing URLs from the index.
-
RE: How can I tell if a website is a 'NoFollow'?
There's also various plugins for Chrome / Firefox which will highlight nofollowed links on a page - including SEOMoz's Mozbar (http://www.seomoz.org/seo-toolbar)
-
RE: Trouble Exporting Xenu Crawl to Excel
I think you might be doing Save As... rather than Export.
You should have the option "Export to Tab Separated File" under the File menu? That will export to a TSV format which Excel can read quite happily.
-
RE: Setting up a 301 redirect from expired webpages
To answer your question directly - yes, there's a rule to put in .htaccess for this. It would be something like:
RewriteRule (.*).asp$ http://www.link.to/ourbrandspage (someone who knows regex better may correct me on this)
However, redirecting everything to the same page is a bit of a waste - if the site has been around for a long time, then there may be inbound links to deep pages in the site which would be better off being redirected to the appropriate page on the new URL structure rather than dumping everyone on the same page.
If there's a pattern match which you can follow, then you can write regex to cope with this (e.g. if the old structure was http://www.whatever.com/blah.asp and the new one is http://www.whatever.com/blah.php then just do an .htaccess redirect from *.asp to .php - something like RewriteRule (.).asp $1.php). However, I'm going to bet it's not that simple.
Best is to do a proper map of existing links so you can direct the actual old URL to the most relevant URL on the new site.
I've had to do this kind of "emergency redirect fix" before, for sites with a lot of pages and no neat "pattern match" fix. The way I usually approach it is to try and get a list of the existing URL structure (either: from a back up version of the site, from Google analytics, from webmaster tools or at a pinch you can scrape the SERPs) to grab all the possible/indexed URLs and stick them in a spreadsheet. I then prioritise the highest traffic pages - if you can see via Google Analytics (or server logs) which pages get the most inbound traffic, redirect those first to the most appropriate page on the new structure. That way you can carry on adding new rules into the .htaccess as you go along - you'll probably find of the 1000s of old pages, there's a relatively small %age which get the vast majority of inbound traffic.
Hope this helps!
-
RE: Are there any concerns moving a site to https?
I've just done this for a site which was part http and part https. The change was mainly for appearance - the client wanted the whole site to appear 'secure'. This was in December, and we haven't noticed any massive changes in the rankings for terms.
The obvious thing to consider is making sure you set up 301 redirects for http:// to https:// which can be done with a simple rule in .htaccess, something like RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L] though there may be other things which affect this, so check this through first (there's a big discussion on this here with various other options).
You'll need the https version of GA, and to make sure that pages aren't referencing anything on a non-secure server (e.g. an image) else users may get warning messages.
-
RE: In my errors I have 2 different products on the same page?
Can't really do anything in depth here, but this is where the root of the issue is:
1. The GHD irons link (and I suspect this is the case for all the errors) is coming from this product page:
http://www.thehairroom.co.uk/hair-care-products/ghd-straightening-irons
Look at the links being generated in the left hand "Filter Results" menu - they all have the GHD link appended to a different category, and so...
2. ... because your CMS allows wildcard endings to URLs after the product category, this brings up the same page (although this doesn't always work, but the following examples do):
e.g. http://www.thehairroom.co.uk/Tigi-Rockaholic-797658/something-else/norec
http://www.thehairroom.co.uk/redken-styling-products-577765/hello-seomoz-readers
Given that these aren't valid URLs, they should be returning a 404 page, rather than a "200 - OK" result which is why you're getting all these duplicate page errors.
Looks like a bit of a bug / bad design in whatever CMS is powering the site.
-
RE: Crawl Diagnostics - How to find where broken links are located?
+1 for Xenu - run Xenu over your site, when it finds a broken link you can get a list of which pages it appears on
-
RE: Multilingual drupal 7
Do those pages still exist? If they do, and there's a reason for them existing but you don't want them to appear in the index then
1. Block spider access to the pages with robots.txt
e.g.
User-agent: *
Disallow: /es/node/100
etc.
2. Use the Drupal metatag module to set up robots "noindex" meta tag on those pages if D7
Google say they'll remove pages from index which have noindex meta tag on them (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710&from=61050&rd=1)
-
RE: Faceted navigation, Affiliate links, Meta descriptions - Oh My!
Donford's answer is good then - they're probably ignoring those meta description tags as not relevant to the search query. So it's not that they won't display them (assuming they're correctly set up) it's more that they're ignoring them as they're not relevant.
From the horse's mouth: "Google will sometimes use the meta description of a page in search results snippets, if we think it gives users a more accurate description than would be possible purely from the on-page content."
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35624