Hi Marina,
If I understand your question correctly, you just don't want your Tumblr blog to be indexed by Google. In which case these steps will help: http://yourbusiness.azcentral.com/keep-tumblr-off-google-3061.html
Regards,
George
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Hi Marina,
If I understand your question correctly, you just don't want your Tumblr blog to be indexed by Google. In which case these steps will help: http://yourbusiness.azcentral.com/keep-tumblr-off-google-3061.html
Regards,
George
Hi Carly,
It needs to be done to each of the pages. In most cases, this is just a minor change to a single page template. Someone might tell you that you can add an entry to robots.txt to solve the problem, but that won't remove them from the index.
Looking at the links you provided, I'm not convinced you should deindex them all - as these are member profile pages which might have some value in terms of driving organic traffic and having unique content on them. That said I'm not party to how your site works, so this is just an observation.
Hope that helps,
George
Sites that have algorithmic or manual penalties can still have sitelinks. I have no idea if your site has a penalty or not, I'm just saying there's a real risk from looking at your backlink profile that you may have an algorithmic penalty now, or will get one (or manual penalty) in future. It depends on a number of factors, including the accuracy of your disavow.
If you're starting to rank well for competitive keywords, then that would be a case for staying put. Your visibility is yet to update in SearchMetrics, which is still low though obviously jumped when you launched the new site.
If you moved to the .net, you would have to give up on the .com, as redirects from it would pass the toxic link equity to your new domain. In a worst case scenario, if you stuck with the .com and build lots of quality content, got links and promoted the brand, then your bad link past caught up with you then it would be very difficult to cut free and move to a new domain.
That said, if you moved to .net now there's equally no guarantee that this is completely necessary (as you said you're starting to rank). You're clearly aware that It's a huge decision to make and not one that you would want to take lightly.
Since you're at the point where you're just setting up a new site and probably about to pump money into marketing it pays to be aware of the options. I don't have enough information to say which one you should take.
Hope that helps,
George
Hi again,
The horse may have bolted on this particular issue, but here's what I would have done in your position:
If there's no existing traffic to the domain that you want to keep, and the .com isn't critical to the branding (it's not in your logo) then personally I would have put the site on a another domain that you own already (e.g. moneysite.net - assuming that is clean) and just killed the .com.
Having fought through a few Penguin penalties for existing brands, I can't imagine anything worse than launching a new site/brand that has someone else's dirty link laundry attached to it. There's still a chance you might get a manual penalty in future which will hang over like an axe.
It really depends on how much resource you have to start building real quality content that gets links and shares, and keep ontop of your disavows and potentially ongoing link removals.
It also depends on how critical organic traffic is to your business. If you have $50K a month to throw at PPC or affiliates then it may not matter.
George
Your site appears to be indexed OK, but your visibility is low. I checked that "money site" is a low competition keyword you should be ranking better for.
Taking a look at your backlink profile (opensiteexplorer.org), it appears that there are a ton of toxic links pointing to the domain. This is almost certainly going to affect your rankings through Google Penguin, unless someone's already gone through a stringent disavow process.
Before you launched a new site on this domain, was it vetted to see if your predecessors had done any link building badness?
George
I would throw HTTP 410s for them all if they don't get traffic. 410 carries a little bit more weight than 404s and we're not talking about a small number of pages here. I wouldn't redirect them to the homepage as you'll almost certainly get a ton of "soft 404s" in WMT if done all at once.
Matt Cutts on 404 vs 410: https://www.youtube.com/watch?v=xp5Nf8ANfOw
If they are getting traffic, then it'll be a harder job to unpick the pages that have value.
George
Hi,
You'll need to provide the site details if you need help in diagnosing a penalty.
As a starting point I would log into Google Webmaster Tools to see if a manual penalty has been applied, I would also look in Analytics to see if your organic traffic overall has dropped across other pages on your website.
An algorithmic penalty is harder to diagnose, but can usually be recognised by aligning traffic drops with dates of Google algorithm releases.
George
Hi Monica,
It's almost certainly an issue related to the Backlinker plugin given that error message, though clearly it's not a straightforward solution. I found this post on the wordpress forum, perhaps this is your issue too (by member pee_dee):
"Look in header.php inside your current theme and find this line:
http://www.4llw4d.freefilesblog.com/jquery-1.6.3.min.js
This server is no longer able to provide the .js file linked to your theme. I found it mine at:
http://ajax.aspnetcdn.com/ajax/jQuery/jquery-1.6.3.min.js
Get a hold of the .js file (or google the heck out of the .js file you need) and point to it on your server."
Hope that works
George
Hi,
I see a couple of assumptions in your question - I would say that having a "keyword rich domain" is becoming a less significant ranking factor in SERPs so I wouldn't base the migration of an existing website that performs pretty well on the potential of a new domain targetting certain KWs.
Secondl assumption is that your existing domain is ranking purely because it's older. There are likely to be other factors at play here - particularly backlinks.
However, I realise that you need to restructure the website and moving to a single domain with the complexes on subdirectories makes sense architecturally. You might well see a drop in rankings certainly in the meantime while you do this migration so if this is a key acquisiton channel, then investigate PPC options to bolster your traffic.
As for the 301 - I agree it makes sense to 301 to the complex subdirectory for a user, however in Webmaster Tools Google doesn't support the migration of one domain to the subdirectory of another domain. This means it won't be as seamless as if you migrate to the root of the new domain.
One way around this would be to redirect the old domain to the root domain, but provide very clear navigation on how to get to the relevant apartment complex to a user. As far as a user is concerned, I would see this as an acceptable solution.
George
It looks like this error is caused by a plugin you have installed and enabled on your wordpress site that probably isn't compatible with the version of wordpress you're running. If you disable the Backlinker plugin it will probably go away.
As for SEO impact - it appears to also have mangled your /robots.txt (which you should fix), and the user experience of seeing this error is poor and so it's worth fixing.
George
Link wheels are a pretty old school tactic and Google Penguin (links) & Panda (thin content) stamped out the wide-scale use of them.
Here's what I'd do in your situation:
1. Report his website(s) to Google, giving as much information as possible. The more information you can collect on the link wheel sites the better: https://www.google.com/webmasters/tools/paidlinks?pli=1&hl=en
2. There's no point you disavowing nofollow links to your website as they aren't passing link equity. You should only disavow them if they are followed links - for example if he was trying to get you a Google penalty by making it look like you were part of a paid link scheme.
3. Have a look at the highest quality backlinks his websites have (open site explorer). Chances are he has decent links outside of his link wheel that you don't if he's ranking above you. Take a look at his domain authority to get a general sense of how strong his organic profile is.
4. Take a long, hard look at your own site, content, offering and backlinks and try to improve it. Can you create engaging content for your customers? Can you create a unique proposition that will make you stand apart from your competitors?
All in all, despite the frustration I would avoid agonising over any dubious SEO tactics being used by your competitors - so long as they aren't negative SEO attacks on you. If they're willing to take such short-sighted risks then they risk long term harm to their business.
George
A developer who tells you "W3C validation isn't important" is like a house builder telling you "Those small cracks in the walls are nothing to worry about"
George
Google has a policy for this - what you're doing is not advisable - you should be annotating the URLs. You can read the correct approach to take here: https://developers.google.com/webmasters/mobile-sites/mobile-seo/configurations/separate-urls
Hi Jarrett,
Although the menus probably look different in your designs (an assumption on my part), the HTML looks identical on the link you provided (ULs/LIs). If the HTML is the same, then you'll use CSS to vary the appearance of them - specificially using the viewport on responsive mobile which is designed for exactly this scenario.
Perhaps I'm missing some other dev reason why it can't be done, but using ajax for this, even if you do attempt to block Google crawling it sounds like an over-engineered solution.
George
The official Google line would be to make them nofollow so it really depends on what your appetite for risk is.
In terms of whether your brand name is actually also a commercial keyword - Google it, and if you're ranking top then in theory it's being recognised as a brand.
In practice you will probably be able to get away with your brand name, or your full website address as the anchor text.
George
Some good responses already. I would add that if you're not already segmenting your audience then you definitely should be to make sure you're measuring the 'real' performance. For example, if in your 180k subscriber list, you have 90k people who haven't placed an order in 90 days and 10k customers who order with you every month then your open rates within the 'engaged' proportion of users will be swamped by the staleness of the rest of the list. Subscriber lists grow with growing businesses, and naturally develop dead wood so churning out the same sorts of emails means the stats can gradually over time decline.
You can combat this by (very simply) segmenting your 'active' base from your inactive base - by all means send them the same email but track their stats separately. Then when you start to invest in your emails, you'll be able to see if your active base are affected, rather than them all being lumped together with any increases/decreases in the performance of key sections of your customer base being concealed.
Finally in terms of email performance, I would use CTR/open rate purely as informative, because really it's revenue and margin that matter to the business.
Hi,
This isn't the best forum for this question as it's about IIS configuration - you'd be best off hitting up Windows Server configuration forums.
It is possible to do what you want to do. Dynamic (application) content in IIS needs to run under an application pool in order to be processed. You do this by creating an application under the website in the IIS manager.
Static content typically should sit under a different virtual directory (doesn't need an application). This means you can set it to be cached to improve page load time for users.
My advice would be to go back to the developer and look at his dev server set up, then copy it for your live server. Sorry it's hard to give you any more advice without a lot more information on the environment and code.
George
Hi,
Moz won't report this as an issue because there's no such detectable issue as having duplicate content on the same page. Duplicate content issues are discovered between two or more pages.
You could ajax the mobile menu, but given that google can and will crawl with javascript enabled, it will probably still come across 2 menus.
Personally I'd tell the developer to come up with a responsive solution. Looking at the markup, I'm guessing there will be at least some similarities between desktop and mobile menu experiences. Possibly a bit more painful in the short term, but worth it in the long run.
George
Thanks Max, your feedback makes complete sense.
KW volume analysis is a big job but managable, though I'm not even sure where I'd start with analysing whether people buy or not based on certain organic KWs. I'd probably have to set up Adwords campaigns and test conversion rates? Across a long tail of keywords that's going to be expensive to get statistically significant results.
Assuming that I don't have the resources to do that immediately, but that I do have a duplicate content issue (at least Moz seems to think so) am I better off "fixing" it with my proposed solution, or would you hold off until the KW analysis was done. This section of the site gets very little organic traffic at the moment as it's also a very competitive space and it doesn't have many inbound links so the risk of causing damage is low. I'm reluctant to start promoting this section and linking to it if I know there's a significant underlying duplicate content problem.
You're right about the URL too - it actually starts /Candy-Dispenser-Candies-Refills/*, I didn't think I'd get picked up on that!
Thanks,
George
Hi all,
I’m looking for some expert advice on use of canonicals to resolve duplicate content for an e-Commerce site. I’ve used a generic example to explain the problem (I do not really run a candy shop).
SCENARIO
I run a candy shop website that sells candy dispensers and the candy that goes in them. I sell about 5,000 different models of candy dispensers and 10,000 different types of candy.
Much of the candy fits in more than one candy dispenser, and some candy dispensers fit exactly the same types of candy as others.
To make things easy for customers who need to fill up their candy dispensers, I provide a “candy finder” tool on my website which takes them through three steps:
1. Pick your candy dispenser brand (e.g. Haribo)
2. Pick your candy dispenser type (e.g. soft candy or hard candy)
3. Pick your candy dispenser model (e.g. S4000-A)
RESULT: The customer is then presented with a list of candy products that they can buy. on a URL like this:
Candy-shop.com/haribo/soft-candy/S4000-A
All of these steps are presented as HTML pages with followable/indexable links.
PROBLEM:
There is a duplicate content issue with the results pages. This is because a lot of the candy dispensers fit exactly the same candy (e.g. S4000-A, S4000-B and S4000-C). This means that the content on these pages are the basically same because the same candy products are listed. I’ll call these the “duplicate dispensers” E.g.
Candy-shop.com/haribo/soft-candy/S4000-A
Candy-shop.com/haribo/soft-candy/S4000-B
Candy-shop.com/haribo/soft-candy/S4000-C
The page titles/headings change based on the dispenser model, but that’s not enough for the pages to be deemed unique by Moz. I want to drive organic traffic searches for the dispenser model candy keywords, but with duplicate content like this I’m guessing this is holding me back from any of these dispenser pages ranking.
SOLUTIONS
1. Write unique content for each of the duplicate dispenser pages: Manufacturers add or discontinue about 500 dispenser models each quarter and I don’t have the resources to keep on top of this content. I would also question the real value of this content to a user when it’s pretty obvious what the products on the page are.
2. Pick one duplicate dispenser to act as a rel=canonical and point all its duplicates at it. This doesn’t work as dispensers get discontinued so I run the risk of randomly losing my canonicals or them changing as models become unavailable.
3. Create a single page with all of the duplicate dispensers on, and canonical all of the individual duplicate pages to that page.
e.g. Canonical: candy-shop.com/haribo/soft-candy/S4000-Series
Duplicates (which all point to canonical):
candy-shop.com/haribo/soft-candy/S4000-Series?model=A
candy-shop.com/haribo/soft-candy/S4000-Series?model=B
candy-shop.com/haribo/soft-candy/S4000-Series?model=C
PROPOSED SOLUTION
Option 3.
Anyone agree/disagree or have any other thoughts on how to solve this problem?
Thanks for reading.
1. If you add just noindex, Google will crawl the page, drop it from the index but it will also crawl the links on that page and potentially index them too. It basically passes equity to links on the page.
2. If you add nofollow, noindex, Google will crawl the page, drop it from the index but it will not crawl the links on that page. So no equity will be passed to them. As already established, Google may still put these links in the index, but it will display the standard "blocked" message for the page description.
If the links are internal, there's no harm in them being followed unless you're opening up the crawl to expose tons of duplicate content that isn't canonicalised.
noindex is often used with nofollow, but sometimes this is simply due to a misunderstanding of what impact they each have.
George
Hi,
I took a quick look at your site, sitemap and index status and only 25 urls in Google, but very many more in the sitemap.
What I couldn't work out is where your /item-details/ urls in the sitemap are linked to from your website? I can't get to them through buying -> catalogue. It won't help indexing status if they aren't being linked to from anywhere.
The biggest issue you have however is the way canonicals are set up on the problem pages. If you go to this page:
https://www.wilkinsons-auctioneers.co.uk/item-details/?ID=2710
It has the following canonical (without the id):
rel='canonical' href='https://www.wilkinsons-auctioneers.co.uk/item-details/' />
If you search on Google, that canonical URL is indexed, so if you fix this by adding the id to the canonical they should start to appear in SERPS.
You have exactly the same problem on your auctions pages. e.g. https://www.wilkinsons-auctioneers.co.uk/auction-items/?id=13&pagenum=51
Another point that will help you rank is to use friendlier / more descriptive URLs for the items.
Hope that helps
George
Hi Lee,
The foundation site idea sounds like a real roundabout way of achieving organic traffic and hence sales - which from a high level I'm assuming is what you're trying to achieve. It would perhaps make more sense if you were going to use the Foundation site to drive referrals, or to use for PR, rather than solely for link equity purposes.
It wouldn't take much for Google to work out that the foundation site is a bit of a cynical attempt to gain rankings.
If I was you I'd focus on improving the content and linkability of your client's existing site and address some of the branding issues head on rather than side-stepping them with a sister website. You can incorporate the "foundation" idea into the existing website (perhaps on a subdomain or directory), which if done properly - with valuable content - will earn natural links and therefore gain far more organic value than having a sister website.
George
I've not come across any reason ever that would give cause to be concerned about losing Page Authority by having a page canonical to itself.
No need to be concerned. Aside from all the really well documented best practices on canonicals, in your original question you've spotted at least one big site that does this. They pay the SEO big bucks and rank well.
Yes this is a good idea as it's a catch all for URLs that might include tracking URL parameters, or other parameters that don't affect the page content. When there are no tracking parameters, it's going to be more development and testing work to hide the canonical, when having it there doesn't cause any issues. It's also quite a brutal but effective catch all if your page was accidentally accessible via other URLs - e.g. non-www or https.
George
Hi,
I migrated a load of product category pages on one of my websites recently to cleaner URLs and to force the crawl I submitted the new URLs (and children) to index via WMT. This was to pick them up quickly - and it worked (within seconds). The old URLs appearing were never a problem. However there are limits to the number of times you can do this so that might be a sticking point for your solution as I'm guessing you have lots of products. Try it with one page (a low traffic and selling product!) and see what happens - and let us know.
It's possible Google is holding onto your old URLs because they have a number of inbound links and the crawl will eventually catch up to only display the new URLs if you give it time.
Aside from agreeing with the sitemap submission suggestion, I'd also triple check that your 301s / canonicals are set up properly on your website's old URLs by firing Screaming Frog or another crawler at it.
George
Hi Graeme,
For old product pages - your solution is good regarding showing users alternatives to the out of stock products. No need for an "out of stock page" as there's no value in that for crawlers or users. Regarding point 2 - if you redirect discontinued product pages to category pages that should be fine although Google may regard that as a soft 404. If there are loads of products like this and you 301 them in one go then the chances are it will flag up in Google WMT. If there are a small number and you introduce them gradually then you'll probably be fine.
For the crawl errors question, adding value to the pages in terms of related products is a good solution if that's viable and the pages will be different enough from each other (i.e. no duplicate content). One thing that isn't clear at the moment is if you're redirecting empty category pages all to the homepage - or if it's possible to redirect or canonical them to their parent category.
e.g. For home -> clothing -> men's clothing -> shoes
If all the men's shoes are discontinued, then redirect that page to men's clothing rather than to the homepage. This reduces your chances of getting a soft 404, and is also arguably a better user experience.
Hope that helps,
George
Yes it's the worst possible scenario that they basically get trapped in SERPs. Google won't then crawl them until you allow the crawling, then set noindex (to remove from SERPS) and then add nofollow,noindex back on to keep them out of SERPs and to stop Google following any links on them.
Configuring URL parameters again is just a directive regarding the crawl and doesn't affect indexing status to the best of my knowledge.
In my experience, noindex is bulletproof but nofollow / robots.txt is very often misunderstood and can lead to a lot of problems as a result. Some SEOs think they can be clever in crafting the flow of PageRank through a site. The unsurprising reality is that Google just does what it wants.
George
Hi Rafal,
The key part of that statement is "we might still find and index information about disallowed URLs...". If you read the next sentence it says: "As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results".
If you look at moz.com/robots.txt you'll see an entry for:
Disallow: /pages/search_results*
But if you search this on Google:
site:moz.com/pages/search_results
You'll find there are 20 results in the index.
I used to agree with you, until I found out the hard way that if Google finds a link, regardless of whether it's in robots.txt or not it can put it in the index and it will remain there until you remove the nofollow restriction and noindex it, or remove it from the index using webmaster tools.
George
SearchMetrics would be a good place to start - you won't get individual keyword historic performance but it will show your website's overall SEO visibility over time. Particularly useful for tying in with Google algorithmic updates.
George
Hi Finnmoto,
You're in luck - it does use a 301 for that homepage redirect. The results of this test were brought to you by the mighty Fiddler (http://www.telerik.com/fiddler).
I've migrated pages like this before and it can take a bit of time for the dust to settle. Remember you've migrated an entire website to a new subdomain in one go and that takes time for Google and other services to process (depending on how authoritative your site is).
It's worth crawling your entire site's old URL structure with ScreamingFrog to check the redirects were implemented correctly.
Regards,
George
+1 what Martijn says.
Even though the pages have no traffic, Google may still be crawling them if they have links so there may still be some equity in keeping them. If this is a concern you can redirect them to the most relevant (quality) page on your site. If it's only a few pages this is fine but any more than a few they might come up as "soft 404s" in Google WMT.
You can check if Google is crawling the pages by checking your web logs. That said, if the content is of no value and they don't have any decent links then I would just delete them and move on.
Regards,
George
Hi Tanveer,
It's hard to answer your questions without seeing the raw data. I presume these are external rather than internal links, and that they are genuinely new as opposed to only just having been discovered. I would start with going to Webmaster Tools, downloading your latest links and having a look at where they are coming from.
There could be a number of reasons for this, and so there's no point me speculating and you're right to investigate further. Using a link profile checker such as cognitiveseo.com will give you a clearer idea on the quality of any new links you acquire.
Feel free to post more information if you need,
Regards
George
The problem you're describing is almost exactly the reason why canonical URL functionality exists. Just pick your canonical (with or without slash - it doesn't matter) and make sure you roll it out consistently across your website and sitemap.
Regards,
George
Hi Nicole,
Personally I've had lots of issues getting images indexed on large websites - and I've come across other webmasters with the same problems. If you really want to get diagnostic then you need to start splitting out content into different sitemaps as SEO-Buzz suggests so you can see a clearer breakdown in Webmaster Tools.
Another approach you might want to try is doing some image link building - your image content is ripe for being active on Pinterest and other photo sharing platforms. Getting the content placed like this should help with indexing.
Regards,
George
Just adding this to robots.txt will not stop the pages being indexed:
Disallow: /*login?
It just means Google won't crawl the links on that page.
I would do one of the following:
1. Add noindex to the page. PR will still be passed to the page but they will no longer appear in SERPs.
2. Add a canonical on the page to: "www.exemple.com/user/login"
You're never going to try and get these pages to rank, so although it's worth fixing I wouldn't lose too much sleep on the impact of having duplicate content on registration pages (unless there are hundreds of them!).
Regards,
George
I bumped SEO-Buzz's answer
In practice, sometimes it's impossible to write unique product descriptions. I've worked with websites that have over 15K products, with around 5K changing every quarter. To keep on top of that you'd need an army of copywriters.
In that situation I would recommend doing the following:
1. Make sure your category / hub page content is awesome / unique / relevant. These will then be your main landing pages.
2. Pick key product sections - based on high margin / good stock availability / competitive pricing and update the product descriptions for them. If you client sees improved sales as a result, they will probably roll out this strategy to the rest of the site.
Hope that helps
George
Hi there,
Any reputable affiliate should be adding nofollow to outbound links, and I would be concerned if they weren't. Even if PR isn't passed on 302s, what's stopping the affiliate from making them 301s or simply direct links in future?
Blocking GoogleBot using robots.txt on landing pages isn't going to stop Google indexing URLs it finds - you would have to use canonicals or noindex on the page for that. Even that wouldn't negate the impact of inbound links from potentially "toxic" websites.
I would say that you can't have your cake and eat it. If you're getting traffic and sales from these affiliate links then they're good for business. However there's a good chance if the sites are poorly regarded by Google, and they aren't using nofollow on the links that you will be penalised in organic listings.
I have used affiliates in the past, and learnt to keep them on a tight leash.
If you're concerned about your link profile I can recommend using cognitiveseo.com to analyse it.
Regards
George
Hi Sika,
What you're seeing isn't anything to be concerned about, and Moosa has already answered the cannibalisation part. I'd advise against tracking positions for individual keywords using a single tool from one week to the next. If you imagine there's a natural fluctuation of rankings and Moz is taking individual snapshots that might be up one day and down the next so they're only really meaningful when tracked over a longer period of time. You're also not taking into account the long tail - bunches of keywords that are similar which may not be fluctuating nearly as much as the one you're tracking.
The real indicator of where you rank should be the organic traffic to the page. I doubt that traffic will be fluctuating even nearly as much as the positions for the few keywords you are looking at.
As for algorithmic negative impact - you would probably see significant drops across multiple tracked keywords if this was the case - and those drops would be sustained until you diagnosed and fixed the problem.
Regards,
George
I'm not certain duplicate means exactly the same when it comes to titles. I've seen instances on particularly on large ecommerce sites where titles are blatantly auto-generated and are not displayed by Google in SERPs:
e.g. "buy <parent category="">and <subcategory products="">from xyz shop at great prices"</subcategory></parent>
In view it's likely that Google is aware of this from a quality guidelines standpoint. Where possible, titles should be individually crafted.
I agree with the 2 responses above.
Your blog is probably ranking because it has more links/shares (or at least more recent links/shares) and potentially more relevant content. You should try improving the content on the laptops products page.
You should also make it very easy for someone visiting your blog to get to your products to purchase them. If the blog is well written and useful/engaging, this might be a good opportunity for your content marketing to form a key part of your customer journey.
Hi there,
As far as your platform goes, product name changes simply shouldn't be causing 404s and this can be (relatively) easily bypassed by introducing the product id to the end of the URL. The name can then change but the product id remains the identifier for the product to load on the page.
With regards to your 40K pages without meta titles or descriptions, it's going to be almost impossible to fix that manually. It sounds as though you need to establish a business case, which could be done by fixing a few hundred of them (based on the ones that get the most traffic) and seeing if it has any improvement. This might not have an impact though as it sounds as though they aren't doing well in SEO as it is, although I agree there's a chance that these poorly optimised pages might be hurting your overall rankings.
The challenge you face sounds like more political/strategic than technical though. Either SEO has actual/potential value to your business or it doesn't. If content producers aren't versed in SEO or focused on maintaining it or producing optimised pages and content then you probably have an uphill battle ahead of you to get them to focus on it.
Good luck,
George
Hi,
My thoughts are that only if the page was on the same site would I advise you to redirect it. Since you're setting up a completely new website you should link from the existing page on the existing domain to the new page on the new domain.
George
Hi Aaron,
First off, since your rankings haven't been affected I would definitely hold off changing anything in WMT unless you're sure as it might cause more harm than good. If you paginate what looks like potentially thousands of pages I'm not convince Google will look on this fondly. The URLs will probably also change regularly as more companies are incorporated because the pages are set to show fixed list lengths.
Resolving the duplicate content onsite is definitely the best course of action. The fact that Moz is crawling these duplicate pages indicates that it's picking up links from somewhere on your site. If you are able to stop exposing these links and only linking to the "preferred version" i.e. canonical then this will give you some control and a better understanding of the site's information architecture.
Regarding setting up of canonicals, I suspect that this will be a harder job as of the 3 duplicate URLs you provide, it's not immediately clear which one would be the canonical. There are probably also thousands of instances similar to this duplicate group across other company lists and Google will have picked at random which one it sees as the canonical on each one. Marking another URL in the group as the canonical stands to (at least temporarily) cause a drop in rankings and SEO visibility if done across thousands of pages simultaneously.
If I was you and I felt compelled to address the issue I would pick a sample ~10% of the duplicate groups, set a canonical on each of them and see what happens in terms of rankings over 3-6 weeks. I would also add the canonicals to a sitemap and try update any links on your website to make sure only the canonical is referenced.
It's risky though, as your rankings are good even though I understand the principle of what you're trying to achieve. When I've tended to do things like this it's when a website has had nothing to lose.
George
Hi Aaron,
The search experience on the website is a bit unconventional in that you search for a company name and it returns pages of results alphabetically listed with the name you are searching for hopefully in there somewhere!
You could make changes to the pagination using rel=next/previous, but what you're displaying isn't really "true" results pagination. I would therefore be cautious about changing it if the site is ranking well.
Canonicals would only be required if you were showing the same content on different URLs. A quick "site:" search like the below only returns one result, so either Google isn't showing the duplicate URLs (very likely given your question) or it isn't a problem for you:
site:www.formationsdirect.com inurl:companysearchlist.aspx?name=AMNA+CONSTRUCTION+LTD
You can look in webmaster tools to see which query string parameters it is picking up and configure the behaviour you want GoogleBot to take. You can also get some sense of the duplication if it is an issue.
Regarding the company page URL you gave, anything after the # in the URL won't get crawled so you don't need to worry about canonicalising those.
Again, if it's ranking well, be very careful about trying to solve a problem that doesn't exist. If you can find duplicate content then definitely redirect or canonicalise it and see what kind of impact it has. I would do this before taking on anything more significant like the website information architecture and navigation.
George
The only thing I would add to the existing responses, is that if following a "site:www.mysite.com" query you notice that some key landing pages haven't been indexed then submit them via Webmaster Tools (Fetch as Google).
I would also make sure your sitemap is up to date and submitted via WMT too. It will also tell you how many of the sitemap URLs have been indexed.
These 2 things could speed up your re-indexing. My guess is that if it's a reputable site, and the migration of URLs was done properly, you'll probably get re-indexed quickly anyway.
George
Hi,
I've been badly burnt by agencies in the past offering "quality" link building services and have done quite a lot of work on dealing with a conundrum similar to yours. Here is my advice:
Good luck,
George
I think Devanur gives some good advice regarding the gradual improvement of the content, though you're stuck in a bit of a catch-22 with regard to how Google views websites: You want to be able to sell lots of products, but don't have the resources for your company present them in a unique or engaging fashion. This is something that Google wants webmasters to do, but the reality of your situation paints a completely different picture of what will give your company decent ROI for updating vast amounts of product content.
If there isn't an obvious Panda problem, I wouldn't just noindex lots of pages without some thought and planning first. Before noindexing the pages I would look at what SEO traffic they're getting. noindexing alone seems like a tried and tested method of bypassing potential Panda penalties and although PageRank will still be passed, there's a chance that you are going to remove pages from the index that are driving traffic (even if it's long tail).
In addition to prioritising content production for indexed pages per Devanur's advice, I would also do some keyword analysis and prioritise the production of new content for terms which people are actually searching for before they purchase.
There's a Moz discussion here which might help you: http://moz.com/community/q/noindex-vs-page-removal-panda-recovery.
Regards
George
@methodicalweb