I think Ryan's point about HTML5 is good to keep in mind, but the problem is that we don't have any great guidance on what Google thinks about HTML5 right now, at least at this level of detail. They're waiting for the standard to evolve into common practice, just like the rest of us. I suspect, though, that if HTML5 is changing the rules, they may scale back their judgment.
Posts made by Dr-Pete
-
RE: Multiple H1 tags are OK according to developer. I have my doubts. Please advise...
-
RE: Multiple H1 tags are OK according to developer. I have my doubts. Please advise...
To be fair, how do you know that they're "spammy", "abusive", or "irrelevant"? I've seen people just use them badly - for example, for CSS styling. Is it a best practice? No. Would I do it? No. Will it have major SEO implications in 2012? Probably not.
I've seen instances where an H1 was used badly, but not in a deliberately spammy or even irrelevant way. Developers often treat tags as much more interchangeable than they should.
-
RE: Is there an easier way from the server to prevent duplicate page content?
Yes, a 301-redirect is almost always a server-level directive. It's not a tag or HTML element. You can create them with code (in the header of the page), but that's typically harder and only for special cases.
-
RE: Do search engines penalize for too many domain aliases?
I agree with Syed - it really depends on what you're doing with them. If they're just placeholders that have never had a site on them, Google probably won't notice at all. If these were previously interconnected micro-sites and then you turned around and funneled them into one site to capture all the link-juice, then it could certainly look a little manipulative. What's the purpose of the 20+ domains?
-
RE: Is there an easier way from the server to prevent duplicate page content?
As long as the tactic you use returns a proper 301, there's really no way that's better than any other. Ryan's approach works perfectly well for Apache-hosted sites.
-
RE: Is there an easier way from the server to prevent duplicate page content?
In most cases, I don't find sitewide canonical tags to really be necessary, but if they're done right, they can't hurt. The trick is that people often screw them up (and bad canonicals can be really bad). I do like one on the home-page, because it sweeps up all the weird variants that are so common for home pages.
-
RE: Is there an easier way from the server to prevent duplicate page content?
Expounding is what I do Other people use different words for it...
-
RE: Is there an easier way from the server to prevent duplicate page content?
Just to clarify, the rewrite that Ryan is proposing IS a 301-redirect (see the "R=301") - it's just one way to implement it. Done right, it can be used sitewide.
It's perfectly viable to also use canonicals (and I definitely think they're great to have on the home-page, for example), but I think the 301 is more standard practice here. It's best for search crawlers AND visitors to see your canonical URL (www vs. non-www, whichever you choose). That leads people to link to the "proper" version, bookmark it, promote it on social, etc.
Make sure, too, to use the canonical version internally. It's amazing how often people 301-redirect to "www." but then link to the non-www version internally, or vise-versa. Consistent signals are important.
-
RE: Masses (5,168 issues found) of Duplicate content.
Don't look at individual URLs - at the scale of 5K plus, look at your site architecture and what kind of variants you're creating. For example, if you know that the show= and sort= parameter are a possible issue, you could go to Google and enter something like:
site:example.com inurl:show=
(warning: it will return pages with the word "show" in the URL, like "example.com/show-times" - not usually an issue, but it can be on rare occasion).
That'll give you a sense of how many cases that one parameter is creating. Odds are, you'll find a couple that are causing 500+ of the 5K duplicates, so start with those.
Search pagination is very tricky - you could canonicalize to "View All" as Chris Hill said, you could NOINDEX pages 2+, or you could try Google's new (but very complicated way):
http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
Problem is, that doesn't work on Bing and it's pretty easy to mess up.
The rel-canonical tag can scoop up sorts pretty well. You can also tell Google in Google Webmaster Tools what those parameters do, and whether to index them, but I've had mixed luck with that. If you're not having any serious problems, GWT is easy and worth a shot.
-
RE: How should I deal with the affiliate URLs we create through our affiliate program?
It depends a bit on the scale - if you have a massive amount of affiliate links, they could get devalued, so some people have masked them in the past. The 302 does that, in a sense, but it also blocks the link-juice (or most of it). The 301 redirect would pass the link-juice, but there's a little more risk. If you're talking about 1000s of affiliate links and that's 90% of your link profile, I'd be careful. If it's just one part of your link-building strategy, it's probably not a big risk.
As Istvan mentioned, the canonical tag is another option. Visitors would still see the affiliate URL (which may or may not be ideal, depending on your own business model), but search engines would pass link-juice to the target page. In that case, I'd make the canonical page match the landing page (the home-page might not be a good bet) - it's a bit situational.
This post is a couple of years old, but it covers the basic options:
-
RE: Limiting On Page Links
Brent and Syed are correct - nofollow no longer preserves link-juice. As for the 100-links "rule", it's really just a guideline. I discuss it in depth here:
http://www.seomoz.org/blog/how-many-links-is-too-many
It's really a balancing act - the more links you have, the less love each link gets. It's not just SEOs - it's true for visitors, too (as Ryan pointed out). More options isn't always a good thing.
The trick is that the balance really depends a lot on the site. I've seen sites with 160 links that were well-designed with ample authority to make that work. I've seen others with 80 links where it was a complete mess.
It's also important to note that links to the same Page B on any Page A are discounted by Google - so, if you have a link in the navigation and then a link to that same page in the content and footer, the 2nd and 3rd links are basically ignored. We still count them as part of the 100, but Google doesn't in most cases. It's a little tricky, since Google probably views navigation links differently from contextual (in-page) links, but the rule still generally holds. Only one link to any Page B from Page A is going to get counted.
-
RE: Should I outsource any SEO work or can I do it all?
I think @Axt makes a great point - it's good to learn enough on your own (if you have the time) to understand the basics and build up what you can control. Then, when you're ready to hire someone, you have enough experience to know how to guide them.
The trick with outsourcing (whether foreign or domestic) is to know what you want, specifically. If you just hire a company to "build links", and you don't define what that means, you leave them open to any tactics and any reporting method they choose. They could come back having done virtually nothing, or they could come back having built a network of spam that creates a lot of risk for you. It's fine to ask a company what they'll do and then build the plan from that, but make sure it's all on paper. Don't assume that what they mean by "link-building" and what you mean are the same thing - get it all down and agreed on.
-
RE: How to deal with an media press content?
Are you distributing those press releases to others sites (via RSS)? In other words, do they appear on outside sites in their identical form?
If so, you could use rel-canonical cross-domain or a syndication-source tag and point your copies to the original press releases. That should help offset any duplicate content issues. It will keep you from ranking directly for those press releases, though.
It really depends on the goals/scope. It's not uncommon to cross-post press releases, and if you're talking about one a month or so on a site with dozens of pages (or more), it's not a big issue. If you're talking about 100s of press releases, then you could certainly run into trouble.
-
RE: Duplicate Content Issue from using filters on a directory listing site
Is your site relatively new? I currently show no pages in the Google index at all, which makes the duplicate content issue a bit moot (at least in the short-term). The search filters and pagination are a bit different issues. You could META NOINDEX any pages with the filter parameters active, or rel-canonical them to the unfiltered version (as @Steve25 said). Since no pages are indexed yet, you could also just "nofollow" the filter links ("Title", etc.), which should help prevent those filtered versions getting crawled. Pagination (pages 2+ of search) is a trickier issue, but it might be best to just NOINDEX, FOLLOW those. You could also let Google know in Google Webmaster Tools that that page= parameter is for pagination (I've had that be hit-or-miss, but it is easy, relative to other solutions). For the empty profiles, it really depends on the scope. If you have a lot, I'd ideally want to code them to have META NOINDEX if they're empty. You can lift the NOINDEX once they have content posted. You'd have to do that dynamically, but it shouldn't be too tricky. That way, Google would see new pages only once they have some content in place.
-
RE: Duplicate pages, overly dynamic URL’s and long URL’s in Magento
I'm actually a fan of selectively (programmatically) NOINDEX'ing like that. I find that the GWT parameter blocking doesn't always scale well. I'm running into a lot of clients trying to use it on 100s or 1000s (or millions, actually) of pages and Google is mostly ignoring it. Very frustrating.
We're working on features to let you ignore certain warnings/notices if you feel they don't apply, I but I do believe in being proactive about indexation issues. I think they matter a lot more than they used to, especially post-Panda.
I would double-check to see if there's a Magento plug-in to help, as this could be a common problem. Unfortunately, we don't have any Magento experts on-staff. I'll leave this open as a discussion question, in case any members have specific expertise.
-
RE: Duplicate Home Page content and title ... Fix with a 301?
Just a side note - your home-page TITLE is "Peruvian Soul | Peruvian Soul" - not that it's likely to look spammy to Google, but it just looks odd and isn't really helping anything. Might be an artifact of your CMS.
-
RE: How to block "print" pages from indexing
Sorry, but I have to jump in - do NOT use all of those signals simultaneously. You'll make a mess, and they'll interfere with each other. You can try Robots.txt or NOINDEX on the page level - my experience suggests NOINDEX is much more effective.
Also, do not nofollow the links yet - you'll block the crawl, and then the page-level cues (like NOINDEX) won't work. You can nofollow later. This is a common mistake and it will keep your fixes from working.
-
RE: How to block "print" pages from indexing
Rel-canonical, in practice, does essentially de-index the non-canonical version. Technically, it's not a de-indexation method, but it works that way.
-
RE: How to block "print" pages from indexing
I have to agree with Jen - Robots.txt isn't great for getting indexed pages out. It's good for prevention, but tends to be unreliable as a cure. META NOINDEX is probably more reliable.
One trick - DON'T nofollow the print links, at least not yet. You need Google to crawl and read the NOINDEX tags. Once the ?print pages are de-indexed, you could nofollow the links, too.
-
RE: Redirection
Yeah, I'm 100% with Keri - that seems like a really dangerous strategy - 1000s of sub-domains are likely to fragment your site and this is going to look like a low-value tactic to Google. The ranking ability of that extra keyword will be very small, and you could just as easily put it in a directory.
You're also at some Panda risk generating 1000s pages (sub-domains, etc.) just to rank for keywords. In most cases, this is thin content. Granted, I don't know the details of what you're doing, but I'd strongly advise against this.
-
RE: Has anyone ever shared a list of all their backlink sources?
Just a heads up - there's chatter across the SEO industry that Google has been targeting (penalizing) blog networks the past couple of weeks and may be getting more aggressive. So, be careful out there.
-
RE: Help I don't understand Rel Canonical
Yeah, unfortunately, on a CMS or template-driven site, it's really easy to put a canonical tag in place that impacts the wrong pages. Hopefully, you caught it in time.
The 270 notices in are system are just telling you that 270 URLs we crawled had a canonical tag pointing to a different page. In this particular case, it was a problem, but it isn't always an issue.
Unfortunately, with the bad canonical tag in place, it's tough to tell why they were there before. This is usually just a notice, and non-critical, but once the bad canonicals clear out, let us know if you're still getting the notice (it may take a couple of weeks to go away).
-
RE: SeoMoz duplicate content
We do get a lot of scrapers Google doesn't manage it all that well, unfortunately. There are a couple of cues:
(1) As Harald said, Google does try to determine which came first. This can be tough, because auto-scrapers actually can get indexed before sources in some cases. Having a solid crawl structure, XML sitemaps, pinging relevant sites, etc. can help.
(2) If the sites link back to you (on purpose or accidentally, by including links you put in the content), it's a signal to Google that you're the source.
(3) If you're a high-authority site, you've generally got an edge. Most of our scrapers are pretty weak sites, so we're not in much dangers. Unfortunately, I've seen times when scrapers outranked original content.
Proper syndication, with back-links or other signals (like syndication-source or cross-domain canonical) can definitely be good for sites. Having a ton of scrapers on a site that's relatively new/weak can be very negative, unfortunately. Then again, most new sites don't have a ton of scrapers, so it does balance out a little.
-
RE: Duplicate content issue
Sorry, I'm slightly confused. Are the wiki and forum duplicating each other, or are they each duplicating content on your root domain. Even in a sub-domain, they could be diluting content in your root domain, but it really depends a lot on the situation and extent of the duplication.
You could use the canonical tag to point them to the source of the content, or you could block them (probably META NOINDEX), but I'd like to understand the goals a bit better.
-
RE: Negative SEO?
Yes - if the pages/links are gone, they will get discounted - it just takes time. If they all came to one page and that page is non-essential, you could temporarily NOINDEX it until the bad links clear up. Depending on the page, though, that's a bit drastic. That's really the only way to cut a link from the receiving end.
-
RE: SEO problem if homepage is 2 folders deep?
I should point out that directory "level" in the URL is not the same as how deep a page is. If a page one-off the home-page is 4 directory levels deep, the crawlers still see it as one layer down, because that's how they reached it. I agree that this structure is far less than ideal, but I don't think it would automatically harm your internal PR flow.
-
RE: SEO problem if homepage is 2 folders deep?
It's not great. I find Google to be very stubborn about wanting to use the root level, and putting it 2 levels deep could cause some general crawl and canonicalization issues. Plus, Google may go looking for a folder or pages at the first level.
You're also adding duplicate keywords to every URL and pushing down the unique keywords. Even though keywords in the URL are a relatively weak ranking factor, it certainly isn't going to help you.
The risks may be small to moderate, but there's no up-side that I can see. Is this primarily a technical issue?
-
RE: Multiple URLs and Dup Content
One additional comment, and it's tricky. You need to find the crawl path creating these, BUT you don't necessarily want to block it yet. Add the canonical, and let Google keep crawling these pages. Otherwise, the canonical can't do its job properly. Then, once they've cleared out, fix the crawl path.
Are you seeing this in our (SEOmoz) tools or in Google? I'm not actually seeing these variants indexed, so it could potentially be a glitch. It looks a bit like some kind of session variable.
-
RE: Best way to improve page rank
One trick is that low-quality links can boost PageRank (in the toolbar), but then get devalued or even cause penalties down the row. As @Jarno said, links absolutely matter, but PR is a very, very crude measurement of the value of your links (and doesn't factor in trust, relevance, spam, etc.).
It's also important to note that Toolbar PR can be as much as 3-4 months out of date at any given time.
-
RE: Pages with Little Content
If you've already cut internal links and they're only in RSS, I probably would NOINDEX them, personally. You're basically telling Google that they're low-value (for now, at least) but then you're letting them be [possibly] indexed and potentially diluting the rest of your index (and ranking power). That way, people can still access them, but you're not risking Panda issues or other problems.
Over time, as you add content and boost your domain authority and link profile, you can start phasing in more pages.
-
RE: Do new Mozzers realise it takes effort to respond to their questions?
Thanks for the reminder! To be honest, I'm not sure we expected the kind of volume that Public Q&A would have after the re-launch. I know the Associates have been working to do a better job of endorsing good answers and generally participating. As you said, though, it's a community effort, and we can't do it without you guys (and gals).