Accidental Noindex/Mis-Canonicalisation - Please help!
-
Hi everybody,
I was hoping somebody might be able to help as this is an issue my team and I have never come across before.
A client of ours recently migrated to a new site design. 301 redirects were properly implemented and the transition was fairly smooth.
However, we realised soon after that a sub-section of pages had either one or both of the following errors:
- They featured a canonical tag pointing to the wrong page
- They featured the 'meta noindex' tag
After realising this, both the canonicals and the noindex tags were immediately removed. However, Google crawled the site while these were in place and the pages subsequently dropped out of Google's index.
We re-submitted the affected pages to Google's index and used WMT to 'Fetch' the pages as Google. We have also since 'allowed' the pages in the robots.txt file as an extra measure.
We found that the pages which just had the noindex tag were immediately re-indexed, while the pages which featured the noindex tag and which were mis-canonicalised are still not being re-indexed.
Can anyone think of a reason why this might be the case? One of the pages which featured both tags was one of our most important organic landing pages, so we're eager to resolve this.
Any help or advice would be appreciated.
Thanks!
-
I'm not sure how helpful it is, in the sense of being good news, but I did something like this to one of my sites on purpose once, and wrote it up:
http://www.seomoz.org/blog/catastrophic-canonicalization
A couple of tips:
(1) I think what Oleg is saying, which I agree with is that if Page A had a canonical to Page B, instead of just removing the canonical tag, put in a canonical tag pointing from Page A to Page A. Sometimes, the self-referencing canonical will help over-ride the old/bad canonical.
(2) Fetch is a good bet, but I'd also re-submit an XML sitemap with just the "bad" URLs. It's not a cure-all, but it can help nudge Google.
Unfortunately, it really can take time to sort out. Make sure your internal links are correct as well. You could temporarily build new internal links (list a few resources on your home-page, for example) to push link-juice temporarily. You could also post the proper URLs on Twitter/FB, etc., to kick them a bit. Of course, that only works for a few pages, not for hundreds.
-
Yes it may just be a waiting game as Oleg mentioned. But perhaps to help speed up the process you could link to some of those pages from a higher level page (like the homepage or a department landing page).Don't spam tho, no more than 100 links on a page (including navigation/footer etc).
I'd also recommend having an XML sitemap with all the URLs of your website on it. You'll need to upload this to Google Webmaster Tools as well.
When they do get re-indexed keep an eye out for how they have been indexed; so look at what keywords bring up that page in SERPs (Raven Tools is an easy way to track keywords and see which URL comes up). If you find that 'odd' pages are being indexed for a certain keyword search you should do some link building specific to the keyword you want ranked pointing to the page/URL you want ranked.
Good luck!
Davinia
-
Hi Oleg,
Thanks for your response. Unfortunately the canonical URL was another of our main organic landing pages so a redirect wouldn't be appropriate in this situation.
I agree that it's just a matter of time but it's frustrating that Google has crawled the site since we updated the pages and still hasn't re-indexed the page in question.
-
Can you set a canonical/redirect on the page that was incorrect pointing back to the correct page?
i.e. page1.html had wrong canonical to pgae1.html -> change pgae1.html canonical to page1.html
Overall, I think it's just a matter of time before Google is able to recrawl and fix itself... it's odd that canonical + noindex is slower than just noindex. Do whatever you can to get G to recrawl the pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help article / Knowledge base SEO consideration
Hi everyone, I am in the process of building the knowledge base for our SaaS product and I am afraid it could impact us negatively on the SEO side because of: Thin content on pages containing short answers to specific questions Keyword cannibalisation between some of our blog articles and the knowledge base articles I didn't find much on the impact of knowledge bases on SEO when I searched on Google. So I'm hoping we can use this thread to share a few thoughts and best practices on this topic. Below is a bit more details on the issues I face, any tips on how to address them would be most welcome. 1. Thin content: Some articles will have thin content by design: the H1 will be a specific question and there will be only 2 or 3 lines of text answering it in the article. I think creating a dedicated article per question is better than grouping 20 questions on one article from a UX point of view, because this will enable us to direct users more quickly to the answer when they use the live search function inside the software (help widget) or on the knowledge base (saves them the need to scrolling a long article to find the answer). Now the issue is that this will result in lots of pages with thin content. A workaround could be to have both a detailed FAQ style page with all the questions and answers, and individual articles for each question on top of that. The FAQ style page could be indexed in Google while the individual articles would have either a noIndex directive or a rel canonical to the FAQ style page. Have any of you faced similar issues when setting-up your knowledge base? Which approach would you recommend? 2.Keyword cannibalisation: There will be, to some extend, a level of keyword cannibalisation between our blog articles (which rank well) and some of the knowledge base articles. While we want both types of articles to appear in search, we don't want the "How to do XYZ" blog article containing practical tips to compete with the "How to do XYZ in the software" knowledge base article. Do you have any advice on how to achieve that? Having a specific Schema.org (or equivalent) type of markup to differentiate between the 2 types of articles would have been ideal but I couldn't find anything relating to help articles specifically when I searched.
Intermediate & Advanced SEO | | tbps0 -
Where do you find an individual/freelance SEO?
I know Moz has a directory of recommended companies, and I've found that very useful. However, we're really looking for an individual (who, of course, keeps up with the latest best practices and trends in SEO) to optimize our site while we put our time into client sites. We've done Craigslist ads, but those seldom pan out. Have any of you had luck finding part-time SEOs? Where did you find them? Thanks!
Intermediate & Advanced SEO | | ScottImageWorks0 -
What tags/coding are not good for SEO?
what tags/coding are not good for SEO? and also what tags not to include while creating website. For example - I read some where to avoid Span tag.
Intermediate & Advanced SEO | | JordanBrown0 -
Scraping / Duplicate Content Question
Hi All, I understanding the way to protect content such as a feature rich article is to create authorship by linking to your Google+ account. My Question
Intermediate & Advanced SEO | | Mark_Ch
You have created a webpage that is informative but not worthy to be an article, hence no need create authorship in Google+
If a competitor comes along and steals this content word for word, something similar, creates their own Google+ page, can you be penalised? Is there any way to protect yourself without authorship and Google+? Regards Mark0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
URL Redirect: http://www.example.net/ vs. http://www.example.net
I currently have a website set up so that http://www.example.net/ redirects to http://www.example.net but **http://www.example.net/ **has more links and a higher page authority. Should I switch the redirect around? Here's the Open Site Explorer metrics for both: http://www.example.net/ Domain Authority: 38/100 Page Authority: 48/100 Linking Root Domains: 112 Total Links: 235 http://www.example.net Domain Authority: 38/100 Page Authority: 45/100 Linking Root Domains: 18 Total Links: 39
Intermediate & Advanced SEO | | kbrake0 -
Please suggest! Overstuffed URLs - to change, or not to change?
Hello guys, We are in doubts about changing URLs on our http://www.fiberscope.net/ store The site was hit by Google in January and has dropped from top 3 to top 20 for most of its keywords. Now we're working on some redesign, including removing of over optimized content from it. It looks like our categories and content pages URLs are overstuffed with keywords. Here is a dilemma. Should we rewrite them and set up 301 redirect or just leave as is? I look trough the seomoz Q&A and find out that a lot of guys got in troubles after redirecting PLEASE, Suggest!
Intermediate & Advanced SEO | | Meditinc.com0 -
Help with canonical tag
hello- i got this recommendation <dl> <dt>Recommendation</dt> <dd>Add a canonical URL tag referencing this URL to the header of the page</dd> <dd>from my "report card" and i see also that i have a lot of issues with duplicate content but i really dont have any duplicate content on my site.</dd> <dd>the crawl has apparently marked every post in my blog as duplicate page content.</dd> <dd>and the "use canonical tag" suggestion keeps appearing as a fix to my problems.</dd> <dd>could you please help me with ------How do i create a canonical tag?</dd> <dd>is it just rel=canonical?</dd> <dd>and where do i put it?</dd> <dd>i should put it on every page right?</dd> <dd>or with CSS my webmaster could probably do it very quickly right?</dd> <dd>i get the basic concept behind rel=canonical but i cant say i fully understand it -</dd> <dd>i need some help with regard to how and where this tag should be placed.</dd> <dd>thanks,</dd> <dd>erik
Intermediate & Advanced SEO | | Ezpro9
</dd> <dd>.</dd> </dl>0