Duplication Penalty through Specs?
-
I am trying to figure our how to correct a recently incurred duplication penalty on a partner site. I didn't see any posts on this yet specific to my problem. The site used to be ranked on page 1 of Google for all important keywords but now we ran into the situation that many pages were bumped to pos 100 or lower due to duplication issues. This is an aviation site, discussing airplanes and each page discusses a different model but each page also has the specs of the plane and while the data parts are different for each plane the specification terms are the same ,see here:
Primary Function:
Crew:
Engine:
Thrust:
Weight Empty:
Max. Weight:
Length:
Wingspan:
Cruise Speed:
Max.Speed:
Climb:
Ceiling:
Range:
First Flight:
Year Deployed:Is there an easy way to get Google to stop including these terms (not the data in the 2nd column) from the page anaysis to prevent this causing the duplication issues we are are seeing due to this?
Thanks in advance!
-
Dear Dan. THANK YOU for this excellent information. We are still pretty new to this so this helps a lot!
We did some work already and now are noticing also strange behaviour around capitalization, which makes no sense to us at all; any thoughts on that? Here some more details:
I am finding out some interesting things about G search results as I further explore what has happened to my site.
The results are different if you use small case or capital letters such as RC A-10 Warthog or rc a-10 warthog.
The results are different at different times of the day, sometimes within minutes. Before 7 am today most of the pages which I modified were in the top 10 results, now about 1/2 of them have slipped to page two.
There is absolutely no logic to all the variables. For example, my site returned to its normal traffic (the same way it was before the algorithm change) of over 5,000 views per day on Saturday and Sunday. Just last weekend it barely got to 3,000 views. And, I haven't changed enough pages to the new format to directly affect those results. I wonder if Google has made an adjustment and that is why many of my pages are getting back to near where they were before? I look forward to tonight's final numbers for the day to see if it was solely for the weekend or if it continues.
However, using rc a-10 warthog in small letters, the air hogs page is no where to be found, yet my page is no. 4 as of right now. In Caps, RC A-10 Warthog it is no. 11 right now.
The above example may lead you to believe that the pages on my site should all be in the higher search results when using small case. However, check out Japanese Zero vs japanese zero. Just the opposite is true. In this case my page rates higher when entering it with Capital letters.
Yet, if you look at RC A-6 Intruder vs rc a-6 intruder, they are both around no. 7 in the search results.
Check out how little content is on the page that is higher in the results than mine for the A-6:
http://www.dhgate.com/jet-airplane-a-6-intruder-high-grade-rtf/p-ff80808128ed96260128f2db154e1e9f.htmlCheck out my A-6 page:
http://www.aviationtrivia.info/Grumman-A-6-Intruder.phpI think it is because DH Gate, like Amazon, simply has thousands of pages and G likes that, even if each page has little content. Do you concur?
I think what G is doing is known as the google dance. It is reminiscent of the cypher codes used to encrypt top secret transmissions for the military in that it is constantly changing so as to discourage detection. What may be a top page one day can rate up to 10 positions lower on another.
Many of my pages have slipped only slightly when G changed their algorithm, such as from no. 1 in the search results to no. 3. (Virtually all the top rated pages are now ones which have videos.) However, that has been enough to make a big difference in traffic and sales.
Other than start taking videos of some 500 rc airplanes featured on my site, putting the videos on my site and YouTube, and hoping that the videos will be highly rated, I don't know how to get my pages as high in the search results as the YouTube pages.
The other pages which usually are higher in G's search results are forums like RC Universe and RC Groups. They have tens of thousands of pages, none of which have been changed, dating back to the late 1990's. Not much I can do about that either.
My aim is to be in G's search results right after the videos and forums. I think that someone searching for a rc airplane for sale will go right past them if they know that Aviation Trivia has ALL the rc airplanes of a particular model listed on it on a single page.
The changes I've made, ie. no longer dividing the page between the actual aircraft and rc models of it, has gotten those pages back into the top ten search results, but not for both Caps and small letters.
-
It was after. The site however has some of the best data from a people perspective. We are trying to reexplain that to the G algo
-
Tried to add as a reply to you Ralf, but it's not working. Anyone know why I cannot reply to a response?
Ralf, I glanced at some pages to see if I could find anything. Here's a couple of things that I think you could change or work on to improve your standing with Google:
1. There is pretty wide consensus that pages with a lot of ads and adsense seem to have been hit the hardest. Some of your pages have up to 5 blocks of adsense. Perhaps 1 or 2 blocks of adsense would be better. Google doesn't seem to like it much when the adsense is "hidden" in a sense, as in, there's so much adsense and it looks so much like the actual content that users cannot tell the difference. Go easy on it and see if that helps.
2. You have very outdated code on your website. It seems your whole site is built in HTML tables. Your code to content ratio is going to suffer because of it, and you are using HTML elements that are deprecated I believe. (ie: font face). Perhaps a face lift of the site and an update of that clunky code could help speed up the site and present a better image to users and search engines.
3. You haven't signaled which URL of your site is the main URL. For example, you have 4 home pages according to Google:
www.aviationtrivia.info/index.php
All go to the exact same page, so that page is showing up under 4 different URL's. That is one, duplicate content and two, not making good use of your link juice. According to open site explorer, you have over 22,000 links to your domain, but 13,000 go to the www version of your domain. So your links are split between your main domain and the www subdomain. Redirect the www version or the non www version to its counterpart with or without the www. This will consolidate your link juice much better.
4. From the numbers, 68 domains are giving you 13,000 links to your www url and 10 domains are giving you about 9,000 links to the non www url. 9,000 links from only 10 domains looks a little odd to me. The odd part is your home page(s) only account for about 1,000 of your total links. I didn't take the time to find out which page or pages are getting all those links, but it doesn't appear to be your home page. 9,000 links from 10 domains going to a page other than your home page just seems, well, odd to me. If you have any paid links or have participated in a suspicious link exchange of some kind, this could be harming you as well.
Hope that helps. Another tip would be to go into your Google Webmaster Tools account and see what it tells you. Often you can get good information from them to help you out.
-
Was this before or AFTER the Panda Update? If after, you may have been hit by G's new algo which targets sites they deem to be of low quality.
-
Thanks you for your thoughts, much appreciated! The Google change is what made use think it is duplication and the only thing we could think of was the repeating specs.
It was copyscape that let us to believe it to be the duplication issue but as we checked on copyscape before the Google change and since tested some content changes that didn't make any difference to copyscape, I am beginning to think that copyscape doesn't work properly.
Now going back to the problem itself, let me describe what we are seeing and maybe you have a better idea what could cause this and what we need to be looking at.
Virtually all of the pages on my Aviation Trivia website www.aviationtrivia.info have been downgraded by Google. The aircraft pages were mostly in the top ten search results for rc airplanes and the name of the aircraft such as F4U Corsair and the words "for sale", ie: "F4U Corsair for sale". A great number of those pages are now out of the first 50 search results.
The aircraft pages all contain original content. One such page, Sikorsky CH-53
http://www.aviationtrivia.info/Sikorsky-CH-53.php
is typical of the downgraded pages. It was in the first five page results under its name and now is around no. 60. The page even has an exclusive interview with a pilot of the aircraft.A page that was in the top results for the search words "rc Airwolf Helicopter"
http://www.aviationtrivia.info/Airwolf-Helicopter.php has now been downgraded to about no. 30. The no. 1 search results for rc Airwolf Helicopter is Century Helicopters 30 Size Airwolf Helicopter
http://www.centuryheli.com/products/helikits/cn1070airwolf/index.html?currentid=120
Their page minimally describes three helicopters they sell. My page at Aviation Trivia describes over 30 Airwolf helicopters for sale plus has information from a person who has flown one, plus additional information on where you can find specifics about the helicopter on popular websites.The most popular page on my site, World's Fastest Aircraft http://www.aviationtrivia.info/THE-100-FASTEST-AIRCRAFT.php
is still no. 1 in the Google search results, however what was the second most popular page, Largest Aircraft
http://www.aviationtrivia.info/THE-LARGEST-AIRCRAFT.php
has gone from no. 1 to about no. 50. What is really interesting is that, although I expect Wikipedia pages to always be above mine in the search results, the highest ranked page after Wikipedia is Global Aircraft - Top 50 Largest Aircraft.
http://www.globalaircraft.org/50_largest.htmLooking at my page that describes 100 aircraft in detail and provides links to pages in my site that goes into their histories and full specifications, then looking at the Global Aircraft page that simply states the name of the aircraft, wingspan, and weight, I can't understand how it can now ranked no. 1 and my page no. 50 when searching for "world's largest aircraft."
I have set up my site from the view point of a person who is interested in scale rc model aircraft and the aviation history of their actual aircraft. The information I put on the site is there to not only inform the people about the full scale aircraft, but to give them choices about the rc model airplanes that are available, and other information that may be helpful in choosing the right rc model for them.
Just to clarify, someone would search for rc F4U Corsair, or rc F-14 Tomcat and that would come up in the top search results as well as F4U Corsair for sale and F-14 Tomcat for sale, etc.
Hope that makes sense
Thanks!
Ralf -
If I understand correctly, you have a site with several pages that discuss different models, and each of those pages has this same spec list? You are not being hit with a duplicate content penalty. If you were, every website in the world that does reviews would be hit hard. I just did a quick search on treadmill reviews. Look at those results on the top page. They all use the same specs on all their pages. Treadmilldoctor.com and treadmillreviews.com seem to use a near identical system to each other even. Yet none of them have duplicate penalties.
WIthout knowing your website, there is definitely some other reason you are not ranking anymore. There was a new algorithm change that could have affected you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content Issues - Where to start???
Dear All I have recently joined a new company Just Go Holidays - www.justgoholidays.com I have used the SEO Moz tools (yesterday) to review the site and see that I have lots of duplicate content/pages and also lots of duplicate titles all of which I am looking to deal with. Lots of the duplicate pages appear to be surrounding, additional parameters that are used on our site to refine and or track various marketing campaigns. I have therefore been into Google Webmaster Tools and defined each of these parameters. I have also built a new XML sitemap and submitted that too. It looks as is we have two versions of the site, one being at www.justgoholidays.com and the other without the www It appears that there are no redirects from the latter to the former, do I need to use 301's here or is it ok to use canonicalisation instead? Any thoughts on an action plan to try to address these issues in the right order and the right way would be very gratefully received as I am feeling a little overwhelmed at the moment. (we also use a CMS system that is not particularly friendly and I think I will have to go directly to the developers to make lots of the required changes which is sure to cost - therefore really don't want to get this wrong) All the best Matt
Technical SEO | | MattByrne0 -
How to Identify Which Penalty : Penguin, Panda or Other?
I'm in the process of putting together a plan to recover from Algorithmic penalty. I'm not sure if I have to focus my recovery effort based on Penguin, Panda or Other algorithm penalty. After looking at the attached screenshot : Google Analytics Data vs Google Algorithm update timeline, I'm not sure if the blog is affected due to Penguin or Panda. I have following questions Traffic drop is because of Pengin, Panda or Other penalty? (there is no manual penalty message) Where should I focus my time with recovery efforts? (link removal, contents, link building, etc). Any other comments or suggestions? Thanks for you help. cSFZqj7
Technical SEO | | rsmb0 -
Duplicate Page Content
Hi, I just had my site crawled by the seomoz robot and it came back with some errors. Basically it seems the categories and dates are not crawling directly. I'm a SEO newbie here Below is a capture of the video of what I am talking about. Any ideas on how to fix this? Hkpekchp
Technical SEO | | mcardenal0 -
Duplicate pages
Hi Can anyone tell me why SEO MOZ thinks these paes are duplicates when they're clearly not? Thanks very much Kate http://www.katetooncopywriter.com.au/how-to-be-a-freelance-copywriter/picture-1-58/ http://www.katetooncopywriter.com.au/portfolio/clients/other/ http://www.katetooncopywriter.com.au/portfolio/clients/travel/ http://www.katetooncopywriter.com.au/webservices/what-i-do/blog-copywriter/
Technical SEO | | ToonyWoony0 -
Duplicate content
I'm getting an error showing that two separate pages have duplicate content. The pages are: | Help System: Domain Registration Agreement - Registrar Register4Less, Inc. http://register4less.com/faq/cache/11.html 1 27 1 Help System: Domain Registration Agreement - Register4Less Reseller (Tucows) http://register4less.com/faq/cache/7.html | These are both registration agreements, one for us (Register4Less, Inc.) as the registrar, and one for Tucows as the registrar. The pages are largely the same, but are in fact different. Is there a way to flag these pages as not being duplicate content? Thanks, Doug.
Technical SEO | | R4L0 -
Duplicate content and http and https
Within my Moz crawl report, I have a ton of duplicate content caused by identical pages due to identical pages of http and https URL's. For example: http://www.bigcompany.com/accomodations https://www.bigcompany.com/accomodations The strange thing is that 99% of these URL's are not sensitive in nature and do not require any security features. No credit card information, booking, or carts. The web developer cannot explain where these extra URL's came from or provide any further information. Advice or suggestions are welcome! How do I solve this issue? THANKS MOZZERS
Technical SEO | | hawkvt10 -
Canonical Link for Duplicate Content
A client of ours uses some unique keyword tracking for their landing pages where they append certain metrics in a query string, and pulls that information out dynamically to learn more about their traffic (kind of like Google's UTM tracking). Non-the-less these query strings are now being indexed as separate pages in Google and Yahoo and are being flagged as duplicate content/title tags by the SEOmoz tools. For example: Base Page: www.domain.com/page.html
Technical SEO | | kchandler
Tracking: www.domain.com/page.html?keyword=keyword#source=source Now both of these are being indexed even though it is only one page. So i suggested placing an canonical link tag in the header point back to the base page to start discrediting the tracking URLs: But this means that the base pages will be pointing to themselves as well, would that be an issue? Is their a better way to solve this issue without removing the query tracking all togther? Thanks - Kyle Chandler0 -
Duplicate content connundrum
Hey Mozzers- I have a tricky situation with one of my clients. They're a reputable organization and have been mentioned in several major news articles. They want to create a Press page on their site with links to each article, but they want viewers to remain within the site and not be redirected to the press sites themselves. The other issue is some of the articles have been removed from the original press sites where they were first posted. I want to avoid duplicate content issues, but I don't see how to repost the articles within the client's site. I figure I have 3 options: 1. create PDFs (w/SEO-friendly URLs) with the articles embedded in them that open in a new window. 2. Post an image with screenshot of article on a unique URL w/brief content. 3. Copy and paste the article to a unique URL. If anyone has experience with this issue or any suggestions, I would greatly appreciate it. Jaime Brown
Technical SEO | | JamesBSEO0