Resolving duplicate text issues with a duplicate image?
-
We are a listing site for programs overseas. Many of our listings are inherently the same content, because in many cases the same exact information applies. We have resolved duplicate content issues to some extent by making some of the content in these listings unique. However, for the rest of the content which is going to be the same for about 100 pages, we were wondering if its better to have an image in place instead of duplicate text content (this would basically be an image of the text in question). We know this is a problem, because this is inherently duplicate content as well (only its a duplicate image instead of duplicate text). However, what's the best solution to this problem, and is a duplicate image just asking for trouble, or might this actually be a good idea?
-
Google won't index image-embedded text on a webpage (currently only .pdf documents)
If you want a little more insurance, which you won't really need, use your handy robot.txt or rel="canonical"
As usual, keep your eyes forward:
"While search engines may not use OCR for indexing the content of web pages now, that doesn’t mean that they might not in the future, and there are some indications that the search engines are developing a much greater proficiency in the use of optical character recognition."
Here's that article, including some great references.
Good luck.
-
Could you point me to a valid reference on that OCR issue?
-
You should use rel=canonical tag on duplicate content pages. Google can read text embedded as an image through OCR algorithm. So duplicate image is not a good option. Moreover think how these images will increase the load time of the web pages.
-
To directly answer your question, there are a few ways you can present content in a manner that is not readily crawlable for search engines: flash, iframe and images.
As far as good ideas, I much prefer to offer real content which is unique to the given area. Let's say you are a US-based site offering programs for attending universities overseas. Add some content specific to each country's page to make it unique.
If you present Malaysia as a country, talk about their universities by name, awards they have won, landmarks and other items of interest such as their incredibly diverse forests. You can also provide testimonials from satisfied clients. Testimonials can help establish a lot of relevancy as clients will often mention specifics about where they are from "John from Miami, FL" and where they visited.
In short, you will achieve better results if you work within Google's system then by trying to work around it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Different WP Taxonomies seen as duplicate content
Hey guys, We're seeing Moz report "duplicate" content on pages like: mysite.com/interesting-category/ and mysite.com/interesting-tag. Why exactly is this, and is there something that we should do about this? Obviously some of the same posts will intersect on pages like category, tag, author pages. Thanks
Content Development | | andy.bigbangthemes0 -
Translated text: should I use canonical link?
Hello everybody, I'm writing an article in Danish, which I have translated into English on a Danish blog. But I'm not sure if I have to use the canonical link from the English version to the Danish, or whether I should just publish both without using canonical link. What is your recommendation for this? Looking forward to hearing from you. Thanks & regards, Jonathan
Content Development | | JoLinda910 -
How to Get Rid of Duplicate Content Captured on Article Lists
We have a ton of articles and blog posts on our site. Currently, we display summary lists of articles that contain the first paragraph of the article in the summary list. However, in my reports, this is coming back as duplicate content with the full article itself. How do I fix this? Ex: article main page- http://www.robots.com/articles/10 First article on that page- http://www.robots.com/articles/viewing/grippers-for-robots (which shows up as duplicate content with the main artilce page). With our blogs, we have the most recent 5 blogs (in the same summary format) listed on our main blog page. We then have categories that people can sort by. But again, this is causing us duplicate content because those pages show the first paragraph of the blogs related to that category. Ex: blog main page- http://www.robots.com/blog. First blog listed on that page- http://www.robots.com/blog/viewing/robots-and-automation-bringing-jobs-back-to-the-united-states (which then shows as duplicate content with the main blog page). And then you can also select categories to see related topics: http://www.robots.com/blog/category/buying-a-robot which is showing as duplicate content also. Help! How can I prevent this? Thanks! JWanner
Content Development | | jwanner0 -
Duplicate Page Content & Rel-Canonicals
The SEO Moz duplicate page content tool lists the following URL's as having duplicate content: http://www.savvyboater.com/1988-newer-8-tooth-15-hp-honda-outboard-props.aspx http://www.savvyboater.com/1988-newer-8-tooth-15-hp-honda-outboard-props.aspx?sort=PriceAsc&pi=2 The second URL is the price sorter/second page of the category and contains the following rel-canonical: | http://www.savvyboater.com/1988-newer-8-tooth-15-hp-honda-outboard-props.aspx"> Are we using the rel-canonical correctly in this case? If so, why does it continue to show up as duplicate content in our SEO Moz report? There are over 1,000 URLS listed in the report with the exact same issue. |
Content Development | | ironpac0 -
Duplicate content on the homepage
Hello SEOMOZ Is giving me an error on duplicated content on my site. When viewing the details it is showing the following as duplicated content domain.co.uk/ domain.co.uk domain.co.uk/index.html Obviously these are the same pages. Why is it seeing them as seperate. Does anyone know how I can resolve this issue? Many thanks
Content Development | | lcdesign0 -
Best way to resolve duplicate content issue?
Not sure about what to do about this - I have a client who has a ton of pages (around 1200) which are all City specific pages, for long-tail search. These are all written with paragraphs in the format such as: Order to [City] today. So every page has essentially the same content. The site also only has 1562 pages, so with 1200 of them being City-specific same-content pages, that can't be good. However the problem is that these pages still rank very well (usually Position 1 or 2) for the terms they're targeting, and bring in enough traffic and revenue to justify their purpose. We also have Country specific pages, and these are all with unique content, rather than the scripted content on the City pages. So for example, for Italy we might have: Italy Page (Unique Content) Rome (Duplicate Content) Milan (Duplicate Content) Venice (Duplicate Content) etc. (Duplicate Content) For a low traffic country (Austria), we tried to 301 the City pages to the Country page, but that only resulted in us seeing a drop in search results for the city keywords, from (usually) Position 1 to more like Page 3 or 4, so quite a drop. So, without writing 1200 pages worth of unique content, what would your advice be?
Content Development | | TME_Digital0 -
Wordpress Duplicate Pages/ URL's - Help !
Hi guys, I have been running SEOMoz for just over a month and slowly cleaning up one of my Wordpress Blogs. While going through the crawl reports I have noticed that I have duplicate pages showing on the crawl. For example, the main post would be; www.xxxxx.com/blog/post-title Then I see another URL which would be; **www.xxxx.com/blog/page/59 ** When I click on either URL it goes back to the actual post title URL. What's with these page URL's ? Isn't these two URL's showing duplicate content to the search engines ? Any suggestions would be greatly appreciated.
Content Development | | dcc0 -
Duplicate Content Penalty
If our pages are to have roughly 30% of non-original textual content, can we be penalized by Google? Or are we OK as long as this non-original content is relevant to the pages?
Content Development | | Quidsi0