I'm curious where the line is drawn for "duplicate content" by the search engines. Obviously the same article, or even the same article with minor edits, can and should be detected as duplicate.
I have a use case where there is a database of similar, but not duplicate, content that changes as time passes. I want to serve this content up via html template but don't want the 1000 pages to be considered duplicates of each other.
Example: Imagine local weather. You could create a template for city name, longitude, latitude, altitude, and current weather conditions. The values for all fields would be different for each of the 1000 database entries (cities) and one of the "current weather conditions" would change frequently (hourly, let's say).
Now, if I have a nice heirarchical index pages (first one maybe points to 50 state sub-pages, and each state page points to 20 city pages) that point to the 1000 city-specific pages, would the city-specific pages be considered 'duplicate' since they are based on the same HTML template but all have different values in key areas? Does the answer change based on the % of the template (or visible text) that changes for each city?
My goal is to get these 1000 subpages as part of my site, have them indexable, and have them each flow a little bit of link juice to my home page.
Best practices? What should I be careful of?
Thanks!