Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I have my blog on http and the rest of the site on https?
I have an ecommerce site that is on https. We have a Wordpress blog for blogging, but we also have our help section located on it. I used a plugin to switch the blog to https but now have a few problems. 1. My sitemap generator still shows the blog as http and Google gives me a warning for the redirect. 2. When trying to use the Moz page grader I was told that I was in a redirect loop. 3. The pages do not seem to be getting indexed. It is a blog so there is never any information exchanged that is private. Would I be ok with just switching it to http? Or would Google see that as two different sites even though they have the same domain?
Intermediate & Advanced SEO | | EcommerceSite0 -
Knowledge Graph Quick Answer Box: Is there anything we can do to get our content to appear there?
Hi everyone, The quick answers box can be really helpful for searchers by pulling through content which answers their question or provides a clear description of an item or entity. Our client appeared in the quick answer box for a period of time with their description of a product, but have since been replaced by one of their competitors. Previously, the answer was provided by Wikipedia. Is there anything we can do to help get our client's content back in there? We've been looking at possible structured data we can use but haven't found anything. Also suggesting our client ensures they have a paragraph within their copy which is a clear, concise description of the product that Google can pull. Can anyone give any suggestions? Thanks Laura
Intermediate & Advanced SEO | | tomcraig860 -
Link earning for local businesses who can't afford content marketing
What are some of the best ways to earn and build quality relevant links that will increase exposure to your target market in addition to assisting search rankings? I personally find that local niche directories and PR are the best ways to accomplish this without having content to "earn links"..what else works? Any interesting ideas??
Intermediate & Advanced SEO | | RickyShockley0 -
Huge Google index on E-commerce site
Hi Guys, I got a question which i can't understand. I'm working on a e-commerce site which recently got a CMS update including URL updates.
Intermediate & Advanced SEO | | ssiebn7
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed). The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed. Another strange thing which another forum member describes here : Cache date has been reverted And next to that old url's (which have a 301 for about a month now) keep showing up in the index. Does anyone know what i could do to solve the problem?0 -
More Indexed Pages than URLs on site.
According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site. I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration. Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K). Anyone got any ideas?
Intermediate & Advanced SEO | | DavidLenehan0 -
Can Someone Provide an Example of a Site that Indexes Search Results Successfully?
So, I know indexing search results is a big no-no, but I recently started working with a site that sees 50% of its traffic from search result pages. The user engagement on these pages is very high, and these pages rank well too. Unfortunately, they've been hit by Panda. They already moved the section of the site with search results to a subdomain, and saw temporary success. There must be a way to preserve their traffic from these search result pages and get out from under Panda.
Intermediate & Advanced SEO | | nicole.healthline0 -
Website is not getting indexed in Google! Not sure why?
I just came up with my new blog, its not live yet but the 1<sup>st</sup> landing page is ready, up and running… all is fine but here is the only problem is its not getting indexed in Google and I am not really sure why? .xml sitemap is there Google webmaster and analytics are there Website contain at least that much real social shares that it should get indexed in Google Few Links may be coming from Famous Bloggers and SEOmoz (both sites are very authentic in their respective domains) It’s the 4 day the website is up I don’t think website is not getting indexed in Google just because it contains 1 landing page and a thank you page! Any clue or help will be appreciated. www.setalks.com is the domain
Intermediate & Advanced SEO | | MoosaHemani0 -
Can Anyone show me a site that has followed the seomoz seo rules
Hi i have been reading the seo information on here which is very interesting and i would like to know if anyone can point to any sites that have followed the rules and advice. It is great when you can read the info and rules but i feel it is also better to see a site that has followed the rules and to hear from people who have followed the information and put them into practice and explain what results they have got. I am currently building the following website http://www.womenlifestylemagazine.com so it would be great to see a site that has followed all the rules and who can explain if they work or not.
Intermediate & Advanced SEO | | ClaireH-1848860