Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google indexed "Lorem Ipsum" content on an unfinished website
Hi guys. So I recently created a new WordPress site and started developing the homepage. I completely forgot to disallow robots to prevent Google from indexing it and the homepage of my site got quickly indexed with all the Lorem ipsum and some plagiarized content from sites of my competitors. What do I do now? I’m afraid that this might spoil my SEO strategy and devalue my site in the eyes of Google from the very beginning. Should I ask Google to remove the homepage using the removal tool in Google Webmaster Tools and ask it to recrawl the page after adding the unique content? Thank you so much for your replies.
Intermediate & Advanced SEO | | Ibis150 -
Site architecture, inner link strategy and duplicate or thin content HELP :)
Ok, can I just say I love that Moz exists! I am still very new to this whole website stuff. I've had a site for about 2 years that I have re-designed several times. It has been published this entire time as I made changes but I am now ready to create amazing content for my niche. Trouble is my target audience is in a very focused niche and my site is really only about 1 topic - life insurance for military families. I'm a military spouse who happens to be an experience life insurance agent offering plans to active duty service members, their spouses as well as veterans and retirees. So really I have 3 niches within a niche. I'm REALLY struggling on how to set up my site architecture. My site is basically fresh so it's a good time to get it hammered down as best as possible with my limited knowledge. Might I also add this is a very competitive space. My competitors are big, established brands who offer life insurance along with unaffiliated, informational sites like military.com or the va benefits site. The people in my niche rarely actually search for life insurance because they think they are all set by the military. When they do search it's very short which is common as this niche lives in a world of acronyms. I'm going to have to get real creative to see if there are any long tail keywords I can use as supporting posts but I think my best route is to attempt to rank for the short one to three keyword phrases this niche looks for while searching. Given my expertise on the subject I am able to write long 1000-5000 content on the matter that will also point out some considerations my competitors dont really cover. My challenge is I cant see how this can be broken into sub topics without having thin supporting content. It's my understanding that I should create these in order to inner link and have a shot at ranking. In thinking about my topic I feel like the supporting posts can only be so long. Furthermore, my three niches within my small overall niche search for short but different keywords. Seems I am struggling to put it all into words. Let me stop here with a question - is it bad to have one category in a website? If not I feel like this would solve my dilemma in making a good site map and content plan. it is possible to split my main topic into 3 categories. I heard somewhere you shouldn't inner link posts from different categories. Problem is if I dont it's not ideal for the user experience as the topics really arent that different. Example a military member might be researching his/her own life insurance and be curious about his spouses coverage. In order to satisfy this user's experience and increase the time on my site I should link to where they can find more dept on their spouses coverage which would be in a different category. Is this still acceptable since it's really not a different subject?
Intermediate & Advanced SEO | | insuretheheroes.com0 -
Can anyone tell me why this page has content wider than screen?
I am getting that error on my product pages. This link is in the errors http://www.wolfautomation.com/drive-accessory-safety-sto-module-i500 but when I look at it on mobile it is fine.
Intermediate & Advanced SEO | | Tylerj0 -
Question about moving content from one site to another without a 301
I could use a second opinion about moving content from some inactive sites to my main site. Once upon a time, we had a handful of geotargeted websites set up targeting various cities that we serve. This was in addition to our main site, which was mostly targeted to our primary office and ranked great for those keywords. Our main site has plenty of authority, has been around for ages, etc. We built out these geo-targeted sites with some good landing pages and kept them active with regularly scheduled blog posts which were unique and either interesting or helpful. Although we had a little success with these, we eventually saw the light and realized that our main site was strong enough to rank for these cities as well, which made life a whole lot easier, not to mention a lot less spammy. We've got some good content on these other sites that I'd like to use on our main site, especially the blog posts. Now that I've got it through my head that there's no such thing as a duplicate content penalty, I understand that I could just start moving this content over so long as I put a 301 redirect in place where the content used to be on these old sites. Which leads me to my question. Our SEO was careful not to have these other websites pointing to our main site to avoid looking like we were trying to do something shady from a link building perspective. His concern is that these redirects would undermine that effort and having a bunch of redirects from a half dozen sites could end up hurting us somehow. Do you think that is the case? What he is suggesting we do is remove all of the content that we'd like to use and use Webmaster Tools to request that this content be removed from the index. Then, after the sites have been recrawled, we'll check for ourselves to confirm they've been removed and proceed with using the content however we'd like. Thoughts?
Intermediate & Advanced SEO | | LeeAbrahamson0 -
medical site with no unique content
Hi I'm trying to promote an ecommerce site that sells vitamins and health goods. The site owner doesn't want to add texts in the product pages because it is medical material. therefore he Currently has non unique (duplicated) content in each product page' It is the same exact content all others have (taken From the manufacturer)' Any ideas? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Can Keyword-Stuffing on a Single Page Penalize My Entire Site?
Hi forum! I want to improve my internal linking through adding keyword-rich anchor text to my search results pages (my site has an internal search engine for products). For example, if I were a shoes store, my product search engine results are currently:
Intermediate & Advanced SEO | | Travis-W
-Running
-Hiking
-Walking
-Track and I want to make them actual keyword-terms by changing them to:
-Running Shoes
-Hiking Shoes
-Walking Shoes
-Track Shoes This creates a problem - the keyword "shoes" is stuffed on the page. I don't care how well these dynamic search results pages appear in search, only the actual product pages. Is it okay to keyword stuff on these pages, or would it penalize my entire site?0 -
How to resolve Duplicate Page Content issue for root domain & index.html?
SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?
Intermediate & Advanced SEO | | ContentWriterMicky0 -
Getting Google to Correct a Misspelled Site Link...Help!
My company website recently got its site links in google search... WooHoo! However, when you type TECHeGO into Google Search one of the links is spelled incorrectly. Instead of 'CONversion Optimization' its 'COversion Optimization'. At first I thought there was a misspelling on that page somewhere but there is not and have come to the conclusion that Google has made a mistake. I know that I can block the page in webmaster tools (No Thanks) but how in the crap can I get them to correct the spelling when no one really knows how to get them to appear in the first place? Riddle Me That Folks! sitelink.jpg
Intermediate & Advanced SEO | | TECHeGO0