Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexed Site A's Content On Site B, Site C etc
Hi All, I have an issue where the content (pages and images) of Site A (www.ericreynolds.photography) are showing up in Google under different domains Site B (www.fastphonerepair.com), Site C (www.quarryhillvet.com), Site D (www.spacasey.com). I believe this happened because I installed an SSL cert on Site A but didn't have the default SSL domain set on the server. You were able to access Site B and any page from Site A and it would pull up properly. I have since fixed that SSL issue and am now doing a 301 redirect from Sites B, C and D to Site A for anything https since Sites B, C, D are not using an SSL cert. My question is, how can I trigger google to re-index all of the sites to remove the wrong listings in the index. I have a screen shot attached so you can see the issue clearer. I have resubmitted my site map but I'm not seeing much of a change in the index for my site. Any help on what I could do would be great. Thanks
Intermediate & Advanced SEO | | cwscontent
Eric TeVM49b.png qPtXvME.png1 -
Index Pages become No-Index
Hi Mozzers, Here is the scenario: I created a landing page targeting Holiday keywords for the holiday season. The page has been crawled and indexed - I see my landing page in the SERP. However, because of the CMS layout, since the Holiday is over and I don't want it to be displayed on the homepage, i have to remove the page from hp which makes it no-index (don't ask why, it's how the CMS was built). Question: How does this affect this LP's search? Since it's already crawled and etc. will it still be on the SERP after i change the page to no-index? If I remove the no-index next year for the holiday season, how does this all play out? Any insights or information provided will be appreciated. Thank you!
Intermediate & Advanced SEO | | TommyTan0 -
Dev Site Out of SERP But Still Indexed
One of our dev sites get indexed (live site robots.txt was moved to it, that has been corrected) 2-3 weeks ago. I immediately added it to our Webmaster Tools and used the Remove URL tool to get the whole thing out of the SERPs. A site:devurl search in Google now returns no results, but checking Index Status in WMT shows 2,889 pages of it still indexed. How can I get all instances of it completely removed from Google?
Intermediate & Advanced SEO | | Kingof50 -
Getting Pages Requiring Login Indexed
Somehow certain newspapers' webpages show up in the index but require login. My client has a whole section of the site that requires a login (registration is free), and we'd love to get that content indexed. The developer offered to remove the login requirement for specific user agents (eg Googlebot, et al.). I am afraid this might get us penalized. Any insight?
Intermediate & Advanced SEO | | TheEspresseo0 -
WMT Index Status - Possible Duplicate Content
Hi everyone. A little background: I have a website that is 3 years old. For a period of 8 months I was in the top 5 for my main targeted keyword. I seemed to have survived the man eating panda but not so sure about the blood thirsty penguin. Anyway; my homepage, along with other important pages, have been wiped of the face of Google's planet. First I got rid of some links that may not have been helping and disavowed them. When this didn't work I decided to do a complete redesign of my site with better content, cleaner design, removed ads (only had 1) and incorporated social integration. This has had no effect at all. I filed a reconsideration request and was told that I have NOT had any manual spam penalties made against me, by the way I never received any warning messages in WMT. SO, what could be the problem? Maybe it's duplicate content? In WMT the Index Status indicates that there are 260 pages indexed. However; I have only 47 pages in my sitemap and when I do a site: search on Google it only retrieves 44 pages. So what are all these other pages? Before I uploaded the redesign I removed all the current pages from the index and cache using the remove URL tool in WMT. I should mention that I have a blog on Blogger that is linked to a subdomain on my hosting account i.e. http://blog.mydomain.co.uk. Are the blog posts counted as pages on my site or on Blogger's servers? Ahhhh this is too complicated lol Any help will be much appreciated! Many thanks, Mark.
Intermediate & Advanced SEO | | Nortski0 -
Lots of city pages - How do I ensure we don't get penalized
We are planning on having a job posting page for each city that we are looking to hire new CFO partners in. But, the problem is, we have LOTS of locations. I was wondering what would be the best way to have similar content on each page (since the job description and requirements are the same for each job posting) without being hit by Google for having duplicate content? One of the main reasons we have decided to have location based pages is that we have noticed visitors to our site are searching for "cfo job in [location] but we notice that most of these visitors then leave. We believe it to be because the pages they land on make no mention of the location that they were looking for and is a little incongruent with what they were expecting. We are looking to use the following URLs and TItle/Description as an example: | http://careers.b2bcfo.com/cfo-jobs/Alabama/Birmingham | CFO Careers in Birmingham, AL | | Are you looking for a CFO Career in Birmingham, Alabama ? We're looking for partners there. Apply today! | | Any advice you have for this would be greatly appreciated. Thank you.
Intermediate & Advanced SEO | | B2B.CFO0 -
Push for site-wide https, but all pages in index are http. Should I fight the tide?
Hi there, First Q&A question 🙂 So I understand the problems caused by having a few secure pages on a site. A few links to the https version a page and you have duplicate content issues. While there are several posts here at SEOmoz that talk about the different ways of dealing with this issue with respect to secure pages, the majority of this content assumes that the goal of the SEO is to make sure no duplicate https pages end up in the index. The posts also suggest that https should only used on log in pages, contact forms, shopping carts, etc." That's the root of my problem. I'm facing the prospect of switching to https across an entire site. In the light of other https related content I've read, this might seem unecessary or overkill, but there's a vaild reason behind it. I work for a certificate authority. A company that issues SSL certificates, the cryptographic files that make the https protocol work. So there's an obvious need our site to "appear" protected, even if no sensitive data is being moved through the pages. The stronger push, however, stems from our membership of the Online Trust Alliance. https://otalliance.org/ Essentially, in the parts of the internet that deal with SSL and security, there's a push for all sites to utilize HSTS Headers and force sitewide https. Paypal and Bank of America are leading the way in this intiative, and other large retailers/banks/etc. will no doubt follow suit. Regardless of what you feel about all that, the reality is that we're looking at future that involves more privacy protection, more SSL, and more https. The bottom line for me is; I have a site of ~800 pages that I will need to switch to https. I'm finding it difficult to map the tips and tricks for keeping the odd pesky https page out of the index, to what amounts to a sitewide migratiion. So, here are a few general questions. What are the major considerations for such a switch? Are there any less obvious pitfalls lurking? Should I even consider trying to maintain an index of http pages, or should I start work on replacing (or have googlebot replace) the old pages with https versions? Is that something that can be done with canonicalization? or would something at the server level be necessary? How is that going to affect my page authority in general? What obvious questions am I not asking? Sorry to be so longwinded, but this is a tricky one for me, and I want to be sure I'm giving as much pertinent information as possible. Any input will be very much appreciated. Thanks, Dennis
Intermediate & Advanced SEO | | dennis.globalsign0 -
Can't find my site on Bing, since ages
Hi Guys, Well, the problem seems normal but I guess it's not. I have tried many things, and nothing changed it, now I give it last try... ask so maybe you will help me. The problem is.. I can't find my site nowhere in Bing, I mean nowhere by not in first 20 pages for my keywords "beauty tips" and the site is: http://www.beauty-tips.net/. In my opinion it should be pretty high... maybe it's too high so I can't see it ;). I never had special problems with Bing, was easier to be there "somewhere" than in google, but with this one is totally opposite. Any ideas? Thanks for your time!
Intermediate & Advanced SEO | | Luke220