Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can Image Quality In Online Store Effect Main Site SEO?
My client is building a new shopping cart that will be mobile friendly BUT will have terrible image quality - tiny images, with text on them. How will these tiny images impact the SEO on the main site? Should they put the store on the mainsite.com or on a subdomain (store.mainsite.com)? Does google see subdomains as part of the main site? I would think that having thousands of shopping product pages could be beneficial to the main site SEO, as long as the images don't negate the content. Thoughts?
Intermediate & Advanced SEO | | jerrico10 -
Duplicate Page getting indexed and not the main page!
Main Page: www.domain.com/service
Intermediate & Advanced SEO | | Ishrat-Khan
Duplicate Page: www.domain.com/products-handler.php/?cat=service 1. My page was getting indexed properly in 2015 as: www.domain.com/service
2. Redesigning done in Aug 2016, a new URL pattern surfaced for my pages with parameter "products-handler"
3. One of my product landing pages had got 301-permanent redirected on the "products-handler" page
MAIN PAGE: www.domain.com/service GETTING REDIRECTED TO: www.domain.com/products-handler.php/?cat=service
4. This redirection was appearing until Nov 2016.
5. I took over the website in 2017, the main page was getting indexed and deindexed on and off.
6. This June it suddenly started showing an index of this page "domain.com/products-handler.php/?cat=service"
7. These "products-handler.php" pages were creating sitewide internal duplicacy, hence I blocked them in robots.
8. Then my page (Main Page: www.domain.com/service) got totally off the Google index Q1) What could be the possible reasons for the creation of these pages?
Q2) How can 301 get placed from main to duplicate URL?
Q3) When I have submitted my main URL multiple times in Search Console, why it doesn't get indexed?
Q4) How can I make Google understand that these URLs are not my preferred URLs?
Q5) How can I permanently remove these (products-handler.php) URLs? All the suggestions and discussions are welcome! Thanks in advance! 🙂0 -
How get an image on a third party site rank high on goode images?
Hello, I have sometimes articles written about my product online, is there anything else I can do except make a good file name for it, perhaps I can ask the site owner to modify in the article to make it rank higher? Also on some small websites I can see that images rank very high for the specific search term that is difficult to rank for in images, if I were to contact the site with a sponsored post request, what I should make sure the site adds except filename to that sponsored post... I think there are also some other methods such as reddit to make images rank high on third party page, just need to find out how... thanks a lot
Intermediate & Advanced SEO | | bidilover0 -
Blocking Certain Site Parameters from Google's Index - Please Help
Hello, So we recently used Google Webmaster Tools in an attempt to block certain parameters on our site from showing up in Google's index. One of our site parameters is essentially for user location and accounts for over 500,000 URLs. This parameter does not change page content in any way, and there is no need for Google to index it. We edited the parameter in GWT to tell Google that it does not change site content and to not index it. However, after two weeks, all of these URLs are still definitely getting indexed. Why? Maybe there's something we're missing here. Perhaps there is another way to do this more effectively. Has anyone else ran into this problem? The path we used to implement this action:
Intermediate & Advanced SEO | | Jbake
Google Webmaster Tools > Crawl > URL Parameters Thank you in advance for your help!0 -
How can a Page indexed without crawled?
Hey moz fans,
Intermediate & Advanced SEO | | atakala
In the google getting started guide it says **"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
" How can it happen, I dont really get the point.
Thank you0 -
Why is this site have a PR 0 rank? Anyone can figure this out? LegionSafety.com
Our site dropped in PR and we haven't done anything and not sure why the drop. Anyone have any recommendations?
Intermediate & Advanced SEO | | legionsafety0 -
Can PDF be seen as duplicate content? If so, how to prevent it?
I see no reason why PDF couldn't be considered duplicate content but I haven't seen any threads about it. We publish loads of product documentation provided by manufacturers as well as White Papers and Case Studies. These give our customers and prospects a better idea off our solutions and help them along their buying process. However, I'm not sure if it would be better to make them non-indexable to prevent duplicate content issues. Clearly we would prefer a solutions where we benefit from to keywords in the documents. Any one has insight on how to deal with PDF provided by third parties? Thanks in advance.
Intermediate & Advanced SEO | | Gestisoft-Qc1 -
The SEO process...step by step
In what order do you carry out your SEO? do you start with keyword research or competitor analysis or is it on-page and then on to link building, do you concentrate on article submissions and then move on to social networking? List your step by step process here.
Intermediate & Advanced SEO | | IPIM0