Competitior 'scraped' entire site - pretty much - what to do?
-
I just discovered a competitor in the insurance lead generation space has completely copied my client's site's architecture, page names, titles, even the form, tweaking a word or two here or there to prevent 100% 'scraping'.
We put a lot of time into the site, only to have everything 'stolen'. What can we do about this? My client is very upset. I looked into filing a 'scraper' report through Google but the slight modifications to content technically don't make it a 'scraped' site.
Please advise to what course of action we can take, if any.
Thanks,
Greg -
5 Steps:
- Take screenshots of ALL webpages
- Get a report on exactly how many pages were scraped and have evidence (usually Googling the site titles is very effective)
- Take screenshots of the meta data: Right click, click on view source, and take screenshots
- Once all is recorded send the website owner a Cease and Desist letter informing them to take everything offline and manually take off the pages from search indexes
- If they don't comply at that point any IP lawyer will help if you have all the documentation. Some will take the work pro-Bono because there's huge money to be won, especially if you did all the work for them already.
Do NOT issue Cease and Desist letters without the screenshots. Usually what these guys will do is to change the appearance and add content to the meta tags and at that point they will claim it was not plagiarized while still hurting you. It will not stand up in court.
However, if you documented the scraping the only option the website owner will have is to take the plagiarized content offline completely. Any edits they do at that point is considered a scraping/plagiarism because you documented the offense.
We've been able to prosecute 13 companies already. One company we publicly called out on Twitter during a popular chat leading to the company's downfall in 4 weeks.
FIGHT FOR YOUR CONTENT!
-
Hi again Greg,
Just one more option that is available to you if you happen to have a Wordpress blog on the site (or have the option of rebuilding the entire site using Wordpress).
You could install the Bad Behavior plugin for Wordpress. The plugin is part of Project Honeypot, which tracks millions of bad ip addresses and gathers information from the plugin and feeds it back to the honeypot. Bad Behavior also works against link spam, email and content harvesters and other malicious sites.
Sha
-
Thanks for all the details Rami.
-
Hi Ryan,
As long as others are benefiting and not bothered I am happy to answer your questions.
When setting up Distil you are able to allocate a specific record (subdomain) or the entire zone (domain) to be delivered through our cloud. This allows you to segregate what traffic you would like us to serve and what content you would like to handle through other delivery mechanisms. Distil honors all no cache and cache control directives allowing you to easily customize what content we cache even if we are serving your entire site. Additioanlly we do not cache any dynamic file types ensuring that fresh content always functions properly. Location based content will continue to function correctly because our service continues to pass the end user's IP through the host headers.
Clients are able to reduce their infrastructure after migrating onto our platform however it is important to note that you cannot downgrade to a $5 shared hosting and expect the same results. Distil is able to reduce your server load by 50%-70% but the remaining 30%-50% will still be handled by your backend so you need ensure any hosting you use can still handle that.
Our specialty is dealing with bots and all of our security measures surrounding that protection are automated. Any security concerns outside of that scope will be handled reactively with each individual client.
Our service is constantly adapting to ensure that we provide a holistic solution and we go far beyond the suggestions mentioned above. Distil is set up to adapt intelligently on its own as it uncovers new bots and we also are always adding new algorithms to catch bots. I do not want to say we are bot proof but we will catch well over 95% of bots and will quickly adapt to catch and stop any new derivates.
Similar to most other cloud or CDN type services Google Analytics will not be impacted at all.
Amazon offers cloud computing where as Distil offers a managed security solution in the cloud. We utilize several cloud providers, including Amazon, for our infrastructure but what makes Distil unique is the software running on that infrastructure. Amazon simply provides the computing power, we provide the intelligence to catch and stop malicious bots from scraping your website and ensure your content is protected.
Rami Essaid
www.distil.it -
Greg,
There is only one thing that helps you to move forward with your client. Rewrite your texts and upgrade or tweak you site to better UX. That way the scraped site will look like cheap copy. Have done that in past. I know it's not fair but thats how you can put this behind you.
PS. Rapid linkbuilding to forums and blogs will get one banned
-
Thank you for the additional details Rami. If you are willing to share further information, I do have a few follow up questions.
-
Do you serve 100% of the content to users? Or do users still visit the site? I am interested to understand how dynamic content would be affected. Will location based content where information changes based on a user's IP still function properly or is there likely to be issues? Will "fresh" content still function properly such as a new blog article which is receiving many comments, or a forum discussion.
-
Since you are caching the target site, how much does the target site's speed optimization still play? If a client's site is on a shared server vs a dedicated server, would it still be a concern for speed?
-
You mentioned dealing with security concerns. Are your actions taken proactively? Or does a client need to recognize there is an issue and contact your company?
-
Specific to the original question asked in this Q&A, can some bots get past your system? Or do you believe it to be bot-proof? I am specifically referring to bad bots, not those of major search engines.
-
How would Google Analytics and other tools which monitor site traffic be impacted by your service? I am trying to determine if your service is "normal" cloud service or if there are differences.
-
What differences are there between the services you offer and the regular Amazon cloud service?
Thanks again for your time.
-
-
Hi Ryan,
Thanks for catching my typo and your interest. I am happy to answer your questions publicly and will definitely add your questions to the FAQ section we are currently working on.
The company is at distil.it and yes we are an american company located in San Fran despite the Italian TLD.
We do not host your files permanently on our servers, instead our service is layered on top of a standard host. We do however cache your content on our edge nodes exactly like a CDN to accelerate your site. This feature is already included in the pricing model.
With the enterprise plan we will work with clients to responde to specific threats that an organization may face. This could mean blocking certain countries from accessing your site, blocking certain IP ranges, or dealing with DoS attacks.
Although we can respond to most security concerns, there are still some security threats outside our scope.
Our page optimization and acceleration techniques are recognized by Pagespeed and YSlow and the results are measurable. With one case study we improved our customer's page load time by 55%. There are still other optimization tricks that we do not handle such as combining images into CSS sprites, or setting browser caching.
We try to accomodate our customers the best we can. Basic redirects like the one you mention would not be hard and we would happily do this for regular customer within reason.
Pricing for the service is based on bandwidth used and there is no extra cost for storage.For your specific scenario though we may not be a complete solution since our service is not currently optimized for video delivery.
Please feel free to ask any additional questions, we are happy to answer and help!
Rami
-
Hi Rami.
Sharing information about a relevant and useful service isn't advertising, it's educational and informative. You could have used a random name and mentioned the service, but you shared the information in a transparent, quality manner and I for one appreciate it.
I believe your signature is missing a character and you meant to use www.distil.it.
After reading about your product, I have some follow up questions. I can send the questions to your privately, but I think others would benefit from the responses so I will ask here if it is ok. I would humbly suggest adding this information to your site where appropriate or possibly in a FAQ section. If the information is already on your site and I missed it, I apologize.
-
It sounds like your solution offers cloud hosting. Is that correct? If so, is your hosting complete? In other words, do I maintain my regular web host or is your service in addition to my regular host?
-
It sounds like your Cloud Acceleration service is a CDN. Is that correct? Is this service an extra cost on top of the costs listed on your pricing page?
-
The Enterprise solution offers "Custom Security Algorithms". Can you share more details about what is involved?
-
Would it be fair to say your service handles 100% of security settings?
-
You mentioned caching, compression and minification. Would it be fair to say your service handles 100% of optimization settings? Along these lines, is your solution offered in such a manner to where your results are recognized by PageSpeed and YSlow? I always value results over any tool, but some clients latch onto certain tools and it would offer additional value if the tools recognized the results.
-
While your site ccTLD is .it, your contact number listed on your home page appears in the San Francisco area. Are you a US-based company?
-
You mention "the best support in the industry". For your regular (i.e. non-premium
) users, if a non-technical client requested basic changes such as to direct URLs which did not end in a slash to the equivalent URL which did end in a slash throughout their site, do you make these changes for them? How far are you able to assist customers? (I know it's a dangerous question to answer on some levels for you, but inquiring minds would like to know). -
I did not notice any pricing related to space on disk. I have a client who provides many self-hosted videos and the site is 30 GB. Are there any pricing or other issues related to the physical size of a site?
Your solution intrigues me because it addresses a wide array of hosting issues ranging from site speed to security to content scraping. I am anxious to learn more.
-
-
Thanks. Rami:
Your solution and offer are fascinating. And no worries about the shameless plug pitfall.
The issue for me is clients who may not quite fit into the category of being victims of the scraping/complete sleaze bag racket.
Rather. they are industry leaders who are often victimized by leading content farms (and you know who I mean!) Some poor schmuck gets 15 bucks spending 15 minutes lifting our content without attribution or links by paraphrases it..
Ironically, said content farms claim to have turned over a new leaf, hired reputable journalists as so-called "editors-in-chief" and now want to "partner" with our leading SMEs.
As they used to say in 19th century Russian novels "What is to be done?"
-
hmmm...I like to pick my battles.
Scumbags are scumbags and will always find a way to win in the short term.
I like to live by two things my grandma taught me a long time ago...
"What goes around comes around" and "revenge is a dish best served cold"
As to there being an easy way out - you're an SEO Ryan! You know the deal.
Sha
-
Hi All,
To follow up on Ryan's last post "offer an anti-bot copyright protection program ", that is exactly what we have created at Distil. We are the the first turnkey cloud solution that safeguards your revenue and reputation by protecting your web content from bots, data mining, and other malicious traffic.
I do not mean to shamelessly advertise but it seems relevant to mention our service. If anyone is interested in testing the solution please feel free to message me and I will be happy to extend a no obligation 30 day trial.
Rami Founder, CEO
www.distil.it -
Well darn, so there is no easy way out! I think this is a fantastic opportunity for you. You can create Sha Enterprises and offer an anti-bot copyright protection program which would protect sites.
-
Hi Ryan,
In this case Greg already knows the site has been scraped and duplicated. Blocking the scraper and serving the image via the bot-response php script is simply a "gift" to the duplicate site if they return to update their stolen content as they often do.
It is entirely possible to put the solution in place for well known scrapers such as Pagegrabber etc, but there are thousands of them, the people using them can easily change the name when they have been outed and anyone can write their own.
I understand that everyone wants a "list", but even if you Google "user agent blacklist" and find one, there will be problems. Adding thousands of rules to your .htaccess will eventually cause processing issues, the list will constantly be out of date etc.
As I explained at the outset, the key is to be aware of what is happening on your server and respond where necessary. Unfortunately, this is not a "set and forget" issue. In my experience though, bots will likely be visible in your logs long before they have scraped your entire site.
Sha
-
Love it!
-
I love the idea if we can figure out a way to get it to work. It would require someone stealing your code, you discovering the theft, putting the steps in place and then the bad site coming back for more.
-
I guess the use of bot-response.php and bot-response.gif is the gentle internet version of a public shaming campaign.
Sometimes it's a matter of picking your battles, but engineering enough of a win to make your client feel better without launching into an all-out war that could end up costing way more than you're willing to pay.:)
Sha
-
I agree you have to be very careful.
I am only suggesting this approach might be considered in certain circumstances.
Public shaming is an intermediate step somewhere between sending a friendly note, a C&D letter, and suing, provided:
- the other company's identity is known
- the other company cares about its reputation
I am not a lawyer. Nor do I play one on the Internet.
The other company might claim "tortious interference" in its business. (That was the claim against CBS in the tobacco case.) But it's a stretch. A truthful story in a mainstream media outlet poses little risk, IMHO. Any competent attorney could make the case that the purpose of the story was to inform the public. As for libel, you have to prove "actual malice" or "regardless disregard for the truth" an almost impossible standard to meet of proving your were lying and knew you were lying.
But who wants to go to court? One company I worked for had copyright infringement issues. Enthusiastic fans were using the name and logo without consent. A friendly email was usually all it took for them to either cease and desist or become official affiliates.
But these were basically good people who infringed out of ignorance.
It's different if you're dealing with dirtbags.
-
I love the idea, but there are two concerns I have about this approach. In order for this to work, the company has to be known. Usually known companies don't participate in content scraping.
Also, if you do launch a successful public shaming campaign, you could possibly open yourself up to legal damages. I know you are thinking "What? They stole from me!" You are taking action with the express purpose of harming another business. You need to be extremely careful.
There have been multiple court cases where a robber successfully sued a home or business owner when they were injured during a robbery. Of course we can agree that sounds insane, but it has really happened and this situation is much more transparent. The other company can claim you stole the content from them, and then you smeared the company. I can personally civil court cases are not set up so the good guy always wins or for principles to be upheld. Each side makes a legal case, the costs can quickly run into tens of thousands of dollars, and the side with the most money will often win. Be very careful before taking this approach.
-
Thanks. Very helpful.
-
It is a formal legal notification sent to the company involved. I perform research of the site information, contact information and domain registration information to determine the proper party involved. I also send the C&D via registered mail with proof of delivery. After the document has been delivered, I also sent the document to the site's "Contact Us" address. I take every step reasonably possible to ensure the document is received by the right party within the company, and I can document the date/time of receipt.
The letter provides the following:
-
identifies the company which owns the copyrighted or trademarked material
-
offers a means to contact the copyright and trademark owner
-
identifies the copyright / trademark owner has become aware of the infringement
-
provides proof of ownership such as the copyright number, trademark number, etc.
-
identifies the location of the infringing content
-
identifies my client has suffered harm as a result of the infringement. "Harm" can range from direct damages such as decreased sales, decreased website traffic, etc. or potential damage such as confusion in the marketplace.
Once the above points are established, the Cease and Desist demand is made.I also provide a follow up date by which the corrective action needs to be completed. Finally the specific next steps are covered with the following statement:
"This contact represents our goodwill effort to resolve this matter quickly and decisively. If further action is required please be advised of statute 15 U.S.C. 1117(a), sets out the remedies available to the prevailing party in trademark infringement cases. They are: (1) defendant’s profits, (2) any damages sustained by the plaintiff, (3) the costs of the action, and (4) in exceptional cases, reasonable attorney’s fees."
There are a couple additional legal stipulations added as required by US law. The C&D is then signed, dated and delivered.
This letter works in a high percentage of cases. When it fails, a slightly modified version is sent to the web host. If that fails, then the next recourse is requesting Google directly to remove the site or content from their index.
If all fails, you can sue the offending company. If you do go to court, the fact you went through the above process and did everything possible to avoid court action will clearly benefit your case. I have never gone to that last step and I am not an attorney but perhaps Sarah can comment further?
-
-
What does the C & D letter say? What is the threat? All the subsequent steps? Or do you just keep it vague and menacing (eg. "any and all remedies, including legal remedies")
-
Excellent answers.
On top of everything else, how about some out of the box thinking: public shaming.
It's a risky strategy, so it needs careful consideration.
But it's pretty clear your client is the victim of dirty pool.
We're talking truth and justice and virtue here, folks. Forces of darkness vs. forces of light.
If I were still a TV news director, and someone on my staff suggested this as a story idea, I'd jump all over it.
And the company that copied the site would not emerge looking good.
-
Hi Ryan,
The major problem is that any experienced programmer can easily write their own script to scrape a site. So there could be thousands of "bad bots" out there that have not been seen before.
There are a few recurring themes that appear amongst suspicious User Agents that are easy to spot - generally anything that has a name including words like grabber, siphon, leach, downloader, extractor, stripper, sucker or any name with a bad connotation like reaper, vampire, widow etc. Some of these guys just can't help themselves!
The most important thing though is to properly identify the ones that are giving you a problem by checking server logs and tracing where they originate from using Roundtrip DNS and WhoIs Lookups.
Matt Cutts wrote a post a long time ago on how to verify googlebot and of course the method applies to other search engines as well. The doublecheck is to then use WhoIs to verify that the IP address you have falls within the IP range assigned to Google (or whichever search engine you are checking).
If you are experienced at reading server logs it becomes fairly easy to spot spikes in hits, bandwidth etc which will alert you to bots. Depending which server stats package you are using, some or all of the bots may already be highlighted for you. Some packages do a much better job than others. Some provide only a limited list.
If you have access to a programmer who is easy to get along with, the best way to get your head around this is to sit down with them for an hour and walk through the process.
Hope that helps,
Sha
PS - I'm starting to think you sleep less than I do!
-
Wow! Amazing information on the bots Sha. I never knew about this approach. My thoughts were just about how bad bots would ignore the robots.txt file and not much else a site owner can do.
I have to think there are a high number of "bad" bots out there using various names which often change. It also seems likely the IP addresses of these bad bots change frequently. By any chance do you, or anyone else, know of some form of "bad bots" list which is updated?
It seems like too much work for any normal site owner to compile and maintain a list of this nature.
I know...this is a stretch but hey, it doesn't hurt to ask, right?!
-
Hi Greg,
Awesome information there from Ryan!
Implementing the authorship markup is important in that it basically "outs" anyone who has already stolen your content by telling Google that they are not the original author. With authorship markup properly implemented, it really doesn't matter how many duplicates there are out there, Google will always see those sites as imposters, since no-one else has the ability to verify their authorship with a link back from your Google profile
It is possible to block scrapers from your server (blacklist) using IP address or User Agent if you are able to identify them. Identification is not very difficult if you have access to server logs as there will be a number of clues in the log data. These include excessive hits, bandwidth used, requests for java and css files and high numbers of 401(unauthorized) and 403 (forbidden) HTTP error codes.
Some scrapers are also easily identifiable by User Agent (name). Once the IP address or user agent is known, instructions can be given to the server to block it and if you wish, to serve content which will identify the site as having been scraped.
If you are not able to specifically identify the bot(s) responsible, it is also possible to use alternatives like whitelisting bots that you know are OK. This needs to be handled carefully as ommissions from the whitelist could mean that you have actually banned bots that you want to crawl the site.
If using a LAMP setup (Apache server), then instructions are added to the .htaccess file using PHP. For a Windows server, you use a database or text file with filesystemobject to redirect them to a dead end page. Ours is a LAMP Shop, so I am much more familiar with the .htaccess method.
If using .htaccess you have the choice of returning a 403 FORBIDDEN HTTP error, or using the bot-response.php script to serve an image which identifies the site as scraped
If using bot-response.php, the gif image should be made large enough to break the layout in the scraped site if they serve the content somewhere else. Usually a very large gif that reads something like: "Content on this page has been scraped from yoursite.com. If you are the webmaster please stop trying to steal our content".will do the job.
There is one VERY BIG note of caution if you are thinking of blocking bots from your server. You really need to be an experienced tech to do this. It is NOT something that should be attempted if you don't understand exactly what you are doing and what precautions need to be taken beforehand. There are two major things to consider:
- You can accidentally block the bots that you want to crawl your site. (Major search engines use many different crawlers to do different jobs. They do not always appear as googlebot, slurp etc)
- It is possible for people to create fake bots that appear to be legitimate. If you don't identify these you will not solve the scraping problem.
The authenticity of bots can be verified using Roundtrip DNS Lookups and WhoIs Lookups to check the originating domain and IP address range.
It is possible to add a disallow statement for "bad bots" to your robots.txt file, but scrapers will generally ignore robots.txt by default, so this method is not recommended.
Phew! Think that's everything covered.
Hope it helps,
Sha
-
Does the canonical tag work after the fact?
The canonical tag only works if the scraping site is dumb enough or lazy enough not to correct it. Fortunately, this applies in many circumstances.
Also, the scraping might have been a one time thing, but often they will continue to scrap your site for updates and new content. It depends. If they return for new content, then yes it would apply.
My suggestion would be to copyright your home page immediately. Additionally, add a new page to your site and copyright it. Then you have two pages on your site which are copyrighted which offers you a lot more protection then you presently offer.
One item I forgot to mention, Google Authorship. Use it.
http://googlewebmastercentral.blogspot.com/2011/06/authorship-markup-and-web-search.html
http://www.google.com/support/webmasters/bin/answer.py?answer=1408986
-
Thanks - I am going to get started on these. Does the canonical tag work after the fact?
Thanks,
Greg -
Hi Greg.
Having a site scraped is unfortunately common. It is a frustrating experience which takes time and effort to address. Below are some suggestions:
-
going forward you can copyright at least some pages within your site. Even if you do not wish to copyright every page, by having some pages copyrighted you will have very clear legal rights if your entire site is scraped.
-
add the canonical tag to each page, along with various clues throughout the site to indicate it really belongs to you. Generally speaking, these operations are a bit lazy which is why they steal from others rather then create their own content. If they do not recognize the canonical tag then you might receive all the SEO credit for the second site, and either way Google will understand to index your site as the primary source of the content.
-
You might rename a random image to mysite.com.jpg as a suggestion. There are numerous other means by which you can drop indicators the content is really yours. The reason this step is helpful is clearly the site which stole your content fall into the no ethics category. They clearly know what they are doing and likely have used this practice before and will do so again. As part of the process, they often will deny everything and may even claim you stole the site from them.These clues can assist proving you are the true owner.
-
you should contact the offending site via registered mail with a "Cease and Desist" notification. Be certain to provide a deadline. I use 10 days as a timeline.
-
If the C&D does not work, contact their web host with a DMCA notice. If the host is reputable, they will honor the DMCA and take down the site. The problem is the host is required to contact the site and share your claim with the site owner. The site owner can respond with a statement saying the content is theirs, and then there is nothing further the host can do UNLESS you have a registered copyright or you have a helpful host who is willing to consider your evidence (i.e. the clues you left) and help you (their non-customer) over their paying customer. Some hosts are good this way.
-
You can always take legal action and sue the website and host in court. Again, the copyright is very important in court as it provides you with a significant advantage. Some sites will actually defend themselves in court with the intention to delay the trial as long as possible and drive up your expenses to literally tens of thousands of dollars so you give up.
The above process will work in a lot of cases, but not all. When it doesn't work, you have to take other approaches. Sometimes the site is owned, operated and hosted in a foreign country. Sometimes the country does not have enforceable copyright laws. In these cases, and in the others above, you can file the complaint with Google and they have the ability to remove the offending site from their index.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can a recruitment company get 'credit' from Google when syndicating job posts?
I'm working on an SEO strategy for a recruitment agency. Like many recruitment agencies, they write tons of great unique content each month and as agencies do, they post the job descriptions to job websites as well as their own. These job websites won't generally allow any linking back to the agency website from the post. What can we do to make Google realise that the originator of the post is the recruitment agency and they deserve the 'credit' for the content? The recruitment agency has a low domain authority and so we've very much at the start of the process. It would be a damn shamn if they produced so much great unique content but couldn't get Google to recognise it. Google's advice says: "Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content." - But none of that can happen. Those big job websites just won't do it. A previous post here didn't get a sufficient answer. I'm starting to think there isn't an answer, other than having more authority than the websites we're syndicating to. Which isn't going to happen any time soon! Any thoughts?
Intermediate & Advanced SEO | | Mark_Reynolds0 -
Website Isn't Ranking & I'm Not Sure Why Based On The Data
Hi Moz Community,
Intermediate & Advanced SEO | | ErrickG
I am having an issue that has been killing me for some time and I could really use another opinion. One of my client’s websites hasn't been ranking for some time and I can't put my finger on it. There are no issues showing up in the webmaster tools. If you compare the site with the tops ranking sites for the websites number one keyword, the website is just as good as everyone else. My clients website is the first one on the left in the attachment. We have better quality content but instead of showing up on page 1,2,3 the site is on page 21. I am just at a lost. Anyone have any thoughts outside looking in. Thanks,
Errick rrLJZ2G0 -
My site has a loft of leftover content that's irrelevant to the main business -- what should I do with it?
Hi Moz! I'm working on a site that has thousands of pages of content that are not relevant to the business anymore since it took a different direction. Some of these pages still get a lot of traffic. What should I do with them? 404? Keep them? Redirect? Are these pages hurting rankings for the target terms? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
How to remove my site's pages in search results?
I have tested hundreds of pages to see if Google will properly crawl, index and cached them. Now, I want these pages to be removed in Google search except for homepage. What should be the rule in robots.txt? I use this rule, but I am not sure if Google will remove the hundreds of pages (for my testing). User-agent: *
Intermediate & Advanced SEO | | esiow2013
Disallow: /
Allow: /$0 -
Is Link Building Pretty Much Irrelevant Now?
When I ask this, I am not under the illusion that links do nothing. I am more curious if from an SEO strategy perspective is executing a link building campaign a really unwise use of time and resources? Currently my company literally has every single one of our SEO clients ranked on page 1 rank 1 for their most high value keyword and in the top 5 results for another 5 to 10 high value keywords. we have almost done no link building for these clients. I mean we have established a handful of really good links, but thats it. I look at some competitors link profiles and they have hundreds yet our clients sites are outranking them. I am almost at the point of not implementing link building initiatives. I mean we will still establish high quality links as they naturally present themselves, but as far as investing time and resources building links i am leaning towards not even doing that anymore or atleast for a while to see what the effect will be. We seem to find better results when we spend our time building great additional pages and following all web best practices. Just curious as to what other think? Thanks SEOmoz Community!
Intermediate & Advanced SEO | | WebbyNabler0 -
Strange situation - Started over with a new site. WMT showing the links that previously pointed to old site.
I have a client whose site was severely affected by Penguin. A former SEO company had built thousands of horrible anchor texted links on bookmark pages, forums, cheap articles, etc. We decided to start over with a new site rather than try to recover this one. Here is what we did: -We noindexed the old site and blocked search engines via robots.txt -Used the Google URL removal tool to tell it to remove the entire old site from the index -Once the site was completely gone from the index we launched the new site. The new site had the same content as the old other than the home page. We changed most of the info on the home page because it was duplicated in many directory listings. (It's a good site...the content is not overoptimized, but the links pointing to it were bad.) -removed all of the pages from the old site and put up an index page saying essentially, "We've moved" with a nofollowed link to the new site. We've slowly been getting new, good links to the new site. According to ahrefs and majestic SEO we have a handful of new links. OSE has not picked up any as of yet. But, if we go into WMT there are thousands of links pointing to the new site. WMT has picked up the new links and it looks like it has all of the old ones that used to point at the old site despite the fact that there is no redirect. There are no redirects from any pages of the old to the new at all. The new site has a similar name. If the old one was examplekeyword.com, the new one is examplekeywordcity.com. There are redirects from the other TLD's of the same to his (i.e. examplekeywordcity.org, examplekeywordcity.info), etc. but no other redirects exist. The chances that a site previously existed on any of these TLD's is almost none as it is a unique brand name. Can anyone tell me why Google is seeing the links that previously pointed to the old site as now pointing to the new? ADDED: Before I hit the send button I found something interesting. In this article from dejan SEO where someone stole Rand Fishkin's content and ranked for it, they have the following line: "When there are two identical documents on the web, Google will pick the one with higher PageRank and use it in results. It will also forward any links from any perceived ’duplicate’ towards the selected ‘main’ document." This may be what is happening here. And just to complicate things further, it looks like when I set up the new site in GA, the site owner took the GA tracking code and put it on the old page. (The noindexed one that is set up with a nofollowed link to the new one.) I can't see how this could affect things but we're removing it. Confused yet? I'd love to hear your thoughts.
Intermediate & Advanced SEO | | MarieHaynes0 -
I'm afraid I may have messed up my site's organization
So I recently started working on an existing site for a company, and I'm afraid I may have done something to lose some backlinks. So to start off, say the website is www.domain.net and when I arrived domain.net and www.domain.net showed up as two separate sites so I changed my web.config file to direct all domain.net to www.domain.net The homepage was called default.asp, and I wanted the homepage to always show up as www.domain.net instead of www.domain.net/default.asp. Of course they both showed the same thing but I couldn't figure it out. So I removed www.domain.net/default.asp from indexing and changed the my internal links to the homepage to point at www.domain.net instead of simply pointing at the file default.asp. So now www.domain.net/default.asp still brings up the page, but I want it to revert to www.domain.net. I'm also a little worried because I noticed that one of my incoming links points at www.domain.net/default.asp and it doesn't get passed along to www.domain.net and I think i may have damaged my sites SEO I guess this is a very complicated and roundabout way of saying this, but how can I get www.domain.net/default.asp to take you to www.domain.net
Intermediate & Advanced SEO | | bcrabill0 -
How 'Off Topic' can I go - site wide?
Hello, I am currently number 1 for a competitive keyword - so don't want to push the wrong button and self destruct! My site is highly focused on one relatively narrow niche with about 50-60 pages of content bang on topic. I was wondering if Google will discredit my site in any way if I start adding pages that are** 'loosely related' **to the overall theme of my niche. Some of them are what you might call sister concepts with maybe one mention of my target keyword in the body..... Does the algo value what percentage of the whole site's content is on/ off topic? If so how important is this as a factor? Thanks a lot
Intermediate & Advanced SEO | | philipjterry0