Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Next JS and Missing content
Hello
Moz Pro | | 4thWhale
We recently migrated our page to next JS which is supposed to be great for SEO
On almost all our pages we are getting the same errors Missing Canonical Tag Missing Title Missing or Invalid H1 Missing Description We don't understand this because we have all of that content on every page. We believe that maybe NextJs is having a incompatibility with Moz. Has anyone had any experience with this?0 -
Pages with Duplicate Content Error
Hello, the result of renewed content appeared in the scan results in my Shopify Store. But these products are unique. Why am I getting this error? Can anyone please help to explain why? screenshot-analytics.moz.com-2021.10.28-19_53_09.png
Moz Pro | | gokimedia0 -
Dynamic contents causes duplicate pages
Technical help required - please!
Moz Pro | | GBCweb
In our Duplicate Content Pages Report I see a lot of duplicate pages that are created by one URL plus several versions of the same page with the dynamic content, for example,
http://www.georgebrown.ca/immigranteducation/programs
http://www.georgebrown.ca/school-program.aspx?id=1909&Sortid=Study
http://www.georgebrown.ca/school-program.aspx?id=1909&Sortid=Term
http://www.georgebrown.ca/school-program.aspx?id=1909&Sortid=Certification
http://www.georgebrown.ca/school-program.aspx?id=1909&Sortid=Title How do we solve it?0 -
How to export keyword list to duplicate for another campaign in Moz?
I've create a couple campaigns (sub-domains) but I want them to largely have the same keywords as my other already create campaigns. How can I bring across these keywords? Export and then import them somehow?
Moz Pro | | BlueLinkERP6 -
Internal Links Data on OSE and poor ranking despite excellent metrics
Hi folks, I use OSE a lot for comparing where we stand against our competitors and I'm always surprised to see most of our competitors have loads more internal links than we do. For example, comparing 'www.kensyard.co.uk' with 'www.sterlingbuild.co.uk', we are better on every Root Domain Metric except for the Total Links metric and this is down to a much smaller number of internal links. They are ranking better than us in Google so I'm wondering if there is something wrong with the way our site is coded / structured that means Google and OSE are not picking up our internal link network properly? We have over nine times as many external root domains linking to our site than theirs, which is surely a far more valuable metric than internal links right? Or is there an issue with our internal link structure that is confusing Google and causing us to rank lower? Many thanks for any advice, Luke
Moz Pro | | LukeyB300 -
The pages that add robots as noindex will Crawl and marked as duplicate page content on seo moz ?
When we marked a page as noindex with robots like {<meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">noindex</a>" />} will crawl and marked as duplicate page content(Its already a duplicate page content within the site. ie, Two links pointing to the same page).So we are mentioning both the links no need to index on SE.But after we made this and crawl reports have no change like it tooks the duplicate with noindex marked pages too. Please help to solve this problem.
Moz Pro | | trixmediainc0 -
Campaign 4XX error gives duplicate page URL
I ran the report for my site and had many more 4xx errors than I've had in the past month. I updated my .htaccess to include 301 statements based on Google Webmaster Tools Crawl Errors. Google has been reporting a positive downward trend in my errors, but my SEOmoz campaign has shown a dramatic increase in the 4xx pages. Here is an example of an 4xx URL page: http://www.maximphotostudio.net/engagements/266/inniswood_park_engagements/http:%2F%2Fwww.maximphotostudio.net%2Fengagements%2F266%2Finniswood_park_engagements%2F This is strange because URL: http://www.maximphotostudio.net/engagements/266/inniswood_park_engagements/ is valid and works great, but then there is a duplicate entry with %2F representing forward slashes and 2 http statements in each link. What is the reason for this?
Moz Pro | | maximphotostudio1 -
Company Name in Page Title creating thousands of "Duplicate Page Title" errors
I am new, and I just got back my crawl results (after a week or more). The first thing I noticed is that the "duplicate page title" is in the thousands, my urls and page titles are different. The only thing I can see is that our company name is at appended to the name of every title. I did search and found one other person with this problem, but no answer was given. Can anyone offer some advice? This doesn't seem right... Thanks,
Moz Pro | | AoyamaJPN0