Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicated content generated by keywords
Hello! I am kind of new to SEO and MOZ, so I really need your help to understand why some of my keywords generate duplicated content. Meaning, in my blog posts I use various SEO keywords. It shows up that in my MOZ crawl analysis, I have these keywords listed as duplicates: so two/three different keywords are pointing to the same articles and are considered duplicates? I really don't understand how it is possible. Did it also happen to you? I highly appreciate it. Thank you
Moz Pro | | DianaC0 -
Duplicate Content - Multiple URL's
I know a few of these problems come from products being in the same categories but I have no idea how to get rid of the url's that are showing duplicate content when the product is in the exact same place. Hard to explain, but here are URL examples. http://www.ocelco.com/store/pc/www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-688p3308.htm http://www.ocelco.com/store/pc/www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-696p3308.htm http://www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-p3308.htm http://www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-688p3308.htm Any Idea's how to fix / get rid of these URL's? Thanks!
Moz Pro | | Mike.Bean0 -
Since July 1, we've had a HUGE jump in errors on our weekly crawl. We don't think anything has changed on our website. Has MOZ changed something that would account for a large leap in duplicate content and duplicate title errors?
Our error report went from 1,900 to 18,000 in one swoop, starting right around the first of July. The errors are duplicate content and duplicate title, as if it does not see our 301 redirects. Any insights?
Moz Pro | | KristyFord0 -
Duplicate titles reported with canonical
Hi Mozzers, In the reports it is saying that I have some duplicate content and titles even though there is a canonical tag on them, is anyone else getting this?
Moz Pro | | KarlBantleman0 -
Duplicate Page content
I found these URLs in Issue: Duplicate Page Content | http://www.decoparty.fr/Products.asp?SubCatID=4612&CatID=139 1 0 10 1 http://www.decoparty.fr/Products.asp?SubCatID=4195&CatID=280 1 0 10 1 http://www.decoparty.fr/Catproducts.asp?CatID=124 | 28 | 0 | 12 | 1 |
Moz Pro | | partyrama0 -
Crawl Diagnostics : Problem of display in Excell.
Hi Mozers, I've just finished watching the Crawl Diagnostics Webinar and when I try to export one of my campaign into the CSV format, I've a display problem into Microsoft Excell. Every headtitles are into the "A" column so, I can't do anything with that : I can't organize the data,... It's totally unreadable. What can I do? Thank you for yours answers. Jonathan
Moz Pro | | JonathanLeplang0 -
Would someone be willing to site audit us and give us some tips/advice?
Hi, Would someone or a whoever is willing to, help me get me head around what the OSE is telling me? I had a few quotes from firms who used this and they basically tried to sell me the world for £500 a month. So I have done my own SEO over the past 5 months and I think the figures are slowly going up, as are visits and sales (albeit slower than the visits). The site is www.designerboutique-online.com, if anyone is willing to take a look around, throw me some constructive criticism, some praise and some tips/hints etc... that would be great. Our budget is very small, hence the reason for doing it myself, so maybe if I could take some advice and put it into action, I could start making some big changes. Some competitors to compare us with are: www.odsdesignerclothing.com www.tessuti.co.uk www.uniquemenswear.co.uk www.endclothing.co.uk www.psyche.co.uk Thanks Will
Moz Pro | | YNWA0 -
SEOmoz crawl diagnostics report - what are the duplicate pages urls?
I just see the number of duplicates but not what the urls of the duplicates are? I don't see it in the export either, but maybe I'm missing it Cheers S
Moz Pro | | firstconversion0