Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate canonical tag issue
i have this site https://www.dealsmango.com/ which i have selected for canonical , but google is still selecting my old website https://www.selldealsmango.com/ , i have removed everything from old site only one page with new site link, and also put 301 redirect , but still when i click on request for indexing on google search console same error appears regarding duplicate canonical tag . what should i do? remove the canonical tag from old site which i don't want google to index, or what will be the best possible solution.
Moz Pro | | MudassirSultn0 -
Can Moz generate 404 Broken links report in an excel format
Hi all, Hope you are doing good and that all is well at your end. I'm a marketing team member here at Zephyr and I take care of the content side things for the team. I'm reaching out to you this morning in regards to figuring whether any Moz feature can be used for fixing broken links on our website and whether or not it can generate the broken links report which can be exported and saved in a spreadsheet. We are in the middle of cleaning up all the broken links which are currently live on our drupal site and which is why we are in need of a tool that generates the report of all the broken links onto an excel spreadsheet.Once we have the list of broken links set in a spreadsheet, we will work towards fixing those links. Meanwhile, we have also created a 404 error image for our readers who can be directed towards our homepage. Please, let me know if you can help me take care of these below-mentioned requests: 1. Suggest a tool /Moz feature which can generate the site report and list of broken links onto excel/spreadsheet 2. Once I've the list in an excel format, is there an automated way in which we can fix all the broken links which are currently live on our website, instead of manually deleting/unlinking those pages. Truly appreciate your help. Thank you! Is there an automated way in which we can fix all the broken links which are currently live on our website, instead of manually deleting/unlinking those pages.
Moz Pro | | LilianB0 -
Duplicate Content - Multiple URL's
I know a few of these problems come from products being in the same categories but I have no idea how to get rid of the url's that are showing duplicate content when the product is in the exact same place. Hard to explain, but here are URL examples. http://www.ocelco.com/store/pc/www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-688p3308.htm http://www.ocelco.com/store/pc/www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-696p3308.htm http://www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-p3308.htm http://www.ocelco.com/store/pc/Bathtub-Floor-Corner-Stainless-Steel-Grab-Bar-Right-Hand-left-hand-pictured-688p3308.htm Any Idea's how to fix / get rid of these URL's? Thanks!
Moz Pro | | Mike.Bean0 -
The pages that add robots as noindex will Crawl and marked as duplicate page content on seo moz ?
When we marked a page as noindex with robots like {<meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">noindex</a>" />} will crawl and marked as duplicate page content(Its already a duplicate page content within the site. ie, Two links pointing to the same page).So we are mentioning both the links no need to index on SE.But after we made this and crawl reports have no change like it tooks the duplicate with noindex marked pages too. Please help to solve this problem.
Moz Pro | | trixmediainc0 -
How does SEOmoz pull its duplicate page title and content information?
I ask because I am getting errors based on URLs that do not even exist on our site. For example: http://www.robots.com/applications/abb/panasonic/robots this URL does not even exist for our site, but somehow it is listed in the error section of page title duplication tool. http://www.robots.com/applications/ exists, but there is no place to get to an ABB or a Panasonic robot from this page, not to mention an ABB/Panasonic (which for sure does not exist). ?? We have quite a few of these out there and just wondering how to find out where the link is coming from. When we checked our URLs through Integrity, links like the one listed above (which we had 29 of them listed) that do not show up. Thoughts? Thanks! Janelle
Moz Pro | | jwanner0 -
Issue: Duplicate page title
Hello, I have run the "Crawl Diagnostics" report using SEOmoz pro and it says that I have a total of 56 errors. 18 of those errors being duplicate content and another 38 errors being duplicate title tags. Now I have looked at both reports and detail and the reason I am getting there errors is due to the fact the it is checking "http" and "https". So for example: my website is http://www.widgets.com On the crawl diagnostics report, it also checks https://www.widgets.com So it looks like I have duplicate content and duplicate title tags because of this Now my question is this: Is this really duplicate content? If so, how do I fix this? Any help is greatly appreciated.
Moz Pro | | threebiz0 -
Broken Links and Duplicate Content Errors?
Hello everybody, I’m new to SEOmoz and I have a few quick questions regarding my error reports: In the past, I have used IIS as a tool to uncover broken links and it has revealed a large amount of varying types of "broken links" on our sites. For example, some of them were links on my site that went to external sites that were no longer available, others were missing images in my CSS and JS files. According to my campaign in SEOmoz, however, my site has zero broken links (4XX). Can anyone tell me why the IIS errors don’t show up in my SEOmoz report, and which of these two reports I should really be concerned about (for SEO purposes)? 2. Also in the "errors" section, I have many duplicate page titles and duplicate page content errors. Many of these "duplicate" content reports are actually showing the same page more than once. For example, the report says that "http://www.cylc.org/" has the same content as "http://www.cylc.org/index.cfm" and that, of course, is because they are the same page. What is the best practice for handling these duplicate errors--can anyone recommend an easy fix for this?
Moz Pro | | EnvisionEMI0 -
Company Name in Page Title creating thousands of "Duplicate Page Title" errors
I am new, and I just got back my crawl results (after a week or more). The first thing I noticed is that the "duplicate page title" is in the thousands, my urls and page titles are different. The only thing I can see is that our company name is at appended to the name of every title. I did search and found one other person with this problem, but no answer was given. Can anyone offer some advice? This doesn't seem right... Thanks,
Moz Pro | | AoyamaJPN0