Are the CSV downloads malformatted, when a comma appears in a URL?
-
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.
The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.
Is there something simple I'm missing or is it something that can be fixed?
Looking forward to learn more about the various tools. Thanks for the help.
-
I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.
-
The first rule in this business is "You can't trust programmers"
I should know, I am a programmer and I used to manage teams of them.
You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.
They should know that URLs can contain commas, and they should quote them.
If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.
What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.
Unfortunately, even that is a problem.
The problem is there are other fields that may not be quoted, some of which can start with http://
There can also be line breaks in the title field, and possibly even in the link text field.
Quotes and other characters are escaped with double quotes.
Titles and link text can also contain commas, so it is very complex.
Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.
They did actually quote the first URL, if it contains commas.
They really should have quoted every field
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to fix overly dynamic URLs for Volusion site?
We're currently getting over 5439 pages with an 'overly dynamic URL' warning in our Moz scan. The site is run on Volusion. Is there a way to fix this seeming Volusion error?
Moz Pro | | Brandon_Clay0 -
Csv download from open site explorer
After I run a report in Open Site explorer and download the csv, the bar says it processing # of 10,000 links, when the report is done and i open it there are only 450 links
Moz Pro | | thesea0 -
Will SEOMoz offer URL data relating to Bot visits
Does SEOMoz in the future plan to report on Bot visits for each URL, when they are spidered and when they appear in for example Google's index ?
Moz Pro | | NeilTompkins0 -
Dead links-urls
What is the quickest way to get Google to clean up dead
Moz Pro | | 1step2heaven
link? I have 74,000 dead links reported back, i have added a robot txt to
disallow and added on Google list remove from my webmaster tool 4 months ago.
The same dead links also show on the open site explores. Thanks0 -
Why do I keep getting "more than one canonical URL tag" on-page factor when, in fact, there is always only one?
The following are pages that SEOMOZ says have "more than one canonical URL tag" but they all have only one. Can someone help me understand this?http://www.lasercenterny.com/Laser-Hair-Removal-Binghamton/tabid/1950/Default.aspxhttp://www.lasercenterny.com/Hair-Removal-Binghamton-NY/tabid/1949/Default.aspxhttp://www.lasercenterny.com/Hair-Removal-Binghamton/tabid/1948/Default.aspx
Moz Pro | | SmartWebPros0 -
Getting relevant keywords from URL with Google KW Tool.
Hi, When I first start researching a site, I like to see what Google "thinks" it is relevant to. I use the Google KW Tool and enter the website URL only. I sort the results by relevance. I can then show the prospective client what Google thinks his site is optimized for and use that info to show him what opportunities exist to rank for terms more relevant to his business. I show him keyword, volume and I also get current SERP rank for his site. For larger sites, I do this for the top pages based Domain Authority. I want to automate this process using excel and APIs but Google refused my API token request. I told them I wanted to use the "Google AdWords API Extension for Excel" from http://seogadget.co.uk/google-adwords-plugin-excel. The Google API token team replied: Please note, after reviewing your application in detail, we are sorry to let you know that we won't be able to approve your token. We understand that you are planning to use the AdWords API mainly for Targeting Idea Service (TIS) and Traffic Estimation Service (TES) such as 'keyword research'. Please note that as per the Required Minimum Functionality (RMF) outlined in the API Terms & Conditions, using the AdWords API exclusively for TIS and TES type of services is not allowed. Q1: What does the KW Tool relevancy data mean, anyway? Q2: is there another way to get it or is there another way to do this? Q3: Is there a better approach I should take with the Google API team? Q4: Are there other APIs and Excel plugins that can do this, including the SEOMoz APIs? Thanks,
Moz Pro | | phersh
Phil0 -
SEOmoz crawl diagnostics report - what are the duplicate pages urls?
I just see the number of duplicates but not what the urls of the duplicates are? I don't see it in the export either, but maybe I'm missing it Cheers S
Moz Pro | | firstconversion0 -
Metrics from Linkscape - DJ Passed, URL mozRank Passed and funny numbers
Hello, Hoping someone can help me understand the difference between the Domain Juice Passed and some interesting numbers found in the exported CSV file. I ran the Advanced Link Intelligence Report and focusing on the Links to Domain metrics. It looks like the report is sorted by mozRank passed but next to each link we are given the DJ Passed instead. Why is that? My confusion is compounded by the fact that when I export the CSV of this report it no longer includes the DJ Passed numbers but does show URL mozRank Passed instead. For Example, on the web version of the Advanced Link Intelligence Report the top link is: http://www.holdenouterwear.com/shop.php with mozRank: 5.56 mozTrust: 5.95 and DJ Passed: 4.49 In the CSV file we don't get the DJ passed but get the URL mozRank Passed of: 0.00051 Looking at the CSV file further some links have URL mozRank Passed of 4.00E-05 Anyone has a clear explanation of why DJ Passed is not in the CSV file and how the mozRank passed is calculated? And what the 4.00E-05 mean? Thank you.
Moz Pro | | miloszpekala0