Are the CSV downloads malformatted, when a comma appears in a URL?

Safelincs

Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.

The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.

Is there something simple I'm missing or is it something that can be fixed?

Looking forward to learn more about the various tools. Thanks for the help.

Safelincs

I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.

loopyal

The first rule in this business is "You can't trust programmers"

I should know, I am a programmer and I used to manage teams of them.

You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.

They should know that URLs can contain commas, and they should quote them.

If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.

What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.

Unfortunately, even that is a problem.

The problem is there are other fields that may not be quoted, some of which can start with http://

There can also be line breaks in the title field, and possibly even in the link text field.

Quotes and other characters are escaped with double quotes.

Titles and link text can also contain commas, so it is very complex.

Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.

They did actually quote the first URL, if it contains commas.

They really should have quoted every field

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Are the CSV downloads malformatted, when a comma appears in a URL?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Unsolved Ooops. Our crawlers are unable to access that URL

Duplicate content issues with file download links (diff. versions of a downloadable application)

Long URLs

Crawlers crawl weird long urls

Crawl Errors from URL Parameter

Crawl Diagnostics Shows thousands of 302's from a single url. I'm confused

Links not appearing on Open Site Explorer

We were unable to grade that page. We received a response code of 301\. URL content not parseable