Are the CSV downloads malformatted, when a comma appears in a URL?
-
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.
The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.
Is there something simple I'm missing or is it something that can be fixed?
Looking forward to learn more about the various tools. Thanks for the help.
-
I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.
-
The first rule in this business is "You can't trust programmers"
I should know, I am a programmer and I used to manage teams of them.
You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.
They should know that URLs can contain commas, and they should quote them.
If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.
What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.
Unfortunately, even that is a problem.
The problem is there are other fields that may not be quoted, some of which can start with http://
There can also be line breaks in the title field, and possibly even in the link text field.
Quotes and other characters are escaped with double quotes.
Titles and link text can also contain commas, so it is very complex.
Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.
They did actually quote the first URL, if it contains commas.
They really should have quoted every field
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New URL, new physical address, New Name. 30 point drop in Domain Authority. Yikes.
I have a client who is asking for SEO help after renaming their business, getting a new URL, and somehow having an address change (without moving to a new location...weird...I know). This has set them back big time in terms of their domain authority (they went from a 46 to a 15 in DA). The web developers they work with put a 302 redirect in place from their old URL (home page), which had 10,477 links from 52 root domains, to their new URL's home page. Open site explorer shows that they now have 5 links! We can improve some of the local search set backs from the name and address change with a citation audit and clean up, but the domain name change is a killer. So here's my question or questions, really: Do we need to manually rebuild links with partner websites? I know there is debate around the actual link juice passed along from a 302 vs a 301 redirect (despite what has been publicly stated by Google). Or is this just a waiting game while old links get recrawled?
Moz Pro | | TheKatzMeow1 -
Any tool built into MOZ that can help tell who the owner of a URL is?
I'd like to know if there's any tool which would let us know who the owner of a web domain is.
Moz Pro | | daleseppie0 -
Csv download from open site explorer
After I run a report in Open Site explorer and download the csv, the bar says it processing # of 10,000 links, when the report is done and i open it there are only 450 links
Moz Pro | | thesea0 -
Tools which scan urls for social data
Hi can anyone recommend any tools out there, which can allow me to scan a list of pages (urls) and give me back social data for each page (e.g. number of facebook likes, shares, twitter data, google plus, etc) Cheers, Chris
Moz Pro | | monster990 -
The keyword ranking report takes into account all my website urls? Can I specify the URLs where I want to track the keywords?
I don't know if my weekly reports are reporting the ranking of my keywords correctly. I have added some new keywords, since that all my reports are in red numbers. I don't know if this is happening because I did something wrong, or if is because my rankings are really falling down.
Moz Pro | | hockerty0 -
Configure parameter effect in google wmt to reduce overly dynamic urls
We are looking at a weatherforecast site with realtime information that is updated every 5 minutes. For this website many urls have 6 parameters The SEOmoz campagne found duplicate information and overly dynamic urls. Then we went to google wmt section url parameters and configured parameters like day, month, year (effect: none). The next weekly SEOmoz campagne showed a big reduction in duplicates and small reduction overly dynamic urls. How can we reduce these 'errors' further?
Moz Pro | | theonlinefactory0 -
Title missing or empty on non-html downloadable files?
My site, www.cnccookbook.com, has lots of links for downloading files. These files are not html and they don't have .htm or .html extensions. So why does SEOMoz flag them for missing titles? Is there some other way these files should be handled for better SEO?
Moz Pro | | CNCCookbook0 -
4xx (not found) errors seem spurious, caused by a "\" added to the URL
Hi SEOmoz folks We're getting a lot of 404 (not found) errors in our weekly crawl. However the weird thing is that the URLs in question all have the same issue. They are all a valid URL with a backsalsh ("") added. In URL encoding, this is an extra %5C at the end of the URL. Even weirder, we do not have any such URLs in our (Wordpress-based) website. Any insight on how to get rid of this issue? Thanks
Moz Pro | | GPN0