Are the CSV downloads malformatted, when a comma appears in a URL?
-
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded.
The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are.
Is there something simple I'm missing or is it something that can be fixed?
Looking forward to learn more about the various tools. Thanks for the help.
-
I won't be too hard on the programmers - I'm a programmer myself. Our small business has developers and designers doing the bulk of the SEO. I can see you've looked in to it as I have - there are many factors involved if I was to decide to "fix" this myself. To be honest, I don't fancy it - I'm hoping the better approach will come from the wonderful SEO Moz developers who might put in a fix. Hint hint.
-
The first rule in this business is "You can't trust programmers"
I should know, I am a programmer and I used to manage teams of them.
You can't trust them to write something perfect, because they will always make huge assumptions, based on what they know.
They should know that URLs can contain commas, and they should quote them.
If they didn't do that in the final field, it is a deficiency in the code and your stuff isn't going to workunless you fix it manually.
What you need to do to fix this is to add a quote after the 10th comma and also add one at the end of each line.
Unfortunately, even that is a problem.
The problem is there are other fields that may not be quoted, some of which can start with http://
There can also be line breaks in the title field, and possibly even in the link text field.
Quotes and other characters are escaped with double quotes.
Titles and link text can also contain commas, so it is very complex.
Some of the fields are a bigger mess because it depends on the link text, and if the link text contains an image, you'll have quotes and equals signs, commas and all kinds of stuff. You can also have upper ascii characters and multibyte characters.
They did actually quote the first URL, if it contains commas.
They really should have quoted every field
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Solving URL Too Long Issues
Moz.com is reporting that many URL's are to long, these particularly affect Product URL's where the URL is typically https://www.domainname.com/collections/category-name/products/product-name, (You guessed it we're using Shopify). However, we use Canonicals that ignore all most of the URL and are just structured https://www.domainname.com/product-name, so Google should be reading the Canonical and not the long-winded version. However, Moz cannot seem to spot this... does anyone else have this problem and how to solve so that we can satisfy the Moz.com crawl engine?
Moz Pro | | Tildenet0 -
Difference between urls and referring urls?
Sorry, nit new to this side of SEO We recently discovered we have over 200 critical crawler issues on our site (mainly 4xx) We exported the CSV and it shows both a URL link and a referring URL. Both lead to a 'page not found' so I have two questions? What is the difference between a URL and a referring URL? What is the best practice/how do we fix this issue? Is it one for our web developer? Appreciate the help.
Moz Pro | | ayrutd1 -
How to fix overly dynamic URLs for Volusion site?
We're currently getting over 5439 pages with an 'overly dynamic URL' warning in our Moz scan. The site is run on Volusion. Is there a way to fix this seeming Volusion error?
Moz Pro | | Brandon_Clay0 -
Open Site Explorer CSV export limit?
Hi! Something has been puzzling me. I've filter down a few things within open site explorer to produce some links of interest to me - around 500 records are showing When I try to export it via CSV however, only 25 links appear? Anyone know why and how I can get the rest?? David
Moz Pro | | rejigdigital0 -
Tool for tracking actions taken on problem urls
I am looking for tool suggestions that assist in keeping track of problem urls, the actions taken on urls, and help deal with tracking and testing a large number of errors gathered from many sources. So, what I want is to be able to export lists of url's and their problems from my current sets of tools (SEOmoz campaigns, Google WM, Bing WM,.Screaming Frog) and input them into a type of centralized DB that will allow me to see all of the actions that need to be taken on each url while at the same time removing duplicates as each tool finds a significant amount of the same issues. Example Case: SEOmoz and Google identify urls with duplicate title tags (example.com/url1 & example.com/url2) , while Screaming frog sees that example.com/url1 contains a link that is no longer valid (so terminates in a 404). When I import the three reports into the tool I would like to see that example.com/url1 has two issues pending, a duplicated title and a broken link, without duplicating the entry that both SEOmoz and Google found. I would also like to see historical information on the url, so if I have written redirects to it (to fix a previous problem), or if it used to be a broken page (i.e. 4XX or 5XX error) and is now fixed. Finally, I would like to not be bothered with the same issue twice. As Google is incredibly slow with updating their issues summary, I would like to not important duplicate issues (so the tool should recognize that the url is already in the DB and that it has been resolved). Bonus for any tool that uses Google and SEOmoz API to gather this info for me Bonus Bonus for any tool that is smart enough to check and mark as resolved issues as they come in (for instance, if a url has a 403 error it would check on import if it still resolved as a 403. If it did it would add it to the issue queue, if not it would be marked as fixed). Does anything like this exist? how do you deal with tracking and fixing thousands of urls and their problems and the duplicates created from using multiple tools. Thanks!
Moz Pro | | prima-2535090 -
Open site Explorer CSV output mixed up
If I download a CSV output of the external links and the characteristics of them, the output is mixed up. When I put the data in Columns, some columns are disordered. Do more people experience these problems? Is there some preference I have to adapt? Thanks! SFCTf
Moz Pro | | MartijnHoving820 -
'Appropriate Use of Rel Canonical', Critical Factor but appears correct on page
Hi, Trying to get the following page ranked unsuccessfully.... http://www.joules.com/en-GB/2/Collections-Quilted-Jackets/c01c02.r16.1 Instead a product page is being ranked, shown below.... http://www.joules.com/en-GB/Womens-Quilted-Jacket/Navy/M_HAMPTON/ProductDetail.raction When I run the on page report card it advises that the Rel Canonical tag needs to point to that page, but we have checked and it looks to be doing that already. Has anyone else had an issue like this? Thanks, Martin
Moz Pro | | rockethot0 -
We were unable to grade that page. We received a response code of 301\. URL content not parseable
I am using seomoz webapp tool for my SEO on my site. I have run into this issue. Please see the attached file as it has the screen scrape of the error. I am running an on page scan from seomoz for the following url: http://www.racquetsource.com/squash-racquets-s/95.htm When I run the scan I receive the following error: We were unable to grade that page. We received a response code of 301. URL content not parseable. This page had worked previously. I have tried to verify my 301 redirects and am unable to resolve this error. I can perform other on page scans and they work fine. Is this a known problem with this tool? I have verified ensuring I don't have it defined. Any help would be appreciated.
Moz Pro | | GeoffBatterham0