Talk:Delimiter-separated values
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
- This article was on votes for deletion, the consensus was to keep it. See the archived discussion for further details.
Way Oversimplified
[edit]This is scandalously oversimplified. No one here actually working in computer science? Just dilettantes hanging about, right? See Eric Raymond for an excellent treatise on this subject. There is a huge difference between clumsy Microsoft technologies such as CSV and TSV on the one hand and the elegant DSV of Unix on the other. Microsoft's abominable CSV is not repeat not a DSV. Get back to work. — Preceding unsigned comment added by 81.50.44.156 (talk) 19:01, 20 March 2009 (UTC)
Suggested merge
[edit]Delimited has more links to it, but I don't like the idea of a past tense word for the article name. Plinth molecular gathered 15:10, 7 September 2006 (UTC)
- Merge to "delimiter." Right now "delimited" looks the higher quality article, but "delimiter" is more consistent with the way articles are named on Wikipedia. Assuming "delimiter" doesn't say anything "delimited" doesn't say better, it might be a simple matter of replacing the text of "delimiter" with that of "delimited" and turning "delimited" into a redirect. Anton Mravcek 18:53, 7 September 2006 (UTC)
- DSV is a standard. It's a Unix standard. It doesn't want to be called delimited or delimiter. You've heard of Unix?
- I am against a merger because the DSV format is recognized as different then the psuedo-CSV tab format by its users. BUT tab separated should be merged with coma separated.--Frozenport (talk) 16:22, 25 July 2011 (UTC)
- Oppose. I oppose the merge as well since CSV, TSV and DSV are different enough to be discussed in separate articles. Mixing them all together would make it more difficult to explain their specific pros and cons. Since the merge templates have been up since April 2011 I am removing them again now. --Matthiaspaul (talk) 04:43, 3 August 2013 (UTC)
rename to Delimiter-separated values
[edit]this article should probably be renamed with the dash, since it is a compound adjective. —The preceding unsigned comment was added by Dreftymac (talk • contribs) .
- Done. — xaosflux Talk 00:44, 19 October 2006 (UTC)
Stop hyphenating everything. It's a disease. Compound adjective? Where'd you find that? lolz —Preceding unsigned comment added by 81.50.44.156 (talk) 19:02, 20 March 2009 (UTC)
It would be in your english grammer text, had you bothered to ever read it. It's amazing to me how many of the illiterate manage to operate a computer and get on the internet. I long of the old days when you had to have some level of intelligence to here. DAMN YOU AOL! — Preceding unsigned comment added by 129.119.81.135 (talk) 17:01, 10 June 2011 (UTC)
Please discuss hyphens at the Wikipedia:Manual of Style#Hyphens and the article English compound#Hyphenated compound modifiers, and their respective talk pages. Thank you. --DavidCary (talk) 14:10, 30 October 2015 (UTC)
escaping of delimiters etc
[edit]Might be useful to have some mention of other solutions to delimiter escaping, e.g. CSV can deal with embedded delimiters (comma) and embedded escapes (double-quote)
Quotes
[edit]Why are the entries in the example quoted? There seems no reason for this.
Why are you even here? You obviously have no clue what this subject is about. Why are you here? Why?
RFC
[edit]One may mention that CSV is a part of RFC http://www.rfc-editor.org/rfc/rfc4180.txt --Csmth (talk) 07:57, 22 February 2008 (UTC)
- Sorry, I am wrong. This article is not specific for CSV. Ignore my comment. --Csmth (talk) 08:04, 22 February 2008 (UTC)
Is "Delimiter" the same as "Separator" now?
[edit]It seems that “delimiter” has come to be synonymous with “separator”, but this is not how I learned it. Back when I was taking computer science classes, delimiters were used to mark the start and end of something. For example, a quotation uses delimiters. If you see something like, "it fell from the sky", the quoted text is delimited. A double-quote marks the beginning, and another double-quote marks the end. So delimiters set the boundaries of the thing being delimited.
If this were still true, you could not call a file of comma-separated values format (or CSV) a comma-delimited file. Technically, I would argue it is not comma-delimited, it is comma-separated. By contrast, files that conform to a text markup language, like SGML or XML, are delimited in that the beginning and end of chunks of text are marked by start and end tags.
<rant>I don’t mean to be pedantic, but this is delimited.</rant> Spoodles (talk) 21:16, 29 January 2009 (UTC)
I have to agree. As a programmer "delimiter separated" sounds oxymoronic. Some examples:
- delimited: "This is delimited with quotes."
- separated: This,is,separated,with,commas.
- both: "This","is","both","delimited","and","separated."
A third way is with terminators:
- terminated: this;is;terminated;with;semicolons;
Perhaps the confusion comes from Microsoft's misuse of the term delimiter in Microsoft Excel (in 2003 anyway) when loading character separated data. They refer to the *separators* as 'delimiters'. Where delimiters are used - which they can be if the individual values might contain the separate character - they refer to the delimiters using the term 'text qualifier'.
62.232.250.50 (talk) 11:04, 17 March 2010 (UTC)
Ads in the article?
[edit]Data validation tools like Flat File Checker may be used to identify and eliminate such errors before import. - Looks like advertisement of the above tool. —Preceding unsigned comment added by 82.200.67.50 (talk) 10:33, 1 February 2010 (UTC)
Line Breaks
[edit]It has been my experience that some versions of some Microsoft products, as well as third-party products, will fail on a CSV file delimited by LF characters. They only work if the file is delimited by the CRLF sequence. --Jym (talk) 21:07, 23 March 2010 (UTC)
Yeah...--Frozenport (talk) 16:23, 25 July 2011 (UTC)
Delimiter-Separated Value Best Practices
[edit]What does everyone think about best practices for delimiter-separated values? I've commonly seen commas, pipes, tildes, and C-cedillas used as the delimiter, but there are certain risks associated with them (depending on the raw data). Is there a delimiter that should be used as a standard which is considered "safest" to avoid situations where (for example) a customer might have a C-cedilla in their name which throws off the file format after delimiters are inserted? I've also seen where people look for and remove/strip the delimiter being used from the raw data before adding in the delimiter to prevent this from happening, but my thought is that there may be a particular character that is least commonly found/used which would prevent the need to strip anything from the raw data. — Preceding unsigned comment added by 159.53.46.141 (talk) 20:13, 6 February 2014 (UTC)
Content Clarification
[edit]Just reading the discussion on the talk page shows that this article requires more clarity and explanation. The fact that someone even asked why the values are quoted, which is valid question considering the article content alone, suggests that there is a major gap in the content. The concept of delimited and separated values, used independently (not here, obviously) or in combination are relatively simple concepts. This article, however, does not explain clearly enough what a DSV is. That there is confusion between DSVs and CSVs would seem to indicate that the article content should distinguish how a DSV is different from potential interpretations, such as CSV.
Even the naming is confusing. There may be a common usage standard for "delimiter-separated", but I'd have to say it's incorrect. "Comma-separated" is correct because the separator is the comma. Because CSV's aren't always separated by commas, sometimes they're called character-separated values (although commas are by far the most common, at least in my experience). But these DSV files, as I understand it (and, again, this is not clear in the article), are delimited and separated; delimited/delimitor is not qualifying or modifying separated/separator. There is no need for a hyphen here, and I would go as far as to say it is incorrect, grammatically speaking. But, as I said, there may be some common usage here of which I'm not aware.
But I digressed. Getting back to the main point, this article seems to be written by, and reviewed by, those who already know the content. WikiP articles should include a bit more content for the lay person, as well as distinguish content from common, potential confusion (e.g. DSV vs. CSV).
I would also suggest that some of this discussion is not consistent with WikiP's guidelines:
- "...in an atmosphere of mutual respect and cooperation. Don't be afraid to contribute as you are encouraged to be bold in editing in a fair and accurate manner."
- "While discussing matters, it is very important that you conduct yourself with civility and assume good faith on the part of others."
Just my thoughts. --FreeText (talk) 19:36, 1 August 2014 (UTC)
Delimiter vs. Separator
[edit]This poorly written article confuses "delimiters" with sep,ara,tors. They are two completely different things. — Preceding unsigned comment added by 64.126.68.153 (talk) 18:06, 5 September 2014 (UTC)
Database schema
[edit]I've reverted again. A lede should summarize the body, but the body has nothing on cross platform data exchange or database schemas. The reference to database schema is inappropriate because DSV files are flat and do not describe schemas. DSV is a file format, but it is not about converting file formats. It is just a least-common denominator format that can be used for data exchange. Glrx (talk) 23:02, 10 November 2016 (UTC)
- Right. We are talking past the same concepts in different words. I am not saying that DSV is what converts a file or defines a schema. (How did you get that out of it?) I am saying that because different schemas exist, which they do, you need a common format for data exchange between them, which DSV is. I question the thinking/reading that couldn't see that point being made, but <>sI'm not interested right now in spending more time here I ended up taking another crack at it. Quercus solaris (talk) 23:29, 10 November 2016 (UTC)
- Uh, reading comprehension alert? I wrote "Despite that each of those applications has its own database schema and its own file format (for example, accdb or xlsx), they [THE APPLICATIONS] can all map the fields in a DSV file to their own schema and format." You wrote "DSV files are flat and offer NO support for schema conversions." No kidding! Who said anything about "providing support"? The DSV file doesn't have to DO anything, the application does it! I'm not just accepting that reversion. Quercus solaris (talk) 23:49, 10 November 2016 (UTC)
- Where are the sources for these claims? Are the issues being raise discussed in secondary sources that address DSV? "This is about as plain as it can be made" sounds in WP:OR. The unsourced statement can be made even simpler if schema are not mentioned. Why are you raising the topic of database schema for a flat interchange format that has an uninteresting schema? There's no hierarchy, no multiple tables, no integrity relations, no domain constraints. The topic is only about raw data exchange. It does not care about integrity relations. Now, you are changing database schema to database design.[1] That makes it worse. The schema is just the table layout; the database design comprises more than just the data. You've also changed schema to data model.[2] Why are you raising data model issues when DSV has no impact on the model? You are raising issues about complexity that the format does not address. Glrx (talk) 00:23, 11 November 2016 (UTC)