Delimiter-separated values


Delimiter-separated values

Formats that use delimiter-separated values (also DSV)[1] store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. Most database and spreadsheet programs are able to read or save data in a delimited format.

Contents

Delimited formats

Any character or sequence of characters may be used to separate the values, but the most common delimiters are the comma, tab, and colon.[1] [2] The vertical bar (also referred to as pipe) and space are also sometimes used.[1] In a comma-separated values (CSV) file the data items are separated using commas as a delimiter, while in a tab-separated values (TSV) file, the data items are separated using tabs as a delimiter. Column headers are sometimes included as the first line, and each subsequent line is a row of data. The lines are separated by newlines.

For example, the following fields in each record are delimited by commas, and each record by newlines:

"Date","Pupil","Grade"
"25 May","Bloggs, Fred","C"
"25 May","Doe, Jane","B"
"15 July","Bloggs, Fred","A"
"15 April","Muniz, Alvin ""Hank""","A"

Note the use of the double quote to enclose each field. This prevents the comma in the actual field value (Bloggs, Fred; Doe, Jane and etc.) from being interpreted as a field separator. This necessitates a way to "escape" the field wrapper itself, in this case the double quote; it is customary to double the double quotes actually contained in a field as with those surrounding "Hank". In this way, any ASCII text including newlines can be contained in a field.

ASCII includes several control characters that are intended to be used as delimiters. They are: 28 file separator, 29 group separator, 30 record separator, 31 unit separator. Use of these characters has not achieved widespread adoption; some systems have replaced their control properties with more accepted controls such as CR/LF and TAB.

Uses and applications

Due to their widespread use, comma- and tab-delimited text files can be opened by several kinds of applications, including most spreadsheet programs, and statistical analysis tools such as PSPP, without the user designating which delimiter has been used.[3][4]

Typically a delimited file format is indicated by a specification. Some specifications provide conventions for avoiding delimiter collision, others do not. Delimiter collision is a problem that occurs when a character that is intended as part of the data gets interpreted as a delimiter instead. Comma- and space-separated formats often suffer from this problem, since in many contexts those characters are legitimate parts of a data field.

Most such files avoid delimiter collision either by surrounding all data fields in double quotes, or only quoting those data fields that contain the delimiter character. One problem with tab-delimited text files is that tabs are generally non-printing characters, and difficult to distinguish from spaces; therefore, there are sometimes problems with the files being corrupted when people try to edit them by hand. Another set of problems occur due to errors in the file structure, usually during import of file into a database (in the example above, such error may be a pupil's first name missing).

Depending on the data itself, it may be beneficial to use non-standard characters such as the tilde (~) as delimiters. With rising prevalence of web sites and other applications that store snippets of code in databases, simply using a " which occurs in every hyperlink and image source tag simply isn't sufficient to avoid this type of collision. Since colons (:), semi-colons (;), pipes (|), and many other characters are also used, it can be quite challenging to find a character that isn't being used elsewhere.

See also

Notes and references

  1. ^ a b c DSV stands for Delimiter Separated Values Raymond, Eric (2004). The Art of Unix Programming. Boston: Addison-Wesley. ISBN 0131429019. http://www.catb.org/~esr/writings/taoup/html/ch05s02.html.  p 113.
  2. ^ Under UNIX, the colon is the most common DSV delimiter for values that may contain whitespace. Ibid.
  3. ^ Knight, Andrew (2000). Basics of Matlab and beyond. Boca Raton: Chapman & Hall/CRC. ISBN 0849320399. 
  4. ^ Robbins, Arnold (2005). Classic Shell Scripting. Sebastopol: O'Reilly. ISBN 0596005954. 

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Delimiter-separated values — Le delimiter separated values (DSV) est un format informatique[1] stocke les tableaux en deux dimensions en séparant les valeurs de chaque rangée par un ou plusieurs délimiteurs. La plupart des SGDB et des tableurs sont capables de sauvegarder et …   Wikipédia en Français

  • Comma-separated values — Comma separated list Filename extension .csv or .txt Internet media type text/csv Type of format multiplatform …   Wikipedia

  • Delimiter — This article is about delimiters in computing. For delimiters in written human languages, see interword separation. A stylistic depiction of a fragment from a CSV formatted text file. The commas (shown in red) are used as field delimiters. A… …   Wikipedia

  • TSV — Значения, разделённые табуляцией англ. Tab separated values Расширение .tsv или .tab MIME text/tab separated values Тип формата мультиплатформенный, последовательные потоки данных Содержит информация баз данных организованная как списки с… …   Википедия

  • Fielded text — is a proposed standard which provides structure and schema definition to text files which contain tables of values (for example, CSV files). The standard allows the format and structure of the data within the text file to be specified by a Meta… …   Wikipedia

  • Flat file database — A flat file database describes any of various means to encode a data model (most commonly a table) as a plain text file. Flat filesA flat file is a plain text file which usually contains one record per line. [Citation last = Fowler first = Glenn… …   Wikipedia

  • CSV — У этого термина существуют и другие значения, см. CSV (значения). CSV Comma Separated Values Расширение .csv MIME text/csv Тип формата представление базы данных Стандарт(ы) RFC 4180 CSV (от англ …   Википедия

  • Array data type — Not to be confused with Array data structure. In computer science, an array type is a data type that is meant to describe a collection of elements (values or variables), each selected by one or more indices that can be computed at run time by the …   Wikipedia

  • TSV — may refer to:* Tab separated values, an example of delimiter separated values * IATA code for Townsville International Airport * Time Space Visualiser, a Doctor Who fanzine * Through silicon via, a vertical electrical connection passing… …   Wikipedia

  • DSV — is an initialism that may stand for: Contents 1 Codes 2 Computing 3 Transport 4 Miscellaneous Codes IATA airport code for …   Wikipedia