Unicode Separated Values (USV)

601 Van Ness Ave #E3-359 San Francisco CA 94102 US 1-415-317-2700 joel@joelparkerhenderson.com https://linkedin.com/in/joelparkerhenderson

General Internet Engineering Task Force usv data format markup Unicode Separated Values (USV) is a data format that uses Unicode characters to mark parts. USV builds on ASCII separated values (ASV), and provides pragmatic ways to edit data in text editors by using visual symbols and layouts.

Introduction Unicode Separated Values (USV) is a data format useful for exchanging and converting data between various spreadsheet programs, databases, and streaming data services. This RFC explains USV. Additionally, we propose a new media type "text/usv", to be registered with IANA. We provide information references for a USV git repository and a programming implementation as a USV Rust crate .

Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Media Type Language The media type normative references are RFC 6838 , RFC 2046 , and RFC 4289 .

ABNF Language The ABNF normative reference is RFC 5234 .

USV characters Separators:

File Separator (FS) is U+001C or U+241C
Group Separator (GS) is U+001D or U+241D
Record Separator (RS) is U+001E or U+241E
Unit Separator (US) is U+001F or U+241F

Modifiers:

Escape (ESC) is U+001B or U+241B
End of Transmission (EOT) is U+0004 or U+2404

Liners:

Carriage Return (CR) is U+000D
Line Feed (LF) is U+000A

Definition of the USV Format

Data Data comprises units, records, groups, and files.

Unit A unit comprises content characters. It runs until a Unit Separator (US): Example unit and unit separator:

Record A record comprises units. It runs until a Record Separator (RS): Example record and record separator:

Group A group comprises records. It runs until a Group Separator (GS): Example group and group separator:

File A file comprises groups. It runs until a file separator. Example file and file separator:

Header There may be an optional header appearing as the first item and with the same format as normal items. This header will contain names corresponding to the fields in the data, and should contain the same number of fields as the rest of data. The presence or absence of the header line should be indicated via the optional "header" parameter of this media type. For example:

Escape (ESC) Escape (ESC) makes the next character content. Example: USV with a unit that contains an Escape + End of Transmission; because of the Escape, the End of Transmission is treated as content:

End of Transmission (EOT) End of Transmission (EOT) tells any reader that it can stop reading. This is can be useful for streaming data, such as to end a connection. This can also be useful for providing data files that contain USV data, then EOT, then addition non-USV information such as comments, images, attachments, etc.

EOT tells the data reader that it can stop.
EOT has no effect on the output content.

Example of a unit then an End of Transmission:

ABNF grammar

Semantics usv = *files file = *groups group = *records record = *units unit = *content-characters

Syntax usv = ( header-and-body / body ) '*' ; anything after the body is chaff header-and-body = 1*unit-run / 1*record-run / 1*group-run / 1*file-run body = *unit-run / *record-run / *group-run / *file-run

Runs file-run = *( *liner-character file *liner-character FS ) group-run = *( *liner-character group *liner-character GS ) record-run = *( *liner-character record *liner-character RS ) unit-run = *( *liner-character unit *liner-character US )

Character classes content-character = typical-character / ESC '*' typical-character = '*' - special-character special-character = US / RS / GS / FS / ESC / EOT liner-character = CR / LF

Unicode symbols FS = U+001C File Separator / U+241C Symbol for File Separator GS = U+001D Group Separator / U+241D Symbol for Group Separator RS = U+001E Record Separator / U+241E Symbol for Record Separator US = U+001F Unit Separator / U+241F Symbol for Unit Separator ESC = U+001B Escape / U+241B Symbol for Escape EOT = U+0004 End of Transmission / U+2404 Symbol for End of Transmission CR = U+000D Carriage Return LF = U+000A Line Feed

Examples

Hello World This kind of data … … is represented in USV as two units: If you prefer to see one unit per line, then you can add carriage returns and/or newlines:

Hello World Goodnight Moon This kind of data … … is represented in USV as two records, each with two units: If you prefer to see one record per line, then you can add carriage returns and/or newlines:

Units, Records, Groups, Files USV with 2 units by 2 records by 2 groups by 2 files: If you prefer to see one record per line, then you can add carriage returns and/or newlines: If you prefer to see one unit per line, then you can add carriage returns and/or newlines:

Articles USV can format paragraphs, such as in this example data stream of articles; note the units contain leading liners and trailing liners.

Source Code Examples Hello World using Rust and the USV crate Hello World Goodnight Moon using Rust and the USV crate

MIME media type registration for text/usv This section provides the MIME media type registration application information. To: ietf-types@iana.org Subject: Registration of MIME media type text/usv MIME media type name: text MIME subtype name: usv Required parameters: none

Optional parameters: charset, header Common usage of USV is UTF-8, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter. The "header" parameter indicates the presence or absence of the header line. Valid values are "present" or "absent". Implementors choosing not to use this parameter must make their own decisions as to whether the header line is present or absent.

Encoding considerations This media type uses LF to denote line breaks. However, implementors should be aware that some implementations may not conform i.e. may incorrectly use other values.

Security considerations USV files contain passive text data that should not pose any risks. However, it is possible in theory that malicious binary data may be included in order to exploit potential buffer overruns in the program processing USV data. Additionally, private data may be shared via this format (which of course applies to any text data).

Interoperability considerations Implementors should "be conservative in what you do, be liberal in what you accept from others" (RFC 793 [8]) when processing USV data. Implementations deciding not to use the optional "header" parameter must make their own decision as to whether the header is absent or present.

Published specification https://github.com/sixarm/usv

Applications that use this media type Spreadsheet programs, such as with import/export. Database programs, such as with loading/saving text. Data conversion utilities.

Additional information Magic number(s): none File extension(s): usv Apple macOS File Type Code(s): TEXT Intended usage: COMMON Author/Change controller: IESG Contact: Joel Parker Henderson <joel@joelparkerhenderson.com>

IANA Considerations We are requesting IANA to create a standard MIME media type "text/usv". We have filed an IANA request for this, with same contact information.

Security Considerations This document should not affect the security of the Internet.

References Normative References Informative References USV git repository at https://github.com/sixarm/usv USV rust crate at https://crates.io/crates/usv Key words for use in RFCs to Indicate Requirement Levels In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.

Acknowledgements The author would like to thank Y. Shafranovich, author of the CSV RFC, which provided guidance for this USV RFC. A special thank you goes to P.X.V.

Contributors Thanks to all of the contributors.

joel@joelparkerhenderson.com