Canonical S-expressions

Canonical S-expressions

A Canonical S-expression (or csexp) is a binary encoding form of a subset of general S-expression. It was designed for use in SPKI - to retain the power of S-expressions, while achieving the compactness of a binary form and maximizing the speed of parsing.

An S-expression is composed of atoms, which are byte strings, and parentheses used to delimit lists or sub-lists. S-expressions are fully recursive. Typically, S-expressions are encoded as text, with spaces delimiting atoms and quotation marks used to surround atoms that contain spaces.

Atoms in a Canonical S-expression are encoded as length-prefixed byte strings. The length of the following byte string is expressed as an ASCII decimal number followed by a ":".

The sexp

(this "Canonical S-expression" has 5 atoms)

becomes the csexp

(4:this22:Canonical S-expression3:has1:55:atoms).

There is no white space to be normalized - thus the adjective "canonical" - and the atoms can be any binary string. So, a cryptographic hash value or a public key modulus that would have to be encoded in base64 or some other printable encoding can be expressed in csexp as its binary bytes.

Binary byte strings can represent various encodings. A csexp includes a non-S-expression construct for indicating the encoding of a string, when that encoding is not obvious. Any atom in csexp can be prefixed by a single atom in square brackets - such as " [4:JPEG] " or " [7:UNICODE] ".

Finally, an csexp as used in SPKI has one limitation compared to a full S-expression - that every list must start with an atom - and therefore there can be no empty lists.

Typically, a list's first atom is treated as one treats an element name in XML.

Comparisons to other encodings

There are other encodings in common use:
# XML
# ASN.1
# JSON

Generally, csexp has a parser one or two decimal orders of magnitude smaller than that of either XML or ASN.1. This small size and corresponding speed give csexp its main advantage. In addition to the parsing advantage, there are other differences.

csexp vs. XML

A csexp is roughly as expressive as XML. This is not surprising, since XML is described as an ASCII form for S-expressions. However, csexp does not have a concept like XML attributes (within an element). When encoding in csexp, one must plan on not using such attributes.

A csexp, like a general sexp, is fully recursive. XML, however, has limitations on recursive use of element names.

The first atom in a csexp list - the equivalent of an XML element name - can be any atom in any encoding (e.g., a JPEG, a UNICODE string, a WAV file, ...). XML element names are constrained to use a subset of the printable character set.

XML merges a sequence of strings within one element into a single string, while csexp allows a sequence of atoms within a list and those atoms remain separate from one another.

Finally, csexp is inherently binary while XML is printable - so binary quantities in XML must be encoded, for example using base64.

csexp vs. ASN.1

ASN.1 is a popular binary encoding form. However, it expresses only syntax (data types), not semantics. Two different structures - each a SEQUENCE of two INTEGERS - have identical representations on the wire (barring special tag choices to distinguish them). To parse an ASN.1 structure, one must tell the parser what set of structures one is expecting and the parser must match the data type being parsed against the structure options. This adds to the complexity of an ASN.1 parser.

A csexp structure, like an XML document, carries its own semantics (encoded in element names), and the parser for a csexp structure does not care what structure is being parsed. Once a wire-format expression has been parsed into an internal tree form (similar to XML's DOM), the consumer of that structure can examine it for conformance to what was expected.

Links

* [http://theory.lcs.mit.edu/~rivest/sexp.html Ron Rivest's sexp page]
* [http://theworld.com/~cme/html/spki.html#Code various uses of csexp, including s2x: translating csexp to XML]


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Canonical form (Boolean algebra) — In Boolean algebra, any Boolean function can be expressed in a canonical form using the dual concepts of minterms and maxterms. Minterms are called products because they are the logical AND of a set of variables, and maxterms are called sums… …   Wikipedia

  • Post canonical system — A Post canonical system, as created by Emil Post, is a string manipulation system that starts with finitely many strings and repeatedly transforms them by applying a finite set of specified rules of a certain form, thus generating a formal… …   Wikipedia

  • Simple public key infrastructure — (SPKI, pronounced spoo key ) was born out of a joint effort to overcome the overcomplication and scalability problems of traditional X.509 public key infrastructure. It is specified in two Internet Engineering Task Force (IETF) Request For… …   Wikipedia

  • S-выражение — Термин S выражение или sexp (для символического выражения) относится к соглашению о способе записи полуструктурированных данных (англ.) в доступной для человеческого понимания текстовой форме. Символические выражения создаются, в основном,… …   Википедия

  • DFA minimization — In computer science, more specifically in the branch of automata theory, DFA minimization is the task of transforming a given deterministic finite automaton (DFA) into an equivalent DFA that has minimum number of states. Here, two DFAs are called …   Wikipedia

  • biblical literature — Introduction       four bodies of written works: the Old Testament writings according to the Hebrew canon; intertestamental works, including the Old Testament Apocrypha; the New Testament writings; and the New Testament Apocrypha.       The Old… …   Universalium

  • Christianity — /kris chee an i tee/, n., pl. Christianities. 1. the Christian religion, including the Catholic, Protestant, and Eastern Orthodox churches. 2. Christian beliefs or practices; Christian quality or character: Christianity mixed with pagan elements; …   Universalium

  • BIBLE — THE CANON, TEXT, AND EDITIONS canon general titles the canon the significance of the canon the process of canonization contents and titles of the books the tripartite canon …   Encyclopedia of Judaism

  • Canon Law — • Canon law is the body of laws and regulations made by or adopted by ecclesiastical authority, for the government of the Christian organization and its members Catholic Encyclopedia. Kevin Knight. 2006. Canon Law     Canon Law …   Catholic encyclopedia

  • South Asian arts — Literary, performing, and visual arts of India, Pakistan, Bangladesh, and Sri Lanka. Myths of the popular gods, Vishnu and Shiva, in the Puranas (ancient tales) and the Mahabharata and Ramayana epics, supply material for representational and… …   Universalium

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”