SGML entity

SGML entity

In SGML and its derived markup languages HTML and XML, an entity is a named body of data associated with a document, or the unnamed "document entity" itself. Once defined, a named entity can be referenced any number of times within the document, via "entity references".

An entity typically consists of a sequence of one or more characters, either encapsulated within a declaration of the entity's existence, or encoded and stored externally in a text file. Such entities and their references are comparable to macros and include files in programming languages. However, entities don't necessarily have to contain parseable text; they may also be binary files that are included in a document only by reference, with the entity name as the value of an ENTITY-type attribute.

Types of entities

Entities are classified as general or parameter:
* A "general" entity can only be referenced within the document content.
* A "parameter" entity can only be referenced within the DTD.

Entities are further classified as internal or external:
* An "internal" entity exists within its own declaration.
* An "external" entity exists in an external file.

Entities are also further classified as parsed or unparsed:
* A "parsed" entity contains text which will be incorporated into the document and parsed, if the entity is referenced. A parameter entity can only be a parsed entity.
* An "unparsed" entity contains any kind of data, and a reference to it will result in the application merely being notified of the entity's presence; the content of the entity will not be parsed, even if it is text. An unparsed entity can only be external.


An entity is defined via an "entity declaration" in a document's DTD. For example:

<!ENTITY greeting1 "Hello world"> <!ENTITY greeting2 SYSTEM "file:///hello.txt"> <!ENTITY % greeting3 "¡Hola!"> <!ENTITY greeting4 "%greeting3; means Hello!">

This DTD markup declares the following:
* An internal general entity named "greeting1" exists and consists of the string "Hello world".
* An external general entity named "greeting2" exists and consists of the text found in the resource identified by the URI "file:///hello.txt".
* An internal parameter entity named "greeting3" exists and consists of the string "¡Hola!".
* An internal general entity named "greeting4" exists and consists of the string "¡Hola! means Hello!".

Names for entities must follow the rules for SGML names, and there are limitations on where entities can be referenced.

Parameter entities are referenced by placing the entity name between "%" and ";". Parsed general entities are referenced by placing the entity name between "&" and ";". Unparsed entities are referenced by placing the entity name in the value of an attribute declared as type ENTITY.

The general entities from the example above might be referenced in a document as follows:

<content> <info>'&greeting1;' is a common test string.</info> <info>The content of hello.txt is: &greeting2;</info> <info>In Spanish, &greeting4;</info> </content>

When parsed, this document would be reported to the downstream application the same as if it has been written as follows, assuming the hello.txt file contains the text "Salutations":

<content> <info>'Hello world' is a common test string.</info> <info>The content of hello.txt is: Salutations</info> <info>In Spanish, ¡Hola! means Hello!</info> </content>

A reference to an undeclared entity is an error unless a default entity has been defined. For example:

<!ENTITY DEFAULT "This entity is not defined">

Additional markup constructs and processor options may affect whether and how entities are processed. For example, a processor may optionally ignore external entities.

In SGML, entities have additional capabilities and nuances that are not described here.

In HTML, the DTDs are immutable; new entities cannot be defined. In XML, new entities can be defined, but XML only supports a limited subset of SGML entity functionality.

Character entities

Standard entity sets for SGML and some of its derivatives have been developed as mnemonic devices, to ease document authoring when there is a need to use characters that are not easily typed or that are not widely supported by legacy character encodings. Each such entity consists of just one character from the Universal Character Set. Although any character can be referenced using a numeric character reference, a character entity reference allows characters to be referenced by name instead of code point.

HTML 4, for example, has 252 built-in character entities that don't have to be explicitly declared. XML has five. XHTML has the same five as XML, but if its DTDs are explicitly used, then it has 253 (' being the extra entity).

The terms "entity", "character entity", "character reference", "character entity reference", and "entity reference" are often used interchangeably.

ee also

* List of XML and HTML character entity references

External links

* [ Entity Management in SGML]

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Entity (disambiguation) — Entity may refer to:* Entity, a unit * a part of an Entity relationship model or Entity relationship diagram * Character entity reference in HTML, XML or SGML * Entity (netlabel), a Belgian netlabel specialising in experimental electronic musicIn …   Wikipedia

  • Entity — An entity is something that has a distinct, separate existence, though it need not be a material existence. In particular, abstractions and legal fictions are usually regarded as entities. In general, there is also no presumption that an entity… …   Wikipedia

  • Entity — steht für: Entität, in der Datenmodellierung ein eindeutig zu bestimmendes Objekt HTML Entity, eine Referenz auf eine bestimmte benannte SGML Entität Diese Seite ist eine Begriffsklärung zur Unterscheidung mehrerer mit demse …   Deutsch Wikipedia

  • Character entity reference — In the markup languages SGML, HTML, XHTML and XML, a character entity reference is a reference to a particular kind of named entity that has been predefined or explicitly declared in a Document Type Definition (DTD). The replacement text of the… …   Wikipedia

  • List of XML and HTML character entity references — In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of… …   Wikipedia

  • HTML-Entity — Die Artikel Entitäten in Auszeichnungssprachen und Zeichen Entität Referenz überschneiden sich thematisch. Hilf mit, die Artikel besser voneinander abzugrenzen oder zu vereinigen. Beteilige dich dazu an der Diskussion über diese Überschneidungen …   Deutsch Wikipedia

  • Document Type Definition — (DTD) is a set of markup declarations that define a document type for SGML family markup languages (SGML, XML, HTML). DTDs were a precursor to XML schema and have a similar function, although different capabilities. DTDs use a terse formal syntax …   Wikipedia

  • Standard Generalized Markup Language — Infobox file format name = Standard Generalized Markup Language icon = logo = extension = mime = application/sgml, text/sgml type code = uniform type = public.xml magic = owner = ISO genre = metalanguage container for = contained by = extended… …   Wikipedia

  • Ampersand — This article is about the symbol. For the magazine, see Ampersand (magazine). Ampersand Punctuation …   Wikipedia

  • Entitäten in Auszeichnungssprachen — Auszeichnungssprachen (Markup Languages) verwenden Entitäten (engl. entity bzw. entities, deutsch Einheiten, Gebilde), um wiederkehrende Informationseinheiten zu standardisieren. Die heute weit verbreiteten Formate für Entitäten basieren auf SGML …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.