Element Type Declarations

From Overdensity
Jump to: navigation, search

Element type declarations

An element type declaration defines an element and its possible content. A valid XML document contains only elements that are defined in the DTD.

Various keywords and characters specify an element’s content:

  • EMPTY for specifying that the defined element allows no content, i.e., it can't have any children elements, not even text elements (if there are whitespaces, they are ignored);
  • ANY for specifying that the defined element allows any content, without restriction, i.e., that it may have any number (including none) and type of children elements (including text elements);
  • or an expression, specifying the only elements allowed as direct children in the content of the defined element; this content can be either:
    • a mixed content, which means that the content may include at least one text element and zero or more named elements, but their order and number of occurrences can't be restricted; this can be:
      • ( #PCDATA ): historically meaning parsed character data, this means that only one text element is allowed in the content (no quantifier is allowed);
      • ( #PCDATA | element name | ... )*: a limited choice (in an exclusive list between parentheses and separated by "|" pipe characters and terminated by the required "*" quantifier) of two or more child elements (including only text elements or the specified named elements) may be used in any order and number of occurrences in the content.
    • an element content, which means that there must be no text elements in the children elements of the content (all whitespaces encoded between child elements are then ignored, just like comments). Such element content is specified as content particle in a variant of Backus–Naur form without terminal symbols and element names as non-terminal symbols. Element content consists of:
      • a content particle can be either the name of an element declared in the DTD, or a sequence list or choice list. It may be followed by an optional quantifier.
        • a sequence list means an ordered list (specified between parentheses and separated by a "," comma character) of one or more content particles: all the content particles must appear successively as direct children in the content of the defined element, at the specified position and relative order;
        • a choice list means a mutually exclusive list (specified between parentheses and separated by a "|" pipe character) of two or more content particles: only one of these content particles may appear in the content of the defined element at the same position.
      • A quantifier is a single character that immediately follows the specified item it applies to, to restrict the number of successive occurrences of these items at the specified position in the content of the element; it may be either:
        • + for specifying that there must be one or more occurrences of the item — the effective content of each occurrence may be different;
        • * for specifying that any number (zero or more) of occurrences is allowed — the item is optional and the effective content of each occurrence may be different;
        • ? for specifying that there must not be more than one occurrence — the item is optional;
        • If there is no quantifier, the specified item must occur exactly one time at the specified position in the content of the element.

For example:

<!ELEMENT html (head, body)>
<!ELEMENT p (#PCDATA | p | ul | dl | table | h1|h2|h3)*>

Note that element type declarations are ignored by non-validating SGML and XML parsers (in which cases, any elements are accepted in any order, and in any number of occurrences in the parsed document), but these declarations are still checked for form and validity.