O Markup Language

This document specifies the O Markup Language (OML). The recommended file extension is .oml. Documents may be written in any character encoding that includes the ASCII symbol characters described below. Implementations are recommended to support at least UTF-8.

A represented document consists of a sequence of nodes. A node is either text or an element. An element has a label and a sequence of zero or more child nodes.

The paired ASCII symbol characters (<[{ (and }]>)) are called left (right) beaks. A sequence of one or two of the remaining 24 ASCII symbol characters !"#$%&'*+,-./:;=?@\^_`|~ is called an eye. A two-character eye takes precedence over a one-character eye. A sequence of one or more ASCII whitespace characters --- namely horizontal tab (HT, 0x09), line feed (LF, 0x0A), vertical tab (VT, 0x0B, \v), form feed (FF, 0x0C, \f), carriage return (CR, 0x0D, \r), and space (SP, 0x20) --- is called a cheek. Note that null characters (NUL, 0x0) is either replaced by replacement characters if the character encoding supports them --- for example, the REPLACEMENT CHARACTER (U+FFFD) can be used for Unicode --- or removed otherwise, in the representation after parsing.

Parsing proceeds sequentially. Throughout the following, anything described as "potential" becomes text if it does not ultimately become an element. An occurrence of a beak, an eye, and an optional cheek is a potential left head. An occurrence of an optional cheek, an eye, and a beak attempts to create an element as described below. All other types of occurrences are text.

Some left heads have a corresponding label. The set of such mappings is called the vocabulary. A special element that modifies the vocabulary is called a vocabulary change. Its default left head is <!. A left head carries its label if one is assigned by the vocabulary.

An element is created when the nearest preceding left head in the sequence exists, has the same kind of beak, and has a matching eye. Eyes match when the number of eyes is the same, and either there is one eye and it is the same as the left head's eye, or there are two eyes and the left (right) eye is the same as the left head's right (left) eye. When the condition is met, for the sequence between the left head and the closing eye: if the left head's label denotes a vocabulary change, the processing described below is performed; if the left head's label is non-empty, an element with that sequence as its children is created; and if the left head's label is empty and it is a direct child of a potential vocabulary change, it becomes a potential element.

In processing a vocabulary change, for each item in its content sequence, the following operation is performed at most once where applicable: if the item is an element, its content sequence is examined; if that content is text, the text becomes the label mapped to the element; if it is a left head, the mapping for that left head is transferred to the element's left head.

The language is now fully specified. The remainder of this document is non-normative.

Examples

This section presents examples. Each example is accompanied by its JSON output, enabling conformance testing against this document. The JSON output follows the JSON schema below, which represents the abstract syntax tree. This is the format produced by the reference implementation's executable, though conforming implementations are not required to produce this exact structure.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "urn:local:oml-ast.schema.json",
  "title": "Sample AST",
  "type": "array",
  "items": { "$ref": "#/$defs/node" },
  "$defs": {
    "node": {
      "anyOf": [
        { "type": "string" },
        { "$ref": "#/$defs/element" }
      ]
    },
    "element": {
      "type": "object",
      "properties": {
        "label": { "type": "string" },
        "children": {
          "type": "array",
          "items": { "$ref": "#/$defs/node" }
        }
      },
      "required": ["label", "children"],
      "additionalProperties": false
    }
  }
}

Any string is a valid source.

Case 1:

["a"]

A copyright symbol is not an element, because it contains no eye. It remains a plain string as-is.

Case 2:

(C)

["(C)"]

If a label is not in the vocabulary, no element is created.

Case 3:

(+a+)

["(+a+)"]

The minimal vocabulary change is as follows.

Case 4:

<!(*a*)!>(*b*)

[{"label":"a","children":["b"]}]

When there are two eyes, it is as follows.

Case 5:

<!(:~a~:)!>(:~b~:)

[{"label":"a","children":["b"]}]

When a mapping is moved by a vocabulary change, the original mapping is removed.

Case 6:

<!(+a+)(* (+ *)!>(+b+)(*c*)

["(+b+)",{"label":"a","children":["c"]}]

If no element is created, any part that could have been a cheek is preserved.

Case 7:

(+ (* +)

["(+ (* +)"]

If an element is created, the cheek is removed.

Case 8:

<!(+a+)!>(+ (* +)

[{"label":"a","children":["(*"]}]

A vocabulary change can itself be changed to another element.

Case 9:

<! <? <! ?> !><? (+a+) ?><!a!>(+b+)

["<!a!>",{"label":"a","children":["b"]}]

Unclosed left heads are plain texts.

Case 10:

["<"]

Case 11:

<!

["<!"]

Case 12:

<!?

["<!?"]

Right heads without left heads are plain texts.

Case 13:

["!"]

Case 14:

!>

["!>"]

Case 15:

?!>

["?!>"]

Changelog

The first edition was written on May 21, 2023. The design was inspired by TeX, XML, and Djot. On April 27, 2025, the acronym of this language was changed to OML. On May 17, 2026, the eighth edition added the handling of whitespace. On May 21, 2026, this language and the reference implementation was publicly released. On June 21, 2026, we added about null characters handling of replacing them with replacement characters.

License

Copyright (C) 2023-2026 gemmaro.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.  This file is offered as-is,
without any warranty.