Conventions

WebAssembly is a programming language that has multiple concrete representations (its binary format and the text format). Both map to a common structure. For conciseness, this structure is described in the form of an abstract syntax. All parts of this specification are defined in terms of this abstract syntax.

Grammar Notation

The following conventions are adopted in defining grammar rules for abstract syntax.

  • Terminal symbols (atoms) are written in sans-serif font or in symbolic form: i32,end,,[,].

  • Nonterminal symbols are written in italic font: valtype,instr.

  • An is a sequence of n0 iterations of A.

  • A is a possibly empty sequence of iterations of A. (This is a shorthand for An used where n is not relevant.)

  • A+ is a non-empty sequence of iterations of A. (This is a shorthand for An where n1.)

  • A? is an optional occurrence of A. (This is a shorthand for An where n1.)

  • Productions are written sym::=A1 |  | An.

  • Large productions may be split into multiple definitions, indicated by ending the first one with explicit ellipses, sym::=A1 | , and starting continuations with ellipses, sym::= | A2.

  • Some productions are augmented with side conditions in parentheses, “(ifcondition)”, that provide a shorthand for a combinatorial expansion of the production into many separate cases.

  • If the same meta variable or non-terminal symbol appears multiple times in a production, then all those occurrences must have the same instantiation. (This is a shorthand for a side condition requiring multiple different variables to be equal.)

Auxiliary Notation

When dealing with syntactic constructs the following notation is also used:

  • ϵ denotes the empty sequence.

  • |s| denotes the length of a sequence s.

  • s[i] denotes the i-th element of a sequence s, starting from 0.

  • s[i:n] denotes the sub-sequence s[i]  s[i+n1] of a sequence s.

  • swith[i]=A denotes the same sequence as s, except that the i-th element is replaced with A.

  • swith[i:n]=An denotes the same sequence as s, except that the sub-sequence s[i:n] is replaced with An.

  • concat(s) denotes the flat sequence formed by concatenating all sequences si in s.

Moreover, the following conventions are employed:

  • The notation xn, where x is a non-terminal symbol, is treated as a meta variable ranging over respective sequences of x (similarly for x, x+, x?).

  • When given a sequence xn, then the occurrences of x in a sequence written (A1 x A2)n are assumed to be in point-wise correspondence with xn (similarly for x, x+, x?). This implicitly expresses a form of mapping syntactic constructions over a sequence.

Productions of the following form are interpreted as records that map a fixed set of fields fieldi to “values” Ai, respectively:

r ::= {field1 A1,field2 A2,}

The following notation is adopted for manipulating such records:

  • r.field denotes the contents of the field component of r.

  • rwithfield=A denotes the same record as r, except that the contents of the field component is replaced with A.

  • r1r2 denotes the composition of two records with the same fields of sequences by appending each sequence point-wise:

    {field1A1,field2A2,}{field1B1,field2B2,}={field1A1 B1,field2A2 B2,}
  • r denotes the composition of a sequence of records, respectively; if the sequence is empty, then all fields of the resulting record are empty.

The update notation for sequences and records generalizes recursively to nested components accessed by “paths” pth::=([]|.field)+:

  • swith[i]pth=A is short for swith[i]=(s[i]withpth=A),

  • rwithfieldpth=A is short for rwithfield=(r.fieldwithpth=A),

where rwith .field=A is shortened to rwithfield=A.

Vectors

Vectors are bounded sequences of the form An (or A), where the A can either be values or complex constructions. A vector can have at most 2321 elements.

vec(A)::=An(ifn<232)