Conventions¶

Validation checks that a WebAssembly module is well-formed. Only valid modules can be instantiated.

Validity is defined by a type system over the abstract syntax of a module and its contents. For each piece of abstract syntax, there is a typing rule that specifies the constraints that apply to it. All rules are given in two equivalent forms:

In prose, describing the meaning in intuitive form.
In formal notation, describing the rule in mathematical form. [1]

Note

The prose and formal rules are equivalent, so that understanding of the formal notation is not required to read this specification. The formalism offers a more concise description in notation that is used widely in programming languages semantics and is readily amenable to mathematical proof.

In both cases, the rules are formulated in a declarative manner. That is, they only formulate the constraints, they do not define an algorithm. The skeleton of a sound and complete algorithm for type-checking instruction sequences according to this specification is provided in the appendix.

Contexts¶

Validity of an individual definition is specified relative to a context, which collects relevant information about the surrounding module and the definitions in scope:

Types: the list of types defined in the current module.
Functions: the list of functions declared in the current module, represented by their function type.
Tables: the list of tables declared in the current module, represented by their table type.
Memories: the list of memories declared in the current module, represented by their memory type.
Globals: the list of globals declared in the current module, represented by their global type.
Element Segments: the list of element segments declared in the current module, represented by their element type.
Data Segments: the list of data segments declared in the current module, each represented by an $ok$ entry.
Locals: the list of locals declared in the current function (including parameters), represented by their value type.
Labels: the stack of labels accessible from the current position, represented by their result type.
Return: the return type of the current function, represented as an optional result type that is absent when no return is allowed, as in free-standing expressions.
References: the list of function indices that occur in the module outside functions and can hence be used to form references inside them.

In other words, a context contains a sequence of suitable types for each index space, describing each defined entry in that space. Locals, labels and return type are only used for validating instructions in function bodies, and are left empty elsewhere. The label stack is the only part of the context that changes as validation of an instruction sequence proceeds.

More concretely, contexts are defined as records $C$ with abstract syntax:

\begin{array}{r} \begin{array}{llll} C & ::= & \begin{array}{lll} { & types & {functype}^{*}, \\ funcs & {functype}^{*}, \\ tables & {tabletype}^{*}, \\ mems & {memtype}^{*}, \\ globals & {globaltype}^{*}, \\ elems & {reftype}^{*}, \\ datas & {ok}^{*}, \\ locals & {valtype}^{*}, \\ labels & {resulttype}^{*}, \\ return & {resulttype}^{?}, \\ refs & {funcidx}^{*}} \end{array} \end{array} \end{array}

In addition to field access written $C . field$ the following notation is adopted for manipulating contexts:

When spelling out a context, empty fields are omitted.
$C, field A^{*}$ denotes the same context as $C$ but with the elements $A^{*}$ prepended to its $field$ component sequence.

Note

Indexing notation like $C . labels [i]$ is used to look up indices in their respective index space in the context. Context extension notation $C, field A$ is primarily used to locally extend relative index spaces, such as label indices. Accordingly, the notation is defined to append at the front of the respective sequence, introducing a new relative index $0$ and shifting the existing ones.

Prose Notation¶

Validation is specified by stylised rules for each relevant part of the abstract syntax. The rules not only state constraints defining when a phrase is valid, they also classify it with a type. The following conventions are adopted in stating these rules.

A phrase $A$ is said to be “valid with type $T$ ” if and only if all constraints expressed by the respective rules are met. The form of $T$ depends on what $A$ is.

Note

For example, if $A$ is a function, then $T$ is a function type; for an $A$ that is a global, $T$ is a global type; and so on.
The rules implicitly assume a given context $C$ .
In some places, this context is locally extended to a context $C^{'}$ with additional entries. The formulation “Under context $C^{'}$ , … statement …” is adopted to express that the following statement must apply under the assumptions embodied in the extended context.

Formal Notation¶

Note

This section gives a brief explanation of the notation for specifying typing rules formally. For the interested reader, a more thorough introduction can be found in respective text books. [2]

The proposition that a phrase $A$ has a respective type $T$ is written $A : T$ . In general, however, typing is dependent on a context $C$ . To express this explicitly, the complete form is a judgement $C ⊢ A : T$ , which says that $A : T$ holds under the assumptions encoded in $C$ .

The formal typing rules use a standard approach for specifying type systems, rendering them into deduction rules. Every rule has the following general form:

\frac{{premise}_{1} {premise}_{2} \dots {premise}_{n}}{conclusion}

Such a rule is read as a big implication: if all premises hold, then the conclusion holds. Some rules have no premises; they are axioms whose conclusion holds unconditionally. The conclusion always is a judgment $C ⊢ A : T$ , and there is one respective rule for each relevant construct $A$ of the abstract syntax.

Note

For example, the typing rule for the $i 32 . add$ instruction can be given as an axiom:

\frac{}{C ⊢ i 32 . add : [i 32 i 32] \to [i 32]}

The instruction is always valid with type $[i 32 i 32] \to [i 32]$ (saying that it consumes two $i 32$ values and produces one), independent of any side conditions.

An instruction like $local . get$ can be typed as follows:

\frac{C . locals [x] = t}{C ⊢ local . get x : [] \to [t]}

Here, the premise enforces that the immediate local index $x$ exists in the context. The instruction produces a value of its respective type $t$ (and does not consume any values). If $C . locals [x]$ does not exist then the premise does not hold, and the instruction is ill-typed.

Finally, a structured instruction requires a recursive rule, where the premise is itself a typing judgement:

\frac{C ⊢ blocktype : [t_{1}^{*}] \to [t_{2}^{*}] C, label [t_{2}^{*}] ⊢ {instr}^{*} : [t_{1}^{*}] \to [t_{2}^{*}]}{C ⊢ block blocktype {instr}^{*} end : [t_{1}^{*}] \to [t_{2}^{*}]}

A $block$ instruction is only valid when the instruction sequence in its body is. Moreover, the result type must match the block’s annotation $blocktype$ . If so, then the $block$ instruction has the same type as the body. Inside the body an additional label of the corresponding result type is available, which is expressed by extending the context $C$ with the additional label information for the premise.