Conventions¶
Validation checks that a WebAssembly module is well-formed. Only valid modules can be instantiated.
Validity is defined by a type system over the abstract syntax of a module and its contents. For each piece of abstract syntax, there is a typing rule that specifies the constraints that apply to it. All rules are given in two equivalent forms:
In prose, describing the meaning in intuitive form.
In formal notation, describing the rule in mathematical form. [1]
Note
The prose and formal rules are equivalent, so that understanding of the formal notation is not required to read this specification. The formalism offers a more concise description in notation that is used widely in programming languages semantics and is readily amenable to mathematical proof.
In both cases, the rules are formulated in a declarative manner. That is, they only formulate the constraints, they do not define an algorithm. The skeleton of a sound and complete algorithm for type-checking instruction sequences according to this specification is provided in the appendix.
Types¶
To define the semantics, the definition of some sorts of types is extended to include additional forms. By virtue of not being representable in either the binary format or the text format, these forms cannot be used in a program; they only occur during validation or execution.
The unique value type \(\href{../valid/conventions.html#syntax-valtype-ext}{\mathsf{bot}}\) is a bottom type that matches all value types. Similarly, \(\href{../valid/conventions.html#syntax-heaptype-ext}{\mathsf{bot}}\) is also used as a bottom type of all heap types.
Note
No validation rule uses bottom types explicitly, but various rules can pick any value or heap type, including bottom. This ensures the existence of principal types, and thus a validation algorithm without back tracking.
A concrete heap type can consist of a defined type directly. this occurs as the result of substituting a type index with its definition.
A concrete heap type may also be a recursive type index. Such an index refers to the \(i\)-th component of a surrounding recursive type. It occurs as the result of rolling up the definition of a recursive type.
Finally, the representation of supertypes in a sub type is generalized from mere type indices to heap types. They occur as defined types or recursive type indices after substituting type indices or rolling up recursive types.
Note
It is an invariant of the semantics that sub types occur only in one of two forms: either as “syntactic” types as in a source module, where all supertypes are type indices, or as “semantic” types, where all supertypes are resolved to either defined types or recursive type indices.
A type of any form is closed when it does not contain a heap type that is a type index or a recursive type index without a surrounding recursive type, i.e., all type indices have been substituted with their defined type and all free recursive type indices have been unrolled.
Note
Recursive type indices are internal to a recursive type. They are distinguished from regular type indices and represented such that two closed types are syntactically equal if and only if they have the same recursive structure.
Convention¶
The difference \(\mathit{rt}_1\href{../valid/conventions.html#aux-reftypediff}{\setminus}\mathit{rt}_2\) between two reference types is defined as follows:
\[\begin{split}\begin{array}{lll} (\href{../syntax/types.html#syntax-reftype}{\mathsf{ref}}~\href{../syntax/types.html#syntax-reftype}{\mathsf{null}}_1^?~\mathit{ht}_1) \href{../valid/conventions.html#aux-reftypediff}{\setminus} (\href{../syntax/types.html#syntax-reftype}{\mathsf{ref}}~\href{../syntax/types.html#syntax-reftype}{\mathsf{null}}~\mathit{ht}_2) &=& (\href{../syntax/types.html#syntax-reftype}{\mathsf{ref}}~\mathit{ht}_1) \\ (\href{../syntax/types.html#syntax-reftype}{\mathsf{ref}}~\href{../syntax/types.html#syntax-reftype}{\mathsf{null}}_1^?~\mathit{ht}_1) \href{../valid/conventions.html#aux-reftypediff}{\setminus} (\href{../syntax/types.html#syntax-reftype}{\mathsf{ref}}~\mathit{ht}_2) &=& (\href{../syntax/types.html#syntax-reftype}{\mathsf{ref}}~\href{../syntax/types.html#syntax-reftype}{\mathsf{null}}_1^?~\mathit{ht}_1) \\ \end{array}\end{split}\]
Note
This definition computes an approximation of the reference type that is inhabited by all values from \(\mathit{rt}_1\) except those from \(\mathit{rt}_2\). Since the type system does not have general union types, the defnition only affects the presence of null and cannot express the absence of other values.
Defined Types¶
Defined types denote the individual types defined in a module. Each such type is represented as a projection from the recursive type group it originates from, indexed by its position in that group.
Defined types do not occur in the binary or text format, but are formed by rolling up the recursive types defined in a module.
It is hence an invariant of the semantics that all recursive types occurring in defined types are rolled up.
Conventions¶
\(t[x^\ast \href{../valid/conventions.html#notation-subst}{\mathrel{\mathbf{:=}}} \mathit{dt}^\ast]\) denotes the parallel substitution of type indices \(x^\ast\) with defined types \(\mathit{dt}^\ast\) in type \(t\), provided \(|x^\ast| = |\mathit{dt}^\ast|\).
\(t[(\href{../syntax/types.html#syntax-heaptype}{\mathsf{rec}}~i)^\ast \href{../valid/conventions.html#notation-subst}{\mathrel{\mathbf{:=}}} \mathit{dt}^\ast]\) denotes the parallel substitution of recursive type indices \((\href{../syntax/types.html#syntax-heaptype}{\mathsf{rec}}~i)^\ast\) with defined types \(\mathit{dt}^\ast\) in type \(t\), provided \(|(\href{../syntax/types.html#syntax-heaptype}{\mathsf{rec}}~i)^\ast| = |\mathit{dt}^\ast|\).
\(t[\href{../valid/conventions.html#notation-subst}{\mathrel{\mathbf{:=}}} \mathit{dt}^\ast]\) is shorthand for the substitution \(t[x^\ast \href{../valid/conventions.html#notation-subst}{\mathrel{\mathbf{:=}}} \mathit{dt}^\ast]\), where \(x^\ast = 0 \cdots (|\mathit{dt}^\ast| - 1)\).
Rolling and Unrolling¶
In order to allow comparing recursive types for equivalence, their representation is changed such that all type indices internal to the same recursive type are replaced by recursive type indices.
Note
This representation is independent of the type index space, so that it is meaningful across module boundaries. Moreover, this representation ensures that types with equivalent recursive structure are also syntactically equal, hence allowing a simple equality check on (closed) types. It gives rise to an iso-recursive interpretation of types.
The representation change is performed by two auxiliary operations on the syntax of recursive types:
Rolling up a recursive type substitutes its internal type indices with corresponding recursive type indices.
Unrolling a recursive type substitutes its recursive type indices with the corresponding defined types.
These operations are extended to defined types and defined as follows:
In addition, the following auxiliary function denotes the expansion of a defined type:
Instruction Types¶
Instruction types classify the behaviour of instructions or instruction sequences, by describing how they manipulate the operand stack and the initialization status of locals:
An instruction type \([t_1^\ast] \mathrel{\href{../valid/conventions.html#syntax-instrtype}{\rightarrow}}_{x^\ast} [t_2^\ast]\) describes the required input stack with argument values of types \(t_1^\ast\) that an instruction pops off and the provided output stack with result values of types \(t_2^\ast\) that it pushes back. Moreover, it enumerates the indices \(x^\ast\) of locals that have been set by the instruction or sequence.
Note
Instruction types are only used for validation, they do not occur in programs.
Local Types¶
Local types classify locals, by describing their value type as well as their initialization status:
Note
Local types are only used for validation, they do not occur in programs.
Contexts¶
Validity of an individual definition is specified relative to a context, which collects relevant information about the surrounding module and the definitions in scope:
Types: the list of types defined in the current module.
Functions: the list of functions declared in the current module, represented by a defined type that expands to their function type.
Tables: the list of tables declared in the current module, represented by their table type.
Memories: the list of memories declared in the current module, represented by their memory type.
Globals: the list of globals declared in the current module, represented by their global type.
Tags: the list of tags declared in the current module, represented by their tag type.
Element Segments: the list of element segments declared in the current module, represented by the elements’ reference type.
Data Segments: the list of data segments declared in the current module, each represented by an \(\mathrel{\mbox{ok}}\) entry.
Locals: the list of locals declared in the current function (including parameters), represented by their local type.
Labels: the stack of labels accessible from the current position, represented by their result type.
Return: the return type of the current function, represented as an optional result type that is absent when no return is allowed, as in free-standing expressions.
References: the list of function indices that occur in the module outside functions and can hence be used to form references inside them.
In other words, a context contains a sequence of suitable types for each index space, describing each defined entry in that space. Locals, labels and return type are only used for validating instructions in function bodies, and are left empty elsewhere. The label stack is the only part of the context that changes as validation of an instruction sequence proceeds.
More concretely, contexts are defined as records \(C\) with abstract syntax:
In addition to field access written \(C.\mathsf{field}\) the following notation is adopted for manipulating contexts:
When spelling out a context, empty fields are omitted.
\(C,\mathsf{field}\,A^\ast\) denotes the same context as \(C\) but with the elements \(A^\ast\) prepended to its \(\mathsf{field}\) component sequence.
Note
Indexing notation like \(C.\href{../valid/conventions.html#context}{\mathsf{labels}}[i]\) is used to look up indices in their respective index space in the context. Context extension notation \(C,\mathsf{field}\,A\) is primarily used to locally extend relative index spaces, such as label indices. Accordingly, the notation is defined to append at the front of the respective sequence, introducing a new relative index \(0\) and shifting the existing ones.
Convention¶
Any form of type can be closed to bring it into closed form relative to a context it is valid in by substituting each type index \(x\) occurring in it with the corresponding defined type \(C.\href{../valid/conventions.html#context}{\mathsf{types}}[x]\), after first closing the types in \(C.\href{../valid/conventions.html#context}{\mathsf{types}}\) themselves.
Prose Notation¶
Validation is specified by stylised rules for each relevant part of the abstract syntax. The rules not only state constraints defining when a phrase is valid, they also classify it with a type. The following conventions are adopted in stating these rules.
A phrase \(A\) is said to be “valid with type \(T\)” if and only if all constraints expressed by the respective rules are met. The form of \(T\) depends on what \(A\) is.
Note
For example, if \(A\) is a function, then \(T\) is a function type; for an \(A\) that is a global, \(T\) is a global type; and so on.
The rules implicitly assume a given context \(C\).
In some places, this context is locally extended to a context \(C'\) with additional entries. The formulation “Under context \(C'\), … statement …” is adopted to express that the following statement must apply under the assumptions embodied in the extended context.
Formal Notation¶
Note
This section gives a brief explanation of the notation for specifying typing rules formally. For the interested reader, a more thorough introduction can be found in respective text books. [2]
The proposition that a phrase \(A\) has a respective type \(T\) is written \(A : T\). In general, however, typing is dependent on a context \(C\). To express this explicitly, the complete form is a judgement \(C \vdash A : T\), which says that \(A : T\) holds under the assumptions encoded in \(C\).
The formal typing rules use a standard approach for specifying type systems, rendering them into deduction rules. Every rule has the following general form:
Such a rule is read as a big implication: if all premises hold, then the conclusion holds. Some rules have no premises; they are axioms whose conclusion holds unconditionally. The conclusion always is a judgment \(C \vdash A : T\), and there is one respective rule for each relevant construct \(A\) of the abstract syntax.
Note
For example, the typing rule for the \(\href{../syntax/types.html#syntax-valtype}{\mathsf{i32}}.\href{../syntax/instructions.html#syntax-instr-numeric}{\mathsf{add}}\) instruction can be given as an axiom:
The instruction is always valid with type \([\href{../syntax/types.html#syntax-valtype}{\mathsf{i32}}~\href{../syntax/types.html#syntax-valtype}{\mathsf{i32}}] \mathrel{\href{../valid/conventions.html#syntax-instrtype}{\rightarrow}} [\href{../syntax/types.html#syntax-valtype}{\mathsf{i32}}]\) (saying that it consumes two \(\href{../syntax/types.html#syntax-valtype}{\mathsf{i32}}\) values and produces one), independent of any side conditions.
An instruction like \(\href{../syntax/instructions.html#syntax-instr-variable}{\mathsf{local.get}}\) can be typed as follows:
Here, the premise enforces that the immediate global index \(x\) exists in the context. The instruction produces a value of its respective type \(t\) (and does not consume any values). If \(C.\href{../valid/conventions.html#context}{\mathsf{globals}}[x]\) does not exist then the premise does not hold, and the instruction is ill-typed.
Finally, a structured instruction requires a recursive rule, where the premise is itself a typing judgement:
A \(\href{../syntax/instructions.html#syntax-instr-control}{\mathsf{block}}\) instruction is only valid when the instruction sequence in its body is. Moreover, the result type must match the block’s annotation \(\href{../syntax/instructions.html#syntax-blocktype}{\mathit{blocktype}}\). If so, then the \(\href{../syntax/instructions.html#syntax-instr-control}{\mathsf{block}}\) instruction has the same type as the body. Inside the body an additional label of the corresponding result type is available, which is expressed by extending the context \(C\) with the additional label information for the premise.