Conventions¶
The binary format for WebAssembly modules is a dense linear encoding of their abstract syntax. 1
The format is defined by an attribute grammar whose only terminal symbols are bytes. A byte sequence is a well-formed encoding of a module if and only if it is generated by the grammar.
Each production of this grammar has exactly one synthesized attribute: the abstract syntax that the respective byte sequence encodes. Thus, the attribute grammar implicitly defines a decoding function (i.e., a parsing function for the binary format).
Except for a few exceptions, the binary grammar closely mirrors the grammar of the abstract syntax.
Note
Some phrases of abstract syntax have multiple possible encodings in the binary format. For example, numbers may be encoded as if they had optional leading zeros. Implementations of decoders must support all possible alternatives; implementations of encoders can pick any allowed encoding.
The recommended extension for files containing WebAssembly modules in binary format is “\(\mathtt{.wasm}\)” and the recommended Media Type is “\(\mathtt{application/wasm}\)”.
- 1
Additional encoding layers – for example, introducing compression – may be defined on top of the basic representation defined here. However, such layers are outside the scope of the current specification.
Grammar¶
The following conventions are adopted in defining grammar rules for the binary format. They mirror the conventions used for abstract syntax. In order to distinguish symbols of the binary syntax from symbols of the abstract syntax, \(\mathtt{typewriter}\) font is adopted for the former.
Terminal symbols are bytes expressed in hexadecimal notation: \(\def\mathdef675#1{\mathtt{0x#1}}\mathdef675{0F}\).
Nonterminal symbols are written in typewriter font: \(\mathtt{valtype}, \mathtt{instr}\).
\(B^n\) is a sequence of \(n\geq 0\) iterations of \(B\).
\(B^\ast\) is a possibly empty sequence of iterations of \(B\). (This is a shorthand for \(B^n\) used where \(n\) is not relevant.)
\(B^?\) is an optional occurrence of \(B\). (This is a shorthand for \(B^n\) where \(n \leq 1\).)
\(x{:}B\) denotes the same language as the nonterminal \(B\), but also binds the variable \(x\) to the attribute synthesized for \(B\).
Productions are written \(\mathtt{sym} ::= B_1 \Rightarrow A_1 ~|~ \dots ~|~ B_n \Rightarrow A_n\), where each \(A_i\) is the attribute that is synthesized for \(\mathtt{sym}\) in the given case, usually from attribute variables bound in \(B_i\).
Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.
If the same meta variable or non-terminal symbol appears multiple times in a production (in the syntax or in an attribute), then all those occurrences must have the same instantiation. (This is a shorthand for a side condition requiring multiple different variables to be equal.)
Note
For example, the binary grammar for value types is given as follows:
Consequently, the byte \(\def\mathdef680#1{\mathtt{0x#1}}\mathdef680{7F}\) encodes the type \(\href{../syntax/types.html#syntax-valtype}{\mathsf{i32}}\), \(\def\mathdef681#1{\mathtt{0x#1}}\mathdef681{7E}\) encodes the type \(\href{../syntax/types.html#syntax-valtype}{\mathsf{i64}}\), and so forth. No other byte value is allowed as the encoding of a value type.
The binary grammar for limits is defined as follows:
That is, a limits pair is encoded as either the byte \(\def\mathdef684#1{\mathtt{0x#1}}\mathdef684{00}\) followed by the encoding of a \(\href{../syntax/values.html#syntax-int}{\mathit{u32}}\) value, or the byte \(\def\mathdef685#1{\mathtt{0x#1}}\mathdef685{01}\) followed by two such encodings. The variables \(n\) and \(m\) name the attributes of the respective \(\href{../binary/values.html#binary-int}{\def\mathdef643#1{{\mathtt{u}#1}}\mathdef643{\mathtt{32}}}\) nonterminals, which in this case are the actual unsigned integers those decode into. The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.
Auxiliary Notation¶
When dealing with binary encodings the following notation is also used:
\(\epsilon\) denotes the empty byte sequence.
\(||B||\) is the length of the byte sequence generated from the production \(B\) in a derivation.
Vectors¶
Vectors are encoded with their \(\href{../binary/values.html#binary-int}{\def\mathdef643#1{{\mathtt{u}#1}}\mathdef643{\mathtt{32}}}\) length followed by the encoding of their element sequence.