Conventions

The binary format for WebAssembly modules is a dense linear encoding of their abstract syntax. [1]

The format is defined by an attribute grammar whose only terminal symbols are bytes. A byte sequence is a well-formed encoding of a module if and only if it is generated by the grammar.

Each production of this grammar has exactly one synthesized attribute: the abstract syntax that the respective byte sequence encodes. Thus, the attribute grammar implicitly defines a decoding function (i.e., a parsing function for the binary format).

Except for a few exceptions, the binary grammar closely mirrors the grammar of the abstract syntax.

Note

Some phrases of abstract syntax have multiple possible encodings in the binary format. For example, numbers may be encoded as if they had optional leading zeros. Implementations of decoders must support all possible alternatives; implementations of encoders can pick any allowed encoding.

The recommended extension for files containing WebAssembly modules in binary format is “.wasm” and the recommended Media Type is “application/wasm”.

Grammar

The following conventions are adopted in defining grammar rules for the binary format. They mirror the conventions used for abstract syntax. In order to distinguish symbols of the binary syntax from symbols of the abstract syntax, typewriter font is adopted for the former.

  • Terminal symbols are bytes expressed in hexadecimal notation: 0x0F.

  • Nonterminal symbols are written in typewriter font: valtype,instr.

  • Bn is a sequence of n0 iterations of B.

  • B is a possibly empty sequence of iterations of B. (This is a shorthand for Bn used where n is not relevant.)

  • B? is an optional occurrence of B. (This is a shorthand for Bn where n1.)

  • x:B denotes the same language as the nonterminal B, but also binds the variable x to the attribute synthesized for B.

  • Productions are written sym::=B1A1 |  | BnAn, where each Ai is the attribute that is synthesized for sym in the given case, usually from attribute variables bound in Bi.

  • Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.

  • If the same meta variable or non-terminal symbol appears multiple times in a production (in the syntax or in an attribute), then all those occurrences must have the same instantiation. (This is a shorthand for a side condition requiring multiple different variables to be equal.)

Note

For example, the binary grammar for number types is given as follows:

numtype::=0x7Fi32|0x7Ei64|0x7Df32|0x7Cf64

Consequently, the byte 0x7F encodes the type i32, 0x7E encodes the type i64, and so forth. No other byte value is allowed as the encoding of a number type.

The binary grammar for limits is defined as follows:

limits::=0x00  n:u32{min n,max ϵ}|0x01  n:u32  m:u32{min n,max m}

That is, a limits pair is encoded as either the byte 0x00 followed by the encoding of a u32 value, or the byte 0x01 followed by two such encodings. The variables n and m name the attributes of the respective u32 nonterminals, which in this case are the actual unsigned integers those decode into. The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.

Auxiliary Notation

When dealing with binary encodings the following notation is also used:

  • ϵ denotes the empty byte sequence.

  • ||B|| is the length of the byte sequence generated from the production B in a derivation.

Vectors

Vectors are encoded with their u32 length followed by the encoding of their element sequence.

vec(B)::=n:u32  (x:B)nxn