1. Introduction
1.1. Introduction
WebAssembly (abbreviated Wasm [1]) is a safe, portable, low-level code format designed for efficient execution and compact representation. Its main goal is to enable high performance applications on the Web, but it does not make any Web-specific assumptions or provide Web-specific features, so it can be employed in other environments as well.
WebAssembly is an open standard developed by a W3C Community Group.
This document describes version 2.0 + tail calls + function references + gc (Draft 2024-09-19) of the core WebAssembly standard. It is intended that it will be superseded by new incremental releases with additional features in the future.
1.1.1. Design Goals
The design goals of WebAssembly are the following:
-
Fast, safe, and portable semantics:
-
Fast: executes with near native code performance, taking advantage of capabilities common to all contemporary hardware.
-
Safe: code is validated and executes in a memory-safe [2], sandboxed environment preventing data corruption or security breaches.
-
Well-defined: fully and precisely defines valid programs and their behavior in a way that is easy to reason about informally and formally.
-
Hardware-independent: can be compiled on all modern architectures, desktop or mobile devices and embedded systems alike.
-
Language-independent: does not privilege any particular language, programming model, or object model.
-
Platform-independent: can be embedded in browsers, run as a stand-alone VM, or integrated in other environments.
-
Open: programs can interoperate with their environment in a simple and universal manner.
-
-
Efficient and portable representation:
-
Compact: has a binary format that is fast to transmit by being smaller than typical text or native code formats.
-
Modular: programs can be split up in smaller parts that can be transmitted, cached, and consumed separately.
-
Efficient: can be decoded, validated, and compiled in a fast single pass, equally with either just-in-time (JIT) or ahead-of-time (AOT) compilation.
-
Streamable: allows decoding, validation, and compilation to begin as soon as possible, before all data has been seen.
-
Parallelizable: allows decoding, validation, and compilation to be split into many independent parallel tasks.
-
Portable: makes no architectural assumptions that are not broadly supported across modern hardware.
-
WebAssembly code is also intended to be easy to inspect and debug, especially in environments like web browsers, but such features are beyond the scope of this specification.
1.1.2. Scope
At its core, WebAssembly is a virtual instruction set architecture (virtual ISA). As such, it has many use cases and can be embedded in many different environments. To encompass their variety and enable maximum reuse, the WebAssembly specification is split and layered into several documents.
This document is concerned with the core ISA layer of WebAssembly. It defines the instruction set, binary encoding, validation, and execution semantics, as well as a textual representation. It does not, however, define how WebAssembly programs can interact with a specific environment they execute in, nor how they are invoked from such an environment.
Instead, this specification is complemented by additional documents defining interfaces to specific embedding environments such as the Web. These will each define a WebAssembly application programming interface (API) suitable for a given environment.
1.1.3. Security Considerations
WebAssembly provides no ambient access to the computing environment in which code is executed. Any interaction with the environment, such as I/O, access to resources, or operating system calls, can only be performed by invoking functions provided by the embedder and imported into a WebAssembly module. An embedder can establish security policies suitable for a respective environment by controlling or limiting which functional capabilities it makes available for import. Such considerations are an embedder’s responsibility and the subject of API definitions for a specific environment.
Because WebAssembly is designed to be translated into machine code running directly on the host’s hardware, it is potentially vulnerable to side channel attacks on the hardware level. In environments where this is a concern, an embedder may have to put suitable mitigations into place to isolate WebAssembly computations.
1.1.4. Dependencies
WebAssembly depends on two existing standards:
-
[IEEE-754-2019], for the representation of floating-point data and the semantics of respective numeric operations.
-
[UNICODE], for the representation of import/export names and the text format.
However, to make this specification self-contained, relevant aspects of the aforementioned standards are defined and formalized as part of this specification, such as the binary representation and rounding of floating-point values, and the value range and UTF-8 encoding of Unicode characters.
Note
The aforementioned standards are the authoritative source of all respective definitions. Formalizations given in this specification are intended to match these definitions. Any discrepancy in the syntax or semantics described is to be considered an error.
1.2. Overview
1.2.1. Concepts
WebAssembly encodes a low-level, assembly-like programming language. This language is structured around the following concepts.
- Values
-
WebAssembly provides only four basic number types. These are integers and [IEEE-754-2019] numbers, each in 32 and 64 bit width. 32 bit integers also serve as Booleans and as memory addresses. The usual operations on these types are available, including the full matrix of conversions between them. There is no distinction between signed and unsigned integer types. Instead, integers are interpreted by respective operations as either unsigned or signed in two’s complement representation.
In addition to these basic number types, there is a single 128 bit wide vector type representing different types of packed data. The supported representations are 4 32-bit, or 2 64-bit [IEEE-754-2019] numbers, or different widths of packed integer values, specifically 2 64-bit integers, 4 32-bit integers, 8 16-bit integers, or 16 8-bit integers.
Finally, values can consist of opaque references that represent pointers towards different sorts of entities. Unlike with other types, their size or representation is not observable.
- Instructions
-
The computational model of WebAssembly is based on a stack machine. Code consists of sequences of instructions that are executed in order. Instructions manipulate values on an implicit operand stack [1] and fall into two main categories. Simple instructions perform basic operations on data. They pop arguments from the operand stack and push results back to it. Control instructions alter control flow. Control flow is structured, meaning it is expressed with well-nested constructs such as blocks, loops, and conditionals. Branches can only target such constructs.
- Traps
-
Under some conditions, certain instructions may produce a trap, which immediately aborts execution. Traps cannot be handled by WebAssembly code, but are reported to the outside environment, where they typically can be caught.
- Functions
-
Code is organized into separate functions. Each function takes a sequence of values as parameters and returns a sequence of values as results. Functions can call each other, including recursively, resulting in an implicit call stack that cannot be accessed directly. Functions may also declare mutable local variables that are usable as virtual registers.
- Tables
-
A table is an array of opaque values of a particular reference type. It allows programs to select such values indirectly through a dynamic index operand. Thereby, for example, a program can call functions indirectly through a dynamic index into a table. This allows emulating function pointers by way of table indices.
- Linear Memory
-
A linear memory is a contiguous, mutable array of raw bytes. Such a memory is created with an initial size but can be grown dynamically. A program can load and store values from/to a linear memory at any byte address (including unaligned). Integer loads and stores can specify a storage size which is smaller than the size of the respective value type. A trap occurs if an access is not within the bounds of the current memory size.
- Modules
-
A WebAssembly binary takes the form of a module that contains definitions for functions, tables, and linear memories, as well as mutable or immutable global variables. Definitions can also be imported, specifying a module/name pair and a suitable type. Each definition can optionally be exported under one or more names. In addition to definitions, modules can define initialization data for their memories or tables that takes the form of segments copied to given offsets. They can also define a start function that is automatically executed.
- Embedder
-
A WebAssembly implementation will typically be embedded into a host environment. This environment defines how loading of modules is initiated, how imports are provided (including host-side definitions), and how exports can be accessed. However, the details of any particular embedding are beyond the scope of this specification, and will instead be provided by complementary, environment-specific API definitions.
In practice, implementations need not maintain an actual operand stack. Instead, the stack can be viewed as a set of anonymous registers that are implicitly referenced by instructions. The type system ensures that the stack height, and thus any referenced register, is always known statically.
1.2.2. Semantic Phases
Conceptually, the semantics of WebAssembly is divided into three phases. For each part of the language, the specification specifies each of them.
- Decoding
-
WebAssembly modules are distributed in a binary format. Decoding processes that format and converts it into an internal representation of a module. In this specification, this representation is modelled by abstract syntax, but a real implementation could compile directly to machine code instead.
- Validation
-
A decoded module has to be valid. Validation checks a number of well-formedness conditions to guarantee that the module is meaningful and safe. In particular, it performs type checking of functions and the instruction sequences in their bodies, ensuring for example that the operand stack is used consistently.
- Execution
-
Finally, a valid module can be executed. Execution can be further divided into two phases:
Instantiation. A module instance is the dynamic representation of a module, complete with its own state and execution stack. Instantiation executes the module body itself, given definitions for all its imports. It initializes globals, memories and tables and invokes the module’s start function if defined. It returns the instances of the module’s exports.
Invocation. Once instantiated, further WebAssembly computations can be initiated by invoking an exported function on a module instance. Given the required arguments, that executes the respective function and returns its results.
Instantiation and invocation are operations within the embedding environment.
2. Structure
2.1. Conventions
WebAssembly is a programming language that has multiple concrete representations (its binary format and the text format). Both map to a common structure. For conciseness, this structure is described in the form of an abstract syntax. All parts of this specification are defined in terms of this abstract syntax.
2.1.1. Grammar Notation
The following conventions are adopted in defining grammar rules for abstract syntax.
-
Terminal symbols (atoms) are written in sans-serif font or in symbolic form: .
-
Nonterminal symbols are written in italic font: .
-
is a sequence of iterations of .
-
is a possibly empty sequence of iterations of . (This is a shorthand for used where is not relevant.)
-
is a non-empty sequence of iterations of . (This is a shorthand for where .)
-
is an optional occurrence of . (This is a shorthand for where .)
-
Productions are written .
-
Large productions may be split into multiple definitions, indicated by ending the first one with explicit ellipses, , and starting continuations with ellipses, .
-
Some productions are augmented with side conditions in parentheses, “”, that provide a shorthand for a combinatorial expansion of the production into many separate cases.
-
If the same meta variable or non-terminal symbol appears multiple times in a production, then all those occurrences must have the same instantiation. (This is a shorthand for a side condition requiring multiple different variables to be equal.)
2.1.2. Auxiliary Notation
When dealing with syntactic constructs the following notation is also used:
-
denotes the empty sequence.
-
denotes the length of a sequence .
-
denotes the -th element of a sequence , starting from .
-
denotes the sub-sequence of a sequence .
-
denotes the same sequence as , except that the -th element is replaced with .
-
denotes the same sequence as , except that the sub-sequence is replaced with .
-
denotes the flat sequence formed by concatenating all sequences in .
Moreover, the following conventions are employed:
-
The notation , where is a non-terminal symbol, is treated as a meta variable ranging over respective sequences of (similarly for , , ).
-
When given a sequence , then the occurrences of in a sequence written are assumed to be in point-wise correspondence with (similarly for , , ). This implicitly expresses a form of mapping syntactic constructions over a sequence.
Productions of the following form are interpreted as records that map a fixed set of fields to “values” , respectively:
The following notation is adopted for manipulating such records:
-
denotes the contents of the component of .
-
denotes the same record as , except that the contents of the component is replaced with .
-
denotes the composition of two records with the same fields of sequences by appending each sequence point-wise:
-
denotes the composition of a sequence of records, respectively; if the sequence is empty, then all fields of the resulting record are empty.
The update notation for sequences and records generalizes recursively to nested components accessed by “paths” :
-
is short for ,
-
is short for ,
where is shortened to .
2.1.3. Vectors
Vectors are bounded sequences of the form (or ), where the can either be values or complex constructions. A vector can have at most elements.
2.2. Values
WebAssembly programs operate on primitive numeric values. Moreover, in the definition of programs, immutable sequences of values occur to represent more complex data, such as text strings or other vectors.
2.2.1. Bytes
The simplest form of value are raw uninterpreted bytes. In the abstract syntax they are represented as hexadecimal literals.
2.2.1.1. Conventions
-
The meta variable ranges over bytes.
-
Bytes are sometimes interpreted as natural numbers .
2.2.2. Integers
Different classes of integers with different value ranges are distinguished by their bit width and by whether they are unsigned or signed.
The class defines uninterpreted integers, whose signedness interpretation can vary depending on context. In the abstract syntax, they are represented as unsigned values. However, some operations convert them to signed based on a two’s complement interpretation.
Note
The main integer types occurring in this specification are , , , , , , , . However, other sizes occur as auxiliary constructions, e.g., in the definition of floating-point numbers.
2.2.2.1. Conventions
-
The meta variables range over integers.
-
Numbers may be denoted by simple arithmetics, as in the grammar above. In order to distinguish arithmetics like from sequences like , the latter is distinguished with parentheses.
2.2.3. Floating-Point
Floating-point data represents 32 or 64 bit values that correspond to the respective binary formats of the [IEEE-754-2019] standard (Section 3.3).
Every value has a sign and a magnitude. Magnitudes can either be expressed as normal numbers of the form , where is the exponent and is the significand whose most significant bit is , or as a subnormal number where the exponent is fixed to the smallest possible value and is ; among the subnormals are positive and negative zero values. Since the significands are binary values, normals are represented in the form , where is the bit width of ; similarly for subnormals.
Possible magnitudes also include the special values (infinity) and (NaN, not a number). NaN values have a payload that describes the mantissa bits in the underlying binary representation. No distinction is made between signalling and quiet NaNs.
where and with
A canonical NaN is a floating-point value where is a payload whose most significant bit is while all others are :
An arithmetic NaN is a floating-point value with , such that the most significant bit is while all others are arbitrary.
Note
In the abstract syntax, subnormals are distinguished by the leading 0 of the significand. The exponent of subnormals has the same value as the smallest possible exponent of a normal number. Only in the binary representation the exponent of a subnormal is encoded differently than the exponent of any normal number.
The notion of canonical NaN defined here is unrelated to the notion of canonical NaN that the [IEEE-754-2019] standard (Section 3.5.2) defines for decimal interchange formats.
2.2.3.1. Conventions
-
The meta variable ranges over floating-point values where clear from context.
2.2.4. Vectors
Numeric vectors are 128-bit values that are processed by vector instructions (also known as SIMD instructions, single instruction multiple data). They are represented in the abstract syntax using . The interpretation of lane types (integer or floating-point numbers) and lane sizes are determined by the specific instruction operating on them.
2.2.5. Names
Names are sequences of characters, which are scalar values as defined by [UNICODE] (Section 2.4).
Due to the limitations of the binary format, the length of a name is bounded by the length of its UTF-8 encoding.
2.2.5.1. Convention
-
Characters (Unicode scalar values) are sometimes used interchangeably with natural numbers .
2.3. Types
Various entities in WebAssembly are classified by types. Types are checked during validation, instantiation, and possibly execution.
2.3.1. Number Types
Number types classify numeric values.
The types and classify 32 and 64 bit integers, respectively. Integers are not inherently signed or unsigned, their interpretation is determined by individual operations.
The types and classify 32 and 64 bit floating-point data, respectively. They correspond to the respective binary floating-point representations, also known as single and double precision, as defined by the [IEEE-754-2019] standard (Section 3.3).
Number types are transparent, meaning that their bit patterns can be observed. Values of number type can be stored in memories.
2.3.1.1. Conventions
-
The notation denotes the bit width of a number type . That is, and .
2.3.2. Vector Types
Vector types classify vectors of numeric values processed by vector instructions (also known as SIMD instructions, single instruction multiple data).
The type corresponds to a 128 bit vector of packed integer or floating-point data. The packed data can be interpreted as signed or unsigned integers, single or double precision floating-point values, or a single 128 bit type. The interpretation is determined by individual operations.
Vector types, like number types are transparent, meaning that their bit patterns can be observed. Values of vector type can be stored in memories.
2.3.2.1. Conventions
-
The notation for bit width extends to vector types as well, that is, .
2.3.3. Heap Types
Heap types classify objects in the runtime store. There are three disjoint hierarchies of heap types:
-
function types classify functions,
-
aggregate types classify dynamically allocated managed data, such as structures, arrays, or unboxed scalars,
-
external types classify external references possibly owned by the embedder.
The values from the latter two hierarchies are interconvertible by ways of the and instructions. That is, both type hierarchies are inhabited by an isomorphic set of values, but may have different, incompatible representations in practice.
A heap type is either abstract or concrete.
The abstract type denotes the common supertype of all function types, regardless of their concrete definition. Dually, the type denotes the common subtype of all function types, regardless of their concrete definition. This type has no values.
The abstract type denotes the common supertype of all external references received through the embedder. This type has no concrete subtypes. Dually, the type denotes the common subtype of all forms of external references. This type has no values.
The abstract type denotes the common supertype of all aggregate types, as well as possibly abstract values produced by internalizing an external reference of type . Dually, the type denotes the common subtype of all forms of aggregate types. This type has no values.
The abstract type is a subtype of that includes all types for which references can be compared, i.e., aggregate values and .
The abstract types and denote the common supertypes of all structure and array aggregates, respectively.
The abstract type denotes unboxed scalars, that is, integers injected into references. Their observable value range is limited to 31 bits.
Note
An is not actually allocated in the store, but represented in a way that allows them to be mixed with actual references into the store without ambiguity. Engines need to perform some form of pointer tagging to achieve this, which is why 1 bit is reserved.
Although the types , , and are not inhabited by any values, they can be used to form the types of all null references in their respective hierarchy. For example, is the generic type of a null reference compatible with all function reference types.
A concrete heap type consists of a type index and classifies an object of the respective type defined in a module.
The syntax of heap types is extended with additional forms for the purpose of specifying validation and execution.
2.3.4. Reference Types
Reference types classify values that are first-class references to objects in the runtime store.
A reference type is characterised by the heap type it points to.
In addition, a reference type of the form is nullable, meaning that it can either be a proper reference to or null. Other references are non-null.
Reference types are opaque, meaning that neither their size nor their bit pattern can be observed. Values of reference type can be stored in tables.
2.3.4.1. Conventions
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
-
The reference type is an abbreviation for .
2.3.5. Value Types
Value types classify the individual values that WebAssembly code can compute with and the values that a variable accepts. They are either number types, vector types, or reference types.
The syntax of value types is extended with additional forms for the purpose of specifying validation.
2.3.5.1. Conventions
-
The meta variable ranges over value types or subclasses thereof where clear from context.
2.3.6. Result Types
Result types classify the result of executing instructions or functions, which is a sequence of values, written with brackets.
2.3.7. Function Types
Function types classify the signature of functions, mapping a vector of parameters to a vector of results. They are also used to classify the inputs and outputs of instructions.
2.3.8. Aggregate Types
Aggregate types describe compound objects consisting of multiple values. These are either structures or arrays, which both consist of a list of possibly mutable and possibly packed fields. Structures are heterogeneous, but require static indexing, while arrays need to be homogeneous, but allow dynamic indexing.
2.3.8.1. Conventions
-
The notation for bit width extends to packed types as well, that is, and .
-
The auxiliary function maps a storage type to the value type obtained when accessing a field:
2.3.9. Composite Types
Composite types are all types composed from simpler types, including function types and aggregate types.
2.3.10. Recursive Types
Recursive types denote a group of mutually recursive composite types, each of which can optionally declare a list of type indices of supertypes that it matches. Each type can also be declared final, preventing further subtyping.
In a module, each member of a recursive type is assigned a separate type index.
The syntax of sub types is generalized for the purpose of specifying validation and execution.
2.3.11. Limits
Limits classify the size range of resizeable storage associated with memory types and table types.
If no maximum is given, the respective storage can grow to any size.
2.3.12. Memory Types
Memory types classify linear memories and their size range.
The limits constrain the minimum and optionally the maximum size of a memory. The limits are given in units of page size.
2.3.13. Table Types
Table types classify tables over elements of reference type within a size range.
Like memories, tables are constrained by limits for their minimum and optionally maximum size. The limits are given in numbers of entries.
2.3.14. Global Types
Global types classify global variables, which hold a value and can either be mutable or immutable.
2.3.15. External Types
External types classify imports and external values with their respective types.
2.3.15.1. Conventions
The following auxiliary notation is defined for sequences of external types. It filters out entries of a specific kind in an order-preserving fashion:
2.4. Instructions
WebAssembly code consists of sequences of instructions. Its computational model is based on a stack machine in that instructions manipulate values on an implicit operand stack, consuming (popping) argument values and producing or returning (pushing) result values.
In addition to dynamic operands from the stack, some instructions also have static immediate arguments, typically indices or type annotations, which are part of the instruction itself.
Some instructions are structured in that they bracket nested sequences of instructions.
The following sections group instructions into a number of different categories.
2.4.1. Numeric Instructions
Numeric instructions provide basic operations over numeric values of specific type. These operations closely match respective operations available in hardware.
Numeric instructions are divided by number type. For each type, several subcategories can be distinguished:
-
Constants: return a static constant.
-
Unary Operations: consume one operand and produce one result of the respective type.
-
Binary Operations: consume two operands and produce one result of the respective type.
-
Tests: consume one operand of the respective type and produce a Boolean integer result.
-
Comparisons: consume two operands of the respective type and produce a Boolean integer result.
-
Conversions: consume a value of one type and produce a result of another (the source type of the conversion is the one after the “”).
Some integer instructions come in two flavors, where a signedness annotation distinguishes whether the operands are to be interpreted as unsigned or signed integers. For the other integer instructions, the use of two’s complement for the signed interpretation means that they behave the same regardless of signedness.
2.4.1.1. Conventions
Occasionally, it is convenient to group operators together according to the following grammar shorthands:
2.4.2. Vector Instructions
Vector instructions (also known as SIMD instructions, single instruction multiple data) provide basic operations over values of vector type.
Vector instructions have a naming convention involving a prefix that determines how their operands will be interpreted. This prefix describes the shape of the operand, written , and consisting of a packed numeric type and the number of lanes of that type. Operations are performed point-wise on the values of each lane.
Note
For example, the shape interprets the operand as four values, packed into an . The bit width of the numeric type times always is 128.
Instructions prefixed with do not involve a specific interpretation, and treat the as an value or a vector of 128 individual bits.
Vector instructions can be grouped into several subcategories:
-
Constants: return a static constant.
-
Unary Operations: consume one operand and produce one result.
-
Binary Operations: consume two operands and produce one result.
-
Ternary Operations: consume three operands and produce one result.
-
Tests: consume one operand and produce a Boolean integer result.
-
Shifts: consume a operand and a operand, producing one result.
-
Splats: consume a value of numeric type and produce a result of a specified shape.
-
Extract lanes: consume a operand and return the numeric value in a given lane.
-
Replace lanes: consume a operand and a numeric value for a given lane, and produce a result.
Some vector instructions have a signedness annotation which distinguishes whether the elements in the operands are to be interpreted as unsigned or signed integers. For the other vector instructions, the use of two’s complement for the signed interpretation means that they behave the same regardless of signedness.
2.4.2.1. Conventions
Occasionally, it is convenient to group operators together according to the following grammar shorthands:
2.4.3. Reference Instructions
Instructions in this group are concerned with accessing references.
The and instructions produce a null value or a reference to a given function, respectively.
The instruction checks for null, while converts a nullable to a non-null one, and traps if it encounters null.
The compares two references.
The instructions and test the dynamic type of a reference operand. The former merely returns the result of the test, while the latter performs a downcast and traps if the operand’s type does not match.
Note
The and instructions provides versions of the latter that branch depending on the success of the downcast instead of trapping.
2.4.4. Aggregate Instructions
Instructions in this group are concerned with creating and accessing references to aggregate types.
The instructions and allocate a new structure, initializing them either with operands or with default values. The remaining instructions on structs access individual fields, allowing for different sign extension modes in the case of packed storage types.
Similarly, arrays can be allocated either with an explicit initialization operand or a default value. Furthermore, allocates an array with statically fixed size, and and allocate an array and initialize it from a data or element segment, respectively. , , , and access individual slots, again allowing for different sign extension modes in the case of a packed storage type. produces the length of an array. fills a specified slice of an array with a given value and , , and copy elements to a specified slice of an array from a given array, data segment, or element segment, respectively.
The instructions and convert between type and an unboxed scalar.
The instructions and allow lossless conversion between references represented as type .
2.4.5. Parametric Instructions
Instructions in this group can operate on operands of any value type.
The instruction simply throws away a single operand.
The instruction selects one of its first two operands based on whether its third operand is zero or not. It may include a value type determining the type of these operands. If missing, the operands must be of numeric type.
Note
In future versions of WebAssembly, the type annotation on may allow for more than a single value being selected at the same time.
2.4.6. Variable Instructions
Variable instructions are concerned with access to local or global variables.
These instructions get or set the values of variables, respectively. The instruction is like but also returns its argument.
2.4.7. Table Instructions
Instructions in this group are concerned with tables table.
The and instructions load or store an element in a table, respectively.
The instruction returns the current size of a table. The instruction grows table by a given delta and returns the previous size, or if enough space cannot be allocated. It also takes an initialization value for the newly allocated entries.
The instruction sets all entries in a range to a given value.
The instruction copies elements from a source table region to a possibly overlapping destination region; the first index denotes the destination. The instruction copies elements from a passive element segment into a table. The instruction prevents further use of a passive element segment. This instruction is intended to be used as an optimization hint. After an element segment is dropped its elements can no longer be retrieved, so the memory used by this segment may be freed.
An additional instruction that accesses a table is the control instruction .
2.4.8. Memory Instructions
Instructions in this group are concerned with linear memory.
Memory is accessed with and instructions for the different number types. They all take a memory immediate that contains an address offset and the expected alignment (expressed as the exponent of a power of 2). Integer loads and stores can optionally specify a storage size that is smaller than the bit width of the respective value type. In the case of loads, a sign extension mode is then required to select appropriate behavior.
Vector loads can specify a shape that is half the bit width of . Each lane is half its usual size, and the sign extension mode then specifies how the smaller lane is extended to the larger lane. Alternatively, vector loads can perform a splat, such that only a single lane of the specified storage size is loaded, and the result is duplicated to all lanes.
The static address offset is added to the dynamic address operand, yielding a 33 bit effective address that is the zero-based index at which the memory is accessed. All values are read and written in little endian byte order. A trap results if any of the accessed memory bytes lies outside the address range implied by the memory’s current size.
Note
Future versions of WebAssembly might provide memory instructions with 64 bit address ranges.
The instruction returns the current size of a memory. The instruction grows memory by a given delta and returns the previous size, or if enough memory cannot be allocated. Both instructions operate in units of page size.
The instruction sets all values in a region to a given byte. The instruction copies data from a source memory region to a possibly overlapping destination region. The instruction copies data from a passive data segment into a memory. The instruction prevents further use of a passive data segment. This instruction is intended to be used as an optimization hint. After a data segment is dropped its data can no longer be retrieved, so the memory used by this segment may be freed.
2.4.9. Control Instructions
Instructions in this group affect the flow of control.
The instruction does nothing.
The instruction causes an unconditional trap.
The , and instructions are structured instructions. They bracket nested sequences of instructions, called blocks, terminated with, or separated by, or pseudo-instructions. As the grammar prescribes, they must be well-nested.
A structured instruction can consume input and produce output on the operand stack according to its annotated block type. It is given either as a type index that refers to a suitable function type reinterpreted as an instruction type, or as an optional value type inline, which is a shorthand for the instruction type .
Each structured control instruction introduces an implicit label. Labels are targets for branch instructions that reference them with label indices. Unlike with other index spaces, indexing of labels is relative by nesting depth, that is, label refers to the innermost structured control instruction enclosing the referring branch instruction, while increasing indices refer to those farther out. Consequently, labels can only be referenced from within the associated structured control instruction. This also implies that branches can only be directed outwards, “breaking” from the block of the control construct they target. The exact effect depends on that control construct. In case of or it is a forward jump, resuming execution after the matching . In case of it is a backward jump to the beginning of the loop.
Note
This enforces structured control flow. Intuitively, a branch targeting a or behaves like a statement in most C-like languages, while a branch targeting a behaves like a statement.
Branch instructions come in several flavors: performs an unconditional branch, performs a conditional branch, and performs an indirect branch through an operand indexing into the label vector that is an immediate to the instruction, or to a default target if the operand is out of bounds. The and instructions check whether a reference operand is null and branch if that is the case or not the case, respectively. Similarly, and attempt a downcast on a reference operand and branch if that succeeds, or fails, respectively.
The instruction is a shortcut for an unconditional branch to the outermost block, which implicitly is the body of the current function. Taking a branch unwinds the operand stack up to the height where the targeted structured control instruction was entered. However, branches may additionally consume operands themselves, which they push back on the operand stack after unwinding. Forward branches require operands according to the output of the targeted block’s type, i.e., represent the values produced by the terminated block. Backward branches require operands according to the input of the targeted block’s type, i.e., represent the values consumed by the restarted block.
The instruction invokes another function, consuming the necessary arguments from the stack and returning the result values of the call. The instruction invokes a function indirectly through a function reference operand. The instruction calls a function indirectly through an operand indexing into a table that is denoted by a table index and must contain function references. Since it may contain functions of heterogeneous type, the callee is dynamically checked against the function type indexed by the instruction’s second immediate, and the call is aborted with a trap if it does not match.
The , , and instructions are tail-call variants of the previous ones. That is, they first return from the current function before actually performing the respective call. It is guaranteed that no sequence of nested calls using only these instructions can cause resource exhaustion due to hitting an implementation’s limit on the number of active calls.
2.4.10. Expressions
Function bodies, initialization values for globals, elements and offsets of element segments, and offsets of data segments are given as expressions, which are sequences of instructions terminated by an marker.
In some places, validation restricts expressions to be constant, which limits the set of allowable instructions.
2.5. Modules
WebAssembly programs are organized into modules, which are the unit of deployment, loading, and compilation. A module collects definitions for types, functions, tables, memories, and globals. In addition, it can declare imports and exports and provide initialization in the form of data and element segments, or a start function.
Each of the vectors – and thus the entire module – may be empty.
2.5.1. Indices
Definitions are referenced with zero-based indices. Each class of definition has its own index space, as distinguished by the following classes.
The index space for functions, tables, memories and globals includes respective imports declared in the same module. The indices of these imports precede the indices of other definitions in the same index space.
Element indices reference element segments and data indices reference data segments.
The index space for locals is only accessible inside a function and includes the parameters of that function, which precede the local variables.
Label indices reference structured control instructions inside an instruction sequence.
Each aggregate type provides an index space for its fields.
2.5.1.1. Conventions
-
The meta variable ranges over label indices.
-
The meta variables range over indices in any of the other index spaces.
-
The notation denotes the set of indices from index space occurring free in . Sometimes this set is reinterpreted as the vector of its elements.
Note
For example, if is , then , or equivalently, the vector .
2.5.2. Types
The component of a module defines a vector of recursive types, each of consisting of a list of sub types referenced by individual type indices. All function or aggregate types used in a module must be defined in this component.
2.5.3. Functions
The component of a module defines a vector of functions with the following structure:
The of a function declares its signature by reference to a type defined in the module. The parameters of the function are referenced through 0-based local indices in the function’s body; they are mutable.
The declare a vector of mutable local variables and their types. These variables are referenced through local indices in the function’s body. The index of the first local is the smallest index not referencing a parameter.
The is an instruction sequence that upon termination must produce a stack matching the function type’s result type.
Functions are referenced through function indices, starting with the smallest index not referencing a function import.
2.5.4. Tables
The component of a module defines a vector of tables described by their table type:
A table is an array of opaque values of a particular reference type. Moreover, each table slot is initialized with the value given by a constant initializer expression. Tables can further be initialized through element segments.
The size in the limits of the table type specifies the initial size of that table, while its , if present, restricts the size to which it can grow later.
Tables are referenced through table indices, starting with the smallest index not referencing a table import. Most constructs implicitly reference table index .
2.5.5. Memories
The component of a module defines a vector of linear memories (or memories for short) as described by their memory type:
A memory is a vector of raw uninterpreted bytes. The size in the limits of the memory type specifies the initial size of that memory, while its , if present, restricts the size to which it can grow later. Both are in units of page size.
Memories can be initialized through data segments.
Memories are referenced through memory indices, starting with the smallest index not referencing a memory import. Most constructs implicitly reference memory index .
Note
In the current version of WebAssembly, at most one memory may be defined or imported in a single module, and all constructs implicitly reference this memory . This restriction may be lifted in future versions.
2.5.6. Globals
The component of a module defines a vector of global variables (or globals for short):
Each global stores a single value of the given global type. Its also specifies whether a global is immutable or mutable. Moreover, each global is initialized with an value given by a constant initializer expression.
Globals are referenced through global indices, starting with the smallest index not referencing a global import.
2.5.7. Element Segments
The initial contents of a table is uninitialized. Element segments can be used to initialize a subrange of a table from a static vector of elements.
The component of a module defines a vector of element segments. Each element segment defines a reference type and a corresponding list of constant element expressions.
Element segments have a mode that identifies them as either passive, active, or declarative. A passive element segment’s elements can be copied to a table using the instruction. An active element segment copies its elements into a table during instantiation, as specified by a table index and a constant expression defining an offset into that table. A declarative element segment is not available at runtime but merely serves to forward-declare references that are formed in code with instructions like .
The is given by a constant expression.
Element segments are referenced through element indices.
2.5.8. Data Segments
The initial contents of a memory are zero bytes. Data segments can be used to initialize a range of memory from a static vector of bytes.
The component of a module defines a vector of data segments.
Like element segments, data segments have a mode that identifies them as either passive or active. A passive data segment’s contents can be copied into a memory using the instruction. An active data segment copies its contents into a memory during instantiation, as specified by a memory index and a constant expression defining an offset into that memory.
Data segments are referenced through data indices.
Note
In the current version of WebAssembly, at most one memory is allowed in a module. Consequently, the only valid is .
2.5.9. Start Function
The component of a module declares the function index of a start function that is automatically invoked when the module is instantiated, after tables and memories have been initialized.
Note
The start function is intended for initializing the state of a module. The module and its exports are not accessible externally before this initialization has completed.
2.5.10. Exports
The component of a module defines a set of exports that become accessible to the host environment once the module has been instantiated.
Each export is labeled by a unique name. Exportable definitions are functions, tables, memories, and globals, which are referenced through a respective descriptor.
2.5.10.1. Conventions
The following auxiliary notation is defined for sequences of exports, filtering out indices of a specific kind in an order-preserving fashion:
2.5.11. Imports
The component of a module defines a set of imports that are required for instantiation.
Each import is labeled by a two-level name space, consisting of a name and a for an entity within that module. Importable definitions are functions, tables, memories, and globals. Each import is specified by a descriptor with a respective type that a definition provided during instantiation is required to match.
Every import defines an index in the respective index space. In each index space, the indices of imports go before the first index of any definition contained in the module itself.
Note
Unlike export names, import names are not necessarily unique. It is possible to import the same / pair multiple times; such imports may even have different type descriptions, including different kinds of entities. A module with such imports can still be instantiated depending on the specifics of how an embedder allows resolving and supplying imports. However, embedders are not required to support such overloading, and a WebAssembly module itself cannot implement an overloaded name.
3. Validation
3.1. Conventions
Validation checks that a WebAssembly module is well-formed. Only valid modules can be instantiated.
Validity is defined by a type system over the abstract syntax of a module and its contents. For each piece of abstract syntax, there is a typing rule that specifies the constraints that apply to it. All rules are given in two equivalent forms:
-
In prose, describing the meaning in intuitive form.
-
In formal notation, describing the rule in mathematical form. [1]
Note
The prose and formal rules are equivalent, so that understanding of the formal notation is not required to read this specification. The formalism offers a more concise description in notation that is used widely in programming languages semantics and is readily amenable to mathematical proof.
In both cases, the rules are formulated in a declarative manner. That is, they only formulate the constraints, they do not define an algorithm. The skeleton of a sound and complete algorithm for type-checking instruction sequences according to this specification is provided in the appendix.
3.1.1. Types
To define the semantics, the definition of some sorts of types is extended to include additional forms. By virtue of not being representable in either the binary format or the text format, these forms cannot be used in a program; they only occur during validation or execution.
The unique value type is a bottom type that matches all value types. Similarly, is also used as a bottom type of all heap types.
Note
No validation rule uses bottom types explicitly, but various rules can pick any value or heap type, including bottom. This ensures the existence of principal types, and thus a validation algorithm without back tracking.
A concrete heap type can consist of a defined type directly. this occurs as the result of substituting a type index with its definition.
A concrete heap type may also be a recursive type index. Such an index refers to the -th component of a surrounding recursive type. It occurs as the result of rolling up the definition of a recursive type.
Finally, the representation of supertypes in a sub type is generalized from mere type indices to heap types. They occur as defined types or recursive type indices after substituting type indices or rolling up recursive types.
Note
It is an invariant of the semantics that sub types occur only in one of two forms: either as “syntactic” types as in a source module, where all supertypes are type indices, or as “semantic” types, where all supertypes are resolved to either defined types or recursive type indices.
A type of any form is closed when it does not contain a heap type that is a type index or a recursive type index without a surrounding recursive type, i.e., all type indices have been substituted with their defined type and all free recursive type indices have been unrolled.
Note
Recursive type indices are internal to a recursive type. They are distinguished from regular type indices and represented such that two closed types are syntactically equal if and only if they have the same recursive structure.
3.1.1.1. Convention
-
The difference between two reference types is defined as follows:
Note
This definition computes an approximation of the reference type that is inhabited by all values from except those from . Since the type system does not have general union types, the defnition only affects the presence of null and cannot express the absence of other values.
3.1.2. Defined Types
Defined types denote the individual types defined in a module. Each such type is represented as a projection from the recursive type group it originates from, indexed by its position in that group.
Defined types do not occur in the binary or text format, but are formed by rolling up the recursive types defined in a module.
It is hence an invariant of the semantics that all recursive types occurring in defined types are rolled up.
3.1.2.1. Conventions
-
denotes the parallel substitution of type indices with defined types in type , provided .
-
denotes the parallel substitution of recursive type indices with defined types in type , provided .
-
is shorthand for the substitution , where .
3.1.3. Rolling and Unrolling
In order to allow comparing recursive types for equivalence, their representation is changed such that all type indices internal to the same recursive type are replaced by recursive type indices.
Note
This representation is independent of the type index space, so that it is meaningful across module boundaries. Moreover, this representation ensures that types with equivalent recursive structure are also syntactically equal, hence allowing a simple equality check on (closed) types. It gives rise to an iso-recursive interpretation of types.
The representation change is performed by two auxiliary operations on the syntax of recursive types:
-
Rolling up a recursive type substitutes its internal type indices with corresponding recursive type indices.
-
Unrolling a recursive type substitutes its recursive type indices with the corresponding defined types.
These operations are extended to defined types and defined as follows:
In addition, the following auxiliary function denotes the expansion of a defined type:
3.1.4. Instruction Types
Instruction types classify the behaviour of instructions or instruction sequences, by describing how they manipulate the operand stack and the initialization status of locals:
An instruction type describes the required input stack with argument values of types that an instruction pops off and the provided output stack with result values of types that it pushes back. Moreover, it enumerates the indices of locals that have been set by the instruction or sequence.
Note
Instruction types are only used for validation, they do not occur in programs.
3.1.5. Local Types
Local types classify locals, by describing their value type as well as their initialization status:
Note
Local types are only used for validation, they do not occur in programs.
3.1.6. Contexts
Validity of an individual definition is specified relative to a context, which collects relevant information about the surrounding module and the definitions in scope:
-
Types: the list of types defined in the current module.
-
Functions: the list of functions declared in the current module, represented by a defined type that expands to their function type.
-
Tables: the list of tables declared in the current module, represented by their table type.
-
Memories: the list of memories declared in the current module, represented by their memory type.
-
Globals: the list of globals declared in the current module, represented by their global type.
-
Element Segments: the list of element segments declared in the current module, represented by the elements’ reference type.
-
Data Segments: the list of data segments declared in the current module, each represented by an entry.
-
Locals: the list of locals declared in the current function (including parameters), represented by their local type.
-
Labels: the stack of labels accessible from the current position, represented by their result type.
-
Return: the return type of the current function, represented as an optional result type that is absent when no return is allowed, as in free-standing expressions.
-
References: the list of function indices that occur in the module outside functions and can hence be used to form references inside them.
In other words, a context contains a sequence of suitable types for each index space, describing each defined entry in that space. Locals, labels and return type are only used for validating instructions in function bodies, and are left empty elsewhere. The label stack is the only part of the context that changes as validation of an instruction sequence proceeds.
More concretely, contexts are defined as records with abstract syntax:
In addition to field access written the following notation is adopted for manipulating contexts:
-
When spelling out a context, empty fields are omitted.
-
denotes the same context as but with the elements prepended to its component sequence.
Note
Indexing notation like is used to look up indices in their respective index space in the context. Context extension notation is primarily used to locally extend relative index spaces, such as label indices. Accordingly, the notation is defined to append at the front of the respective sequence, introducing a new relative index and shifting the existing ones.
3.1.6.1. Convention
Any form of type can be closed to bring it into closed form relative to a context it is valid in by substituting each type index occurring in it with the corresponding defined type , after first closing the types in themselves.
3.1.7. Prose Notation
Validation is specified by stylised rules for each relevant part of the abstract syntax. The rules not only state constraints defining when a phrase is valid, they also classify it with a type. The following conventions are adopted in stating these rules.
-
A phrase is said to be “valid with type ” if and only if all constraints expressed by the respective rules are met. The form of depends on what is.
Note
For example, if is a function, then is a function type; for an that is a global, is a global type; and so on.
-
The rules implicitly assume a given context .
-
In some places, this context is locally extended to a context with additional entries. The formulation “Under context , … statement …” is adopted to express that the following statement must apply under the assumptions embodied in the extended context.
3.1.8. Formal Notation
Note
This section gives a brief explanation of the notation for specifying typing rules formally. For the interested reader, a more thorough introduction can be found in respective text books. [2]
The proposition that a phrase has a respective type is written . In general, however, typing is dependent on a context . To express this explicitly, the complete form is a judgement , which says that holds under the assumptions encoded in .
The formal typing rules use a standard approach for specifying type systems, rendering them into deduction rules. Every rule has the following general form:
Such a rule is read as a big implication: if all premises hold, then the conclusion holds. Some rules have no premises; they are axioms whose conclusion holds unconditionally. The conclusion always is a judgment , and there is one respective rule for each relevant construct of the abstract syntax.
Note
For example, the typing rule for the instruction can be given as an axiom:
The instruction is always valid with type (saying that it consumes two values and produces one), independent of any side conditions.
An instruction like can be typed as follows:
Here, the premise enforces that the immediate global index exists in the context. The instruction produces a value of its respective type (and does not consume any values). If does not exist then the premise does not hold, and the instruction is ill-typed.
Finally, a structured instruction requires a recursive rule, where the premise is itself a typing judgement:
A instruction is only valid when the instruction sequence in its body is. Moreover, the result type must match the block’s annotation . If so, then the instruction has the same type as the body. Inside the body an additional label of the corresponding result type is available, which is expressed by extending the context with the additional label information for the premise.
The semantics is derived from the following article: Andreas Haas, Andreas Rossberg, Derek Schuff, Ben Titzer, Dan Gohman, Luke Wagner, Alon Zakai, JF Bastien, Michael Holman. Bringing the Web up to Speed with WebAssembly. Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM 2017.
For example: Benjamin Pierce. Types and Programming Languages. The MIT Press 2002
3.2. Types
Simple types, such as number types are universally valid. However, restrictions apply to most other types, such as reference types, function types, as well as the limits of table types and memory types, which must be checked during validation.
Moreover, block types are converted to plain function types for ease of processing.
3.2.1. Number Types
Number types are always valid.
3.2.2. Vector Types
Vector types are always valid.
3.2.3. Heap Types
Concrete heap types are only valid when the type index is, while abstract ones are vacuously valid.
3.2.3.1.
-
The heap type is valid.
3.2.3.2.
-
The type must be defined in the context.
-
Then the heap type is valid.
3.2.4. Reference Types
Reference types are valid when the referenced heap type is.
3.2.4.1.
-
The heap type must be valid.
-
Then the reference type is valid.
3.2.5. Value Types
Valid value types are either valid number types, valid vector types, or valid reference types.
3.2.6. Block Types
Block types may be expressed in one of two forms, both of which are converted to instruction types by the following rules.
3.2.6.1.
-
The type must be defined in the context.
-
The expansion of must be a function type .
-
Then the block type is valid as instruction type .
3.2.6.2.
-
The value type must either be absent, or valid.
-
Then the block type is valid as instruction type .
3.2.7. Result Types
3.2.7.1.
-
Each value type in the type sequence must be valid.
-
Then the result type is valid.
3.2.8. Instruction Types
3.2.8.1.
-
The result type must be valid.
-
The result type must be valid.
-
Each local index in must be defined in the context.
-
Then the instruction type is valid.
3.2.9. Function Types
3.2.9.1.
-
The result type must be valid.
-
The result type must be valid.
-
Then the function type is valid.
3.2.10. Composite Types
3.2.10.1.
-
The function type must be valid.
-
Then the composite type is valid.
3.2.10.2.
-
For each field type in :
-
The field type must be valid.
-
-
Then the composite type is valid.
3.2.10.3.
-
The field type must be valid.
-
Then the composite type is valid.
3.2.11. Field Types
3.2.11.1.
-
The storage type must be valid.
-
Then the field type is valid.
3.2.11.2.
-
The packed type is valid.
3.2.12. Recursive Types
Recursive types are validated for a specific type index that denotes the index of the type defined by the recursive group.
3.2.12.1.
-
Either the sequence is empty.
-
Or:
-
The first sub type of the sequence must be valid for the type index .
-
The remaining sequence must be valid for the type index .
-
-
Then the recursive type is valid for the type index .
3.2.12.2.
-
The composite type must be valid.
-
The sequence may be no longer than .
-
For every type index in :
-
The type index must be smaller than .
-
The type index must exist in the context .
-
Let be the unrolling of the defined type .
-
The sub type must not contain .
-
Let be the composite type in .
-
The composite type must match .
-
-
Then the sub type is valid for the type index .
Note
The side condition on the index ensures that a declared supertype is a previously defined types, preventing cyclic subtype hierarchies.
Future versions of WebAssembly may allow more than one supertype.
3.2.13. Defined Types
3.2.13.1.
-
The recursive type must be valid for some type index .
-
Let be the defined type .
-
The number must be smaller than the length of the sequence of sub types.
-
Then the defined type is valid.
3.2.14. Limits
Limits must have meaningful bounds that are within a given range.
3.2.14.1.
-
The value of must not be larger than .
-
If the maximum is not empty, then:
-
Its value must not be larger than .
-
Its value must not be smaller than .
-
-
Then the limit is valid within range .
3.2.15. Table Types
3.2.15.1.
-
The limits must be valid within range .
-
The reference type must be valid.
-
Then the table type is valid.
3.2.16. Memory Types
3.2.16.1.
-
The limits must be valid within range .
-
Then the memory type is valid.
3.2.17. Global Types
3.2.17.1.
-
The value type must be valid.
-
Then the global type is valid.