Numerics

Numeric primitives are defined in a generic manner, by operators indexed over a bit width N.

Some operators are non-deterministic, because they can return one of several possible results (such as different NaN values). Technically, each operator thus returns a set of allowed values. For convenience, deterministic results are expressed as plain values, which are assumed to be identified with a respective singleton set.

Some operators are partial, because they are not defined on certain inputs. Technically, an empty set of results is returned for these inputs.

In formal notation, each operator is defined by equational clauses that apply in decreasing order of precedence. That is, the first clause that is applicable to the given arguments defines the result. In some cases, similar clauses are combined into one by using the notation ± or . When several of these placeholders occur in a single clause, then they must be resolved consistently: either the upper sign is chosen for all of them or the lower sign.

Note

For example, the fcopysign operator is defined as follows:

fcopysignN(±p1,±p2)=±p1fcopysignN(±p1,p2)=p1

This definition is to be read as a shorthand for the following expansion of each clause into two separate ones:

fcopysignN(+p1,+p2)=+p1fcopysignN(p1,p2)=p1fcopysignN(+p1,p2)=p1fcopysignN(p1,+p2)=+p1

Numeric operators are lifted to input sequences by applying the operator element-wise, returning a sequence of results. When there are multiple inputs, they must be of equal length.

op(c1n,,ckn)=op(c1n[0],,ckn[0])  op(c1n[n1],,ckn[n1])

Note

For example, the unary operator fabs, when given a sequence of floating-point values, return a sequence of floating-point results:

fabsN(zn)=fabsN(z[0])  fabsN(z[n])

The binary operator iadd, when given two sequences of integers of the same length, n, return a sequence of integer results:

iaddN(i1n,i2n)=iaddN(i1[0],i2[0])  iaddN(i1[n],i2[n])

Conventions:

  • The meta variable d is used to range over single bits.

  • The meta variable p is used to range over (signless) magnitudes of floating-point values, including nan and .

  • The meta variable q is used to range over (signless) rational magnitudes, excluding nan or .

  • The notation f1 denotes the inverse of a bijective function f.

  • Truncation of rational values is written trunc(±q), with the usual mathematical definition:

    trunc(±q)=±i(ifiN+q1<i+q)
  • Saturation of integers is written sat_uN(i) and sat_sN(i). The arguments to these two functions range over arbitrary signed integers.

    • Unsigned saturation, sat_uN(i) clamps i to between 0 and 2N1:

      sat_uN(i)=2N1(ifi>2N1)sat_uN(i)=0(ifi<0)sat_uN(i)=i(otherwise)
    • Signed saturation, sat_sN(i) clamps i to between 2N1 and 2N11:

    sat_sN(i)=signedN1(2N1)(ifi<2N1)sat_sN(i)=signedN1(2N11)(ifi>2N11)sat_sN(i)=i(otherwise)

Representations

Numbers and numeric vectors have an underlying binary representation as a sequence of bits:

bitsiN(i)=ibitsN(i)bitsfN(z)=fbitsN(z)bitsvN(i)=ibitsN(i)

The first case of these applies to representations of both integer value types and packed types.

Each of these functions is a bijection, hence they are invertible.

Integers

Integers are represented as base two unsigned numbers:

ibitsN(i)=dN1  d0(i=2N1dN1++20d0)

Boolean operators like , , or are lifted to bit sequences of equal length by applying them pointwise.

Floating-Point

Floating-point values are represented in the respective binary format defined by IEEE 754 (Section 3.4):

fbitsN(±(1+m2M)2e)=fsign(±) ibitsE(e+fbiasN) ibitsM(m)fbitsN(±(0+m2M)2e)=fsign(±) (0)E ibitsM(m)fbitsN(±)=fsign(±) (1)E (0)MfbitsN(±nan(n))=fsign(±) (1)E ibitsM(n)fbiasN=2E11fsign(+)=0fsign()=1

where M=signif(N) and E=expon(N).

Vectors

Numeric vectors of type vN have the same underlying representation as an iN. They can also be interpreted as a sequence of numeric values packed into a vN with a particular shape txM, provided that N=|t|M.

lanestxM(c)=c0  cM1(wherew=|t|/8b=bytesiN(c)ci=bytest1(b[iw:w]))

This function is a bijection on iN, hence it is invertible.

Storage

When a number is stored into memory, it is converted into a sequence of bytes in little endian byte order:

bytest(i)=littleendian(bitst(i))littleendian(ϵ)=ϵlittleendian(d8 d )=littleendian(d) ibits81(d8)

Again these functions are invertible bijections.

Integer Operations

Sign Interpretation

Integer operators are defined on iN values. Operators that use a signed interpretation convert the value using the following definition, which takes the two’s complement when the value lies in the upper half of the value range (i.e., its most significant bit is 1):

signedN(i)=i(0i<2N1)signedN(i)=i2N(2N1i<2N)

This function is bijective, and hence invertible.

Boolean Interpretation

The integer result of predicates – i.e., tests and relational operators – is defined with the help of the following auxiliary function producing the value 1 or 0 depending on a condition.

bool(C)=1(ifC)bool(C)=0(otherwise)

iaddN(i1,i2)

  • Return the result of adding i1 and i2 modulo 2N.

iaddN(i1,i2)=(i1+i2)mod2N

isubN(i1,i2)

  • Return the result of subtracting i2 from i1 modulo 2N.

isubN(i1,i2)=(i1i2+2N)mod2N

imulN(i1,i2)

  • Return the result of multiplying i1 and i2 modulo 2N.

imulN(i1,i2)=(i1i2)mod2N

idiv_uN(i1,i2)

  • If i2 is 0, then the result is undefined.

  • Else, return the result of dividing i1 by i2, truncated toward zero.

idiv_uN(i1,0)={}idiv_uN(i1,i2)=trunc(i1/i2)

Note

This operator is partial.

idiv_sN(i1,i2)

  • Let j1 be the signed interpretation of i1.

  • Let j2 be the signed interpretation of i2.

  • If j2 is 0, then the result is undefined.

  • Else if j1 divided by j2 is 2N1, then the result is undefined.

  • Else, return the result of dividing j1 by j2, truncated toward zero.

idiv_sN(i1,0)={}idiv_sN(i1,i2)={}(ifsignedN(i1)/signedN(i2)=2N1)idiv_sN(i1,i2)=signedN1(trunc(signedN(i1)/signedN(i2)))

Note

This operator is partial. Besides division by 0, the result of (2N1)/(1)=+2N1 is not representable as an N-bit signed integer.

irem_uN(i1,i2)

  • If i2 is 0, then the result is undefined.

  • Else, return the remainder of dividing i1 by i2.

irem_uN(i1,0)={}irem_uN(i1,i2)=i1i2trunc(i1/i2)

Note

This operator is partial.

As long as both operators are defined, it holds that i1=i2idiv_u(i1,i2)+irem_u(i1,i2).

irem_sN(i1,i2)

  • Let j1 be the signed interpretation of i1.

  • Let j2 be the signed interpretation of i2.

  • If i2 is 0, then the result is undefined.

  • Else, return the remainder of dividing j1 by j2, with the sign of the dividend j1.

irem_sN(i1,0)={}irem_sN(i1,i2)=signedN1(j1j2trunc(j1/j2))(wherej1=signedN(i1)j2=signedN(i2))

Note

This operator is partial.

As long as both operators are defined, it holds that i1=i2idiv_s(i1,i2)+irem_s(i1,i2).

inotN(i)

  • Return the bitwise negation of i.

inotN(i)=ibitsN1(ibitsN(i)ibitsN(2N1))

iandN(i1,i2)

  • Return the bitwise conjunction of i1 and i2.

iandN(i1,i2)=ibitsN1(ibitsN(i1)ibitsN(i2))

iandnotN(i1,i2)

  • Return the bitwise conjunction of i1 and the bitwise negation of i2.

iandnotN(i1,i2)=iandN(i1,inotN(i2))

iorN(i1,i2)

  • Return the bitwise disjunction of i1 and i2.

iorN(i1,i2)=ibitsN1(ibitsN(i1)ibitsN(i2))

ixorN(i1,i2)

  • Return the bitwise exclusive disjunction of i1 and i2.

ixorN(i1,i2)=ibitsN1(ibitsN(i1)ibitsN(i2))

ishlN(i1,i2)

  • Let k be i2 modulo N.

  • Return the result of shifting i1 left by k bits, modulo 2N.

ishlN(i1,i2)=ibitsN1(d2Nk 0k)(ifibitsN(i1)=d1k d2Nkk=i2modN)

ishr_uN(i1,i2)

  • Let k be i2 modulo N.

  • Return the result of shifting i1 right by k bits, extended with 0 bits.

ishr_uN(i1,i2)=ibitsN1(0k d1Nk)(ifibitsN(i1)=d1Nk d2kk=i2modN)

ishr_sN(i1,i2)

  • Let k be i2 modulo N.

  • Return the result of shifting i1 right by k bits, extended with the most significant bit of the original value.

ishr_sN(i1,i2)=ibitsN1(d0k+1 d1Nk1)(ifibitsN(i1)=d0 d1Nk1 d2kk=i2modN)

irotlN(i1,i2)

  • Let k be i2 modulo N.

  • Return the result of rotating i1 left by k bits.

irotlN(i1,i2)=ibitsN1(d2Nk d1k)(ifibitsN(i1)=d1k d2Nkk=i2modN)

irotrN(i1,i2)

  • Let k be i2 modulo N.

  • Return the result of rotating i1 right by k bits.

irotrN(i1,i2)=ibitsN1(d2k d1Nk)(ifibitsN(i1)=d1Nk d2kk=i2modN)

iclzN(i)

  • Return the count of leading zero bits in i; all bits are considered leading zeros if i is 0.

iclzN(i)=k(ifibitsN(i)=0k (1 d)?)

ictzN(i)

  • Return the count of trailing zero bits in i; all bits are considered trailing zeros if i is 0.

ictzN(i)=k(ifibitsN(i)=(d 1)? 0k)

ipopcntN(i)

  • Return the count of non-zero bits in i.

ipopcntN(i)=k(ifibitsN(i)=(0 1)k 0)

ieqzN(i)

  • Return 1 if i is zero, 0 otherwise.

ieqzN(i)=bool(i=0)

ieqN(i1,i2)

  • Return 1 if i1 equals i2, 0 otherwise.

ieqN(i1,i2)=bool(i1=i2)

ineN(i1,i2)

  • Return 1 if i1 does not equal i2, 0 otherwise.

ineN(i1,i2)=bool(i1i2)

ilt_uN(i1,i2)

  • Return 1 if i1 is less than i2, 0 otherwise.

ilt_uN(i1,i2)=bool(i1<i2)

ilt_sN(i1,i2)

ilt_sN(i1,i2)=bool(signedN(i1)<signedN(i2))

igt_uN(i1,i2)

  • Return 1 if i1 is greater than i2, 0 otherwise.

igt_uN(i1,i2)=bool(i1>i2)

igt_sN(i1,i2)

igt_sN(i1,i2)=bool(signedN(i1)>signedN(i2))

ile_uN(i1,i2)

  • Return 1 if i1 is less than or equal to i2, 0 otherwise.

ile_uN(i1,i2)=bool(i1i2)

ile_sN(i1,i2)

ile_sN(i1,i2)=bool(signedN(i1)signedN(i2))

ige_uN(i1,i2)

  • Return 1 if i1 is greater than or equal to i2, 0 otherwise.

ige_uN(i1,i2)=bool(i1i2)

ige_sN(i1,i2)

ige_sN(i1,i2)=bool(signedN(i1)signedN(i2))

iextendM_sN(i)

  • Let j be the result of computing wrapN,M(i).

  • Return extendM,Ns(j).

iextendM_sN(i)=extendM,Ns(wrapN,M(i))

ibitselectN(i1,i2,i3)

  • Let j1 be the bitwise conjunction of i1 and i3.

  • Let j3 be the bitwise negation of i3.

  • Let j2 be the bitwise conjunction of i2 and j3.

  • Return the bitwise disjunction of j1 and j2.

ibitselectN(i1,i2,i3)=iorN(iandN(i1,i3),iandN(i2,inotN(i3)))

iabsN(i)

  • Let j be the signed interpretation of i.

  • If j is greater than or equal to 0, then return i.

  • Else return the negation of j, modulo 2N.

iabsN(i)=i(ifsignedN(i)0)iabsN(i)=signedN(i)mod2N(otherwise)

inegN(i)

  • Return the result of negating i, modulo 2N.

inegN(i)=(2Ni)mod2N

imin_uN(i1,i2)

  • Return i1 if ilt_uN(i1,i2) is 1, return i2 otherwise.

imin_uN(i1,i2)=i1(ifilt_uN(i1,i2)=1)imin_uN(i1,i2)=i2(otherwise)

imin_sN(i1,i2)

  • Return i1 if ilt_sN(i1,i2) is 1, return i2 otherwise.

imin_sN(i1,i2)=i1(ifilt_sN(i1,i2)=1)imin_sN(i1,i2)=i2(otherwise)

imax_uN(i1,i2)

  • Return i1 if igt_uN(i1,i2) is 1, return i2 otherwise.

imax_uN(i1,i2)=i1(ifigt_uN(i1,i2)=1)imax_uN(i1,i2)=i2(otherwise)

imax_sN(i1,i2)

  • Return i1 if igt_sN(i1,i2) is 1, return i2 otherwise.

imax_sN(i1,i2)=i1(ifigt_sN(i1,i2)=1)imax_sN(i1,i2)=i2(otherwise)

iadd_sat_uN(i1,i2)

  • Let i be the result of adding i1 and i2.

  • Return sat_uN(i).

iadd_sat_uN(i1,i2)=sat_uN(i1+i2)

iadd_sat_sN(i1,i2)

  • Let j1 be the signed interpretation of i1

  • Let j2 be the signed interpretation of i2

  • Let j be the result of adding j1 and j2.

  • Return sat_sN(j).

iadd_sat_sN(i1,i2)=sat_sN(signedN(i1)+signedN(i2))

isub_sat_uN(i1,i2)

  • Let i be the result of subtracting i2 from i1.

  • Return sat_uN(i).

isub_sat_uN(i1,i2)=sat_uN(i1i2)

isub_sat_sN(i1,i2)

  • Let j1 be the signed interpretation of i1

  • Let j2 be the signed interpretation of i2

  • Let j be the result of subtracting j2 from j1.

  • Return sat_sN(j).

isub_sat_sN(i1,i2)=sat_sN(signedN(i1)signedN(i2))

iavgr_uN(i1,i2)

  • Let j be the result of adding i1, i2, and 1.

  • Return the result of dividing j by 2, truncated toward zero.

iavgr_uN(i1,i2)=trunc((i1+i2+1)/2)

iq15mulrsat_sN(i1,i2)

  • Return the result of sat_sN(ishr_sN(i1i2+214,15)).

iq15mulrsat_sN(i1,i2)=sat_sN(ishr_sN(i1i2+214,15))

Floating-Point Operations

Floating-point arithmetic follows the IEEE 754 standard, with the following qualifications:

  • All operators use round-to-nearest ties-to-even, except where otherwise specified. Non-default directed rounding attributes are not supported.

  • Following the recommendation that operators propagate NaN payloads from their operands is permitted but not required.

  • All operators use “non-stop” mode, and floating-point exceptions are not otherwise observable. In particular, neither alternate floating-point exception handling attributes nor operators on status flags are supported. There is no observable difference between quiet and signalling NaNs.

Note

Some of these limitations may be lifted in future versions of WebAssembly.

Rounding

Rounding always is round-to-nearest ties-to-even, in correspondence with IEEE 754 (Section 4.3.1).

An exact floating-point number is a rational number that is exactly representable as a floating-point number of given bit width N.

A limit number for a given floating-point bit width N is a positive or negative number whose magnitude is the smallest power of 2 that is not exactly representable as a floating-point number of width N (that magnitude is 2128 for N=32 and 21024 for N=64).

A candidate number is either an exact floating-point number or a positive or negative limit number for the given bit width N.

A candidate pair is a pair z1,z2 of candidate numbers, such that no candidate number exists that lies between the two.

A real number r is converted to a floating-point value of bit width N as follows:

  • If r is 0, then return +0.

  • Else if r is an exact floating-point number, then return r.

  • Else if r greater than or equal to the positive limit, then return +.

  • Else if r is less than or equal to the negative limit, then return .

  • Else if z1 and z2 are a candidate pair such that z1<r<z2, then:

    • If |rz1|<|rz2|, then let z be z1.

    • Else if |rz1|>|rz2|, then let z be z2.

    • Else if |rz1|=|rz2| and the significand of z1 is even, then let z be z1.

    • Else, let z be z2.

  • If z is 0, then:

    • If r<0, then return 0.

    • Else, return +0.

  • Else if z is a limit number, then:

    • If r<0, then return .

    • Else, return +.

  • Else, return z.

floatN(0)=+0floatN(r)=r(ifrexactN)floatN(r)=+(ifr+limitN)floatN(r)=(ifrlimitN)floatN(r)=closestN(r,z1,z2)(ifz1<r<z2(z1,z2)candidatepairN)closestN(r,z1,z2)=rectifyN(r,z1)(if|rz1|<|rz2|)closestN(r,z1,z2)=rectifyN(r,z2)(if|rz1|>|rz2|)closestN(r,z1,z2)=rectifyN(r,z1)(if|rz1|=|rz2|evenN(z1))closestN(r,z1,z2)=rectifyN(r,z2)(if|rz1|=|rz2|evenN(z2))rectifyN(r,±limitN)=±rectifyN(r,0)=+0(r0)rectifyN(r,0)=0(r<0)rectifyN(r,z)=z

where:

exactN=fNQlimitN=22expon(N)1candidateN=exactN{+limitN,limitN}candidatepairN={(z1,z2)candidateN2 | z1<z2zcandidateN,zz1zz2}evenN((d+m2M)2e)mmod2=0evenN(±limitN)true

NaN Propagation

When the result of a floating-point operator other than fneg, fabs, or fcopysign is a NaN, then its sign is non-deterministic and the payload is computed as follows:

  • If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.

  • Otherwise the payload is picked non-deterministically among all arithmetic NaNs; that is, its most significant bit is 1 and all others are unspecified.

This non-deterministic result is expressed by the following auxiliary function producing a set of allowed outputs from a set of inputs:

nansN{z}={+nan(n),nan(n) | n=canonN}(ifnan(n)z, n=canonN)nansN{z}={+nan(n),nan(n) | ncanonN}(otherwise)

faddN(z1,z2)

  • If either z1 or z2 is a NaN, then return an element of nansN{z1,z2}.

  • Else if both z1 and z2 are infinities of opposite signs, then return an element of nansN{}.

  • Else if both z1 and z2 are infinities of equal sign, then return that infinity.

  • Else if either z1 or z2 is an infinity, then return that infinity.

  • Else if both z1 and z2 are zeroes of opposite sign, then return positive zero.

  • Else if both z1 and z2 are zeroes of equal sign, then return that zero.

  • Else if either z1 or z2 is a zero, then return the other operand.

  • Else if both z1 and z2 are values with the same magnitude but opposite signs, then return positive zero.

  • Else return the result of adding z1 and z2, rounded to the nearest representable value.

faddN(±nan(n),z2)=nansN{±nan(n),z2}faddN(z1,±nan(n))=nansN{±nan(n),z1}faddN(±,)=nansN{}faddN(±,±)=±faddN(z1,±)=±faddN(±,z2)=±faddN(±0,0)=+0faddN(±0,±0)=±0faddN(z1,±0)=z1faddN(±0,z2)=z2faddN(±q,q)=+0faddN(z1,z2)=floatN(z1+z2)

fsubN(z1,z2)

  • If either z1 or z2 is a NaN, then return an element of nansN{z1,z2}.

  • Else if both z1 and z2 are infinities of equal signs, then return an element of nansN{}.

  • Else if both z1 and z2 are infinities of opposite sign, then return z1.

  • Else if z1 is an infinity, then return that infinity.

  • Else if z2 is an infinity, then return that infinity negated.

  • Else if both z1 and z2 are zeroes of equal sign, then return positive zero.

  • Else if both z1 and z2 are zeroes of opposite sign, then return z1.

  • Else if z2 is a zero, then return z1.

  • Else if z1 is a zero, then return z2 negated.

  • Else if both z1 and z2 are the same value, then return positive zero.

  • Else return the result of subtracting z2 from z1, rounded to the nearest representable value.

fsubN(±nan(n),z2)=nansN{±nan(n),z2}fsubN(z1,±nan(n))=nansN{±nan(n),z1}fsubN(±,±)=nansN{}fsubN(±,)=±fsubN(z1,±)=fsubN(±,z2)=±fsubN(±0,±0)=+0fsubN(±0,0)=±0fsubN(z1,±0)=z1fsubN(±0,±q2)=q2fsubN(±q,±q)=+0fsubN(z1,z2)=floatN(z1z2)

Note

Up to the non-determinism regarding NaNs, it always holds that fsubN(z1,z2)=faddN(z1,fnegN(z2)).

fmulN(z1,z2)

  • If either z1 or z2 is a NaN, then return an element of nansN{z1,z2}.

  • Else if one of z1 and z2 is a zero and the other an infinity, then return an element of nansN{}.

  • Else if both z1 and z2 are infinities of equal sign, then return positive infinity.

  • Else if both z1 and z2 are infinities of opposite sign, then return negative infinity.

  • Else if either z1 or z2 is an infinity and the other a value with equal sign, then return positive infinity.

  • Else if either z1 or z2 is an infinity and the other a value with opposite sign, then return negative infinity.

  • Else if both z1 and z2 are zeroes of equal sign, then return positive zero.

  • Else if both z1 and z2 are zeroes of opposite sign, then return negative zero.

  • Else return the result of multiplying z1 and z2, rounded to the nearest representable value.

fmulN(±nan(n),z2)=nansN{±nan(n),z2}fmulN(z1,±nan(n))=nansN{±nan(n),z1}fmulN(±,±0)=nansN{}fmulN(±,0)=nansN{}fmulN(±0,±)=nansN{}fmulN(±0,)=nansN{}fmulN(±,±)=+fmulN(±,)=fmulN(±q1,±)=+fmulN(±q1,)=fmulN(±,±q2)=+fmulN(±,q2)=fmulN(±0,±0)=+0fmulN(±0,0)=0fmulN(z1,z2)=floatN(z1z2)

fdivN(z1,z2)

  • If either z1 or z2 is a NaN, then return an element of nansN{z1,z2}.

  • Else if both z1 and z2 are infinities, then return an element of nansN{}.

  • Else if both z1 and z2 are zeroes, then return an element of nansN{z1,z2}.

  • Else if z1 is an infinity and z2 a value with equal sign, then return positive infinity.

  • Else if z1 is an infinity and z2 a value with opposite sign, then return negative infinity.

  • Else if z2 is an infinity and z1 a value with equal sign, then return positive zero.

  • Else if z2 is an infinity and z1 a value with opposite sign, then return negative zero.

  • Else if z1 is a zero and z2 a value with equal sign, then return positive zero.

  • Else if z1 is a zero and z2 a value with opposite sign, then return negative zero.

  • Else if z2 is a zero and z1 a value with equal sign, then return positive infinity.

  • Else if z2 is a zero and z1 a value with opposite sign, then return negative infinity.

  • Else return the result of dividing z1 by z2, rounded to the nearest representable value.

fdivN(±nan(n),z2)=nansN{±nan(n),z2}fdivN(z1,±nan(n))=nansN{±nan(n),z1}fdivN(±,±)=nansN{}fdivN(±,)=nansN{}fdivN(±0,±0)=nansN{}fdivN(±0,0)=nansN{}fdivN(±,±q2)=+fdivN(±,q2)=fdivN(±q1,±)=+0fdivN(±q1,)=0fdivN(±0,±q2)=+0fdivN(±0,q2)=0fdivN(±q1,±0)=+fdivN(±q1,0)=fdivN(z1,z2)=floatN(z1/z2)

fminN(z1,z2)

  • If either z1 or z2 is a NaN, then return an element of nansN{z1,z2}.

  • Else if either z1 or z2 is a negative infinity, then return negative infinity.

  • Else if either z1 or z2 is a positive infinity, then return the other value.

  • Else if both z1 and z2 are zeroes of opposite signs, then return negative zero.

  • Else return the smaller value of z1 and z2.

fminN(±nan(n),z2)=nansN{±nan(n),z2}fminN(z1,±nan(n))=nansN{±nan(n),z1}fminN(+,z2)=z2fminN(,z2)=fminN(z1,+)=z1fminN(z1,)=fminN(±0,0)=0fminN(z1,z2)=z1(ifz1z2)fminN(z1,z2)=z2(ifz2z1)

fmaxN(z1,z2)

  • If either z1 or z2 is a NaN, then return an element of nansN{z1,z2}.

  • Else if either z1 or z2 is a positive infinity, then return positive infinity.

  • Else if either z1 or z2 is a negative infinity, then return the other value.

  • Else if both z1 and z2 are zeroes of opposite signs, then return positive zero.

  • Else return the larger value of z1 and z2.

fmaxN(±nan(n),z2)=nansN{±nan(n),z2}fmaxN(z1,±nan(n))=nansN{±nan(n),z1}fmaxN(+,z2)=+fmaxN(,z2)=z2fmaxN(z1,+)=+fmaxN(z1,)=z1fmaxN(±0,0)=+0fmaxN(z1,z2)=z1(ifz1z2)fmaxN(z1,z2)=z2(ifz2z1)

fcopysignN(z1,z2)

  • If z1 and z2 have the same sign, then return z1.

  • Else return z1 with negated sign.

fcopysignN(±p1,±p2)=±p1fcopysignN(±p1,p2)=p1

fabsN(z)

  • If z is a NaN, then return z with positive sign.

  • Else if z is an infinity, then return positive infinity.

  • Else if z is a zero, then return positive zero.

  • Else if z is a positive value, then z.

  • Else return z negated.

fabsN(±nan(n))=+nan(n)fabsN(±)=+fabsN(±0)=+0fabsN(±q)=+q

fnegN(z)

  • If z is a NaN, then return z with negated sign.

  • Else if z is an infinity, then return that infinity negated.

  • Else if z is a zero, then return that zero negated.

  • Else return z negated.

fnegN(±nan(n))=nan(n)fnegN(±)=fnegN(±0)=0fnegN(±q)=q

fsqrtN(z)

  • If z is a NaN, then return an element of nansN{z}.

  • Else if z is negative infinity, then return an element of nansN{}.

  • Else if z is positive infinity, then return positive infinity.

  • Else if z is a zero, then return that zero.

  • Else if z has a negative sign, then return an element of nansN{}.

  • Else return the square root of z.

fsqrtN(±nan(n))=nansN{±nan(n)}fsqrtN()=nansN{}fsqrtN(+)=+fsqrtN(±0)=±0fsqrtN(q)=nansN{}fsqrtN(+q)=floatN(q)

fceilN(z)

  • If z is a NaN, then return an element of nansN{z}.

  • Else if z is an infinity, then return z.

  • Else if z is a zero, then return z.

  • Else if z is smaller than 0 but greater than 1, then return negative zero.

  • Else return the smallest integral value that is not smaller than z.

fceilN(±nan(n))=nansN{±nan(n)}fceilN(±)=±fceilN(±0)=±0fceilN(q)=0(if1<q<0)fceilN(±q)=floatN(i)(if±qi<±q+1)

ffloorN(z)

  • If z is a NaN, then return an element of nansN{z}.

  • Else if z is an infinity, then return z.

  • Else if z is a zero, then return z.

  • Else if z is greater than 0 but smaller than 1, then return positive zero.

  • Else return the largest integral value that is not larger than z.

ffloorN(±nan(n))=nansN{±nan(n)}ffloorN(±)=±ffloorN(±0)=±0ffloorN(+q)=+0(if0<+q<1)ffloorN(±q)=floatN(i)(if±q1<i±q)

ftruncN(z)

  • If z is a NaN, then return an element of nansN{z}.

  • Else if z is an infinity, then return z.

  • Else if z is a zero, then return z.

  • Else if z is greater than 0 but smaller than 1, then return positive zero.

  • Else if z is smaller than 0 but greater than 1, then return negative zero.

  • Else return the integral value with the same sign as z and the largest magnitude that is not larger than the magnitude of z.

ftruncN(±nan(n))=nansN{±nan(n)}ftruncN(±)=±ftruncN(±0)=±0ftruncN(+q)=+0(if0<+q<1)ftruncN(q)=0(if1<q<0)ftruncN(±q)=floatN(±i)(if+q1<i+q)

fnearestN(z)

  • If z is a NaN, then return an element of nansN{z}.

  • Else if z is an infinity, then return z.

  • Else if z is a zero, then return z.

  • Else if z is greater than 0 but smaller than or equal to 0.5, then return positive zero.

  • Else if z is smaller than 0 but greater than or equal to 0.5, then return negative zero.

  • Else return the integral value that is nearest to z; if two values are equally near, return the even one.

fnearestN(±nan(n))=nansN{±nan(n)}fnearestN(±)=±fnearestN(±0)=±0fnearestN(+q)=+0(if0<+q0.5)fnearestN(q)=0(if0.5q<0)fnearestN(±q)=floatN(±i)(if|iq|<0.5)fnearestN(±q)=floatN(±i)(if|iq|=0.5i even)

feqN(z1,z2)

  • If either z1 or z2 is a NaN, then return 0.

  • Else if both z1 and z2 are zeroes, then return 1.

  • Else if both z1 and z2 are the same value, then return 1.

  • Else return 0.

feqN(±nan(n),z2)=0feqN(z1,±nan(n))=0feqN(±0,0)=1feqN(z1,z2)=bool(z1=z2)

fneN(z1,z2)

  • If either z1 or z2 is a NaN, then return 1.

  • Else if both z1 and z2 are zeroes, then return 0.

  • Else if both z1 and z2 are the same value, then return 0.

  • Else return 1.

fneN(±nan(n),z2)=1fneN(z1,±nan(n))=1fneN(±0,0)=0fneN(z1,z2)=bool(z1z2)

fltN(z1,z2)

  • If either z1 or z2 is a NaN, then return 0.

  • Else if z1 and z2 are the same value, then return 0.

  • Else if z1 is positive infinity, then return 0.

  • Else if z1 is negative infinity, then return 1.

  • Else if z2 is positive infinity, then return 1.

  • Else if z2 is negative infinity, then return 0.

  • Else if both z1 and z2 are zeroes, then return 0.

  • Else if z1 is smaller than z2, then return 1.

  • Else return 0.

fltN(±nan(n),z2)=0fltN(z1,±nan(n))=0fltN(z,z)=0fltN(+,z2)=0fltN(,z2)=1fltN(z1,+)=1fltN(z1,)=0fltN(±0,0)=0fltN(z1,z2)=bool(z1<z2)

fgtN(z1,z2)

  • If either z1 or z2 is a NaN, then return 0.

  • Else if z1 and z2 are the same value, then return 0.

  • Else if z1 is positive infinity, then return 1.

  • Else if z1 is negative infinity, then return 0.

  • Else if z2 is positive infinity, then return 0.

  • Else if z2 is negative infinity, then return 1.

  • Else if both z1 and z2 are zeroes, then return 0.

  • Else if z1 is larger than z2, then return 1.

  • Else return 0.

fgtN(±nan(n),z2)=0fgtN(z1,±nan(n))=0fgtN(z,z)=0fgtN(+,z2)=1fgtN(,z2)=0fgtN(z1,+)=0fgtN(z1,)=1fgtN(±0,0)=0fgtN(z1,z2)=bool(z1>z2)

fleN(z1,z2)

  • If either z1 or z2 is a NaN, then return 0.

  • Else if z1 and z2 are the same value, then return 1.

  • Else if z1 is positive infinity, then return 0.

  • Else if z1 is negative infinity, then return 1.

  • Else if z2 is positive infinity, then return 1.

  • Else if z2 is negative infinity, then return 0.

  • Else if both z1 and z2 are zeroes, then return 1.

  • Else if z1 is smaller than or equal to z2, then return 1.

  • Else return 0.

fleN(±nan(n),z2)=0fleN(z1,±nan(n))=0fleN(z,z)=1fleN(+,z2)=0fleN(,z2)=1fleN(z1,+)=1fleN(z1,)=0fleN(±0,0)=1fleN(z1,z2)=bool(z1z2)

fgeN(z1,z2)

  • If either z1 or z2 is a NaN, then return 0.

  • Else if z1 and z2 are the same value, then return 1.

  • Else if z1 is positive infinity, then return 1.

  • Else if z1 is negative infinity, then return 0.

  • Else if z2 is positive infinity, then return 0.

  • Else if z2 is negative infinity, then return 1.

  • Else if both z1 and z2 are zeroes, then return 1.

  • Else if z1 is smaller than or equal to z2, then return 1.

  • Else return 0.

fgeN(±nan(n),z2)=0fgeN(z1,±nan(n))=0fgeN(z,z)=1fgeN(+,z2)=1fgeN(,z2)=0fgeN(z1,+)=0fgeN(z1,)=1fgeN(±0,0)=1fgeN(z1,z2)=bool(z1z2)

fpminN(z1,z2)

  • If z2 is less than z1 then return z2.

  • Else return z1.

fpminN(z1,z2)=z2(iffltN(z2,z1)=1)fpminN(z1,z2)=z1(otherwise)

fpmaxN(z1,z2)

  • If z1 is less than z2 then return z2.

  • Else return z1.

fpmaxN(z1,z2)=z2(iffltN(z1,z2)=1)fpmaxN(z1,z2)=z1(otherwise)

Conversions

extendM,Nu(i)

  • Return i.

extendM,Nu(i)=i

Note

In the abstract syntax, unsigned extension just reinterprets the same value.

extendM,Ns(i)

  • Let j be the signed interpretation of i of size M.

  • Return the two’s complement of j relative to size N.

extendM,Ns(i)=signedN1(signedM(i))

wrapM,N(i)

  • Return i modulo 2N.

wrapM,N(i)=imod2N

truncM,Nu(z)

  • If z is a NaN, then the result is undefined.

  • Else if z is an infinity, then the result is undefined.

  • Else if z is a number and trunc(z) is a value within range of the target type, then return that value.

  • Else the result is undefined.

truncM,Nu(±nan(n))={}truncM,Nu(±)={}truncM,Nu(±q)=trunc(±q)(if1<trunc(±q)<2N)truncM,Nu(±q)={}(otherwise)

Note

This operator is partial. It is not defined for NaNs, infinities, or values for which the result is out of range.

truncM,Ns(z)

  • If z is a NaN, then the result is undefined.

  • Else if z is an infinity, then the result is undefined.

  • If z is a number and trunc(z) is a value within range of the target type, then return that value.

  • Else the result is undefined.

truncM,Ns(±nan(n))={}truncM,Ns(±)={}truncM,Ns(±q)=trunc(±q)(if2N11<trunc(±q)<2N1)truncM,Ns(±q)={}(otherwise)

Note

This operator is partial. It is not defined for NaNs, infinities, or values for which the result is out of range.

trunc_sat_uM,N(z)

  • If z is a NaN, then return 0.

  • Else if z is negative infinity, then return 0.

  • Else if z is positive infinity, then return 2N1.

  • Else, return sat_uN(trunc(z)).

trunc_sat_uM,N(±nan(n))=0trunc_sat_uM,N()=0trunc_sat_uM,N(+)=2N1trunc_sat_uM,N(z)=sat_uN(trunc(z))

trunc_sat_sM,N(z)

  • If z is a NaN, then return 0.

  • Else if z is negative infinity, then return 2N1.

  • Else if z is positive infinity, then return 2N11.

  • Else, return sat_sN(trunc(z)).

trunc_sat_sM,N(±nan(n))=0trunc_sat_sM,N()=2N1trunc_sat_sM,N(+)=2N11trunc_sat_sM,N(z)=sat_sN(trunc(z))

promoteM,N(z)

  • If z is a canonical NaN, then return an element of nansN{} (i.e., a canonical NaN of size N).

  • Else if z is a NaN, then return an element of nansN{±nan(1)} (i.e., any arithmetic NaN of size N).

  • Else, return z.

promoteM,N(±nan(n))=nansN{}(ifn=canonN)promoteM,N(±nan(n))=nansN{+nan(1)}(otherwise)promoteM,N(z)=z

demoteM,N(z)

  • If z is a canonical NaN, then return an element of nansN{} (i.e., a canonical NaN of size N).

  • Else if z is a NaN, then return an element of nansN{±nan(1)} (i.e., any NaN of size N).

  • Else if z is an infinity, then return that infinity.

  • Else if z is a zero, then return that zero.

  • Else, return floatN(z).

demoteM,N(±nan(n))=nansN{}(ifn=canonN)demoteM,N(±nan(n))=nansN{+nan(1)}(otherwise)demoteM,N(±)=±demoteM,N(±0)=±0demoteM,N(±q)=floatN(±q)

convertM,Nu(i)

  • Return floatN(i).

convertM,Nu(i)=floatN(i)

convertM,Ns(i)

convertM,Ns(i)=floatN(signedM(i))

reinterprett1,t2(c)

  • Let d be the bit sequence bitst1(c).

  • Return the constant c for which bitst2(c)=d.

reinterprett1,t2(c)=bitst21(bitst1(c))

narrowM,Ns(i)

narrowM,Ns(i)=sat_sN(signedM(i))

narrowM,Nu(i)

narrowM,Nu(i)=sat_uN(signedM(i))