Add unsigned 32-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry or overflow flag), and store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag).
tmp[32:0] := a[31:0] + b[31:0] + (c_in > 0 ? 1 : 0)
MEM[out+31:out] := tmp[31:0]
dst[0] := tmp[32]
dst[7:1] := 0
ADX
Arithmetic
Add unsigned 64-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry or overflow flag), and store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag).
tmp[64:0] := a[63:0] + b[63:0] + (c_in > 0 ? 1 : 0)
MEM[out+63:out] := tmp[63:0]
dst[0] := tmp[64]
dst[7:1] := 0
ADX
Arithmetic
Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"."
a[127:0] := ShiftRows(a[127:0])
a[127:0] := SubBytes(a[127:0])
a[127:0] := MixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
AES
Cryptography
Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"."
a[127:0] := ShiftRows(a[127:0])
a[127:0] := SubBytes(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
AES
Cryptography
Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst".
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
a[127:0] := InvMixColumns(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
AES
Cryptography
Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst".
a[127:0] := InvShiftRows(a[127:0])
a[127:0] := InvSubBytes(a[127:0])
dst[127:0] := a[127:0] XOR RoundKey[127:0]
AES
Cryptography
Perform the InvMixColumns transformation on "a" and store the result in "dst".
dst[127:0] := InvMixColumns(a[127:0])
AES
Cryptography
Assist in expanding the AES cipher key by computing steps towards generating a round key for encryption cipher using data from "a" and an 8-bit round constant specified in "imm8", and store the result in "dst"."
X3[31:0] := a[127:96]
X2[31:0] := a[95:64]
X1[31:0] := a[63:32]
X0[31:0] := a[31:0]
RCON[31:0] := ZeroExtend32(imm8[7:0])
dst[31:0] := SubWord(X1)
dst[63:32] := RotWord(SubWord(X1)) XOR RCON
dst[95:64] := SubWord(X3)
dst[127:96] := RotWord(SubWord(X3)) XOR RCON
AES
Cryptography
Compute dot-product of BF16 (16-bit) floating-point pairs in tiles "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst".
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(a.row[m].bf16[2*k+0]) * FP32(b.row[k].bf16[2*n+0])
tmp.fp32[n] += FP32(a.row[m].bf16[2*k+1]) * FP32(b.row[k].bf16[2*n+1])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-BF16
Application-Targeted
Compute dot-product of BF16 (16-bit) floating-point pairs in tiles "src0" and "src1", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(src0.row[m].bf16[2*k+0]) * FP32(src1.row[k].bf16[2*n+0])
tmp.fp32[n] += FP32(src0.row[m].bf16[2*k+1]) * FP32(src1.row[k].bf16[2*n+1])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-BF16
Application-Targeted
Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles "a" and "b" is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the imaginary part of the result. For each possible combination of (row of "a", column of "b"), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from "a" and one from "b"). The imaginary part of the "a" element is multiplied with the real part of the corresponding "b" element, and the real part of the "a" element is multiplied with the imaginary part of the corresponding "b" elements. The two accumulated results are added, and then accumulated into the corresponding row and column of "dst".
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+1])
tmp.fp32[n] += FP32(a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+0])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-COMPLEX
Application-Targeted
Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles "a" and "b" is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the real part of the result. For each possible combination of (row of "a", column of "b"), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from "a" and one from "b"). The real part of the "a" element is multiplied with the real part of the corresponding "b" element, and the negated imaginary part of the "a" element is multiplied with the imaginary part of the corresponding "b" elements. The two accumulated results are added, and then accumulated into the corresponding row and column of "dst".
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+0])
tmp.fp32[n] += FP32(-a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+1])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-COMPLEX
Application-Targeted
Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles "src0" and "src1" is interpreted as a complex number with FP16 real part and FP16 imaginary part. This function calculates the imaginary part of the result.
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+0]) * FP32(src1.row[k].fp16[2*n+1])
tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+1]) * FP32(src1.row[k].fp16[2*n+0])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-COMPLEX
Application-Targeted
Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles src0 and src1 is interpreted as a complex number with FP16 real part and FP16 imaginary part. This function calculates the real part of the result.
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+0]) * FP32(src1.row[k].fp16[2*n+0])
tmp.fp32[n] += FP32(-src0.row[m].fp16[2*k+1]) * FP32(src1.row[k].fp16[2*n+1])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-COMPLEX
Application-Targeted
Compute dot-product of FP16 (16-bit) floating-point pairs in tiles "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst".
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(a.row[m].fp16[2*k+0]) * FP32(b.row[k].fp16[2*n+0])
tmp.fp32[n] += FP32(a.row[m].fp16[2*k+1]) * FP32(b.row[k].fp16[2*n+1])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-FP16
Application-Targeted
Compute dot-product of FP16 (16-bit) floating-point pairs in tiles "src0" and "src1", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+0]) * FP32(src1.row[k].fp16[2*n+0])
tmp.fp32[n] += FP32(src0.row[m].fp16[2*k+1]) * FP32(src1.row[k].fp16[2*n+1])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-FP16
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".
DEFINE DPBD(c, x, y) {
tmp1 := SignExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
tmp2 := SignExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
tmp3 := SignExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
tmp4 := SignExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".
DEFINE DPBD(c, x, y) {
tmp1 := ZeroExtend32(x.byte[0]) * SignExtend32(y.byte[0])
tmp2 := ZeroExtend32(x.byte[1]) * SignExtend32(y.byte[1])
tmp3 := ZeroExtend32(x.byte[2]) * SignExtend32(y.byte[2])
tmp4 := ZeroExtend32(x.byte[3]) * SignExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".
DEFINE DPBD(c, x, y) {
tmp1 := ZeroExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
tmp2 := ZeroExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
tmp3 := ZeroExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
tmp4 := ZeroExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst".
DEFINE DPBD(c, x, y) {
tmp1 := SignExtend32(x.byte[0]) * SignExtend32(y.byte[0])
tmp2 := SignExtend32(x.byte[1]) * SignExtend32(y.byte[1])
tmp3 := SignExtend32(x.byte[2]) * SignExtend32(y.byte[2])
tmp4 := SignExtend32(x.byte[3]) * SignExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (a.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "src0" with corresponding signed 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
DEFINE DPBD(c, x, y) {
tmp1 := SignExtend32(x.byte[0]) * SignExtend32(y.byte[0])
tmp2 := SignExtend32(x.byte[1]) * SignExtend32(y.byte[1])
tmp3 := SignExtend32(x.byte[2]) * SignExtend32(y.byte[2])
tmp4 := SignExtend32(x.byte[3]) * SignExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "src0" with corresponding unsigned 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
DEFINE DPBD(c, x, y) {
tmp1 := SignExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
tmp2 := SignExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
tmp3 := SignExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
tmp4 := SignExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "src0" with corresponding signed 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
DEFINE DPBD(c, x, y) {
tmp1 := ZeroExtend32(x.byte[0]) * SignExtend32(y.byte[0])
tmp2 := ZeroExtend32(x.byte[1]) * SignExtend32(y.byte[1])
tmp3 := ZeroExtend32(x.byte[2]) * SignExtend32(y.byte[2])
tmp4 := ZeroExtend32(x.byte[3]) * SignExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "src0" with corresponding unsigned 8-bit integers in "src1", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
DEFINE DPBD(c, x, y) {
tmp1 := ZeroExtend32(x.byte[0]) * ZeroExtend32(y.byte[0])
tmp2 := ZeroExtend32(x.byte[1]) * ZeroExtend32(y.byte[1])
tmp3 := ZeroExtend32(x.byte[2]) * ZeroExtend32(y.byte[2])
tmp4 := ZeroExtend32(x.byte[3]) * ZeroExtend32(y.byte[3])
RETURN c + tmp1 + tmp2 + tmp3 + tmp4
}
FOR m := 0 TO dst.rows - 1
tmp := dst.row[m]
FOR k := 0 TO (src0.colsb / 4) - 1
FOR n := 0 TO (dst.colsb / 4) - 1
tmp.dword[n] := DPBD(tmp.dword[n], src0.row[m].dword[k], src1.row[k].dword[n])
ENDFOR
ENDFOR
write_row_and_zero(dst, m, tmp, dst.colsb)
ENDFOR
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-INT8
Application-Targeted
Load tile configuration from a 64-byte memory location specified by "mem_addr". The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If the specified pallette_id is zero, that signifies the init state for both the tile config and the tile data, and the tiles are zeroed. Any invalid configurations will result in #GP fault.
// format of memory payload. each field is a byte.
// 0: palette
// 1: start_row
// 2-15: reserved, must be zero
// 16-17: tile0.colsb
// 18-19: tile1.colsb
// 20-21: tile2.colsb
// ...
// 30-31: tile7.colsb
// 32-47: reserved, must be zero
// 48: tile0.rows
// 49: tile1.rows
// 50: tile2.rows
// ...
// 55: tile7.rows
// 56-63: reserved, must be zero
AMX-TILE
Application-Targeted
Stores the current tile configuration to a 64-byte memory location specified by "mem_addr". The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If tiles are not configured, all zeroes will be stored to memory.
// format of memory payload. each field is a byte.
// 0: palette
// 1: start_row
// 2-15: reserved, must be zero
// 16-17: tile0.colsb
// 18-19: tile1.colsb
// 20-21: tile2.colsb
// ...
// 30-31: tile7.colsb
// 32-47: reserved, must be zero
// 48: tile0.rows
// 49: tile1.rows
// 50: tile2.rows
// ...
// 55: tile7.rows
// 56-63: reserved, must be zero
AMX-TILE
Application-Targeted
Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst" using the tile configuration previously configured via "_tile_loadconfig".
start := tileconfig.startRow
IF start == 0 // not restarting, zero incoming state
tilezero(dst)
FI
nbytes := dst.colsb
DO WHILE start < dst.rows
memptr := base + start * stride
write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
start := start + 1
OD
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-TILE
Application-Targeted
Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst" using the tile configuration previously configured via "_tile_loadconfig". This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly.
start := tileconfig.startRow
IF start == 0 // not restarting, zero incoming state
tilezero(dst)
FI
nbytes := dst.colsb
DO WHILE start < dst.rows
memptr := base + start * stride
write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
start := start + 1
OD
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-TILE
Application-Targeted
Release the tile configuration to return to the init state, which releases all storage it currently holds.
AMX-TILE
Application-Targeted
Store the tile specified by "src" to memory specifieid by "base" address and "stride" using the tile configuration previously configured via "_tile_loadconfig".
start := tileconfig.startRow
DO WHILE start < src.rows
memptr := base + start * stride
write_memory(memptr, src.colsb, src.row[start])
start := start + 1
OD
zero_tileconfig_start()
AMX-TILE
Application-Targeted
Zero the tile specified by "tdest".
nbytes := palette_table[tileconfig.palette_id].bytes_per_row
FOR i := 0 TO palette_table[tileconfig.palette_id].max_rows-1
FOR j := 0 TO nbytes-1
tdest.row[i].byte[j] := 0
ENDFOR
ENDFOR
AMX-TILE
Application-Targeted
Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
start := tileconfig.startRow
IF start == 0 // not restarting, zero incoming state
tilezero(dst)
FI
nbytes := dst.colsb
DO WHILE start < dst.rows
memptr := base + start * stride
write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
start := start + 1
OD
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-TILE
Application-Targeted
Store the tile specified by "src" to memory specifieid by "base" address and "stride". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
start := tileconfig.startRow
DO WHILE start < src.rows
memptr := base + start * stride
write_memory(memptr, src.colsb, src.row[start])
start := start + 1
OD
zero_tileconfig_start()
AMX-TILE
Application-Targeted
Load tile rows from memory specifieid by "base" address and "stride" into destination tile "dst". This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly. The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
start := tileconfig.startRow
IF start == 0 // not restarting, zero incoming state
tilezero(dst)
FI
nbytes := dst.colsb
DO WHILE start < dst.rows
memptr := base + start * stride
write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes)
start := start + 1
OD
zero_upper_rows(dst, dst.rows)
zero_tileconfig_start()
AMX-TILE
Application-Targeted
Zero the tile specified by "dst". The shape of tile is specified in the struct of __tile1024i. The register of the tile is allocated by compiler.
nbytes := palette_table[tileconfig.palette_id].bytes_per_row
FOR i := 0 TO palette_table[tileconfig.palette_id].max_rows-1
FOR j := 0 TO nbytes-1
tdest.row[i].byte[j] := 0
ENDFOR
ENDFOR
AMX-TILE
Application-Targeted
Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ACOS(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ACOS(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ACOSH(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ACOSH(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ASIN(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ASIN(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ASINH(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ASINH(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ATAN(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ATAN(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ATANH(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ATANH(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := COS(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := COS(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := COSD(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := COSD(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := COSH(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := COSH(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SIN(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SIN(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SIN(a[i+63:i])
MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SIN(a[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SIND(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SIND(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SINH(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SINH(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := TAN(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := TAN(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := TAND(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := TAND(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := TANH(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := TANH(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Trigonometry
Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CubeRoot(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := CubeRoot(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".
DEFINE CEXP(a[31:0], b[31:0]) {
result[31:0] := POW(FP32(e), a[31:0]) * COS(b[31:0])
result[63:32] := POW(FP32(e), a[31:0]) * SIN(b[31:0])
RETURN result
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CEXP(a[i+31:i], a[i+63:i+32])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the natural logarithm of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".
DEFINE CLOG(a[31:0], b[31:0]) {
result[31:0] := LOG(SQRT(POW(a, 2.0) + POW(b, 2.0)))
result[63:32] := ATAN2(b, a)
RETURN result
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CLOG(a[i+31:i], a[i+63:i+32])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the square root of packed complex snumbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".
DEFINE CSQRT(a[31:0], b[31:0]) {
sign[31:0] := (b < 0.0) ? -FP32(1.0) : FP32(1.0)
result[31:0] := SQRT((a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
result[63:32] := sign * SQRT((-a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
RETURN result
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CSQRT(a[i+31:i], a[i+63:i+32])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := POW(e, a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := POW(FP32(e), a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := POW(10.0, a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := POW(2.0, a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the inverse cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := InvCubeRoot(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the inverse cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := InvCubeRoot(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := InvSQRT(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := InvSQRT(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := LOG(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := LOG(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := LOG(1.0 + a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := LOG(1.0 + a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_pd".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SQRT(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SQRT(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CDFNormal(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := CDFNormal(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := InverseCDFNormal(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := InverseCDFNormal(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ERF(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ERF(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := 1.0 - ERF(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+63:i] := 1.0 - ERF(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := 1.0 / ERF(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+63:i] := 1.0 / ERF(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Probability/Statistics
Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 31
i := 8*j
IF b[i+7:i] == 0
#DE
FI
dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 15
i := 16*j
IF b[i+15:i] == 0
#DE
FI
dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 32*j
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 3
i := 64*j
IF b[i+63:i] == 0
#DE
FI
dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 31
i := 8*j
IF b[i+7:i] == 0
#DE
FI
dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 15
i := 16*j
IF b[i+15:i] == 0
#DE
FI
dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 32*j
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 3
i := 64*j
IF b[i+63:i] == 0
#DE
FI
dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed 32-bit integers into memory at "mem_addr".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 31
i := 8*j
dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 15
i := 16*j
dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 3
i := 64*j
dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 31
i := 8*j
dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 15
i := 16*j
dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 3
i := 64*j
dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed unsigned 32-bit integers into memory at "mem_addr".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CEIL(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := CEIL(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := FLOOR(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := FLOOR(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ROUND(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ROUND(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := TRUNCATE(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Miscellaneous
Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := TRUNCATE(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Miscellaneous
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i] + b[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i] + b[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 3
i := 64*j
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".
DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
FOR j := 0 to 3
i := j*32
IF imm8[(4+j)%8]
temp[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
temp[i+31:i] := FP32(0.0)
FI
ENDFOR
sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0])
FOR j := 0 to 3
i := j*32
IF imm8[j%8]
tmpdst[i+31:i] := sum[31:0]
ELSE
tmpdst[i+31:i] := FP32(0.0)
FI
ENDFOR
RETURN tmpdst[127:0]
}
dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
dst[255:128] := DP(a[255:128], b[255:128], imm8[7:0])
dst[MAX:256] := 0
AVX
Arithmetic
Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[63:0] := a[127:64] + a[63:0]
dst[127:64] := b[127:64] + b[63:0]
dst[191:128] := a[255:192] + a[191:128]
dst[255:192] := b[255:192] + b[191:128]
dst[MAX:256] := 0
AVX
Arithmetic
Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[31:0] := a[63:32] + a[31:0]
dst[63:32] := a[127:96] + a[95:64]
dst[95:64] := b[63:32] + b[31:0]
dst[127:96] := b[127:96] + b[95:64]
dst[159:128] := a[191:160] + a[159:128]
dst[191:160] := a[255:224] + a[223:192]
dst[223:192] := b[191:160] + b[159:128]
dst[255:224] := b[255:224] + b[223:192]
dst[MAX:256] := 0
AVX
Arithmetic
Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[63:0] := a[63:0] - a[127:64]
dst[127:64] := b[63:0] - b[127:64]
dst[191:128] := a[191:128] - a[255:192]
dst[255:192] := b[191:128] - b[255:192]
dst[MAX:256] := 0
AVX
Arithmetic
Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[31:0] := a[31:0] - a[63:32]
dst[63:32] := a[95:64] - a[127:96]
dst[95:64] := b[31:0] - b[63:32]
dst[127:96] := b[95:64] - b[127:96]
dst[159:128] := a[159:128] - a[191:160]
dst[191:160] := a[223:192] - a[255:224]
dst[223:192] := b[159:128] - b[191:160]
dst[255:224] := b[223:192] - b[255:224]
dst[MAX:256] := 0
AVX
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Arithmetic
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Logical
Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value.
IF ((a[255:0] AND b[255:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[255:0]) AND b[255:0]) == 0)
CF := 1
ELSE
CF := 0
FI
RETURN ZF
AVX
Logical
Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value.
IF ((a[255:0] AND b[255:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[255:0]) AND b[255:0]) == 0)
CF := 1
ELSE
CF := 0
FI
RETURN CF
AVX
Logical
Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
IF ((a[255:0] AND b[255:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[255:0]) AND b[255:0]) == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
AVX
Logical
Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
tmp[255:0] := a[255:0] AND b[255:0]
IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[255:0] := (NOT a[255:0]) AND b[255:0]
IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
CF := 1
ELSE
CF := 0
FI
dst := ZF
AVX
Logical
Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
tmp[255:0] := a[255:0] AND b[255:0]
IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[255:0] := (NOT a[255:0]) AND b[255:0]
IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
CF := 1
ELSE
CF := 0
FI
dst := CF
AVX
Logical
Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
tmp[255:0] := a[255:0] AND b[255:0]
IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[255:0] := (NOT a[255:0]) AND b[255:0]
IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
AVX
Logical
Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
tmp[127:0] := a[127:0] AND b[127:0]
IF (tmp[63] == 0 && tmp[127] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[127:0] := (NOT a[127:0]) AND b[127:0]
IF (tmp[63] == 0 && tmp[127] == 0)
CF := 1
ELSE
CF := 0
FI
dst := ZF
AVX
Logical
Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
tmp[127:0] := a[127:0] AND b[127:0]
IF (tmp[63] == 0 && tmp[127] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[127:0] := (NOT a[127:0]) AND b[127:0]
IF (tmp[63] == 0 && tmp[127] == 0)
CF := 1
ELSE
CF := 0
FI
dst := CF
AVX
Logical
Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
tmp[127:0] := a[127:0] AND b[127:0]
IF (tmp[63] == 0 && tmp[127] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[127:0] := (NOT a[127:0]) AND b[127:0]
IF (tmp[63] == 0 && tmp[127] == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
AVX
Logical
Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
tmp[255:0] := a[255:0] AND b[255:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[255:0] := (NOT a[255:0]) AND b[255:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
CF := 1
ELSE
CF := 0
FI
dst := ZF
AVX
Logical
Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
tmp[255:0] := a[255:0] AND b[255:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[255:0] := (NOT a[255:0]) AND b[255:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
CF := 1
ELSE
CF := 0
FI
dst := CF
AVX
Logical
Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
tmp[255:0] := a[255:0] AND b[255:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[255:0] := (NOT a[255:0]) AND b[255:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \
tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
AVX
Logical
Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value.
tmp[127:0] := a[127:0] AND b[127:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[127:0] := (NOT a[127:0]) AND b[127:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
CF := 1
ELSE
CF := 0
FI
dst := ZF
AVX
Logical
Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value.
tmp[127:0] := a[127:0] AND b[127:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[127:0] := (NOT a[127:0]) AND b[127:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
CF := 1
ELSE
CF := 0
FI
dst := CF
AVX
Logical
Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
tmp[127:0] := a[127:0] AND b[127:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
ZF := 1
ELSE
ZF := 0
FI
tmp[127:0] := (NOT a[127:0]) AND b[127:0]
IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
AVX
Logical
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF imm8[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX
Swizzle
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF imm8[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX
Swizzle
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF mask[i+63]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX
Swizzle
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF mask[i+31]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX
Swizzle
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst".
dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
dst[MAX:256] := 0
AVX
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(b[127:0], imm8[5:4])
dst[127:96] := SELECT4(b[127:0], imm8[7:6])
dst[159:128] := SELECT4(a[255:128], imm8[1:0])
dst[191:160] := SELECT4(a[255:128], imm8[3:2])
dst[223:192] := SELECT4(b[255:128], imm8[5:4])
dst[255:224] := SELECT4(b[255:128], imm8[7:6])
dst[MAX:256] := 0
AVX
Swizzle
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX
Swizzle
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX
Swizzle
Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX
Swizzle
Extract a 32-bit integer from "a", selected with "index", and store the result in "dst".
dst[31:0] := (a[255:0] >> (index[2:0] * 32))[31:0]
AVX
Swizzle
Extract a 64-bit integer from "a", selected with "index", and store the result in "dst".
dst[63:0] := (a[255:0] >> (index[1:0] * 64))[63:0]
AVX
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], b[1:0])
dst[63:32] := SELECT4(a[127:0], b[33:32])
dst[95:64] := SELECT4(a[127:0], b[65:64])
dst[127:96] := SELECT4(a[127:0], b[97:96])
dst[159:128] := SELECT4(a[255:128], b[129:128])
dst[191:160] := SELECT4(a[255:128], b[161:160])
dst[223:192] := SELECT4(a[255:128], b[193:192])
dst[255:224] := SELECT4(a[255:128], b[225:224])
dst[MAX:256] := 0
AVX
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], b[1:0])
dst[63:32] := SELECT4(a[127:0], b[33:32])
dst[95:64] := SELECT4(a[127:0], b[65:64])
dst[127:96] := SELECT4(a[127:0], b[97:96])
dst[MAX:128] := 0
AVX
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
dst[159:128] := SELECT4(a[255:128], imm8[1:0])
dst[191:160] := SELECT4(a[255:128], imm8[3:2])
dst[223:192] := SELECT4(a[255:128], imm8[5:4])
dst[255:224] := SELECT4(a[255:128], imm8[7:6])
dst[MAX:256] := 0
AVX
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
dst[MAX:128] := 0
AVX
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".
IF (b[1] == 0) dst[63:0] := a[63:0]; FI
IF (b[1] == 1) dst[63:0] := a[127:64]; FI
IF (b[65] == 0) dst[127:64] := a[63:0]; FI
IF (b[65] == 1) dst[127:64] := a[127:64]; FI
IF (b[129] == 0) dst[191:128] := a[191:128]; FI
IF (b[129] == 1) dst[191:128] := a[255:192]; FI
IF (b[193] == 0) dst[255:192] := a[191:128]; FI
IF (b[193] == 1) dst[255:192] := a[255:192]; FI
dst[MAX:256] := 0
AVX
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst".
IF (b[1] == 0) dst[63:0] := a[63:0]; FI
IF (b[1] == 1) dst[63:0] := a[127:64]; FI
IF (b[65] == 0) dst[127:64] := a[63:0]; FI
IF (b[65] == 1) dst[127:64] := a[127:64]; FI
dst[MAX:128] := 0
AVX
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI
IF (imm8[2] == 0) dst[191:128] := a[191:128]; FI
IF (imm8[2] == 1) dst[191:128] := a[255:192]; FI
IF (imm8[3] == 0) dst[255:192] := a[191:128]; FI
IF (imm8[3] == 1) dst[255:192] := a[255:192]; FI
dst[MAX:256] := 0
AVX
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst".
IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI
dst[MAX:128] := 0
AVX
Swizzle
Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
AVX
Swizzle
Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
AVX
Swizzle
Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
AVX
Swizzle
Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE (imm8[0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX
Swizzle
Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE imm8[0] OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX
Swizzle
Copy "a" to "dst", then insert 128 bits from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE (imm8[0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX
Swizzle
Copy "a" to "dst", and insert the 8-bit integer "i" into "dst" at the location specified by "index".
dst[255:0] := a[255:0]
sel := index[4:0]*8
dst[sel+7:sel] := i[7:0]
AVX
Swizzle
Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "index".
dst[255:0] := a[255:0]
sel := index[3:0]*16
dst[sel+15:sel] := i[15:0]
AVX
Swizzle
Copy "a" to "dst", and insert the 32-bit integer "i" into "dst" at the location specified by "index".
dst[255:0] := a[255:0]
sel := index[2:0]*32
dst[sel+31:sel] := i[31:0]
AVX
Swizzle
Copy "a" to "dst", and insert the 64-bit integer "i" into "dst" at the location specified by "index".
dst[255:0] := a[255:0]
sel := index[1:0]*64
dst[sel+63:sel] := i[63:0]
AVX
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX
Swizzle
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 3
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 3
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".
[round_note]
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ROUND(a[i+63:i], rounding)
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed single-precision floating-point elements in "dst".
[round_note]
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ROUND(a[i+31:i], rounding)
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := FLOOR(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := CEIL(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := FLOOR(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := CEIL(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ( a[i+63:i] OP b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
dst[MAX:128] := 0
AVX
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ( a[i+63:i] OP b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] OP b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
dst[MAX:128] := 0
AVX
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ( a[i+31:i] OP b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
dst[63:0] := ( a[63:0] OP b[63:0] ) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
dst[31:0] := ( a[31:0] OP b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX
Compare
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*32
m := j*64
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
ENDFOR
dst[MAX:128] := 0
AVX
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 32*j
dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
ENDFOR
dst[MAX:256] := 0
AVX
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:128] := 0
AVX
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
ENDFOR
dst[MAX:128] := 0
AVX
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Convert
Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".
dst[31:0] := a[31:0]
AVX
Convert
Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".
dst[63:0] := a[63:0]
AVX
Convert
Copy the lower 32-bit integer in "a" to "dst".
dst[31:0] := a[31:0]
AVX
Convert
Zero the contents of all XMM or YMM registers.
YMM0[MAX:0] := 0
YMM1[MAX:0] := 0
YMM2[MAX:0] := 0
YMM3[MAX:0] := 0
YMM4[MAX:0] := 0
YMM5[MAX:0] := 0
YMM6[MAX:0] := 0
YMM7[MAX:0] := 0
IF _64_BIT_MODE
YMM8[MAX:0] := 0
YMM9[MAX:0] := 0
YMM10[MAX:0] := 0
YMM11[MAX:0] := 0
YMM12[MAX:0] := 0
YMM13[MAX:0] := 0
YMM14[MAX:0] := 0
YMM15[MAX:0] := 0
FI
AVX
General Support
Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.
YMM0[MAX:128] := 0
YMM1[MAX:128] := 0
YMM2[MAX:128] := 0
YMM3[MAX:128] := 0
YMM4[MAX:128] := 0
YMM5[MAX:128] := 0
YMM6[MAX:128] := 0
YMM7[MAX:128] := 0
IF _64_BIT_MODE
YMM8[MAX:128] := 0
YMM9[MAX:128] := 0
YMM10[MAX:128] := 0
YMM11[MAX:128] := 0
YMM12[MAX:128] := 0
YMM13[MAX:128] := 0
YMM14[MAX:128] := 0
YMM15[MAX:128] := 0
FI
AVX
General Support
Return vector of type __m256 with undefined elements.
AVX
General Support
Return vector of type __m256d with undefined elements.
AVX
General Support
Return vector of type __m256i with undefined elements.
AVX
General Support
Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst".
tmp[31:0] := MEM[mem_addr+31:mem_addr]
FOR j := 0 to 7
i := j*32
dst[i+31:i] := tmp[31:0]
ENDFOR
dst[MAX:256] := 0
AVX
Load
Swizzle
Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst".
tmp[31:0] := MEM[mem_addr+31:mem_addr]
FOR j := 0 to 3
i := j*32
dst[i+31:i] := tmp[31:0]
ENDFOR
dst[MAX:128] := 0
AVX
Load
Swizzle
Broadcast a double-precision (64-bit) floating-point element from memory to all elements of "dst".
tmp[63:0] := MEM[mem_addr+63:mem_addr]
FOR j := 0 to 3
i := j*64
dst[i+63:i] := tmp[63:0]
ENDFOR
dst[MAX:256] := 0
AVX
Load
Swizzle
Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of "dst".
tmp[127:0] := MEM[mem_addr+127:mem_addr]
dst[127:0] := tmp[127:0]
dst[255:128] := tmp[127:0]
dst[MAX:256] := 0
AVX
Load
Swizzle
Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of "dst".
tmp[127:0] := MEM[mem_addr+127:mem_addr]
dst[127:0] := tmp[127:0]
dst[255:128] := tmp[127:0]
dst[MAX:256] := 0
AVX
Load
Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load 256-bits of integer data from memory into "dst".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load 256-bits of integer data from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).
FOR j := 0 to 3
i := j*64
IF mask[i+63]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).
FOR j := 0 to 1
i := j*64
IF mask[i+63]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).
FOR j := 0 to 7
i := j*32
IF mask[i+31]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set).
FOR j := 0 to 3
i := j*32
IF mask[i+31]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX
Load
Load 256-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm256_loadu_si256" when the data crosses a cache line boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX
Load
Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value in "dst".
"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.
dst[127:0] := MEM[loaddr+127:loaddr]
dst[255:128] := MEM[hiaddr+127:hiaddr]
dst[MAX:256] := 0
AVX
Load
Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value in "dst".
"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.
dst[127:0] := MEM[loaddr+127:loaddr]
dst[255:128] := MEM[hiaddr+127:hiaddr]
dst[MAX:256] := 0
AVX
Load
Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value in "dst".
"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.
dst[127:0] := MEM[loaddr+127:loaddr]
dst[255:128] := MEM[hiaddr+127:hiaddr]
dst[MAX:256] := 0
AVX
Load
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits of integer data from "a" into memory.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits of integer data from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask".
FOR j := 0 to 3
i := j*64
IF mask[i+63]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask".
FOR j := 0 to 1
i := j*64
IF mask[i+63]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask".
FOR j := 0 to 7
i := j*32
IF mask[i+31]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask".
FOR j := 0 to 3
i := j*32
IF mask[i+31]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX
Store
Store 256-bits of integer data from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX
Store
Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory two different 128-bit locations.
"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.
MEM[loaddr+127:loaddr] := a[127:0]
MEM[hiaddr+127:hiaddr] := a[255:128]
AVX
Store
Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory two different 128-bit locations.
"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.
MEM[loaddr+127:loaddr] := a[127:0]
MEM[hiaddr+127:hiaddr] := a[255:128]
AVX
Store
Store the high and low 128-bit halves (each composed of integer data) from "a" into memory two different 128-bit locations.
"hiaddr" and "loaddr" do not need to be aligned on any particular boundary.
MEM[loaddr+127:loaddr] := a[127:0]
MEM[hiaddr+127:hiaddr] := a[255:128]
AVX
Store
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
dst[31:0] := a[63:32]
dst[63:32] := a[63:32]
dst[95:64] := a[127:96]
dst[127:96] := a[127:96]
dst[159:128] := a[191:160]
dst[191:160] := a[191:160]
dst[223:192] := a[255:224]
dst[255:224] := a[255:224]
dst[MAX:256] := 0
AVX
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
dst[31:0] := a[31:0]
dst[63:32] := a[31:0]
dst[95:64] := a[95:64]
dst[127:96] := a[95:64]
dst[159:128] := a[159:128]
dst[191:160] := a[159:128]
dst[223:192] := a[223:192]
dst[255:224] := a[223:192]
dst[MAX:256] := 0
AVX
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst".
dst[63:0] := a[63:0]
dst[127:64] := a[63:0]
dst[191:128] := a[191:128]
dst[255:192] := a[191:128]
dst[MAX:256] := 0
AVX
Move
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := 1.0 / a[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SQRT(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SQRT(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX
Elementary Math Functions
Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".
FOR j := 0 to 3
i := j*64
IF a[i+63]
dst[j] := 1
ELSE
dst[j] := 0
FI
ENDFOR
dst[MAX:4] := 0
AVX
Miscellaneous
Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".
FOR j := 0 to 7
i := j*32
IF a[i+31]
dst[j] := 1
ELSE
dst[j] := 0
FI
ENDFOR
dst[MAX:8] := 0
AVX
Miscellaneous
Return vector of type __m256d with all elements set to zero.
dst[MAX:0] := 0
AVX
Set
Return vector of type __m256 with all elements set to zero.
dst[MAX:0] := 0
AVX
Set
Return vector of type __m256i with all elements set to zero.
dst[MAX:0] := 0
AVX
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
dst[191:128] := e2
dst[255:192] := e3
dst[MAX:256] := 0
AVX
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
dst[95:64] := e2
dst[127:96] := e3
dst[159:128] := e4
dst[191:160] := e5
dst[223:192] := e6
dst[255:224] := e7
dst[MAX:256] := 0
AVX
Set
Set packed 8-bit integers in "dst" with the supplied values.
dst[7:0] := e0
dst[15:8] := e1
dst[23:16] := e2
dst[31:24] := e3
dst[39:32] := e4
dst[47:40] := e5
dst[55:48] := e6
dst[63:56] := e7
dst[71:64] := e8
dst[79:72] := e9
dst[87:80] := e10
dst[95:88] := e11
dst[103:96] := e12
dst[111:104] := e13
dst[119:112] := e14
dst[127:120] := e15
dst[135:128] := e16
dst[143:136] := e17
dst[151:144] := e18
dst[159:152] := e19
dst[167:160] := e20
dst[175:168] := e21
dst[183:176] := e22
dst[191:184] := e23
dst[199:192] := e24
dst[207:200] := e25
dst[215:208] := e26
dst[223:216] := e27
dst[231:224] := e28
dst[239:232] := e29
dst[247:240] := e30
dst[255:248] := e31
dst[MAX:256] := 0
AVX
Set
Set packed 16-bit integers in "dst" with the supplied values.
dst[15:0] := e0
dst[31:16] := e1
dst[47:32] := e2
dst[63:48] := e3
dst[79:64] := e4
dst[95:80] := e5
dst[111:96] := e6
dst[127:112] := e7
dst[143:128] := e8
dst[159:144] := e9
dst[175:160] := e10
dst[191:176] := e11
dst[207:192] := e12
dst[223:208] := e13
dst[239:224] := e14
dst[255:240] := e15
dst[MAX:256] := 0
AVX
Set
Set packed 32-bit integers in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
dst[95:64] := e2
dst[127:96] := e3
dst[159:128] := e4
dst[191:160] := e5
dst[223:192] := e6
dst[255:224] := e7
dst[MAX:256] := 0
AVX
Set
Set packed 64-bit integers in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
dst[191:128] := e2
dst[255:192] := e3
dst[MAX:256] := 0
AVX
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst[63:0] := e3
dst[127:64] := e2
dst[191:128] := e1
dst[255:192] := e0
dst[MAX:256] := 0
AVX
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst[31:0] := e7
dst[63:32] := e6
dst[95:64] := e5
dst[127:96] := e4
dst[159:128] := e3
dst[191:160] := e2
dst[223:192] := e1
dst[255:224] := e0
dst[MAX:256] := 0
AVX
Set
Set packed 8-bit integers in "dst" with the supplied values in reverse order.
dst[7:0] := e31
dst[15:8] := e30
dst[23:16] := e29
dst[31:24] := e28
dst[39:32] := e27
dst[47:40] := e26
dst[55:48] := e25
dst[63:56] := e24
dst[71:64] := e23
dst[79:72] := e22
dst[87:80] := e21
dst[95:88] := e20
dst[103:96] := e19
dst[111:104] := e18
dst[119:112] := e17
dst[127:120] := e16
dst[135:128] := e15
dst[143:136] := e14
dst[151:144] := e13
dst[159:152] := e12
dst[167:160] := e11
dst[175:168] := e10
dst[183:176] := e9
dst[191:184] := e8
dst[199:192] := e7
dst[207:200] := e6
dst[215:208] := e5
dst[223:216] := e4
dst[231:224] := e3
dst[239:232] := e2
dst[247:240] := e1
dst[255:248] := e0
dst[MAX:256] := 0
AVX
Set
Set packed 16-bit integers in "dst" with the supplied values in reverse order.
dst[15:0] := e15
dst[31:16] := e14
dst[47:32] := e13
dst[63:48] := e12
dst[79:64] := e11
dst[95:80] := e10
dst[111:96] := e9
dst[127:112] := e8
dst[143:128] := e7
dst[159:144] := e6
dst[175:160] := e5
dst[191:176] := e4
dst[207:192] := e3
dst[223:208] := e2
dst[239:224] := e1
dst[255:240] := e0
dst[MAX:256] := 0
AVX
Set
Set packed 32-bit integers in "dst" with the supplied values in reverse order.
dst[31:0] := e7
dst[63:32] := e6
dst[95:64] := e5
dst[127:96] := e4
dst[159:128] := e3
dst[191:160] := e2
dst[223:192] := e1
dst[255:224] := e0
dst[MAX:256] := 0
AVX
Set
Set packed 64-bit integers in "dst" with the supplied values in reverse order.
dst[63:0] := e3
dst[127:64] := e2
dst[191:128] := e1
dst[255:192] := e0
dst[MAX:256] := 0
AVX
Set
Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:256] := 0
AVX
Set
Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:256] := 0
AVX
Set
Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastb".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
dst[MAX:256] := 0
AVX
Set
Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate the "vpbroadcastw".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
dst[MAX:256] := 0
AVX
Set
Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastd".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:256] := 0
AVX
Set
Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:256] := 0
AVX
Set
Set packed __m256 vector "dst" with the supplied values.
dst[127:0] := lo[127:0]
dst[255:128] := hi[127:0]
dst[MAX:256] := 0
AVX
Set
Set packed __m256d vector "dst" with the supplied values.
dst[127:0] := lo[127:0]
dst[255:128] := hi[127:0]
dst[MAX:256] := 0
AVX
Set
Set packed __m256i vector "dst" with the supplied values.
dst[127:0] := lo[127:0]
dst[255:128] := hi[127:0]
dst[MAX:256] := 0
AVX
Set
Set packed __m256 vector "dst" with the supplied values.
dst[127:0] := lo[127:0]
dst[255:128] := hi[127:0]
dst[MAX:256] := 0
AVX
Set
Set packed __m256d vector "dst" with the supplied values.
dst[127:0] := lo[127:0]
dst[255:128] := hi[127:0]
dst[MAX:256] := 0
AVX
Set
Set packed __m256i vector "dst" with the supplied values.
dst[127:0] := lo[127:0]
dst[255:128] := hi[127:0]
dst[MAX:256] := 0
AVX
Set
Cast vector of type __m256d to type __m256.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256 to type __m256d.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256 to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256i to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX
Cast
Extract an 8-bit integer from "a", selected with "index", and store the result in "dst".
dst[7:0] := (a[255:0] >> (index[4:0] * 8))[7:0]
AVX2
Swizzle
Extract a 16-bit integer from "a", selected with "index", and store the result in "dst".
dst[15:0] := (a[255:0] >> (index[3:0] * 16))[15:0]
AVX2
Swizzle
Blend packed 16-bit integers from "a" and "b" within 128-bit lanes using control mask "imm8", and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF imm8[j%8]
dst[i+15:i] := b[i+15:i]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Blend packed 32-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF imm8[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Blend packed 32-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF imm8[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst".
FOR j := 0 to 31
i := j*8
IF mask[i+7]
dst[i+7:i] := b[i+7:i]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast the low packed 8-bit integer from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Broadcast the low packed 8-bit integer from "a" to all elements of "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast the low packed 32-bit integer from "a" to all elements of "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Broadcast the low packed 32-bit integer from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast the low packed 64-bit integer from "a" to all elements of "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Broadcast the low packed 64-bit integer from "a" to all elements of "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst".
dst[127:0] := a[127:0]
dst[255:128] := a[127:0]
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst".
dst[127:0] := a[127:0]
dst[255:128] := a[127:0]
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Broadcast the low packed 16-bit integer from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
dst[MAX:128] := 0
AVX2
Swizzle
Broadcast the low packed 16-bit integer from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX2
Swizzle
Copy "a" to "dst", then insert 128 bits (composed of integer data) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE (imm8[0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src1, src2, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src1[127:0]
1: tmp[127:0] := src1[255:128]
2: tmp[127:0] := src2[127:0]
3: tmp[127:0] := src2[255:128]
ESAC
IF control[3]
tmp[127:0] := 0
FI
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0])
dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4])
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
dst[63:0] := SELECT4(a[255:0], imm8[1:0])
dst[127:64] := SELECT4(a[255:0], imm8[3:2])
dst[191:128] := SELECT4(a[255:0], imm8[5:4])
dst[255:192] := SELECT4(a[255:0], imm8[7:6])
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
dst[63:0] := SELECT4(a[255:0], imm8[1:0])
dst[127:64] := SELECT4(a[255:0], imm8[3:2])
dst[191:128] := SELECT4(a[255:0], imm8[5:4])
dst[255:192] := SELECT4(a[255:0], imm8[7:6])
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
dst[159:128] := SELECT4(a[255:128], imm8[1:0])
dst[191:160] := SELECT4(a[255:128], imm8[3:2])
dst[223:192] := SELECT4(a[255:128], imm8[5:4])
dst[255:224] := SELECT4(a[255:128], imm8[7:6])
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 8-bit integers in "a" within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
FOR j := 0 to 15
i := j*8
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[3:0] := b[i+3:i]
dst[i+7:i] := a[index*8+7:index*8]
FI
IF b[128+i+7] == 1
dst[128+i+7:128+i] := 0
ELSE
index[3:0] := b[128+i+3:128+i]
dst[128+i+7:128+i] := a[128+index*8+7:128+index*8]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst".
dst[63:0] := a[63:0]
dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
dst[191:128] := a[191:128]
dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
dst[MAX:256] := 0
AVX2
Swizzle
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst".
dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
dst[127:64] := a[127:64]
dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
dst[255:192] := a[255:192]
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
AVX2
Swizzle
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := ABS(a[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ABS(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ABS(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Special Math Functions
Add packed 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
dst[15:0] := a[31:16] + a[15:0]
dst[31:16] := a[63:48] + a[47:32]
dst[47:32] := a[95:80] + a[79:64]
dst[63:48] := a[127:112] + a[111:96]
dst[79:64] := b[31:16] + b[15:0]
dst[95:80] := b[63:48] + b[47:32]
dst[111:96] := b[95:80] + b[79:64]
dst[127:112] := b[127:112] + b[111:96]
dst[143:128] := a[159:144] + a[143:128]
dst[159:144] := a[191:176] + a[175:160]
dst[175:160] := a[223:208] + a[207:192]
dst[191:176] := a[255:240] + a[239:224]
dst[207:192] := b[159:144] + b[143:128]
dst[223:208] := b[191:176] + b[175:160]
dst[239:224] := b[223:208] + b[207:192]
dst[255:240] := b[255:240] + b[239:224]
dst[MAX:256] := 0
AVX2
Arithmetic
Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
dst[31:0] := a[63:32] + a[31:0]
dst[63:32] := a[127:96] + a[95:64]
dst[95:64] := b[63:32] + b[31:0]
dst[127:96] := b[127:96] + b[95:64]
dst[159:128] := a[191:160] + a[159:128]
dst[191:160] := a[255:224] + a[223:192]
dst[223:192] := b[191:160] + b[159:128]
dst[255:224] := b[255:224] + b[223:192]
dst[MAX:256] := 0
AVX2
Arithmetic
Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
dst[15:0] := Saturate16(a[31:16] + a[15:0])
dst[31:16] := Saturate16(a[63:48] + a[47:32])
dst[47:32] := Saturate16(a[95:80] + a[79:64])
dst[63:48] := Saturate16(a[127:112] + a[111:96])
dst[79:64] := Saturate16(b[31:16] + b[15:0])
dst[95:80] := Saturate16(b[63:48] + b[47:32])
dst[111:96] := Saturate16(b[95:80] + b[79:64])
dst[127:112] := Saturate16(b[127:112] + b[111:96])
dst[143:128] := Saturate16(a[159:144] + a[143:128])
dst[159:144] := Saturate16(a[191:176] + a[175:160])
dst[175:160] := Saturate16(a[223:208] + a[207:192])
dst[191:176] := Saturate16(a[255:240] + a[239:224])
dst[207:192] := Saturate16(b[159:144] + b[143:128])
dst[223:208] := Saturate16(b[191:176] + b[175:160])
dst[239:224] := Saturate16(b[223:208] + b[207:192])
dst[255:240] := Saturate16(b[255:240] + b[239:224])
dst[MAX:256] := 0
AVX2
Arithmetic
Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
dst[15:0] := a[15:0] - a[31:16]
dst[31:16] := a[47:32] - a[63:48]
dst[47:32] := a[79:64] - a[95:80]
dst[63:48] := a[111:96] - a[127:112]
dst[79:64] := b[15:0] - b[31:16]
dst[95:80] := b[47:32] - b[63:48]
dst[111:96] := b[79:64] - b[95:80]
dst[127:112] := b[111:96] - b[127:112]
dst[143:128] := a[143:128] - a[159:144]
dst[159:144] := a[175:160] - a[191:176]
dst[175:160] := a[207:192] - a[223:208]
dst[191:176] := a[239:224] - a[255:240]
dst[207:192] := b[143:128] - b[159:144]
dst[223:208] := b[175:160] - b[191:176]
dst[239:224] := b[207:192] - b[223:208]
dst[255:240] := b[239:224] - b[255:240]
dst[MAX:256] := 0
AVX2
Arithmetic
Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
dst[31:0] := a[31:0] - a[63:32]
dst[63:32] := a[95:64] - a[127:96]
dst[95:64] := b[31:0] - b[63:32]
dst[127:96] := b[95:64] - b[127:96]
dst[159:128] := a[159:128] - a[191:160]
dst[191:160] := a[223:192] - a[255:224]
dst[223:192] := b[159:128] - b[191:160]
dst[255:224] := b[223:192] - b[255:224]
dst[MAX:256] := 0
AVX2
Arithmetic
Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
dst[15:0] := Saturate16(a[15:0] - a[31:16])
dst[31:16] := Saturate16(a[47:32] - a[63:48])
dst[47:32] := Saturate16(a[79:64] - a[95:80])
dst[63:48] := Saturate16(a[111:96] - a[127:112])
dst[79:64] := Saturate16(b[15:0] - b[31:16])
dst[95:80] := Saturate16(b[47:32] - b[63:48])
dst[111:96] := Saturate16(b[79:64] - b[95:80])
dst[127:112] := Saturate16(b[111:96] - b[127:112])
dst[143:128] := Saturate16(a[143:128] - a[159:144])
dst[159:144] := Saturate16(a[175:160] - a[191:176])
dst[175:160] := Saturate16(a[207:192] - a[223:208])
dst[191:176] := Saturate16(a[239:224] - a[255:240])
dst[207:192] := Saturate16(b[143:128] - b[159:144])
dst[223:208] := Saturate16(b[175:160] - b[191:176])
dst[239:224] := Saturate16(b[207:192] - b[223:208])
dst[255:240] := Saturate16(b[239:224] - b[255:240])
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 15
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 15
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
FOR j := 0 to 15
i := j*16
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
FOR j := 0 to 15
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Multiply the packed signed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".
FOR j := 0 to 7
i := j*32
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".
FOR j := 0 to 31
i := j*8
tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
ENDFOR
FOR j := 0 to 3
i := j*64
dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \
tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56]
dst[i+63:i+16] := 0
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Negate packed signed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 31
i := j*8
IF b[i+7:i] < 0
dst[i+7:i] := -(a[i+7:i])
ELSE IF b[i+7:i] == 0
dst[i+7:i] := 0
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Negate packed signed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 15
i := j*16
IF b[i+15:i] < 0
dst[i+15:i] := -(a[i+15:i])
ELSE IF b[i+15:i] == 0
dst[i+15:i] := 0
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Negate packed signed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 7
i := j*32
IF b[i+31:i] < 0
dst[i+31:i] := -(a[i+31:i])
ELSE IF b[i+31:i] == 0
dst[i+31:i] := 0
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Arithmetic
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".
FOR j := 0 to 1
i := j*128
tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
dst[i+127:i] := tmp[127:0]
ENDFOR
dst[MAX:256] := 0
AVX2
Miscellaneous
Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
FOR j := 0 to 31
i := j*8
dst[j] := a[i+7]
ENDFOR
AVX2
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
Eight SADs are performed for each 128-bit lane using one quadruplet from "b" and eight quadruplets from "a". One quadruplet is selected from "b" starting at on the offset specified in "imm8". Eight quadruplets are formed from sequential 8-bit integers selected from "a" starting at the offset specified in "imm8".
DEFINE MPSADBW(a[127:0], b[127:0], imm8[2:0]) {
a_offset := imm8[2]*32
b_offset := imm8[1:0]*32
FOR j := 0 to 7
i := j*8
k := a_offset+i
l := b_offset
tmp[i*2+15:i*2] := ABS(Signed(a[k+7:k] - b[l+7:l])) + ABS(Signed(a[k+15:k+8] - b[l+15:l+8])) + \
ABS(Signed(a[k+23:k+16] - b[l+23:l+16])) + ABS(Signed(a[k+31:k+24] - b[l+31:l+24]))
ENDFOR
RETURN tmp[127:0]
}
dst[127:0] := MPSADBW(a[127:0], b[127:0], imm8[2:0])
dst[255:128] := MPSADBW(a[255:128], b[255:128], imm8[5:3])
dst[MAX:256] := 0
AVX2
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
dst[7:0] := Saturate8(a[15:0])
dst[15:8] := Saturate8(a[31:16])
dst[23:16] := Saturate8(a[47:32])
dst[31:24] := Saturate8(a[63:48])
dst[39:32] := Saturate8(a[79:64])
dst[47:40] := Saturate8(a[95:80])
dst[55:48] := Saturate8(a[111:96])
dst[63:56] := Saturate8(a[127:112])
dst[71:64] := Saturate8(b[15:0])
dst[79:72] := Saturate8(b[31:16])
dst[87:80] := Saturate8(b[47:32])
dst[95:88] := Saturate8(b[63:48])
dst[103:96] := Saturate8(b[79:64])
dst[111:104] := Saturate8(b[95:80])
dst[119:112] := Saturate8(b[111:96])
dst[127:120] := Saturate8(b[127:112])
dst[135:128] := Saturate8(a[143:128])
dst[143:136] := Saturate8(a[159:144])
dst[151:144] := Saturate8(a[175:160])
dst[159:152] := Saturate8(a[191:176])
dst[167:160] := Saturate8(a[207:192])
dst[175:168] := Saturate8(a[223:208])
dst[183:176] := Saturate8(a[239:224])
dst[191:184] := Saturate8(a[255:240])
dst[199:192] := Saturate8(b[143:128])
dst[207:200] := Saturate8(b[159:144])
dst[215:208] := Saturate8(b[175:160])
dst[223:216] := Saturate8(b[191:176])
dst[231:224] := Saturate8(b[207:192])
dst[239:232] := Saturate8(b[223:208])
dst[247:240] := Saturate8(b[239:224])
dst[255:248] := Saturate8(b[255:240])
dst[MAX:256] := 0
AVX2
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
dst[15:0] := Saturate16(a[31:0])
dst[31:16] := Saturate16(a[63:32])
dst[47:32] := Saturate16(a[95:64])
dst[63:48] := Saturate16(a[127:96])
dst[79:64] := Saturate16(b[31:0])
dst[95:80] := Saturate16(b[63:32])
dst[111:96] := Saturate16(b[95:64])
dst[127:112] := Saturate16(b[127:96])
dst[143:128] := Saturate16(a[159:128])
dst[159:144] := Saturate16(a[191:160])
dst[175:160] := Saturate16(a[223:192])
dst[191:176] := Saturate16(a[255:224])
dst[207:192] := Saturate16(b[159:128])
dst[223:208] := Saturate16(b[191:160])
dst[239:224] := Saturate16(b[223:192])
dst[255:240] := Saturate16(b[255:224])
dst[MAX:256] := 0
AVX2
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
dst[7:0] := SaturateU8(a[15:0])
dst[15:8] := SaturateU8(a[31:16])
dst[23:16] := SaturateU8(a[47:32])
dst[31:24] := SaturateU8(a[63:48])
dst[39:32] := SaturateU8(a[79:64])
dst[47:40] := SaturateU8(a[95:80])
dst[55:48] := SaturateU8(a[111:96])
dst[63:56] := SaturateU8(a[127:112])
dst[71:64] := SaturateU8(b[15:0])
dst[79:72] := SaturateU8(b[31:16])
dst[87:80] := SaturateU8(b[47:32])
dst[95:88] := SaturateU8(b[63:48])
dst[103:96] := SaturateU8(b[79:64])
dst[111:104] := SaturateU8(b[95:80])
dst[119:112] := SaturateU8(b[111:96])
dst[127:120] := SaturateU8(b[127:112])
dst[135:128] := SaturateU8(a[143:128])
dst[143:136] := SaturateU8(a[159:144])
dst[151:144] := SaturateU8(a[175:160])
dst[159:152] := SaturateU8(a[191:176])
dst[167:160] := SaturateU8(a[207:192])
dst[175:168] := SaturateU8(a[223:208])
dst[183:176] := SaturateU8(a[239:224])
dst[191:184] := SaturateU8(a[255:240])
dst[199:192] := SaturateU8(b[143:128])
dst[207:200] := SaturateU8(b[159:144])
dst[215:208] := SaturateU8(b[175:160])
dst[223:216] := SaturateU8(b[191:176])
dst[231:224] := SaturateU8(b[207:192])
dst[239:232] := SaturateU8(b[223:208])
dst[247:240] := SaturateU8(b[239:224])
dst[255:248] := SaturateU8(b[255:240])
dst[MAX:256] := 0
AVX2
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".
dst[15:0] := SaturateU16(a[31:0])
dst[31:16] := SaturateU16(a[63:32])
dst[47:32] := SaturateU16(a[95:64])
dst[63:48] := SaturateU16(a[127:96])
dst[79:64] := SaturateU16(b[31:0])
dst[95:80] := SaturateU16(b[63:32])
dst[111:96] := SaturateU16(b[95:64])
dst[127:112] := SaturateU16(b[127:96])
dst[143:128] := SaturateU16(a[159:128])
dst[159:144] := SaturateU16(a[191:160])
dst[175:160] := SaturateU16(a[223:192])
dst[191:176] := SaturateU16(a[255:224])
dst[207:192] := SaturateU16(b[159:128])
dst[223:208] := SaturateU16(b[191:160])
dst[239:224] := SaturateU16(b[223:192])
dst[255:240] := SaturateU16(b[255:224])
dst[MAX:256] := 0
AVX2
Miscellaneous
Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[255:0] := (a[255:0] AND b[255:0])
dst[MAX:256] := 0
AVX2
Logical
Compute the bitwise NOT of 256 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
dst[255:0] := ((NOT a[255:0]) AND b[255:0])
dst[MAX:256] := 0
AVX2
Logical
Compute the bitwise OR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[255:0] := (a[255:0] OR b[255:0])
dst[MAX:256] := 0
AVX2
Logical
Compute the bitwise XOR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[255:0] := (a[255:0] XOR b[255:0])
dst[MAX:256] := 0
AVX2
Logical
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ENDFOR
dst[MAX:256] := 0
AVX2
Probability/Statistics
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ENDFOR
dst[MAX:256] := 0
AVX2
Probability/Statistics
Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ( a[i+63:i] == b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 31
i := j*8
dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ( a[i+63:i] > b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
dst[MAX:256] := 0
AVX2
Compare
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j:= 0 to 7
i := 32*j
k := 16*j
dst[i+31:i] := SignExtend32(a[k+15:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j:= 0 to 3
i := 64*j
k := 16*j
dst[i+63:i] := SignExtend64(a[k+15:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j:= 0 to 3
i := 64*j
k := 32*j
dst[i+63:i] := SignExtend64(a[k+31:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := j*8
l := j*16
dst[l+15:l] := SignExtend16(a[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 8*j
dst[i+31:i] := SignExtend32(a[k+7:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 8*j
dst[i+63:i] := SignExtend64(a[k+7:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 16*j
dst[i+31:i] := ZeroExtend32(a[k+15:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j:= 0 to 3
i := 64*j
k := 16*j
dst[i+63:i] := ZeroExtend64(a[k+15:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j:= 0 to 3
i := 64*j
k := 32*j
dst[i+63:i] := ZeroExtend64(a[k+31:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := j*8
l := j*16
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 8*j
dst[i+31:i] := ZeroExtend32(a[k+7:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 8*j
dst[i+63:i] := ZeroExtend64(a[k+7:k])
ENDFOR
dst[MAX:256] := 0
AVX2
Convert
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:64] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:64] := 0
AVX2
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
IF mask[i+63]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
IF mask[i+63]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
IF mask[i+31]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
IF mask[i+31]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
AVX2
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
IF mask[i+31]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
IF mask[i+31]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
AVX2
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
IF mask[i+63]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
IF mask[i+63]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
IF mask[i+63]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
IF mask[i+63]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
IF mask[i+31]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:64] := 0
dst[MAX:64] := 0
AVX2
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
IF mask[i+31]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
IF mask[i+31]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:64] := 0
dst[MAX:64] := 0
AVX2
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
IF mask[i+31]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
IF mask[i+63]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:128] := 0
dst[MAX:128] := 0
AVX2
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
IF mask[i+63]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
AVX2
Load
Load packed 32-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).
FOR j := 0 to 3
i := j*32
IF mask[i+31]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Load packed 32-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).
FOR j := 0 to 7
i := j*32
IF mask[i+31]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Load packed 64-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).
FOR j := 0 to 1
i := j*64
IF mask[i+63]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Load
Load packed 64-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element).
FOR j := 0 to 3
i := j*64
IF mask[i+63]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Load
Load 256-bits of integer data from memory into "dst" using a non-temporal memory hint.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX2
Load
Store packed 32-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).
FOR j := 0 to 3
i := j*32
IF mask[i+31]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX2
Store
Store packed 32-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).
FOR j := 0 to 7
i := j*32
IF mask[i+31]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX2
Store
Store packed 64-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).
FOR j := 0 to 1
i := j*64
IF mask[i+63]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX2
Store
Store packed 64-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).
FOR j := 0 to 3
i := j*64
IF mask[i+63]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX2
Store
Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] << (tmp*8)
dst[255:128] := a[255:128] << (tmp*8)
dst[MAX:256] := 0
AVX2
Shift
Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] << (tmp*8)
dst[255:128] := a[255:128] << (tmp*8)
dst[MAX:256] := 0
AVX2
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] >> (tmp*8)
dst[255:128] := a[255:128] >> (tmp*8)
dst[MAX:256] := 0
AVX2
Shift
Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] >> (tmp*8)
dst[255:128] := a[255:128] >> (tmp*8)
dst[MAX:256] := 0
AVX2
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX2
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX2
Shift
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
FOR i := 0 to 1
tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 3
i := j*64
dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
FOR i := 0 to 1
tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 3
i := j*64
tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
FOR i := 0 to 1
tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 3
i := j*64
tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
tmp.dword[0] := b.dword[ imm8[1:0] ]
tmp.dword[1] := b.dword[ imm8[3:2] ]
tmp.dword[2] := b.dword[ imm8[5:4] ]
tmp.dword[3] := b.dword[ imm8[7:6] ]
FOR j := 0 to 1
i := j*64
dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
tmp.dword[0] := b.dword[ imm8[1:0] ]
tmp.dword[1] := b.dword[ imm8[3:2] ]
tmp.dword[2] := b.dword[ imm8[5:4] ]
tmp.dword[3] := b.dword[ imm8[7:6] ]
FOR j := 0 to 1
i := j*64
tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
tmp.dword[0] := b.dword[ imm8[1:0] ]
tmp.dword[1] := b.dword[ imm8[3:2] ]
tmp.dword[2] := b.dword[ imm8[5:4] ]
tmp.dword[3] := b.dword[ imm8[7:6] ]
FOR j := 0 to 1
i := j*64
tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*128
tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
tmp_dst[i+127:i] := tmp[127:0]
ENDFOR
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*128
tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
tmp_dst[i+127:i] := tmp[127:0]
ENDFOR
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := b[i+7:i]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := b[i+7:i]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := b[i+15:i]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := b[i+15:i]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
off := 16*idx[i+3:i]
dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := idx[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
off := 16*idx[i+3:i]
dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
off := 16*idx[i+3:i]
dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*16
off := 16*idx[i+3:i]
dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off]
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
off := 16*idx[i+2:i]
dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := idx[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
off := 16*idx[i+2:i]
dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
off := 16*idx[i+2:i]
dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*16
off := 16*idx[i+2:i]
dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off]
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
id := idx[i+3:i]*16
IF k[j]
dst[i+15:i] := a[id+15:id]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
id := idx[i+3:i]*16
IF k[j]
dst[i+15:i] := a[id+15:id]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*16
id := idx[i+3:i]*16
dst[i+15:i] := a[id+15:id]
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
id := idx[i+2:i]*16
IF k[j]
dst[i+15:i] := a[id+15:id]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
id := idx[i+2:i]*16
IF k[j]
dst[i+15:i] := a[id+15:id]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*16
id := idx[i+2:i]*16
dst[i+15:i] := a[id+15:id]
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a".
FOR j := 0 to 31
i := j*8
IF a[i+7]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a".
FOR j := 0 to 15
i := j*8
IF a[i+7]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := 0xFF
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := 0xFF
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := 0xFFFF
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := 0xFFFF
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a".
FOR j := 0 to 15
i := j*16
IF a[i+15]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a".
FOR j := 0 to 7
i := j*16
IF a[i+15]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[4:0] := b[i+3:i] + (j & 0x10)
dst[i+7:i] := a[index*8+7:index*8]
FI
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Swizzle
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[4:0] := b[i+3:i] + (j & 0x10)
dst[i+7:i] := a[index*8+7:index*8]
FI
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Swizzle
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[3:0] := b[i+3:i]
dst[i+7:i] := a[index*8+7:index*8]
FI
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Swizzle
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[3:0] := b[i+3:i]
dst[i+7:i] := a[index*8+7:index*8]
FI
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Swizzle
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[63:0] := a[63:0]
tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
tmp_dst[191:128] := a[191:128]
tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[63:0] := a[63:0]
tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
tmp_dst[191:128] := a[191:128]
tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[63:0] := a[63:0]
tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[63:0] := a[63:0]
tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
tmp_dst[127:64] := a[127:64]
tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
tmp_dst[255:192] := a[255:192]
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
tmp_dst[127:64] := a[127:64]
tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
tmp_dst[255:192] := a[255:192]
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
tmp_dst[127:64] := a[127:64]
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
tmp_dst[127:64] := a[127:64]
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Miscellaneous
Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Load
Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Load
Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Load
Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Load
Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Load
Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Load
Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Load
Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Load
Load 256-bits (composed of 16 packed 16-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512BW
AVX512VL
Load
Load 256-bits (composed of 32 packed 8-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512BW
AVX512VL
Load
Load 128-bits (composed of 8 packed 16-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512BW
AVX512VL
Load
Load 128-bits (composed of 16 packed 8-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512BW
AVX512VL
Load
Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Move
Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Move
Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Move
Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Move
Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Move
Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Move
Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Move
Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Move
Store packed 16-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*16
IF k[j]
MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i]
FI
ENDFOR
AVX512BW
AVX512VL
Store
Store packed 16-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*16
IF k[j]
MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i]
FI
ENDFOR
AVX512BW
AVX512VL
Store
Store packed 8-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 31
i := j*8
IF k[j]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
AVX512BW
AVX512VL
Store
Store packed 8-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*8
IF k[j]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
AVX512BW
AVX512VL
Store
Store 256-bits (composed of 16 packed 16-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512BW
AVX512VL
Store
Store 256-bits (composed of 32 packed 8-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512BW
AVX512VL
Store
Store 128-bits (composed of 8 packed 16-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512BW
AVX512VL
Store
Store 128-bits (composed of 16 packed 8-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512BW
AVX512VL
Store
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := ABS(a[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := ABS(a[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := ABS(a[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := ABS(a[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := ABS(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := ABS(a[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := ABS(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := ABS(a[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Arithmetic
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := Saturate16(a[31:0])
tmp_dst[31:16] := Saturate16(a[63:32])
tmp_dst[47:32] := Saturate16(a[95:64])
tmp_dst[63:48] := Saturate16(a[127:96])
tmp_dst[79:64] := Saturate16(b[31:0])
tmp_dst[95:80] := Saturate16(b[63:32])
tmp_dst[111:96] := Saturate16(b[95:64])
tmp_dst[127:112] := Saturate16(b[127:96])
tmp_dst[143:128] := Saturate16(a[159:128])
tmp_dst[159:144] := Saturate16(a[191:160])
tmp_dst[175:160] := Saturate16(a[223:192])
tmp_dst[191:176] := Saturate16(a[255:224])
tmp_dst[207:192] := Saturate16(b[159:128])
tmp_dst[223:208] := Saturate16(b[191:160])
tmp_dst[239:224] := Saturate16(b[223:192])
tmp_dst[255:240] := Saturate16(b[255:224])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := Saturate16(a[31:0])
tmp_dst[31:16] := Saturate16(a[63:32])
tmp_dst[47:32] := Saturate16(a[95:64])
tmp_dst[63:48] := Saturate16(a[127:96])
tmp_dst[79:64] := Saturate16(b[31:0])
tmp_dst[95:80] := Saturate16(b[63:32])
tmp_dst[111:96] := Saturate16(b[95:64])
tmp_dst[127:112] := Saturate16(b[127:96])
tmp_dst[143:128] := Saturate16(a[159:128])
tmp_dst[159:144] := Saturate16(a[191:160])
tmp_dst[175:160] := Saturate16(a[223:192])
tmp_dst[191:176] := Saturate16(a[255:224])
tmp_dst[207:192] := Saturate16(b[159:128])
tmp_dst[223:208] := Saturate16(b[191:160])
tmp_dst[239:224] := Saturate16(b[223:192])
tmp_dst[255:240] := Saturate16(b[255:224])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := Saturate16(a[31:0])
tmp_dst[31:16] := Saturate16(a[63:32])
tmp_dst[47:32] := Saturate16(a[95:64])
tmp_dst[63:48] := Saturate16(a[127:96])
tmp_dst[79:64] := Saturate16(b[31:0])
tmp_dst[95:80] := Saturate16(b[63:32])
tmp_dst[111:96] := Saturate16(b[95:64])
tmp_dst[127:112] := Saturate16(b[127:96])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := Saturate16(a[31:0])
tmp_dst[31:16] := Saturate16(a[63:32])
tmp_dst[47:32] := Saturate16(a[95:64])
tmp_dst[63:48] := Saturate16(a[127:96])
tmp_dst[79:64] := Saturate16(b[31:0])
tmp_dst[95:80] := Saturate16(b[63:32])
tmp_dst[111:96] := Saturate16(b[95:64])
tmp_dst[127:112] := Saturate16(b[127:96])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[7:0] := Saturate8(a[15:0])
tmp_dst[15:8] := Saturate8(a[31:16])
tmp_dst[23:16] := Saturate8(a[47:32])
tmp_dst[31:24] := Saturate8(a[63:48])
tmp_dst[39:32] := Saturate8(a[79:64])
tmp_dst[47:40] := Saturate8(a[95:80])
tmp_dst[55:48] := Saturate8(a[111:96])
tmp_dst[63:56] := Saturate8(a[127:112])
tmp_dst[71:64] := Saturate8(b[15:0])
tmp_dst[79:72] := Saturate8(b[31:16])
tmp_dst[87:80] := Saturate8(b[47:32])
tmp_dst[95:88] := Saturate8(b[63:48])
tmp_dst[103:96] := Saturate8(b[79:64])
tmp_dst[111:104] := Saturate8(b[95:80])
tmp_dst[119:112] := Saturate8(b[111:96])
tmp_dst[127:120] := Saturate8(b[127:112])
tmp_dst[135:128] := Saturate8(a[143:128])
tmp_dst[143:136] := Saturate8(a[159:144])
tmp_dst[151:144] := Saturate8(a[175:160])
tmp_dst[159:152] := Saturate8(a[191:176])
tmp_dst[167:160] := Saturate8(a[207:192])
tmp_dst[175:168] := Saturate8(a[223:208])
tmp_dst[183:176] := Saturate8(a[239:224])
tmp_dst[191:184] := Saturate8(a[255:240])
tmp_dst[199:192] := Saturate8(b[143:128])
tmp_dst[207:200] := Saturate8(b[159:144])
tmp_dst[215:208] := Saturate8(b[175:160])
tmp_dst[223:216] := Saturate8(b[191:176])
tmp_dst[231:224] := Saturate8(b[207:192])
tmp_dst[239:232] := Saturate8(b[223:208])
tmp_dst[247:240] := Saturate8(b[239:224])
tmp_dst[255:248] := Saturate8(b[255:240])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[7:0] := Saturate8(a[15:0])
tmp_dst[15:8] := Saturate8(a[31:16])
tmp_dst[23:16] := Saturate8(a[47:32])
tmp_dst[31:24] := Saturate8(a[63:48])
tmp_dst[39:32] := Saturate8(a[79:64])
tmp_dst[47:40] := Saturate8(a[95:80])
tmp_dst[55:48] := Saturate8(a[111:96])
tmp_dst[63:56] := Saturate8(a[127:112])
tmp_dst[71:64] := Saturate8(b[15:0])
tmp_dst[79:72] := Saturate8(b[31:16])
tmp_dst[87:80] := Saturate8(b[47:32])
tmp_dst[95:88] := Saturate8(b[63:48])
tmp_dst[103:96] := Saturate8(b[79:64])
tmp_dst[111:104] := Saturate8(b[95:80])
tmp_dst[119:112] := Saturate8(b[111:96])
tmp_dst[127:120] := Saturate8(b[127:112])
tmp_dst[135:128] := Saturate8(a[143:128])
tmp_dst[143:136] := Saturate8(a[159:144])
tmp_dst[151:144] := Saturate8(a[175:160])
tmp_dst[159:152] := Saturate8(a[191:176])
tmp_dst[167:160] := Saturate8(a[207:192])
tmp_dst[175:168] := Saturate8(a[223:208])
tmp_dst[183:176] := Saturate8(a[239:224])
tmp_dst[191:184] := Saturate8(a[255:240])
tmp_dst[199:192] := Saturate8(b[143:128])
tmp_dst[207:200] := Saturate8(b[159:144])
tmp_dst[215:208] := Saturate8(b[175:160])
tmp_dst[223:216] := Saturate8(b[191:176])
tmp_dst[231:224] := Saturate8(b[207:192])
tmp_dst[239:232] := Saturate8(b[223:208])
tmp_dst[247:240] := Saturate8(b[239:224])
tmp_dst[255:248] := Saturate8(b[255:240])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[7:0] := Saturate8(a[15:0])
tmp_dst[15:8] := Saturate8(a[31:16])
tmp_dst[23:16] := Saturate8(a[47:32])
tmp_dst[31:24] := Saturate8(a[63:48])
tmp_dst[39:32] := Saturate8(a[79:64])
tmp_dst[47:40] := Saturate8(a[95:80])
tmp_dst[55:48] := Saturate8(a[111:96])
tmp_dst[63:56] := Saturate8(a[127:112])
tmp_dst[71:64] := Saturate8(b[15:0])
tmp_dst[79:72] := Saturate8(b[31:16])
tmp_dst[87:80] := Saturate8(b[47:32])
tmp_dst[95:88] := Saturate8(b[63:48])
tmp_dst[103:96] := Saturate8(b[79:64])
tmp_dst[111:104] := Saturate8(b[95:80])
tmp_dst[119:112] := Saturate8(b[111:96])
tmp_dst[127:120] := Saturate8(b[127:112])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[7:0] := Saturate8(a[15:0])
tmp_dst[15:8] := Saturate8(a[31:16])
tmp_dst[23:16] := Saturate8(a[47:32])
tmp_dst[31:24] := Saturate8(a[63:48])
tmp_dst[39:32] := Saturate8(a[79:64])
tmp_dst[47:40] := Saturate8(a[95:80])
tmp_dst[55:48] := Saturate8(a[111:96])
tmp_dst[63:56] := Saturate8(a[127:112])
tmp_dst[71:64] := Saturate8(b[15:0])
tmp_dst[79:72] := Saturate8(b[31:16])
tmp_dst[87:80] := Saturate8(b[47:32])
tmp_dst[95:88] := Saturate8(b[63:48])
tmp_dst[103:96] := Saturate8(b[79:64])
tmp_dst[111:104] := Saturate8(b[95:80])
tmp_dst[119:112] := Saturate8(b[111:96])
tmp_dst[127:120] := Saturate8(b[127:112])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := SaturateU16(a[31:0])
tmp_dst[31:16] := SaturateU16(a[63:32])
tmp_dst[47:32] := SaturateU16(a[95:64])
tmp_dst[63:48] := SaturateU16(a[127:96])
tmp_dst[79:64] := SaturateU16(b[31:0])
tmp_dst[95:80] := SaturateU16(b[63:32])
tmp_dst[111:96] := SaturateU16(b[95:64])
tmp_dst[127:112] := SaturateU16(b[127:96])
tmp_dst[143:128] := SaturateU16(a[159:128])
tmp_dst[159:144] := SaturateU16(a[191:160])
tmp_dst[175:160] := SaturateU16(a[223:192])
tmp_dst[191:176] := SaturateU16(a[255:224])
tmp_dst[207:192] := SaturateU16(b[159:128])
tmp_dst[223:208] := SaturateU16(b[191:160])
tmp_dst[239:224] := SaturateU16(b[223:192])
tmp_dst[255:240] := SaturateU16(b[255:224])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := SaturateU16(a[31:0])
tmp_dst[31:16] := SaturateU16(a[63:32])
tmp_dst[47:32] := SaturateU16(a[95:64])
tmp_dst[63:48] := SaturateU16(a[127:96])
tmp_dst[79:64] := SaturateU16(b[31:0])
tmp_dst[95:80] := SaturateU16(b[63:32])
tmp_dst[111:96] := SaturateU16(b[95:64])
tmp_dst[127:112] := SaturateU16(b[127:96])
tmp_dst[143:128] := SaturateU16(a[159:128])
tmp_dst[159:144] := SaturateU16(a[191:160])
tmp_dst[175:160] := SaturateU16(a[223:192])
tmp_dst[191:176] := SaturateU16(a[255:224])
tmp_dst[207:192] := SaturateU16(b[159:128])
tmp_dst[223:208] := SaturateU16(b[191:160])
tmp_dst[239:224] := SaturateU16(b[223:192])
tmp_dst[255:240] := SaturateU16(b[255:224])
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := SaturateU16(a[31:0])
tmp_dst[31:16] := SaturateU16(a[63:32])
tmp_dst[47:32] := SaturateU16(a[95:64])
tmp_dst[63:48] := SaturateU16(a[127:96])
tmp_dst[79:64] := SaturateU16(b[31:0])
tmp_dst[95:80] := SaturateU16(b[63:32])
tmp_dst[111:96] := SaturateU16(b[95:64])
tmp_dst[127:112] := SaturateU16(b[127:96])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := SaturateU16(a[31:0])
tmp_dst[31:16] := SaturateU16(a[63:32])
tmp_dst[47:32] := SaturateU16(a[95:64])
tmp_dst[63:48] := SaturateU16(a[127:96])
tmp_dst[79:64] := SaturateU16(b[31:0])
tmp_dst[95:80] := SaturateU16(b[63:32])
tmp_dst[111:96] := SaturateU16(b[95:64])
tmp_dst[127:112] := SaturateU16(b[127:96])
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[7:0] := SaturateU8(a[15:0])
tmp_dst[15:8] := SaturateU8(a[31:16])
tmp_dst[23:16] := SaturateU8(a[47:32])
tmp_dst[31:24] := SaturateU8(a[63:48])
tmp_dst[39:32] := SaturateU8(a[79:64])
tmp_dst[47:40] := SaturateU8(a[95:80])
tmp_dst[55:48] := SaturateU8(a[111:96])
tmp_dst[63:56] := SaturateU8(a[127:112])
tmp_dst[71:64] := SaturateU8(b[15:0])
tmp_dst[79:72] := SaturateU8(b[31:16])
tmp_dst[87:80] := SaturateU8(b[47:32])
tmp_dst[95:88] := SaturateU8(b[63:48])
tmp_dst[103:96] := SaturateU8(b[79:64])
tmp_dst[111:104] := SaturateU8(b[95:80])
tmp_dst[119:112] := SaturateU8(b[111:96])
tmp_dst[127:120] := SaturateU8(b[127:112])
tmp_dst[135:128] := SaturateU8(a[143:128])
tmp_dst[143:136] := SaturateU8(a[159:144])
tmp_dst[151:144] := SaturateU8(a[175:160])
tmp_dst[159:152] := SaturateU8(a[191:176])
tmp_dst[167:160] := SaturateU8(a[207:192])
tmp_dst[175:168] := SaturateU8(a[223:208])
tmp_dst[183:176] := SaturateU8(a[239:224])
tmp_dst[191:184] := SaturateU8(a[255:240])
tmp_dst[199:192] := SaturateU8(b[143:128])
tmp_dst[207:200] := SaturateU8(b[159:144])
tmp_dst[215:208] := SaturateU8(b[175:160])
tmp_dst[223:216] := SaturateU8(b[191:176])
tmp_dst[231:224] := SaturateU8(b[207:192])
tmp_dst[239:232] := SaturateU8(b[223:208])
tmp_dst[247:240] := SaturateU8(b[239:224])
tmp_dst[255:248] := SaturateU8(b[255:240])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[7:0] := SaturateU8(a[15:0])
tmp_dst[15:8] := SaturateU8(a[31:16])
tmp_dst[23:16] := SaturateU8(a[47:32])
tmp_dst[31:24] := SaturateU8(a[63:48])
tmp_dst[39:32] := SaturateU8(a[79:64])
tmp_dst[47:40] := SaturateU8(a[95:80])
tmp_dst[55:48] := SaturateU8(a[111:96])
tmp_dst[63:56] := SaturateU8(a[127:112])
tmp_dst[71:64] := SaturateU8(b[15:0])
tmp_dst[79:72] := SaturateU8(b[31:16])
tmp_dst[87:80] := SaturateU8(b[47:32])
tmp_dst[95:88] := SaturateU8(b[63:48])
tmp_dst[103:96] := SaturateU8(b[79:64])
tmp_dst[111:104] := SaturateU8(b[95:80])
tmp_dst[119:112] := SaturateU8(b[111:96])
tmp_dst[127:120] := SaturateU8(b[127:112])
tmp_dst[135:128] := SaturateU8(a[143:128])
tmp_dst[143:136] := SaturateU8(a[159:144])
tmp_dst[151:144] := SaturateU8(a[175:160])
tmp_dst[159:152] := SaturateU8(a[191:176])
tmp_dst[167:160] := SaturateU8(a[207:192])
tmp_dst[175:168] := SaturateU8(a[223:208])
tmp_dst[183:176] := SaturateU8(a[239:224])
tmp_dst[191:184] := SaturateU8(a[255:240])
tmp_dst[199:192] := SaturateU8(b[143:128])
tmp_dst[207:200] := SaturateU8(b[159:144])
tmp_dst[215:208] := SaturateU8(b[175:160])
tmp_dst[223:216] := SaturateU8(b[191:176])
tmp_dst[231:224] := SaturateU8(b[207:192])
tmp_dst[239:232] := SaturateU8(b[223:208])
tmp_dst[247:240] := SaturateU8(b[239:224])
tmp_dst[255:248] := SaturateU8(b[255:240])
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[7:0] := SaturateU8(a[15:0])
tmp_dst[15:8] := SaturateU8(a[31:16])
tmp_dst[23:16] := SaturateU8(a[47:32])
tmp_dst[31:24] := SaturateU8(a[63:48])
tmp_dst[39:32] := SaturateU8(a[79:64])
tmp_dst[47:40] := SaturateU8(a[95:80])
tmp_dst[55:48] := SaturateU8(a[111:96])
tmp_dst[63:56] := SaturateU8(a[127:112])
tmp_dst[71:64] := SaturateU8(b[15:0])
tmp_dst[79:72] := SaturateU8(b[31:16])
tmp_dst[87:80] := SaturateU8(b[47:32])
tmp_dst[95:88] := SaturateU8(b[63:48])
tmp_dst[103:96] := SaturateU8(b[79:64])
tmp_dst[111:104] := SaturateU8(b[95:80])
tmp_dst[119:112] := SaturateU8(b[111:96])
tmp_dst[127:120] := SaturateU8(b[127:112])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[7:0] := SaturateU8(a[15:0])
tmp_dst[15:8] := SaturateU8(a[31:16])
tmp_dst[23:16] := SaturateU8(a[47:32])
tmp_dst[31:24] := SaturateU8(a[63:48])
tmp_dst[39:32] := SaturateU8(a[79:64])
tmp_dst[47:40] := SaturateU8(a[95:80])
tmp_dst[55:48] := SaturateU8(a[111:96])
tmp_dst[63:56] := SaturateU8(a[127:112])
tmp_dst[71:64] := SaturateU8(b[15:0])
tmp_dst[79:72] := SaturateU8(b[31:16])
tmp_dst[87:80] := SaturateU8(b[47:32])
tmp_dst[95:88] := SaturateU8(b[63:48])
tmp_dst[103:96] := SaturateU8(b[79:64])
tmp_dst[111:104] := SaturateU8(b[95:80])
tmp_dst[119:112] := SaturateU8(b[111:96])
tmp_dst[127:120] := SaturateU8(b[127:112])
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 15
i := 16*j
l := 8*j
dst[l+7:l] := Saturate8(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Store
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i])
FI
ENDFOR
AVX512BW
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 7
i := 16*j
l := 8*j
dst[l+7:l] := Saturate8(a[i+15:i])
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Store
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i])
FI
ENDFOR
AVX512BW
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := SignExtend16(a[i+7:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := SignExtend16(a[i+7:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := SignExtend16(a[i+7:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := SignExtend16(a[i+7:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 15
i := 16*j
l := 8*j
dst[l+7:l] := SaturateU8(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Store
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i])
FI
ENDFOR
AVX512BW
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 7
i := 16*j
l := 8*j
dst[l+7:l] := SaturateU8(a[i+15:i])
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Store
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i])
FI
ENDFOR
AVX512BW
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 15
i := 16*j
l := 8*j
dst[l+7:l] := Truncate8(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Store
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i])
FI
ENDFOR
AVX512BW
AVX512VL
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 16*j
l := 8*j
dst[l+7:l] := Truncate8(a[i+15:i])
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Store
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i])
FI
ENDFOR
AVX512BW
AVX512VL
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512BW
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Convert
Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Set
Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Set
Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Set
Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Set
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Set
Broadcast 16-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Set
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Set
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Set
Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*8
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*8
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*16
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*16
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 31
i := j*8
k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 15
i := j*8
k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 15
i := j*16
k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 7
i := j*16
k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 31
i := j*8
IF k1[j]
k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 31
i := j*8
k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 15
i := j*8
IF k1[j]
k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 15
i := j*8
k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 15
i := j*16
IF k1[j]
k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 15
i := j*16
k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 7
i := j*16
IF k1[j]
k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 7
i := j*16
k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512BW
AVX512VL
Compare
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512BW
AVX512VL
Shift
Reduce the packed 16-bit integers in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[15:0] + src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] + src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_ADD(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_ADD(a, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[15:0] + src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] + src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_ADD(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0
FI
ENDFOR
dst[15:0] := REDUCE_ADD(tmp, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[15:0] + src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] + src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_ADD(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_ADD(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[15:0] + src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] + src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_ADD(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0
FI
ENDFOR
dst[15:0] := REDUCE_ADD(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[7:0] + src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] + src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_ADD(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_ADD(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[7:0] + src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] + src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_ADD(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0
FI
ENDFOR
dst[7:0] := REDUCE_ADD(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[7:0] + src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] + src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_ADD(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_ADD(a, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[7:0] + src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] + src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_ADD(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0
FI
ENDFOR
dst[7:0] := REDUCE_ADD(tmp, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[15:0] * src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] * src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_MUL(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MUL(a, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[15:0] * src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] * src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_MUL(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 1
FI
ENDFOR
dst[15:0] := REDUCE_MUL(tmp, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[15:0] * src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] * src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_MUL(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MUL(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[15:0] * src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] * src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_MUL(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 1
FI
ENDFOR
dst[15:0] := REDUCE_MUL(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[7:0] * src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] * src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_MUL(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MUL(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[7:0] * src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] * src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_MUL(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 1
FI
ENDFOR
dst[7:0] := REDUCE_MUL(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[7:0] * src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] * src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_MUL(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MUL(a, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[7:0] * src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] * src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_MUL(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 1
FI
ENDFOR
dst[7:0] := REDUCE_MUL(tmp, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[15:0] OR src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] OR src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_OR(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_OR(a, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[15:0] OR src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] OR src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_OR(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0
FI
ENDFOR
dst[15:0] := REDUCE_OR(tmp, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[15:0] OR src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] OR src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_OR(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_OR(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[15:0] OR src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] OR src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_OR(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0
FI
ENDFOR
dst[15:0] := REDUCE_OR(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[7:0] OR src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] OR src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_OR(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_OR(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[7:0] OR src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] OR src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_OR(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0
FI
ENDFOR
dst[7:0] := REDUCE_OR(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[7:0] OR src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] OR src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_OR(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_OR(a, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[7:0] OR src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] OR src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_OR(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0
FI
ENDFOR
dst[7:0] := REDUCE_OR(tmp, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[15:0] AND src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] AND src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_AND(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_AND(a, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[15:0] AND src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] AND src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_AND(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0xFFFF
FI
ENDFOR
dst[15:0] := REDUCE_AND(tmp, 8)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[15:0] AND src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] AND src[i+16*len+31:i+16*len]
ENDFOR
RETURN REDUCE_AND(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_AND(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 16-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[15:0] AND src[31:16]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := src[i+15:i] AND src[i+16*len+15:i+16*len]
ENDFOR
RETURN REDUCE_AND(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0xFFFF
FI
ENDFOR
dst[15:0] := REDUCE_AND(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[7:0] AND src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] AND src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_AND(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_AND(a, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[7:0] AND src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] AND src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_AND(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0xFF
FI
ENDFOR
dst[7:0] := REDUCE_AND(tmp, 16)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication. Returns the sum of all elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[7:0] AND src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] AND src[i+8*len+15:i+8*len]
ENDFOR
RETURN REDUCE_AND(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_AND(a, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed 8-bit integers in "a" by multiplication using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[7:0] AND src[15:8]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := src[i+7:i] AND src[i+8*len+7:i+8*len]
ENDFOR
RETURN REDUCE_AND(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0xFF
FI
ENDFOR
dst[7:0] := REDUCE_AND(tmp, 32)
AVX512BW
AVX512VL
Arithmetic
Reduce the packed signed 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MAX(a, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := Int16(-0x8000)
FI
ENDFOR
dst[15:0] := REDUCE_MAX(tmp, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MAX(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := Int16(-0x8000)
FI
ENDFOR
dst[15:0] := REDUCE_MAX(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MAX(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := Int8(-0x80)
FI
ENDFOR
dst[7:0] := REDUCE_MAX(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MAX(a, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := Int8(-0x80)
FI
ENDFOR
dst[7:0] := REDUCE_MAX(tmp, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MAX(a, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0
FI
ENDFOR
dst[15:0] := REDUCE_MAX(tmp, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MAX(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[15:0] > src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] > src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MAX(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0
FI
ENDFOR
dst[15:0] := REDUCE_MAX(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MAX(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0
FI
ENDFOR
dst[7:0] := REDUCE_MAX(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by maximum. Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MAX(a, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[7:0] > src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] > src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MAX(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0
FI
ENDFOR
dst[7:0] := REDUCE_MAX(tmp, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MIN(a, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := Int16(0x7FFF)
FI
ENDFOR
dst[15:0] := REDUCE_MIN(tmp, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MIN(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := Int16(0x7FFF)
FI
ENDFOR
dst[15:0] := REDUCE_MIN(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MIN(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := Int8(0x7F)
FI
ENDFOR
dst[7:0] := REDUCE_MIN(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MIN(a, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed signed 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 31
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := Int8(0x7F)
FI
ENDFOR
dst[7:0] := REDUCE_MIN(tmp, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MIN(a, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0xFFFF
FI
ENDFOR
dst[15:0] := REDUCE_MIN(tmp, 8)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
dst[15:0] := REDUCE_MIN(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 16-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[15:0] < src[31:16] ? src[15:0] : src[31:16])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*16
src[i+15:i] := (src[i+15:i] < src[i+16*len+15:i+16*len] ? src[i+15:i] : src[i+16*len+15:i+16*len])
ENDFOR
RETURN REDUCE_MIN(src[16*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[i+15:i] := a[i+15:i]
ELSE
tmp[i+15:i] := 0xFFFF
FI
ENDFOR
dst[15:0] := REDUCE_MIN(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MIN(a, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0xFF
FI
ENDFOR
dst[7:0] := REDUCE_MIN(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by minimum. Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
dst[7:0] := REDUCE_MIN(a, 32)
AVX512BW
AVX512VL
Special Math Functions
Reduce the packed unsigned 8-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[7:0] < src[15:8] ? src[7:0] : src[15:8])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*8
src[i+7:i] := (src[i+7:i] < src[i+8*len+7:i+8*len] ? src[i+7:i] : src[i+8*len+7:i+8*len])
ENDFOR
RETURN REDUCE_MIN(src[8*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*8
IF k[j]
tmp[i+7:i] := a[i+7:i]
ELSE
tmp[i+7:i] := 0xFF
FI
ENDFOR
dst[7:0] := REDUCE_MIN(tmp, 16)
AVX512BW
AVX512VL
Special Math Functions
Unpack and interleave 32 bits from masks "a" and "b", and store the 64-bit result in "dst".
dst[31:0] := b[31:0]
dst[63:32] := a[31:0]
dst[MAX:64] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16 bits from masks "a" and "b", and store the 32-bit result in "dst".
dst[15:0] := b[15:0]
dst[31:16] := a[15:0]
dst[MAX:32] := 0
AVX512BW
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
FOR i := 0 to 3
tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 7
i := j*64
dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
FOR i := 0 to 3
tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 7
i := j*64
tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets.
FOR i := 0 to 3
tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ]
tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ]
tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ]
tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ]
ENDFOR
FOR j := 0 to 7
i := j*64
tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\
ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24])
tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\
ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32])
tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\
ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40])
tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\
ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48])
ENDFOR
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".
FOR j := 0 to 3
i := j*128
tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
dst[i+127:i] := tmp[127:0]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*128
tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
tmp_dst[i+127:i] := tmp[127:0]
ENDFOR
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*128
tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8)
tmp_dst[i+127:i] := tmp[127:0]
ENDFOR
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := b[i+7:i]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := b[i+15:i]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
off := 16*idx[i+4:i]
dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := idx[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
off := 16*idx[i+4:i]
dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
off := 16*idx[i+4:i]
dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 31
i := j*16
off := 16*idx[i+4:i]
dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
id := idx[i+4:i]*16
IF k[j]
dst[i+15:i] := a[id+15:id]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
id := idx[i+4:i]*16
IF k[j]
dst[i+15:i] := a[id+15:id]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 31
i := j*16
id := idx[i+4:i]*16
dst[i+15:i] := a[id+15:id]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a".
FOR j := 0 to 63
i := j*8
IF a[i+7]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Miscellaneous
Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := 0xFF
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := 0xFFFF
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a".
FOR j := 0 to 31
i := j*16
IF a[i+15]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Miscellaneous
Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce eight unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".
FOR j := 0 to 63
i := j*8
tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
ENDFOR
FOR j := 0 to 7
i := j*64
dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \
tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56]
dst[i+63:i+16] := 0
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 8-bit integers in "a" within 128-bit lanes using the control in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[5:0] := b[i+3:i] + (j & 0x30)
dst[i+7:i] := a[index*8+7:index*8]
FI
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Swizzle
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[5:0] := b[i+3:i] + (j & 0x30)
dst[i+7:i] := a[index*8+7:index*8]
FI
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Swizzle
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
FOR j := 0 to 63
i := j*8
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[5:0] := b[i+3:i] + (j & 0x30)
dst[i+7:i] := a[index*8+7:index*8]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Swizzle
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[63:0] := a[63:0]
tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
tmp_dst[191:128] := a[191:128]
tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
tmp_dst[319:256] := a[319:256]
tmp_dst[335:320] := (a >> (imm8[1:0] * 16))[335:320]
tmp_dst[351:336] := (a >> (imm8[3:2] * 16))[335:320]
tmp_dst[367:352] := (a >> (imm8[5:4] * 16))[335:320]
tmp_dst[383:368] := (a >> (imm8[7:6] * 16))[335:320]
tmp_dst[447:384] := a[447:384]
tmp_dst[463:448] := (a >> (imm8[1:0] * 16))[463:448]
tmp_dst[479:464] := (a >> (imm8[3:2] * 16))[463:448]
tmp_dst[495:480] := (a >> (imm8[5:4] * 16))[463:448]
tmp_dst[511:496] := (a >> (imm8[7:6] * 16))[463:448]
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[63:0] := a[63:0]
tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
tmp_dst[191:128] := a[191:128]
tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
tmp_dst[319:256] := a[319:256]
tmp_dst[335:320] := (a >> (imm8[1:0] * 16))[335:320]
tmp_dst[351:336] := (a >> (imm8[3:2] * 16))[335:320]
tmp_dst[367:352] := (a >> (imm8[5:4] * 16))[335:320]
tmp_dst[383:368] := (a >> (imm8[7:6] * 16))[335:320]
tmp_dst[447:384] := a[447:384]
tmp_dst[463:448] := (a >> (imm8[1:0] * 16))[463:448]
tmp_dst[479:464] := (a >> (imm8[3:2] * 16))[463:448]
tmp_dst[495:480] := (a >> (imm8[5:4] * 16))[463:448]
tmp_dst[511:496] := (a >> (imm8[7:6] * 16))[463:448]
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst".
dst[63:0] := a[63:0]
dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
dst[191:128] := a[191:128]
dst[207:192] := (a >> (imm8[1:0] * 16))[207:192]
dst[223:208] := (a >> (imm8[3:2] * 16))[207:192]
dst[239:224] := (a >> (imm8[5:4] * 16))[207:192]
dst[255:240] := (a >> (imm8[7:6] * 16))[207:192]
dst[319:256] := a[319:256]
dst[335:320] := (a >> (imm8[1:0] * 16))[335:320]
dst[351:336] := (a >> (imm8[3:2] * 16))[335:320]
dst[367:352] := (a >> (imm8[5:4] * 16))[335:320]
dst[383:368] := (a >> (imm8[7:6] * 16))[335:320]
dst[447:384] := a[447:384]
dst[463:448] := (a >> (imm8[1:0] * 16))[463:448]
dst[479:464] := (a >> (imm8[3:2] * 16))[463:448]
dst[495:480] := (a >> (imm8[5:4] * 16))[463:448]
dst[511:496] := (a >> (imm8[7:6] * 16))[463:448]
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
tmp_dst[127:64] := a[127:64]
tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
tmp_dst[255:192] := a[255:192]
tmp_dst[271:256] := (a >> (imm8[1:0] * 16))[271:256]
tmp_dst[287:272] := (a >> (imm8[3:2] * 16))[271:256]
tmp_dst[303:288] := (a >> (imm8[5:4] * 16))[271:256]
tmp_dst[319:304] := (a >> (imm8[7:6] * 16))[271:256]
tmp_dst[383:320] := a[383:320]
tmp_dst[399:384] := (a >> (imm8[1:0] * 16))[399:384]
tmp_dst[415:400] := (a >> (imm8[3:2] * 16))[399:384]
tmp_dst[431:416] := (a >> (imm8[5:4] * 16))[399:384]
tmp_dst[447:432] := (a >> (imm8[7:6] * 16))[399:384]
tmp_dst[511:448] := a[511:448]
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
tmp_dst[127:64] := a[127:64]
tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
tmp_dst[255:192] := a[255:192]
tmp_dst[271:256] := (a >> (imm8[1:0] * 16))[271:256]
tmp_dst[287:272] := (a >> (imm8[3:2] * 16))[271:256]
tmp_dst[303:288] := (a >> (imm8[5:4] * 16))[271:256]
tmp_dst[319:304] := (a >> (imm8[7:6] * 16))[271:256]
tmp_dst[383:320] := a[383:320]
tmp_dst[399:384] := (a >> (imm8[1:0] * 16))[399:384]
tmp_dst[415:400] := (a >> (imm8[3:2] * 16))[399:384]
tmp_dst[431:416] := (a >> (imm8[5:4] * 16))[399:384]
tmp_dst[447:432] := (a >> (imm8[7:6] * 16))[399:384]
tmp_dst[511:448] := a[511:448]
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst".
dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
dst[127:64] := a[127:64]
dst[143:128] := (a >> (imm8[1:0] * 16))[143:128]
dst[159:144] := (a >> (imm8[3:2] * 16))[143:128]
dst[175:160] := (a >> (imm8[5:4] * 16))[143:128]
dst[191:176] := (a >> (imm8[7:6] * 16))[143:128]
dst[255:192] := a[255:192]
dst[271:256] := (a >> (imm8[1:0] * 16))[271:256]
dst[287:272] := (a >> (imm8[3:2] * 16))[271:256]
dst[303:288] := (a >> (imm8[5:4] * 16))[271:256]
dst[319:304] := (a >> (imm8[7:6] * 16))[271:256]
dst[383:320] := a[383:320]
dst[399:384] := (a >> (imm8[1:0] * 16))[399:384]
dst[415:400] := (a >> (imm8[3:2] * 16))[399:384]
dst[431:416] := (a >> (imm8[5:4] * 16))[399:384]
dst[447:432] := (a >> (imm8[7:6] * 16))[399:384]
dst[511:448] := a[511:448]
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512BW
Miscellaneous
Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Load
Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Load
Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Load
Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Load
Load 512-bits (composed of 32 packed 16-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512BW
Load
Load 512-bits (composed of 64 packed 8-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512BW
Load
Load 32-bit mask from memory into "k".
k[31:0] := MEM[mem_addr+31:mem_addr]
AVX512BW
Load
Load 64-bit mask from memory into "k".
k[63:0] := MEM[mem_addr+63:mem_addr]
AVX512BW
Load
Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Move
Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Move
Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Move
Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Move
Store packed 16-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 31
i := j*16
IF k[j]
MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i]
FI
ENDFOR
AVX512BW
Store
Store packed 8-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 63
i := j*8
IF k[j]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
AVX512BW
Store
Store 512-bits (composed of 32 packed 16-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512BW
Store
Store 512-bits (composed of 64 packed 8-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512BW
Store
Store 32-bit mask from "a" into memory.
MEM[mem_addr+31:mem_addr] := a[31:0]
AVX512BW
Store
Store 64-bit mask from "a" into memory.
MEM[mem_addr+63:mem_addr] := a[63:0]
AVX512BW
Store
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := ABS(a[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := ABS(a[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := ABS(a[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ABS(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ABS(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ABS(a[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
FOR j := 0 to 31
i := j*16
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 31
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 31
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
FOR j := 0 to 31
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ENDFOR
dst[MAX:512] := 0
AVX512BW
Arithmetic
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := Saturate16(a[31:0])
tmp_dst[31:16] := Saturate16(a[63:32])
tmp_dst[47:32] := Saturate16(a[95:64])
tmp_dst[63:48] := Saturate16(a[127:96])
tmp_dst[79:64] := Saturate16(b[31:0])
tmp_dst[95:80] := Saturate16(b[63:32])
tmp_dst[111:96] := Saturate16(b[95:64])
tmp_dst[127:112] := Saturate16(b[127:96])
tmp_dst[143:128] := Saturate16(a[159:128])
tmp_dst[159:144] := Saturate16(a[191:160])
tmp_dst[175:160] := Saturate16(a[223:192])
tmp_dst[191:176] := Saturate16(a[255:224])
tmp_dst[207:192] := Saturate16(b[159:128])
tmp_dst[223:208] := Saturate16(b[191:160])
tmp_dst[239:224] := Saturate16(b[223:192])
tmp_dst[255:240] := Saturate16(b[255:224])
tmp_dst[271:256] := Saturate16(a[287:256])
tmp_dst[287:272] := Saturate16(a[319:288])
tmp_dst[303:288] := Saturate16(a[351:320])
tmp_dst[319:304] := Saturate16(a[383:352])
tmp_dst[335:320] := Saturate16(b[287:256])
tmp_dst[351:336] := Saturate16(b[319:288])
tmp_dst[367:352] := Saturate16(b[351:320])
tmp_dst[383:368] := Saturate16(b[383:352])
tmp_dst[399:384] := Saturate16(a[415:384])
tmp_dst[415:400] := Saturate16(a[447:416])
tmp_dst[431:416] := Saturate16(a[479:448])
tmp_dst[447:432] := Saturate16(a[511:480])
tmp_dst[463:448] := Saturate16(b[415:384])
tmp_dst[479:464] := Saturate16(b[447:416])
tmp_dst[495:480] := Saturate16(b[479:448])
tmp_dst[511:496] := Saturate16(b[511:480])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := Saturate16(a[31:0])
tmp_dst[31:16] := Saturate16(a[63:32])
tmp_dst[47:32] := Saturate16(a[95:64])
tmp_dst[63:48] := Saturate16(a[127:96])
tmp_dst[79:64] := Saturate16(b[31:0])
tmp_dst[95:80] := Saturate16(b[63:32])
tmp_dst[111:96] := Saturate16(b[95:64])
tmp_dst[127:112] := Saturate16(b[127:96])
tmp_dst[143:128] := Saturate16(a[159:128])
tmp_dst[159:144] := Saturate16(a[191:160])
tmp_dst[175:160] := Saturate16(a[223:192])
tmp_dst[191:176] := Saturate16(a[255:224])
tmp_dst[207:192] := Saturate16(b[159:128])
tmp_dst[223:208] := Saturate16(b[191:160])
tmp_dst[239:224] := Saturate16(b[223:192])
tmp_dst[255:240] := Saturate16(b[255:224])
tmp_dst[271:256] := Saturate16(a[287:256])
tmp_dst[287:272] := Saturate16(a[319:288])
tmp_dst[303:288] := Saturate16(a[351:320])
tmp_dst[319:304] := Saturate16(a[383:352])
tmp_dst[335:320] := Saturate16(b[287:256])
tmp_dst[351:336] := Saturate16(b[319:288])
tmp_dst[367:352] := Saturate16(b[351:320])
tmp_dst[383:368] := Saturate16(b[383:352])
tmp_dst[399:384] := Saturate16(a[415:384])
tmp_dst[415:400] := Saturate16(a[447:416])
tmp_dst[431:416] := Saturate16(a[479:448])
tmp_dst[447:432] := Saturate16(a[511:480])
tmp_dst[463:448] := Saturate16(b[415:384])
tmp_dst[479:464] := Saturate16(b[447:416])
tmp_dst[495:480] := Saturate16(b[479:448])
tmp_dst[511:496] := Saturate16(b[511:480])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
dst[15:0] := Saturate16(a[31:0])
dst[31:16] := Saturate16(a[63:32])
dst[47:32] := Saturate16(a[95:64])
dst[63:48] := Saturate16(a[127:96])
dst[79:64] := Saturate16(b[31:0])
dst[95:80] := Saturate16(b[63:32])
dst[111:96] := Saturate16(b[95:64])
dst[127:112] := Saturate16(b[127:96])
dst[143:128] := Saturate16(a[159:128])
dst[159:144] := Saturate16(a[191:160])
dst[175:160] := Saturate16(a[223:192])
dst[191:176] := Saturate16(a[255:224])
dst[207:192] := Saturate16(b[159:128])
dst[223:208] := Saturate16(b[191:160])
dst[239:224] := Saturate16(b[223:192])
dst[255:240] := Saturate16(b[255:224])
dst[271:256] := Saturate16(a[287:256])
dst[287:272] := Saturate16(a[319:288])
dst[303:288] := Saturate16(a[351:320])
dst[319:304] := Saturate16(a[383:352])
dst[335:320] := Saturate16(b[287:256])
dst[351:336] := Saturate16(b[319:288])
dst[367:352] := Saturate16(b[351:320])
dst[383:368] := Saturate16(b[383:352])
dst[399:384] := Saturate16(a[415:384])
dst[415:400] := Saturate16(a[447:416])
dst[431:416] := Saturate16(a[479:448])
dst[447:432] := Saturate16(a[511:480])
dst[463:448] := Saturate16(b[415:384])
dst[479:464] := Saturate16(b[447:416])
dst[495:480] := Saturate16(b[479:448])
dst[511:496] := Saturate16(b[511:480])
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[7:0] := Saturate8(a[15:0])
tmp_dst[15:8] := Saturate8(a[31:16])
tmp_dst[23:16] := Saturate8(a[47:32])
tmp_dst[31:24] := Saturate8(a[63:48])
tmp_dst[39:32] := Saturate8(a[79:64])
tmp_dst[47:40] := Saturate8(a[95:80])
tmp_dst[55:48] := Saturate8(a[111:96])
tmp_dst[63:56] := Saturate8(a[127:112])
tmp_dst[71:64] := Saturate8(b[15:0])
tmp_dst[79:72] := Saturate8(b[31:16])
tmp_dst[87:80] := Saturate8(b[47:32])
tmp_dst[95:88] := Saturate8(b[63:48])
tmp_dst[103:96] := Saturate8(b[79:64])
tmp_dst[111:104] := Saturate8(b[95:80])
tmp_dst[119:112] := Saturate8(b[111:96])
tmp_dst[127:120] := Saturate8(b[127:112])
tmp_dst[135:128] := Saturate8(a[143:128])
tmp_dst[143:136] := Saturate8(a[159:144])
tmp_dst[151:144] := Saturate8(a[175:160])
tmp_dst[159:152] := Saturate8(a[191:176])
tmp_dst[167:160] := Saturate8(a[207:192])
tmp_dst[175:168] := Saturate8(a[223:208])
tmp_dst[183:176] := Saturate8(a[239:224])
tmp_dst[191:184] := Saturate8(a[255:240])
tmp_dst[199:192] := Saturate8(b[143:128])
tmp_dst[207:200] := Saturate8(b[159:144])
tmp_dst[215:208] := Saturate8(b[175:160])
tmp_dst[223:216] := Saturate8(b[191:176])
tmp_dst[231:224] := Saturate8(b[207:192])
tmp_dst[239:232] := Saturate8(b[223:208])
tmp_dst[247:240] := Saturate8(b[239:224])
tmp_dst[255:248] := Saturate8(b[255:240])
tmp_dst[263:256] := Saturate8(a[271:256])
tmp_dst[271:264] := Saturate8(a[287:272])
tmp_dst[279:272] := Saturate8(a[303:288])
tmp_dst[287:280] := Saturate8(a[319:304])
tmp_dst[295:288] := Saturate8(a[335:320])
tmp_dst[303:296] := Saturate8(a[351:336])
tmp_dst[311:304] := Saturate8(a[367:352])
tmp_dst[319:312] := Saturate8(a[383:368])
tmp_dst[327:320] := Saturate8(b[271:256])
tmp_dst[335:328] := Saturate8(b[287:272])
tmp_dst[343:336] := Saturate8(b[303:288])
tmp_dst[351:344] := Saturate8(b[319:304])
tmp_dst[359:352] := Saturate8(b[335:320])
tmp_dst[367:360] := Saturate8(b[351:336])
tmp_dst[375:368] := Saturate8(b[367:352])
tmp_dst[383:376] := Saturate8(b[383:368])
tmp_dst[391:384] := Saturate8(a[399:384])
tmp_dst[399:392] := Saturate8(a[415:400])
tmp_dst[407:400] := Saturate8(a[431:416])
tmp_dst[415:408] := Saturate8(a[447:432])
tmp_dst[423:416] := Saturate8(a[463:448])
tmp_dst[431:424] := Saturate8(a[479:464])
tmp_dst[439:432] := Saturate8(a[495:480])
tmp_dst[447:440] := Saturate8(a[511:496])
tmp_dst[455:448] := Saturate8(b[399:384])
tmp_dst[463:456] := Saturate8(b[415:400])
tmp_dst[471:464] := Saturate8(b[431:416])
tmp_dst[479:472] := Saturate8(b[447:432])
tmp_dst[487:480] := Saturate8(b[463:448])
tmp_dst[495:488] := Saturate8(b[479:464])
tmp_dst[503:496] := Saturate8(b[495:480])
tmp_dst[511:504] := Saturate8(b[511:496])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[7:0] := Saturate8(a[15:0])
tmp_dst[15:8] := Saturate8(a[31:16])
tmp_dst[23:16] := Saturate8(a[47:32])
tmp_dst[31:24] := Saturate8(a[63:48])
tmp_dst[39:32] := Saturate8(a[79:64])
tmp_dst[47:40] := Saturate8(a[95:80])
tmp_dst[55:48] := Saturate8(a[111:96])
tmp_dst[63:56] := Saturate8(a[127:112])
tmp_dst[71:64] := Saturate8(b[15:0])
tmp_dst[79:72] := Saturate8(b[31:16])
tmp_dst[87:80] := Saturate8(b[47:32])
tmp_dst[95:88] := Saturate8(b[63:48])
tmp_dst[103:96] := Saturate8(b[79:64])
tmp_dst[111:104] := Saturate8(b[95:80])
tmp_dst[119:112] := Saturate8(b[111:96])
tmp_dst[127:120] := Saturate8(b[127:112])
tmp_dst[135:128] := Saturate8(a[143:128])
tmp_dst[143:136] := Saturate8(a[159:144])
tmp_dst[151:144] := Saturate8(a[175:160])
tmp_dst[159:152] := Saturate8(a[191:176])
tmp_dst[167:160] := Saturate8(a[207:192])
tmp_dst[175:168] := Saturate8(a[223:208])
tmp_dst[183:176] := Saturate8(a[239:224])
tmp_dst[191:184] := Saturate8(a[255:240])
tmp_dst[199:192] := Saturate8(b[143:128])
tmp_dst[207:200] := Saturate8(b[159:144])
tmp_dst[215:208] := Saturate8(b[175:160])
tmp_dst[223:216] := Saturate8(b[191:176])
tmp_dst[231:224] := Saturate8(b[207:192])
tmp_dst[239:232] := Saturate8(b[223:208])
tmp_dst[247:240] := Saturate8(b[239:224])
tmp_dst[255:248] := Saturate8(b[255:240])
tmp_dst[263:256] := Saturate8(a[271:256])
tmp_dst[271:264] := Saturate8(a[287:272])
tmp_dst[279:272] := Saturate8(a[303:288])
tmp_dst[287:280] := Saturate8(a[319:304])
tmp_dst[295:288] := Saturate8(a[335:320])
tmp_dst[303:296] := Saturate8(a[351:336])
tmp_dst[311:304] := Saturate8(a[367:352])
tmp_dst[319:312] := Saturate8(a[383:368])
tmp_dst[327:320] := Saturate8(b[271:256])
tmp_dst[335:328] := Saturate8(b[287:272])
tmp_dst[343:336] := Saturate8(b[303:288])
tmp_dst[351:344] := Saturate8(b[319:304])
tmp_dst[359:352] := Saturate8(b[335:320])
tmp_dst[367:360] := Saturate8(b[351:336])
tmp_dst[375:368] := Saturate8(b[367:352])
tmp_dst[383:376] := Saturate8(b[383:368])
tmp_dst[391:384] := Saturate8(a[399:384])
tmp_dst[399:392] := Saturate8(a[415:400])
tmp_dst[407:400] := Saturate8(a[431:416])
tmp_dst[415:408] := Saturate8(a[447:432])
tmp_dst[423:416] := Saturate8(a[463:448])
tmp_dst[431:424] := Saturate8(a[479:464])
tmp_dst[439:432] := Saturate8(a[495:480])
tmp_dst[447:440] := Saturate8(a[511:496])
tmp_dst[455:448] := Saturate8(b[399:384])
tmp_dst[463:456] := Saturate8(b[415:400])
tmp_dst[471:464] := Saturate8(b[431:416])
tmp_dst[479:472] := Saturate8(b[447:432])
tmp_dst[487:480] := Saturate8(b[463:448])
tmp_dst[495:488] := Saturate8(b[479:464])
tmp_dst[503:496] := Saturate8(b[495:480])
tmp_dst[511:504] := Saturate8(b[511:496])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
dst[7:0] := Saturate8(a[15:0])
dst[15:8] := Saturate8(a[31:16])
dst[23:16] := Saturate8(a[47:32])
dst[31:24] := Saturate8(a[63:48])
dst[39:32] := Saturate8(a[79:64])
dst[47:40] := Saturate8(a[95:80])
dst[55:48] := Saturate8(a[111:96])
dst[63:56] := Saturate8(a[127:112])
dst[71:64] := Saturate8(b[15:0])
dst[79:72] := Saturate8(b[31:16])
dst[87:80] := Saturate8(b[47:32])
dst[95:88] := Saturate8(b[63:48])
dst[103:96] := Saturate8(b[79:64])
dst[111:104] := Saturate8(b[95:80])
dst[119:112] := Saturate8(b[111:96])
dst[127:120] := Saturate8(b[127:112])
dst[135:128] := Saturate8(a[143:128])
dst[143:136] := Saturate8(a[159:144])
dst[151:144] := Saturate8(a[175:160])
dst[159:152] := Saturate8(a[191:176])
dst[167:160] := Saturate8(a[207:192])
dst[175:168] := Saturate8(a[223:208])
dst[183:176] := Saturate8(a[239:224])
dst[191:184] := Saturate8(a[255:240])
dst[199:192] := Saturate8(b[143:128])
dst[207:200] := Saturate8(b[159:144])
dst[215:208] := Saturate8(b[175:160])
dst[223:216] := Saturate8(b[191:176])
dst[231:224] := Saturate8(b[207:192])
dst[239:232] := Saturate8(b[223:208])
dst[247:240] := Saturate8(b[239:224])
dst[255:248] := Saturate8(b[255:240])
dst[263:256] := Saturate8(a[271:256])
dst[271:264] := Saturate8(a[287:272])
dst[279:272] := Saturate8(a[303:288])
dst[287:280] := Saturate8(a[319:304])
dst[295:288] := Saturate8(a[335:320])
dst[303:296] := Saturate8(a[351:336])
dst[311:304] := Saturate8(a[367:352])
dst[319:312] := Saturate8(a[383:368])
dst[327:320] := Saturate8(b[271:256])
dst[335:328] := Saturate8(b[287:272])
dst[343:336] := Saturate8(b[303:288])
dst[351:344] := Saturate8(b[319:304])
dst[359:352] := Saturate8(b[335:320])
dst[367:360] := Saturate8(b[351:336])
dst[375:368] := Saturate8(b[367:352])
dst[383:376] := Saturate8(b[383:368])
dst[391:384] := Saturate8(a[399:384])
dst[399:392] := Saturate8(a[415:400])
dst[407:400] := Saturate8(a[431:416])
dst[415:408] := Saturate8(a[447:432])
dst[423:416] := Saturate8(a[463:448])
dst[431:424] := Saturate8(a[479:464])
dst[439:432] := Saturate8(a[495:480])
dst[447:440] := Saturate8(a[511:496])
dst[455:448] := Saturate8(b[399:384])
dst[463:456] := Saturate8(b[415:400])
dst[471:464] := Saturate8(b[431:416])
dst[479:472] := Saturate8(b[447:432])
dst[487:480] := Saturate8(b[463:448])
dst[495:488] := Saturate8(b[479:464])
dst[503:496] := Saturate8(b[495:480])
dst[511:504] := Saturate8(b[511:496])
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[15:0] := SaturateU16(a[31:0])
tmp_dst[31:16] := SaturateU16(a[63:32])
tmp_dst[47:32] := SaturateU16(a[95:64])
tmp_dst[63:48] := SaturateU16(a[127:96])
tmp_dst[79:64] := SaturateU16(b[31:0])
tmp_dst[95:80] := SaturateU16(b[63:32])
tmp_dst[111:96] := SaturateU16(b[95:64])
tmp_dst[127:112] := SaturateU16(b[127:96])
tmp_dst[143:128] := SaturateU16(a[159:128])
tmp_dst[159:144] := SaturateU16(a[191:160])
tmp_dst[175:160] := SaturateU16(a[223:192])
tmp_dst[191:176] := SaturateU16(a[255:224])
tmp_dst[207:192] := SaturateU16(b[159:128])
tmp_dst[223:208] := SaturateU16(b[191:160])
tmp_dst[239:224] := SaturateU16(b[223:192])
tmp_dst[255:240] := SaturateU16(b[255:224])
tmp_dst[271:256] := SaturateU16(a[287:256])
tmp_dst[287:272] := SaturateU16(a[319:288])
tmp_dst[303:288] := SaturateU16(a[351:320])
tmp_dst[319:304] := SaturateU16(a[383:352])
tmp_dst[335:320] := SaturateU16(b[287:256])
tmp_dst[351:336] := SaturateU16(b[319:288])
tmp_dst[367:352] := SaturateU16(b[351:320])
tmp_dst[383:368] := SaturateU16(b[383:352])
tmp_dst[399:384] := SaturateU16(a[415:384])
tmp_dst[415:400] := SaturateU16(a[447:416])
tmp_dst[431:416] := SaturateU16(a[479:448])
tmp_dst[447:432] := SaturateU16(a[511:480])
tmp_dst[463:448] := SaturateU16(b[415:384])
tmp_dst[479:464] := SaturateU16(b[447:416])
tmp_dst[495:480] := SaturateU16(b[479:448])
tmp_dst[511:496] := SaturateU16(b[511:480])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[15:0] := SaturateU16(a[31:0])
tmp_dst[31:16] := SaturateU16(a[63:32])
tmp_dst[47:32] := SaturateU16(a[95:64])
tmp_dst[63:48] := SaturateU16(a[127:96])
tmp_dst[79:64] := SaturateU16(b[31:0])
tmp_dst[95:80] := SaturateU16(b[63:32])
tmp_dst[111:96] := SaturateU16(b[95:64])
tmp_dst[127:112] := SaturateU16(b[127:96])
tmp_dst[143:128] := SaturateU16(a[159:128])
tmp_dst[159:144] := SaturateU16(a[191:160])
tmp_dst[175:160] := SaturateU16(a[223:192])
tmp_dst[191:176] := SaturateU16(a[255:224])
tmp_dst[207:192] := SaturateU16(b[159:128])
tmp_dst[223:208] := SaturateU16(b[191:160])
tmp_dst[239:224] := SaturateU16(b[223:192])
tmp_dst[255:240] := SaturateU16(b[255:224])
tmp_dst[271:256] := SaturateU16(a[287:256])
tmp_dst[287:272] := SaturateU16(a[319:288])
tmp_dst[303:288] := SaturateU16(a[351:320])
tmp_dst[319:304] := SaturateU16(a[383:352])
tmp_dst[335:320] := SaturateU16(b[287:256])
tmp_dst[351:336] := SaturateU16(b[319:288])
tmp_dst[367:352] := SaturateU16(b[351:320])
tmp_dst[383:368] := SaturateU16(b[383:352])
tmp_dst[399:384] := SaturateU16(a[415:384])
tmp_dst[415:400] := SaturateU16(a[447:416])
tmp_dst[431:416] := SaturateU16(a[479:448])
tmp_dst[447:432] := SaturateU16(a[511:480])
tmp_dst[463:448] := SaturateU16(b[415:384])
tmp_dst[479:464] := SaturateU16(b[447:416])
tmp_dst[495:480] := SaturateU16(b[479:448])
tmp_dst[511:496] := SaturateU16(b[511:480])
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := tmp_dst[i+15:i]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".
dst[15:0] := SaturateU16(a[31:0])
dst[31:16] := SaturateU16(a[63:32])
dst[47:32] := SaturateU16(a[95:64])
dst[63:48] := SaturateU16(a[127:96])
dst[79:64] := SaturateU16(b[31:0])
dst[95:80] := SaturateU16(b[63:32])
dst[111:96] := SaturateU16(b[95:64])
dst[127:112] := SaturateU16(b[127:96])
dst[143:128] := SaturateU16(a[159:128])
dst[159:144] := SaturateU16(a[191:160])
dst[175:160] := SaturateU16(a[223:192])
dst[191:176] := SaturateU16(a[255:224])
dst[207:192] := SaturateU16(b[159:128])
dst[223:208] := SaturateU16(b[191:160])
dst[239:224] := SaturateU16(b[223:192])
dst[255:240] := SaturateU16(b[255:224])
dst[271:256] := SaturateU16(a[287:256])
dst[287:272] := SaturateU16(a[319:288])
dst[303:288] := SaturateU16(a[351:320])
dst[319:304] := SaturateU16(a[383:352])
dst[335:320] := SaturateU16(b[287:256])
dst[351:336] := SaturateU16(b[319:288])
dst[367:352] := SaturateU16(b[351:320])
dst[383:368] := SaturateU16(b[383:352])
dst[399:384] := SaturateU16(a[415:384])
dst[415:400] := SaturateU16(a[447:416])
dst[431:416] := SaturateU16(a[479:448])
dst[447:432] := SaturateU16(a[511:480])
dst[463:448] := SaturateU16(b[415:384])
dst[479:464] := SaturateU16(b[447:416])
dst[495:480] := SaturateU16(b[479:448])
dst[511:496] := SaturateU16(b[511:480])
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[7:0] := SaturateU8(a[15:0])
tmp_dst[15:8] := SaturateU8(a[31:16])
tmp_dst[23:16] := SaturateU8(a[47:32])
tmp_dst[31:24] := SaturateU8(a[63:48])
tmp_dst[39:32] := SaturateU8(a[79:64])
tmp_dst[47:40] := SaturateU8(a[95:80])
tmp_dst[55:48] := SaturateU8(a[111:96])
tmp_dst[63:56] := SaturateU8(a[127:112])
tmp_dst[71:64] := SaturateU8(b[15:0])
tmp_dst[79:72] := SaturateU8(b[31:16])
tmp_dst[87:80] := SaturateU8(b[47:32])
tmp_dst[95:88] := SaturateU8(b[63:48])
tmp_dst[103:96] := SaturateU8(b[79:64])
tmp_dst[111:104] := SaturateU8(b[95:80])
tmp_dst[119:112] := SaturateU8(b[111:96])
tmp_dst[127:120] := SaturateU8(b[127:112])
tmp_dst[135:128] := SaturateU8(a[143:128])
tmp_dst[143:136] := SaturateU8(a[159:144])
tmp_dst[151:144] := SaturateU8(a[175:160])
tmp_dst[159:152] := SaturateU8(a[191:176])
tmp_dst[167:160] := SaturateU8(a[207:192])
tmp_dst[175:168] := SaturateU8(a[223:208])
tmp_dst[183:176] := SaturateU8(a[239:224])
tmp_dst[191:184] := SaturateU8(a[255:240])
tmp_dst[199:192] := SaturateU8(b[143:128])
tmp_dst[207:200] := SaturateU8(b[159:144])
tmp_dst[215:208] := SaturateU8(b[175:160])
tmp_dst[223:216] := SaturateU8(b[191:176])
tmp_dst[231:224] := SaturateU8(b[207:192])
tmp_dst[239:232] := SaturateU8(b[223:208])
tmp_dst[247:240] := SaturateU8(b[239:224])
tmp_dst[255:248] := SaturateU8(b[255:240])
tmp_dst[263:256] := SaturateU8(a[271:256])
tmp_dst[271:264] := SaturateU8(a[287:272])
tmp_dst[279:272] := SaturateU8(a[303:288])
tmp_dst[287:280] := SaturateU8(a[319:304])
tmp_dst[295:288] := SaturateU8(a[335:320])
tmp_dst[303:296] := SaturateU8(a[351:336])
tmp_dst[311:304] := SaturateU8(a[367:352])
tmp_dst[319:312] := SaturateU8(a[383:368])
tmp_dst[327:320] := SaturateU8(b[271:256])
tmp_dst[335:328] := SaturateU8(b[287:272])
tmp_dst[343:336] := SaturateU8(b[303:288])
tmp_dst[351:344] := SaturateU8(b[319:304])
tmp_dst[359:352] := SaturateU8(b[335:320])
tmp_dst[367:360] := SaturateU8(b[351:336])
tmp_dst[375:368] := SaturateU8(b[367:352])
tmp_dst[383:376] := SaturateU8(b[383:368])
tmp_dst[391:384] := SaturateU8(a[399:384])
tmp_dst[399:392] := SaturateU8(a[415:400])
tmp_dst[407:400] := SaturateU8(a[431:416])
tmp_dst[415:408] := SaturateU8(a[447:432])
tmp_dst[423:416] := SaturateU8(a[463:448])
tmp_dst[431:424] := SaturateU8(a[479:464])
tmp_dst[439:432] := SaturateU8(a[495:480])
tmp_dst[447:440] := SaturateU8(a[511:496])
tmp_dst[455:448] := SaturateU8(b[399:384])
tmp_dst[463:456] := SaturateU8(b[415:400])
tmp_dst[471:464] := SaturateU8(b[431:416])
tmp_dst[479:472] := SaturateU8(b[447:432])
tmp_dst[487:480] := SaturateU8(b[463:448])
tmp_dst[495:488] := SaturateU8(b[479:464])
tmp_dst[503:496] := SaturateU8(b[495:480])
tmp_dst[511:504] := SaturateU8(b[511:496])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[7:0] := SaturateU8(a[15:0])
tmp_dst[15:8] := SaturateU8(a[31:16])
tmp_dst[23:16] := SaturateU8(a[47:32])
tmp_dst[31:24] := SaturateU8(a[63:48])
tmp_dst[39:32] := SaturateU8(a[79:64])
tmp_dst[47:40] := SaturateU8(a[95:80])
tmp_dst[55:48] := SaturateU8(a[111:96])
tmp_dst[63:56] := SaturateU8(a[127:112])
tmp_dst[71:64] := SaturateU8(b[15:0])
tmp_dst[79:72] := SaturateU8(b[31:16])
tmp_dst[87:80] := SaturateU8(b[47:32])
tmp_dst[95:88] := SaturateU8(b[63:48])
tmp_dst[103:96] := SaturateU8(b[79:64])
tmp_dst[111:104] := SaturateU8(b[95:80])
tmp_dst[119:112] := SaturateU8(b[111:96])
tmp_dst[127:120] := SaturateU8(b[127:112])
tmp_dst[135:128] := SaturateU8(a[143:128])
tmp_dst[143:136] := SaturateU8(a[159:144])
tmp_dst[151:144] := SaturateU8(a[175:160])
tmp_dst[159:152] := SaturateU8(a[191:176])
tmp_dst[167:160] := SaturateU8(a[207:192])
tmp_dst[175:168] := SaturateU8(a[223:208])
tmp_dst[183:176] := SaturateU8(a[239:224])
tmp_dst[191:184] := SaturateU8(a[255:240])
tmp_dst[199:192] := SaturateU8(b[143:128])
tmp_dst[207:200] := SaturateU8(b[159:144])
tmp_dst[215:208] := SaturateU8(b[175:160])
tmp_dst[223:216] := SaturateU8(b[191:176])
tmp_dst[231:224] := SaturateU8(b[207:192])
tmp_dst[239:232] := SaturateU8(b[223:208])
tmp_dst[247:240] := SaturateU8(b[239:224])
tmp_dst[255:248] := SaturateU8(b[255:240])
tmp_dst[263:256] := SaturateU8(a[271:256])
tmp_dst[271:264] := SaturateU8(a[287:272])
tmp_dst[279:272] := SaturateU8(a[303:288])
tmp_dst[287:280] := SaturateU8(a[319:304])
tmp_dst[295:288] := SaturateU8(a[335:320])
tmp_dst[303:296] := SaturateU8(a[351:336])
tmp_dst[311:304] := SaturateU8(a[367:352])
tmp_dst[319:312] := SaturateU8(a[383:368])
tmp_dst[327:320] := SaturateU8(b[271:256])
tmp_dst[335:328] := SaturateU8(b[287:272])
tmp_dst[343:336] := SaturateU8(b[303:288])
tmp_dst[351:344] := SaturateU8(b[319:304])
tmp_dst[359:352] := SaturateU8(b[335:320])
tmp_dst[367:360] := SaturateU8(b[351:336])
tmp_dst[375:368] := SaturateU8(b[367:352])
tmp_dst[383:376] := SaturateU8(b[383:368])
tmp_dst[391:384] := SaturateU8(a[399:384])
tmp_dst[399:392] := SaturateU8(a[415:400])
tmp_dst[407:400] := SaturateU8(a[431:416])
tmp_dst[415:408] := SaturateU8(a[447:432])
tmp_dst[423:416] := SaturateU8(a[463:448])
tmp_dst[431:424] := SaturateU8(a[479:464])
tmp_dst[439:432] := SaturateU8(a[495:480])
tmp_dst[447:440] := SaturateU8(a[511:496])
tmp_dst[455:448] := SaturateU8(b[399:384])
tmp_dst[463:456] := SaturateU8(b[415:400])
tmp_dst[471:464] := SaturateU8(b[431:416])
tmp_dst[479:472] := SaturateU8(b[447:432])
tmp_dst[487:480] := SaturateU8(b[463:448])
tmp_dst[495:488] := SaturateU8(b[479:464])
tmp_dst[503:496] := SaturateU8(b[495:480])
tmp_dst[511:504] := SaturateU8(b[511:496])
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := tmp_dst[i+7:i]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
dst[7:0] := SaturateU8(a[15:0])
dst[15:8] := SaturateU8(a[31:16])
dst[23:16] := SaturateU8(a[47:32])
dst[31:24] := SaturateU8(a[63:48])
dst[39:32] := SaturateU8(a[79:64])
dst[47:40] := SaturateU8(a[95:80])
dst[55:48] := SaturateU8(a[111:96])
dst[63:56] := SaturateU8(a[127:112])
dst[71:64] := SaturateU8(b[15:0])
dst[79:72] := SaturateU8(b[31:16])
dst[87:80] := SaturateU8(b[47:32])
dst[95:88] := SaturateU8(b[63:48])
dst[103:96] := SaturateU8(b[79:64])
dst[111:104] := SaturateU8(b[95:80])
dst[119:112] := SaturateU8(b[111:96])
dst[127:120] := SaturateU8(b[127:112])
dst[135:128] := SaturateU8(a[143:128])
dst[143:136] := SaturateU8(a[159:144])
dst[151:144] := SaturateU8(a[175:160])
dst[159:152] := SaturateU8(a[191:176])
dst[167:160] := SaturateU8(a[207:192])
dst[175:168] := SaturateU8(a[223:208])
dst[183:176] := SaturateU8(a[239:224])
dst[191:184] := SaturateU8(a[255:240])
dst[199:192] := SaturateU8(b[143:128])
dst[207:200] := SaturateU8(b[159:144])
dst[215:208] := SaturateU8(b[175:160])
dst[223:216] := SaturateU8(b[191:176])
dst[231:224] := SaturateU8(b[207:192])
dst[239:232] := SaturateU8(b[223:208])
dst[247:240] := SaturateU8(b[239:224])
dst[255:248] := SaturateU8(b[255:240])
dst[263:256] := SaturateU8(a[271:256])
dst[271:264] := SaturateU8(a[287:272])
dst[279:272] := SaturateU8(a[303:288])
dst[287:280] := SaturateU8(a[319:304])
dst[295:288] := SaturateU8(a[335:320])
dst[303:296] := SaturateU8(a[351:336])
dst[311:304] := SaturateU8(a[367:352])
dst[319:312] := SaturateU8(a[383:368])
dst[327:320] := SaturateU8(b[271:256])
dst[335:328] := SaturateU8(b[287:272])
dst[343:336] := SaturateU8(b[303:288])
dst[351:344] := SaturateU8(b[319:304])
dst[359:352] := SaturateU8(b[335:320])
dst[367:360] := SaturateU8(b[351:336])
dst[375:368] := SaturateU8(b[367:352])
dst[383:376] := SaturateU8(b[383:368])
dst[391:384] := SaturateU8(a[399:384])
dst[399:392] := SaturateU8(a[415:400])
dst[407:400] := SaturateU8(a[431:416])
dst[415:408] := SaturateU8(a[447:432])
dst[423:416] := SaturateU8(a[463:448])
dst[431:424] := SaturateU8(a[479:464])
dst[439:432] := SaturateU8(a[495:480])
dst[447:440] := SaturateU8(a[511:496])
dst[455:448] := SaturateU8(b[399:384])
dst[463:456] := SaturateU8(b[415:400])
dst[471:464] := SaturateU8(b[431:416])
dst[479:472] := SaturateU8(b[447:432])
dst[487:480] := SaturateU8(b[463:448])
dst[495:488] := SaturateU8(b[479:464])
dst[503:496] := SaturateU8(b[495:480])
dst[511:504] := SaturateU8(b[511:496])
dst[MAX:512] := 0
AVX512BW
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 31
i := 16*j
l := 8*j
dst[l+7:l] := Saturate8(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Store
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i])
FI
ENDFOR
AVX512BW
Convert
Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 to 31
i := j*8
l := j*16
dst[l+15:l] := SignExtend16(a[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := SignExtend16(a[i+7:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := SignExtend16(a[i+7:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 31
i := 16*j
l := 8*j
dst[l+7:l] := SaturateU8(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Store
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i])
FI
ENDFOR
AVX512BW
Convert
Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 31
i := 16*j
l := 8*j
dst[l+7:l] := Truncate8(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+15:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Store
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i])
FI
ENDFOR
AVX512BW
Convert
Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := 16*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+15:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512BW
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 to 31
i := j*8
l := j*16
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
l := j*16
IF k[j]
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Convert
Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Set
Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[7:0]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Set
Broadcast 16-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Set
Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[15:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Set
Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 63
i := j*8
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 31
i := j*16
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 63
i := j*8
k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 31
i := j*16
k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 63
i := j*8
IF k1[j]
k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 63
i := j*8
k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:64] := 0
AVX512BW
Compare
Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 31
i := j*16
IF k1[j]
k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 31
i := j*16
k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512BW
Compare
Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] << (tmp*8)
dst[255:128] := a[255:128] << (tmp*8)
dst[383:256] := a[383:256] << (tmp*8)
dst[511:384] := a[511:384] << (tmp*8)
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0)
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] >> (tmp*8)
dst[255:128] := a[255:128] >> (tmp*8)
dst[383:256] := a[383:256] >> (tmp*8)
dst[511:384] := a[511:384] >> (tmp*8)
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF count[i+15:i] < 16
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 31
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512BW
Shift
Add 32-bit masks in "a" and "b", and store the result in "k".
k[31:0] := a[31:0] + b[31:0]
k[MAX:32] := 0
AVX512BW
Mask
Add 64-bit masks in "a" and "b", and store the result in "k".
k[63:0] := a[63:0] + b[63:0]
k[MAX:64] := 0
AVX512BW
Mask
Compute the bitwise AND of 32-bit masks "a" and "b", and store the result in "k".
k[31:0] := a[31:0] AND b[31:0]
k[MAX:32] := 0
AVX512BW
Mask
Compute the bitwise AND of 64-bit masks "a" and "b", and store the result in "k".
k[63:0] := a[63:0] AND b[63:0]
k[MAX:64] := 0
AVX512BW
Mask
Compute the bitwise NOT of 32-bit masks "a" and then AND with "b", and store the result in "k".
k[31:0] := (NOT a[31:0]) AND b[31:0]
k[MAX:32] := 0
AVX512BW
Mask
Compute the bitwise NOT of 64-bit masks "a" and then AND with "b", and store the result in "k".
k[63:0] := (NOT a[63:0]) AND b[63:0]
k[MAX:64] := 0
AVX512BW
Mask
Compute the bitwise NOT of 32-bit mask "a", and store the result in "k".
k[31:0] := NOT a[31:0]
k[MAX:32] := 0
AVX512BW
Mask
Compute the bitwise NOT of 64-bit mask "a", and store the result in "k".
k[63:0] := NOT a[63:0]
k[MAX:64] := 0
AVX512BW
Mask
Compute the bitwise OR of 32-bit masks "a" and "b", and store the result in "k".
k[31:0] := a[31:0] OR b[31:0]
k[MAX:32] := 0
AVX512BW
Mask
Compute the bitwise OR of 64-bit masks "a" and "b", and store the result in "k".
k[63:0] := a[63:0] OR b[63:0]
k[MAX:64] := 0
AVX512BW
Mask
Compute the bitwise XNOR of 32-bit masks "a" and "b", and store the result in "k".
k[31:0] := NOT (a[31:0] XOR b[31:0])
k[MAX:32] := 0
AVX512BW
Mask
Compute the bitwise XNOR of 64-bit masks "a" and "b", and store the result in "k".
k[63:0] := NOT (a[63:0] XOR b[63:0])
k[MAX:64] := 0
AVX512BW
Mask
Compute the bitwise XOR of 32-bit masks "a" and "b", and store the result in "k".
k[31:0] := a[31:0] XOR b[31:0]
k[MAX:32] := 0
AVX512BW
Mask
Compute the bitwise XOR of 64-bit masks "a" and "b", and store the result in "k".
k[63:0] := a[63:0] XOR b[63:0]
k[MAX:64] := 0
AVX512BW
Mask
Shift the bits of 32-bit mask "a" left by "count" while shifting in zeros, and store the least significant 32 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 31
k[31:0] := a[31:0] << count[7:0]
FI
AVX512BW
Mask
Shift the bits of 64-bit mask "a" left by "count" while shifting in zeros, and store the least significant 64 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 63
k[63:0] := a[63:0] << count[7:0]
FI
AVX512BW
Mask
Shift the bits of 32-bit mask "a" right by "count" while shifting in zeros, and store the least significant 32 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 31
k[31:0] := a[31:0] >> count[7:0]
FI
AVX512BW
Mask
Shift the bits of 64-bit mask "a" right by "count" while shifting in zeros, and store the least significant 64 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 63
k[63:0] := a[63:0] >> count[7:0]
FI
AVX512BW
Mask
Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".
tmp[31:0] := a[31:0] OR b[31:0]
IF tmp[31:0] == 0x0
dst := 1
ELSE
dst := 0
FI
IF tmp[31:0] == 0xFFFFFFFF
MEM[all_ones+7:all_ones] := 1
ELSE
MEM[all_ones+7:all_ones] := 0
FI
AVX512BW
Mask
Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[31:0] := a[31:0] OR b[31:0]
IF tmp[31:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".
tmp[31:0] := a[31:0] OR b[31:0]
IF tmp[31:0] == 0xFFFFFFFF
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".
tmp[63:0] := a[63:0] OR b[63:0]
IF tmp[63:0] == 0x0
dst := 1
ELSE
dst := 0
FI
IF tmp[7:0] == 0xFFFFFFFFFFFFFFFF
MEM[all_ones+7:all_ones] := 1
ELSE
MEM[all_ones+7:all_ones] := 0
FI
AVX512BW
Mask
Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[63:0] := a[63:0] OR b[63:0]
IF tmp[63:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".
tmp[63:0] := a[63:0] OR b[63:0]
IF tmp[63:0] == 0xFFFFFFFFFFFFFFFF
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise AND of 32-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".
tmp1[31:0] := a[31:0] AND b[31:0]
IF tmp1[31:0] == 0x0
dst := 1
ELSE
dst := 0
FI
tmp2[31:0] := (NOT a[31:0]) AND b[31:0]
IF tmp2[31:0] == 0x0
MEM[and_not+7:and_not] := 1
ELSE
MEM[and_not+7:and_not] := 0
FI
AVX512BW
Mask
Compute the bitwise AND of 32-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".
tmp[31:0] := a[31:0] AND b[31:0]
IF tmp[31:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise NOT of 32-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[31:0] := (NOT a[31:0]) AND b[31:0]
IF tmp[31:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise AND of 64-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".
tmp1[63:0] := a[63:0] AND b[63:0]
IF tmp1[63:0] == 0x0
dst := 1
ELSE
dst := 0
FI
tmp2[63:0] := (NOT a[63:0]) AND b[63:0]
IF tmp2[63:0] == 0x0
MEM[and_not+7:and_not] := 1
ELSE
MEM[and_not+7:and_not] := 0
FI
AVX512BW
Mask
Compute the bitwise AND of 64-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".
tmp[63:0] := a[63:0] AND b[63:0]
IF tmp[63:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Compute the bitwise NOT of 64-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[63:0] := (NOT a[63:0]) AND b[63:0]
IF tmp[63:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512BW
Mask
Convert 32-bit mask "a" into an integer value, and store the result in "dst".
dst := ZeroExtend32(a[31:0])
AVX512BW
Mask
Convert 64-bit mask "a" into an integer value, and store the result in "dst".
dst := ZeroExtend64(a[63:0])
AVX512BW
Mask
Convert integer value "a" into an 32-bit mask, and store the result in "k".
k := ZeroExtend32(a[31:0])
AVX512BW
Mask
Convert integer value "a" into an 64-bit mask, and store the result in "k".
k := ZeroExtend64(a[63:0])
AVX512BW
Mask
Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ZeroExtend64(k[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Miscellaneous
Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ZeroExtend64(k[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Miscellaneous
Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ZeroExtend32(k[15:0])
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Miscellaneous
Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ZeroExtend32(k[15:0])
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Miscellaneous
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 7
i := j*32
FOR k := 0 to j-1
m := k*32
dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 7
i := j*32
IF k[j]
FOR l := 0 to j-1
m := l*32
dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 7
i := j*32
IF k[j]
FOR l := 0 to j-1
m := l*32
dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 3
i := j*32
FOR k := 0 to j-1
m := k*32
dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 3
i := j*32
IF k[j]
FOR l := 0 to j-1
m := l*32
dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 3
i := j*32
IF k[j]
FOR l := 0 to j-1
m := l*32
dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 3
i := j*64
FOR k := 0 to j-1
m := k*64
dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 3
i := j*64
IF k[j]
FOR l := 0 to j-1
m := l*64
dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 3
i := j*64
IF k[j]
FOR l := 0 to j-1
m := l*64
dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 1
i := j*64
FOR k := 0 to j-1
m := k*64
dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 1
i := j*64
IF k[j]
FOR l := 0 to j-1
m := l*64
dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 1
i := j*64
IF k[j]
FOR l := 0 to j-1
m := l*64
dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Compare
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*32
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512CD
AVX512VL
Bit Manipulation
Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ZeroExtend64(k[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512CD
Swizzle
Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ZeroExtend32(k[15:0])
ENDFOR
dst[MAX:512] := 0
AVX512CD
Swizzle
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 15
i := j*32
FOR k := 0 to j-1
m := k*32
dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ENDFOR
dst[MAX:512] := 0
AVX512CD
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 15
i := j*32
IF k[j]
FOR l := 0 to j-1
m := l*32
dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Compare
Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 15
i := j*32
IF k[j]
FOR l := 0 to j-1
m := l*32
dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0
ENDFOR
dst[i+31:i+j] := 0
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 7
i := j*64
FOR k := 0 to j-1
m := k*64
dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ENDFOR
dst[MAX:512] := 0
AVX512CD
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 7
i := j*64
IF k[j]
FOR l := 0 to j-1
m := l*64
dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Compare
Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst".
FOR j := 0 to 7
i := j*64
IF k[j]
FOR l := 0 to j-1
m := l*64
dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0
ENDFOR
dst[i+63:i+j] := 0
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Compare
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ENDFOR
dst[MAX:512] := 0
AVX512CD
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Bit Manipulation
Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp := 31
dst[i+31:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+31:i] := dst[i+31:i] + 1
OD
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ENDFOR
dst[MAX:512] := 0
AVX512CD
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Bit Manipulation
Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp := 63
dst[i+63:i] := 0
DO WHILE (tmp >= 0 AND a[i+tmp] == 0)
tmp := tmp - 1
dst[i+63:i] := dst[i+63:i] + 1
OD
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512CD
Bit Manipulation
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Logical
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*32
n := (j % 2)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 3
i := j*64
n := (j % 2)*64
dst[i+63:i] := a[n+63:n]
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst.
FOR j := 0 to 7
i := j*32
n := (j % 2)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst.
FOR j := 0 to 3
i := j*32
n := (j % 2)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst".
FOR j := 0 to 3
i := j*64
n := (j % 2)*64
dst[i+63:i] := a[n+63:n]
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR j := 0 to 3
i := j*64
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ENDFOR
k[MAX:4] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR j := 0 to 1
i := j*64
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ENDFOR
k[MAX:2] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR j := 0 to 7
i := j*32
k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
ENDFOR
k[MAX:8] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR j := 0 to 3
i := j*32
k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
ENDFOR
k[MAX:4] := 0
AVX512DQ
AVX512VL
Miscellaneous
Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512DQ
AVX512VL
Miscellaneous
Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE imm8[0] OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Copy "a" to "dst", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE imm8[0] OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a".
FOR j := 0 to 7
i := j*32
IF a[i+31]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a".
FOR j := 0 to 3
i := j*32
IF a[i+31]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := 0xFFFFFFFF
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := 0xFFFFFFFF
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := 0xFFFFFFFFFFFFFFFF
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := 0xFFFFFFFFFFFFFFFF
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a".
FOR j := 0 to 3
i := j*64
IF a[i+63]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512DQ
AVX512VL
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a".
FOR j := 0 to 1
i := j*64
IF a[i+63]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
RETURN tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Miscellaneous
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512DQ
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512DQ
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512DQ
AVX512VL
Convert
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ENDFOR
dst[MAX:256] := 0
AVX512DQ
AVX512VL
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ENDFOR
dst[MAX:128] := 0
AVX512DQ
AVX512VL
Arithmetic
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Logical
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
n := (j % 2)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
n := (j % 8)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 8)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 8)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
n := (j % 2)*64
dst[i+63:i] := a[n+63:n]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst.
FOR j := 0 to 15
i := j*32
n := (j % 2)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 2)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
n := (j % 8)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 8)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 8)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
n := (j % 2)*64
dst[i+63:i] := a[n+63:n]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 2)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[255:0] := a[255:0]
1: dst[255:0] := a[511:256]
ESAC
dst[MAX:256] := 0
AVX512DQ
Miscellaneous
Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Miscellaneous
Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Miscellaneous
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[1:0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
2: dst[127:0] := a[383:256]
3: dst[127:0] := a[511:384]
ESAC
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[255:0] := a[255:0]
1: dst[255:0] := a[511:256]
ESAC
dst[MAX:256] := 0
AVX512DQ
Miscellaneous
Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Miscellaneous
Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Miscellaneous
Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[1:0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
2: dst[127:0] := a[383:256]
3: dst[127:0] := a[511:384]
ESAC
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR j := 0 to 7
i := j*64
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ENDFOR
k[MAX:8] := 0
AVX512DQ
Miscellaneous
Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512DQ
Miscellaneous
Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR j := 0 to 15
i := j*32
k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
ENDFOR
k[MAX:16] := 0
AVX512DQ
Miscellaneous
Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512DQ
Miscellaneous
Test the lower double-precision (64-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k".
[fpclass_note]
k[0] := CheckFPClass_FP64(a[63:0], imm8[7:0])
k[MAX:1] := 0
AVX512DQ
Miscellaneous
Test the lower double-precision (64-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
[fpclass_note]
IF k1[0]
k[0] := CheckFPClass_FP64(a[63:0], imm8[7:0])
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512DQ
Miscellaneous
Test the lower single-precision (32-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k.
[fpclass_note]
k[0] := CheckFPClass_FP32(a[31:0], imm8[7:0])
k[MAX:1] := 0
AVX512DQ
Miscellaneous
Test the lower single-precision (32-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
[fpclass_note]
IF k1[0]
k[0] := CheckFPClass_FP32(a[31:0], imm8[7:0])
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512DQ
Miscellaneous
Copy "a" to "dst", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE (imm8[0]) OF
0: dst[255:0] := b[255:0]
1: dst[511:256] := b[255:0]
ESAC
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE imm8[1:0] OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
2: dst[383:256] := b[127:0]
3: dst[511:384] := b[127:0]
ESAC
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "dst", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE imm8[0] OF
0: dst[255:0] := b[255:0]
1: dst[511:256] := b[255:0]
ESAC
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "dst", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE imm8[1:0] OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
2: dst[383:256] := b[127:0]
3: dst[511:384] := b[127:0]
ESAC
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a".
FOR j := 0 to 15
i := j*32
IF a[i+31]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512DQ
Miscellaneous
Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := 0xFFFFFFFF
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k".
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := 0xFFFFFFFFFFFFFFFF
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a".
FOR j := 0 to 7
i := j*64
IF a[i+63]
k[j] := 1
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[63:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
IF k[0]
dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
IF k[0]
dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
IF k[0]
dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
IF k[0]
dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0]
1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0]
2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0]
3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0])
1: dst[63:0] := tmp[63:0]
2: dst[63:0] := (0 << 63) OR (tmp[62:0])
3: dst[63:0] := (1 << 63) OR (tmp[62:0])
ESAC
RETURN dst
}
dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[31:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
IF k[0]
dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[31:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
IF k[0]
dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[31:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
IF k[0]
dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit.
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[31:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
IF k[0]
dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute min, 11 = absolute max.
imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note]
DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) {
CASE opCtl[1:0] OF
0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0]
1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0]
2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0]
3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0]
ESAC
CASE signSelCtl[1:0] OF
0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0])
1: dst[31:0] := tmp[31:0]
2: dst[31:0] := (0 << 31) OR (tmp[30:0])
3: dst[31:0] := (1 << 31) OR (tmp[30:0])
ESAC
RETURN dst
}
dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
tmp[63:0] := src1[63:0] - tmp[63:0]
IF IsInf(tmp[63:0])
tmp[63:0] := FP64(0.0)
FI
RETURN tmp[63:0]
}
dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
tmp[31:0] := src1[31:0] - tmp[31:0]
IF IsInf(tmp[31:0])
tmp[31:0] := FP32(0.0)
FI
RETURN tmp[31:0]
}
dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512DQ
Miscellaneous
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512DQ
Convert
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Arithmetic
Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst".
FOR j := 0 to 7
i := j*64
tmp[127:0] := a[i+63:i] * b[i+63:i]
dst[i+63:i] := tmp[63:0]
ENDFOR
dst[MAX:512] := 0
AVX512DQ
Arithmetic
Add 8-bit masks in "a" and "b", and store the result in "k".
k[7:0] := a[7:0] + b[7:0]
k[MAX:8] := 0
AVX512DQ
Mask
Add 16-bit masks in "a" and "b", and store the result in "k".
k[15:0] := a[15:0] + b[15:0]
k[MAX:16] := 0
AVX512DQ
Mask
Compute the bitwise AND of 8-bit masks "a" and "b", and store the result in "k".
k[7:0] := a[7:0] AND b[7:0]
k[MAX:8] := 0
AVX512DQ
Mask
Compute the bitwise NOT of 8-bit masks "a" and then AND with "b", and store the result in "k".
k[7:0] := (NOT a[7:0]) AND b[7:0]
k[MAX:8] := 0
AVX512DQ
Mask
Compute the bitwise NOT of 8-bit mask "a", and store the result in "k".
k[7:0] := NOT a[7:0]
k[MAX:8] := 0
AVX512DQ
Mask
Compute the bitwise OR of 8-bit masks "a" and "b", and store the result in "k".
k[7:0] := a[7:0] OR b[7:0]
k[MAX:8] := 0
AVX512DQ
Mask
Compute the bitwise XNOR of 8-bit masks "a" and "b", and store the result in "k".
k[7:0] := NOT (a[7:0] XOR b[7:0])
k[MAX:8] := 0
AVX512DQ
Mask
Compute the bitwise XOR of 8-bit masks "a" and "b", and store the result in "k".
k[7:0] := a[7:0] XOR b[7:0]
k[MAX:8] := 0
AVX512DQ
Mask
Shift the bits of 8-bit mask "a" left by "count" while shifting in zeros, and store the least significant 8 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 7
k[7:0] := a[7:0] << count[7:0]
FI
AVX512DQ
Mask
Shift the bits of 8-bit mask "a" right by "count" while shifting in zeros, and store the least significant 8 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 7
k[7:0] := a[7:0] >> count[7:0]
FI
AVX512DQ
Mask
Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".
tmp[7:0] := a[7:0] OR b[7:0]
IF tmp[7:0] == 0x0
dst := 1
ELSE
dst := 0
FI
IF tmp[7:0] == 0xFF
MEM[all_ones+7:all_ones] := 1
ELSE
MEM[all_ones+7:all_ones] := 0
FI
AVX512DQ
Mask
Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[7:0] := a[7:0] OR b[7:0]
IF tmp[7:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512DQ
Mask
Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".
tmp[7:0] := a[7:0] OR b[7:0]
IF tmp[7:0] == 0xFF
dst := 1
ELSE
dst := 0
FI
AVX512DQ
Mask
Compute the bitwise AND of 8-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".
tmp1[7:0] := a[7:0] AND b[7:0]
IF tmp1[7:0] == 0x0
dst := 1
ELSE
dst := 0
FI
tmp2[7:0] := (NOT a[7:0]) AND b[7:0]
IF tmp2[7:0] == 0x0
MEM[and_not+7:and_not] := 1
ELSE
MEM[and_not+7:and_not] := 0
FI
AVX512DQ
Mask
Compute the bitwise AND of 8-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".
tmp[7:0] := a[7:0] AND b[7:0]
IF tmp[7:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512DQ
Mask
Compute the bitwise NOT of 8-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[7:0] := (NOT a[7:0]) AND b[7:0]
IF tmp[7:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512DQ
Mask
Compute the bitwise AND of 16-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not".
tmp1[15:0] := a[15:0] AND b[15:0]
IF tmp1[15:0] == 0x0
dst := 1
ELSE
dst := 0
FI
tmp2[15:0] := (NOT a[15:0]) AND b[15:0]
IF tmp2[15:0] == 0x0
MEM[and_not+7:and_not] := 1
ELSE
MEM[and_not+7:and_not] := 0
FI
AVX512DQ
Mask
Compute the bitwise AND of 16-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst".
tmp[15:0] := a[15:0] AND b[15:0]
IF tmp[15:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512DQ
Mask
Compute the bitwise NOT of 16-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[15:0] := (NOT a[15:0]) AND b[15:0]
IF tmp[15:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512DQ
Mask
Convert 8-bit mask "a" into an integer value, and store the result in "dst".
dst := ZeroExtend32(a[7:0])
AVX512DQ
Mask
Convert integer value "a" into an 8-bit mask, and store the result in "k".
k := a[7:0]
AVX512DQ
Mask
Load 8-bit mask from memory into "k".
k[7:0] := MEM[mem_addr+7:mem_addr]
AVX512DQ
Load
Store 8-bit mask from "a" into memory.
MEM[mem_addr+7:mem_addr] := a[7:0]
AVX512DQ
Store
Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ACOS(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ACOS(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ACOS(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ACOS(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ACOSH(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ACOSH(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ACOSH(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ACOSH(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ASIN(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ASIN(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ASIN(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ASIN(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ASINH(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ASINH(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ASINH(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ASINH(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" and store the results in "dst" expressed in radians.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ATAN(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ATAN(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ATAN(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ATAN(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" and store the results in "dst" expressed in radians.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ATANH(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ATANH(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperblic tangent of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ATANH(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ATANH(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := COS(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := COS(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := COS(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := COS(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := COSD(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := COSD(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := COSD(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := COSD(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := COSH(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := COSH(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := COSH(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := COSH(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SIN(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SIN(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SIN(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SIN(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SINH(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SINH(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SINH(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SINH(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SIND(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SIND(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SIND(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SIND(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := TAN(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := TAN(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := TAN(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := TAN(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := TAND(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := TAND(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := TAND(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := TAND(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := TANH(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := TANH(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := TANH(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := TANH(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SIN(a[i+63:i])
MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
cos_res[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SIN(a[i+63:i])
MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
ELSE
dst[i+63:i] := sin_src[i+63:i]
MEM[mem_addr+i+63:mem_addr+i] := cos_src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
cos_res[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SIN(a[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
cos_res[MAX:512] := 0
AVX512F
Trigonometry
Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SIN(a[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
ELSE
dst[i+31:i] := sin_src[i+31:i]
MEM[mem_addr+i+31:mem_addr+i] := cos_src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
cos_res[MAX:512] := 0
AVX512F
Trigonometry
Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := CubeRoot(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := CubeRoot(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := CubeRoot(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := CubeRoot(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := POW(10.0, a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POW(10.0, a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := POW(2.0, a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POW(2.0, a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := POW(e, a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POW(e, a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := POW(FP32(e), a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POW(FP32(e), a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := InvSQRT(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := InvSQRT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := InvSQRT(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := InvSQRT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := LOG(1.0 + a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LOG(1.0 + a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := LOG(1.0 + a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LOG(1.0 + a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := LOG(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LOG(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := LOG(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LOG(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Computes the reciprocal of packed double-precision (64-bit) floating-point elements in "a", storing the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (1.0 / a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Computes the reciprocal of packed double-precision (64-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Computes the reciprocal of packed single-precision (32-bit) floating-point elements in "a", storing the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (1.0 / a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Computes the reciprocal of packed single-precision (32-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := CDFNormal(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := CDFNormal(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := CDFNormal(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := CDFNormal(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := InverseCDFNormal(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := InverseCDFNormal(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := InverseCDFNormal(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := InverseCDFNormal(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ERF(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ERF(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := 1.0 - ERF(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := 1.0 - ERF(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ERF(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ERF(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+63:i] := 1.0 - ERF(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+63:i] := 1.0 - ERF(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := 1.0 / ERF(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := 1.0 / ERF(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+63:i] := 1.0 / ERF(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+63:i] := 1.0 / ERF(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Probability/Statistics
Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := CEIL(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := CEIL(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := CEIL(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := CEIL(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := FLOOR(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := FLOOR(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := FLOOR(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := FLOOR(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds each packed double-precision (64-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := NearbyInt(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds each packed double-precision (64-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := NearbyInt(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds each packed single-precision (32-bit) floating-point element in "a" to the nearest integer value and stores the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := NearbyInt(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds each packed single-precision (32-bit) floating-point element in "a" to the nearest integer value and stores the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := NearbyInt(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds the packed double-precision (64-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RoundToNearestEven(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds the packed double-precision (64-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RoundToNearestEven(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds the packed single-precision (32-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RoundToNearestEven(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Rounds the packed single-precision (32-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RoundToNearestEven(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ROUND(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ROUND(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := TRUNCATE(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := TRUNCATE(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := TRUNCATE(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := TRUNCATE(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 15
i := 32*j
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 63
i := 8*j
IF b[i+7:i] == 0
#DE
FI
dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 31
i := 16*j
IF b[i+15:i] == 0
#DE
FI
dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 64*j
IF b[i+63:i] == 0
#DE
FI
dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 63
i := 8*j
dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 31
i := 16*j
dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 7
i := 64*j
dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 15
i := 32*j
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 63
i := 8*j
IF b[i+7:i] == 0
#DE
FI
dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 31
i := 16*j
IF b[i+15:i] == 0
#DE
FI
dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 64*j
IF b[i+63:i] == 0
#DE
FI
dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 63
i := 8*j
dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 31
i := 16*j
dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 7
i := 64*j
dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). RM.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ABS(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ABS(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ABS(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ABS(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ABS(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ABS(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ABS(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ABS(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ABS(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ABS(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] :=0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := (1.0 / a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (1.0 / a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := (1.0 / a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (1.0 / a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Arithmetic
Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst".
temp[511:256] := a[255:0]
temp[255:0] := b[255:0]
temp[511:0] := temp[511:0] >> (32*imm8[2:0])
dst[255:0] := temp[255:0]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
temp[511:256] := a[255:0]
temp[255:0] := b[255:0]
temp[511:0] := temp[511:0] >> (32*imm8[2:0])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := temp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
temp[511:256] := a[255:0]
temp[255:0] := b[255:0]
temp[511:0] := temp[511:0] >> (32*imm8[2:0])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := temp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst".
temp[255:128] := a[127:0]
temp[127:0] := b[127:0]
temp[255:0] := temp[255:0] >> (32*imm8[1:0])
dst[127:0] := temp[127:0]
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
temp[255:128] := a[127:0]
temp[127:0] := b[127:0]
temp[255:0] := temp[255:0] >> (32*imm8[1:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := temp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
temp[255:128] := a[127:0]
temp[127:0] := b[127:0]
temp[255:0] := temp[255:0] >> (32*imm8[1:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := temp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst".
temp[511:256] := a[255:0]
temp[255:0] := b[255:0]
temp[511:0] := temp[511:0] >> (64*imm8[1:0])
dst[255:0] := temp[255:0]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
temp[511:256] := a[255:0]
temp[255:0] := b[255:0]
temp[511:0] := temp[511:0] >> (64*imm8[1:0])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := temp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
temp[511:256] := a[255:0]
temp[255:0] := b[255:0]
temp[511:0] := temp[511:0] >> (64*imm8[1:0])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := temp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst".
temp[255:128] := a[127:0]
temp[127:0] := b[127:0]
temp[255:0] := temp[255:0] >> (64*imm8[0])
dst[127:0] := temp[127:0]
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
temp[255:128] := a[127:0]
temp[127:0] := b[127:0]
temp[255:0] := temp[255:0] >> (64*imm8[0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := temp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
temp[255:128] := a[127:0]
temp[127:0] := b[127:0]
temp[255:0] := temp[255:0] >> (64*imm8[0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := temp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*32
n := (j % 4)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*32
n := (j % 4)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 64
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[255:m] := src[255:m]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 64
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[255:m] := 0
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 64
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[127:m] := src[127:m]
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 64
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[127:m] := 0
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 32
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[255:m] := src[255:m]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 32
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[255:m] := 0
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 32
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[127:m] := src[127:m]
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 32
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[127:m] := 0
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
ESAC
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN: j := 0
SNAN_TOKEN: j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 3
i := j*64
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 1
i := j*64
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 7
i := j*32
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 3
i := j*32
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE (imm8[0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Copy "a" to "dst", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8".
dst[255:0] := a[255:0]
CASE (imm8[0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
ESAC
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[255:0] := a[255:0]
CASE (imm8[0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
ESAC
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 32
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[255:m] := src[255:m]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 32
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[255:m] := 0
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 32
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[127:m] := src[127:m]
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 32
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[127:m] := 0
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 64
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[255:m] := src[255:m]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 64
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[255:m] := 0
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 64
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[127:m] := src[127:m]
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 64
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[127:m] := 0
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := idx[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := (idx[i+3]) ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
IF k[j]
dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := idx[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
IF k[j]
dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
IF k[j]
dst[i+31:i] := (idx[i+2]) ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := idx[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := (idx[i+2]) ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set)
FOR j := 0 to 1
i := j*64
off := idx[i]*64
IF k[j]
dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := idx[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
off := idx[i]*64
IF k[j]
dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
off := idx[i]*64
IF k[j]
dst[i+63:i] := (idx[i+1]) ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 1
i := j*64
off := idx[i]*64
dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := idx[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := (idx[i+3]) ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*32
off := idx[i+2:i]*32
dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
IF k[j]
dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := idx[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
IF k[j]
dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
IF k[j]
dst[i+31:i] := (idx[i+2]) ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 3
i := j*32
off := idx[i+1:i]*32
dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := idx[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := (idx[i+2]) ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 3
i := j*64
off := idx[i+1:i]*64
dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
off := idx[i]*64
IF k[j]
dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := idx[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
off := idx[i]*64
IF k[j]
dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
off := idx[i]*64
IF k[j]
dst[i+63:i] := (idx[i+1]) ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 1
i := j*64
off := idx[i]*64
dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
id := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
id := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
dst[63:0] := SELECT4(a[255:0], imm8[1:0])
dst[127:64] := SELECT4(a[255:0], imm8[3:2])
dst[191:128] := SELECT4(a[255:0], imm8[5:4])
dst[255:192] := SELECT4(a[255:0], imm8[7:6])
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 3
i := j*64
id := idx[i+1:i]*64
dst[i+63:i] := a[id+63:id]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".
FOR j := 0 to 7
i := j*32
id := idx[i+2:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" across lanes lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
id := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
id := idx[i+1:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
dst[63:0] := SELECT4(a[255:0], imm8[1:0])
dst[127:64] := SELECT4(a[255:0], imm8[3:2])
dst[191:128] := SELECT4(a[255:0], imm8[5:4])
dst[255:192] := SELECT4(a[255:0], imm8[7:6])
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 3
i := j*64
id := idx[i+1:i]*64
dst[i+63:i] := a[id+63:id]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".
dst.m128[0] := a.m128[imm8[0]]
dst.m128[1] := b.m128[imm8[1]]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".
dst.m128[0] := a.m128[imm8[0]]
dst.m128[1] := b.m128[imm8[1]]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".
dst.m128[0] := a.m128[imm8[0]]
dst.m128[1] := b.m128[imm8[1]]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst.m128[0] := a.m128[imm8[0]]
tmp_dst.m128[1] := b.m128[imm8[1]]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".
dst.m128[0] := a.m128[imm8[0]]
dst.m128[1] := b.m128[imm8[1]]
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Miscellaneous
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 3
i := j*64
k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*32
k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*32
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*32
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 3
i := j*64
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 1
i := j*64
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 7
i := j*32
k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 3
i := j*32
k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 3
i := j*64
k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 1
i := j*64
k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 7
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 7
i := j*32
k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 3
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 3
i := j*32
k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 3
i := j*64
IF k1[j]
k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 3
i := j*64
k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:4] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 1
i := j*64
k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:2] := 0
AVX512F
AVX512VL
Compare
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 64
m := base_addr
FOR j := 0 to 3
i := j*64
IF k[j]
MEM[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 64
m := base_addr
FOR j := 0 to 1
i := j*64
IF k[j]
MEM[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 32
m := base_addr
FOR j := 0 to 7
i := j*32
IF k[j]
MEM[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 32
m := base_addr
FOR j := 0 to 3
i := j*32
IF k[j]
MEM[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 1
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 32-bit integers from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 32-bit integers from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 64-bit integers from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 64-bit integers from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 1
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 32-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 32-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 64-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed 64-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 1
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 1
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 32
m := base_addr
FOR j := 0 to 7
i := j*32
IF k[j]
MEM[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 32
m := base_addr
FOR j := 0 to 3
i := j*32
IF k[j]
MEM[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 64
m := base_addr
FOR j := 0 to 3
i := j*64
IF k[j]
MEM[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 64
m := base_addr
FOR j := 0 to 1
i := j*64
IF k[j]
MEM[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
AVX512VL
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
AVX512VL
Store
Store 256-bits (composed of 4 packed 64-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512F
AVX512VL
Store
Store 256-bits (composed of 8 packed 32-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512F
AVX512VL
Store
Store 128-bits (composed of 2 packed 64-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512F
AVX512VL
Store
Store 128-bits (composed of 4 packed 32-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512F
AVX512VL
Store
Store 256-bits (composed of 4 packed 64-bit integers) from "a" into memory.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512F
AVX512VL
Store
Store 256-bits (composed of 8 packed 32-bit integers) from "a" into memory.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512F
AVX512VL
Store
Store 128-bits (composed of 2 packed 64-bit integers) from "a" into memory.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512F
AVX512VL
Store
Store 128-bits (composed of 4 packed 32-bit integers) from "a" into memory.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512F
AVX512VL
Store
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
m := j*64
IF k[j]
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ELSE
dst[m+63:m] := src[m+63:m]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
m := j*64
IF k[j]
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ELSE
dst[m+63:m] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*32
m := j*64
IF k[j]
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ELSE
dst[m+63:m] := src[m+63:m]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*32
m := j*64
IF k[j]
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ELSE
dst[m+63:m] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 7
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 7
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 7
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 7
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 3
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 3
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 3
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
FOR j := 0 to 3
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*64
l := j*32
dst[i+63:i] := Convert_Int32_To_FP64(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_Int32_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*64
l := j*32
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 8*j
dst[k+7:k] := Truncate8(a[i+31:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 8*j
dst[k+7:k] := Truncate8(a[i+31:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 16*j
dst[k+15:k] := Truncate16(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 16*j
dst[k+15:k] := Truncate16(a[i+31:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 8*j
dst[k+7:k] := Truncate8(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 8*j
dst[k+7:k] := Truncate8(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 32*j
dst[k+31:k] := Truncate32(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Truncate32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Truncate32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 32*j
dst[k+31:k] := Truncate32(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Truncate32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Truncate32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 16*j
dst[k+15:k] := Truncate16(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 16*j
dst[k+15:k] := Truncate16(a[i+63:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 8*j
dst[k+7:k] := Saturate8(a[i+31:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 8*j
dst[k+7:k] := Saturate8(a[i+31:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 16*j
dst[k+15:k] := Saturate16(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 16*j
dst[k+15:k] := Saturate16(a[i+31:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 8*j
dst[k+7:k] := Saturate8(a[i+63:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 8*j
dst[k+7:k] := Saturate8(a[i+63:i])
ENDFOR
dst[MAX:16] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:16] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:16] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 32*j
dst[k+31:k] := Saturate32(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Saturate32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Saturate32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 32*j
dst[k+31:k] := Saturate32(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Saturate32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Saturate32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 16*j
dst[k+15:k] := Saturate16(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 16*j
dst[k+15:k] := Saturate16(a[i+63:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+7:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+7:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+7:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+7:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+7:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+7:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+7:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+7:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
l := j*16
IF k[j]
dst[i+31:i] := SignExtend32(a[l+15:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+15:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
l := j*16
IF k[j]
dst[i+31:i] := SignExtend32(a[l+15:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+15:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+15:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+15:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+15:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Sign extend packed 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+15:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 8*j
dst[k+7:k] := SaturateU8(a[i+31:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 8*j
dst[k+7:k] := SaturateU8(a[i+31:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 16*j
dst[k+15:k] := SaturateU16(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 16*j
dst[k+15:k] := SaturateU16(a[i+31:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 8*j
dst[k+7:k] := SaturateU8(a[i+63:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 8*j
dst[k+7:k] := SaturateU8(a[i+63:i])
ENDFOR
dst[MAX:16] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:16] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:16] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 32*j
dst[k+31:k] := SaturateU32(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := SaturateU32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := SaturateU32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 32*j
dst[k+31:k] := SaturateU32(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := SaturateU32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := SaturateU32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 3
i := 64*j
k := 16*j
dst[k+15:k] := SaturateU16(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 16*j
dst[k+15:k] := SaturateU16(a[i+63:i])
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i])
FI
ENDFOR
AVX512F
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+7:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+7:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+7:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in th elow 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+7:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+7:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+7:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+7:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+7:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+15:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+15:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+15:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+15:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+15:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+15:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+15:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Zero extend packed unsigned 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+15:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Convert
Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
AVX512VL
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 3
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 1
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load 256-bits (composed of 4 packed 64-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load 256-bits (composed of 8 packed 32-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load 128-bits (composed of 2 packed 64-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load 128-bits (composed of 4 packed 32-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load 256-bits (composed of 4 packed 64-bit integers) from memory into "dst".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load 256-bits (composed of 8 packed 32-bit integers) from memory into "dst".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512F
AVX512VL
Load
Load 128-bits (composed of 2 packed 64-bit integers) from memory into "dst".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Load 128-bits (composed of 4 packed 32-bit integers) from memory into "dst".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512F
AVX512VL
Load
Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[63:0] := a[63:0]
tmp[127:64] := a[63:0]
tmp[191:128] := a[191:128]
tmp[255:192] := a[191:128]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[63:0] := a[63:0]
tmp[127:64] := a[63:0]
tmp[191:128] := a[191:128]
tmp[255:192] := a[191:128]
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[63:0] := a[63:0]
tmp[127:64] := a[63:0]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[63:0] := a[63:0]
tmp[127:64] := a[63:0]
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[31:0] := a[63:32]
tmp[63:32] := a[63:32]
tmp[95:64] := a[127:96]
tmp[127:96] := a[127:96]
tmp[159:128] := a[191:160]
tmp[191:160] := a[191:160]
tmp[223:192] := a[255:224]
tmp[255:224] := a[255:224]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[31:0] := a[63:32]
tmp[63:32] := a[63:32]
tmp[95:64] := a[127:96]
tmp[127:96] := a[127:96]
tmp[159:128] := a[191:160]
tmp[191:160] := a[191:160]
tmp[223:192] := a[255:224]
tmp[255:224] := a[255:224]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[31:0] := a[63:32]
tmp[63:32] := a[63:32]
tmp[95:64] := a[127:96]
tmp[127:96] := a[127:96]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[31:0] := a[63:32]
tmp[63:32] := a[63:32]
tmp[95:64] := a[127:96]
tmp[127:96] := a[127:96]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[31:0] := a[31:0]
tmp[63:32] := a[31:0]
tmp[95:64] := a[95:64]
tmp[127:96] := a[95:64]
tmp[159:128] := a[159:128]
tmp[191:160] := a[159:128]
tmp[223:192] := a[223:192]
tmp[255:224] := a[223:192]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[31:0] := a[31:0]
tmp[63:32] := a[31:0]
tmp[95:64] := a[95:64]
tmp[127:96] := a[95:64]
tmp[159:128] := a[159:128]
tmp[191:160] := a[159:128]
tmp[223:192] := a[223:192]
tmp[255:224] := a[223:192]
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[31:0] := a[31:0]
tmp[63:32] := a[31:0]
tmp[95:64] := a[95:64]
tmp[127:96] := a[95:64]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[31:0] := a[31:0]
tmp[63:32] := a[31:0]
tmp[95:64] := a[95:64]
tmp[127:96] := a[95:64]
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Move
Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] AND b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] AND b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] AND b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] AND b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] AND b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] AND b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] AND b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] AND b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "a" when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 7
i := j*32
IF k[j]
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 7
i := j*32
IF k[j]
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 7
i := j*32
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "a" when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 3
i := j*32
IF k[j]
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 3
i := j*32
IF k[j]
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 3
i := j*32
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "a" when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 3
i := j*64
IF k[j]
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 3
i := j*64
IF k[j]
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 3
i := j*64
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "a" when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 1
i := j*64
IF k[j]
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 1
i := j*64
IF k[j]
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 1
i := j*64
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Logical
Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Set
Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Set
Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Set
Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Set
Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Set
Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Set
Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Set
Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Set
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Shift
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
AVX512VL
Elementary Math Functions
Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."
FOR j := 0 to 3
i := j*128
a[i+127:i] := ShiftRows(a[i+127:i])
a[i+127:i] := SubBytes(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
VAES
Cryptography
Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."
FOR j := 0 to 3
i := j*128
a[i+127:i] := ShiftRows(a[i+127:i])
a[i+127:i] := SubBytes(a[i+127:i])
a[i+127:i] := MixColumns(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
VAES
Cryptography
Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".
FOR j := 0 to 3
i := j*128
a[i+127:i] := InvShiftRows(a[i+127:i])
a[i+127:i] := InvSubBytes(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
VAES
Cryptography
Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".
FOR j := 0 to 3
i := j*128
a[i+127:i] := InvShiftRows(a[i+127:i])
a[i+127:i] := InvSubBytes(a[i+127:i])
a[i+127:i] := InvMixColumns(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
VAES
Cryptography
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := a[63:0] + b[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] + b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] + b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] + b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] + b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := a[31:0] + b[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] + b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] + b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] + b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] + b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 7
i := 64*j
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", =and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := 64*j
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := 64*j
IF k[j]
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := a[63:0] / b[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] / b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] / b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] / b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] / b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := a[31:0] / b[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] / b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] / b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] / b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] / b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "a" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst".
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := c[63:0]
FI
dst[127:64] := c[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", subtract the lower element in "c" from the negated intermediate result, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := c[31:0]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] * b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] * b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] * b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] * b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := a[63:0] * b[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] * b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] * b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] * b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] * b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := a[31:0] * b[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] - b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] - b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := a[63:0] - b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := a[63:0] - b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := a[63:0] - b[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] - b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] - b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := a[31:0] - b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := a[31:0] - b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := a[31:0] - b[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Arithmetic
Store 512-bits (composed of 8 packed 64-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store 512-bits (composed of 16 packed 32-bit integers) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store 16-bit mask from "a" into memory.
MEM[mem_addr+15:mem_addr] := a[15:0]
AVX512F
Store
Swizzle
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 64
m := base_addr
FOR j := 0 to 7
i := j*64
IF k[j]
MEM[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
AVX512F
Store
Swizzle
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 32
m := base_addr
FOR j := 0 to 15
i := j*32
IF k[j]
MEM[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
AVX512F
Store
Store packed 32-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits of integer data from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store packed 64-bit integers from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits of integer data from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store the lower double-precision (64-bit) floating-point element from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
IF k[0]
MEM[mem_addr+63:mem_addr] := a[63:0]
FI
AVX512F
Store
Store the lower single-precision (32-bit) floating-point element from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
IF k[0]
MEM[mem_addr+31:mem_addr] := a[31:0]
FI
AVX512F
Store
Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Swizzle
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 32
m := base_addr
FOR j := 0 to 15
i := j*32
IF k[j]
MEM[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
AVX512F
Store
Swizzle
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 64
m := base_addr
FOR j := 0 to 7
i := j*64
IF k[j]
MEM[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
AVX512F
Store
Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
Store
Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
Store
Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
Store
Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
Store
Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Multiplies elements in packed 64-bit integer vectors "a" and "b" together, storing the lower 64 bits of the result in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Store
Multiplies elements in packed 64-bit integer vectors "a" and "b" together, storing the lower 64 bits of the result in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Store
Load 512-bits (composed of 8 packed 64-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits (composed of 16 packed 32-bit integers) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load 16-bit mask from memory into "k".
k[15:0] := MEM[mem_addr+15:mem_addr]
AVX512F
Load
Swizzle
Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:256] := 0
AVX512F
Load
Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits of integer data from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits of integer data from memory into "dst" using a non-temporal memory hint.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper element of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
IF k[0]
dst[63:0] := MEM[mem_addr+63:mem_addr]
ELSE
dst[63:0] := src[63:0]
FI
dst[MAX:64] := 0
AVX512F
Load
Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper element of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
IF k[0]
dst[63:0] := MEM[mem_addr+63:mem_addr]
ELSE
dst[63:0] := 0
FI
dst[MAX:64] := 0
AVX512F
Load
Load a single-precision (32-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper elements of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
IF k[0]
dst[31:0] := MEM[mem_addr+31:mem_addr]
ELSE
dst[31:0] := src[31:0]
FI
dst[MAX:32] := 0
AVX512F
Load
Load a single-precision (32-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper elements of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
IF k[0]
dst[31:0] := MEM[mem_addr+31:mem_addr]
ELSE
dst[31:0] := 0
FI
dst[MAX:32] := 0
AVX512F
Load
Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
"mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Swizzle
Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:256] := 0
AVX512F
Load
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*32
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 7
i := j*64
m := j*64
IF k[j]
addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := a[15:0] AND b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k".
k[15:0] := (NOT a[15:0]) AND b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise NOT of 16-bit mask "a", and store the result in "k".
k[15:0] := NOT a[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := a[15:0] OR b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := NOT (a[15:0] XOR b[15:0])
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := a[15:0] XOR b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Shift the bits of 16-bit mask "a" left by "count" while shifting in zeros, and store the least significant 16 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 15
k[15:0] := a[15:0] << count[7:0]
FI
AVX512F
Mask
Shift the bits of 16-bit mask "a" right by "count" while shifting in zeros, and store the least significant 16 bits of the result in "k".
k[MAX:0] := 0
IF count[7:0] <= 15
k[15:0] := a[15:0] >> count[7:0]
FI
AVX512F
Mask
Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones".
tmp[15:0] := a[15:0] OR b[15:0]
IF tmp[15:0] == 0x0
dst := 1
ELSE
dst := 0
FI
IF tmp[15:0] == 0xFFFF
MEM[all_ones+7:all_ones] := 1
ELSE
MEM[all_ones+7:all_ones] := 0
FI
AVX512F
Mask
Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst".
tmp[15:0] := a[15:0] OR b[15:0]
IF tmp[15:0] == 0x0
dst := 1
ELSE
dst := 0
FI
AVX512F
Mask
Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst".
tmp[15:0] := a[15:0] OR b[15:0]
IF tmp[15:0] == 0xFFFF
dst := 1
ELSE
dst := 0
FI
AVX512F
Mask
Convert 16-bit mask "a" into an integer value, and store the result in "dst".
dst := ZeroExtend32(a[15:0])
AVX512F
Mask
Convert integer value "a" into an 16-bit mask, and store the result in "k".
k := ZeroExtend16(a[15:0])
AVX512F
Mask
Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k".
k[15:0] := (NOT a[15:0]) AND b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := a[15:0] AND b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Copy 16-bit mask "a" to "k".
k[15:0] := a[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise NOT of 16-bit mask "a", and store the result in "k".
k[15:0] := NOT a[15:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := a[15:0] OR b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Unpack and interleave 8 bits from masks "a" and "b", and store the 16-bit result in "k".
k[7:0] := b[7:0]
k[15:8] := a[7:0]
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := NOT (a[15:0] XOR b[15:0])
k[MAX:16] := 0
AVX512F
Mask
Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k".
k[15:0] := a[15:0] XOR b[15:0]
k[MAX:16] := 0
AVX512F
Mask
Performs bitwise OR between "k1" and "k2", storing the result in "dst". ZF flag is set if "dst" is 0.
dst[15:0] := k1[15:0] | k2[15:0]
IF dst == 0
SetZF()
FI
AVX512F
Mask
Performs bitwise OR between "k1" and "k2", storing the result in "dst". CF flag is set if "dst" consists of all 1's.
dst[15:0] := k1[15:0] | k2[15:0]
IF PopCount(dst[15:0]) == 16
SetCF()
FI
AVX512F
Mask
Converts bit mask "k1" into an integer value, storing the results in "dst".
dst := ZeroExtend32(k1)
AVX512F
Mask
Converts integer "mask" into bitmask, storing the result in "dst".
dst := mask[15:0]
AVX512F
Mask
Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and stores the low 64 bytes (16 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
temp[1023:512] := a[511:0]
temp[511:0] := b[511:0]
temp[1023:0] := temp[1023:0] >> (32*imm8[3:0])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := temp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 64 bytes (8 elements) in "dst".
temp[1023:512] := a[511:0]
temp[511:0] := b[511:0]
temp[1023:0] := temp[1023:0] >> (64*imm8[2:0])
dst[511:0] := temp[511:0]
dst[MAX:512] := 0
AVX512F
Miscellaneous
Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 64 bytes (8 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
temp[1023:512] := a[511:0]
temp[511:0] := b[511:0]
temp[1023:0] := temp[1023:0] >> (64*imm8[2:0])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := temp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and stores the low 64 bytes (8 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
temp[1023:512] := a[511:0]
temp[511:0] := b[511:0]
temp[1023:0] := temp[1023:0] >> (64*imm8[2:0])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := temp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst", and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
dst[127:64] := b[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst", and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
dst[127:64] := b[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
IF k[0]
dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := b[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
IF k[0]
dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
ELSE
dst[63:0] := a[63:0]
FI
dst[127:64] := b[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
IF k[0]
dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := b[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "b" to the upper element of "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) {
tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0]
CASE(tsrc[63:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[63:0] := src1[63:0]
1 : dest[63:0] := tsrc[63:0]
2 : dest[63:0] := QNaN(tsrc[63:0])
3 : dest[63:0] := QNAN_Indefinite
4 : dest[63:0] := -INF
5 : dest[63:0] := +INF
6 : dest[63:0] := tsrc.sign? -INF : +INF
7 : dest[63:0] := -0
8 : dest[63:0] := +0
9 : dest[63:0] := -1
10: dest[63:0] := +1
11: dest[63:0] := 1/2
12: dest[63:0] := 90.0
13: dest[63:0] := PI/2
14: dest[63:0] := MAX_FLOAT
15: dest[63:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[63:0]
}
IF k[0]
dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := b[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst", and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
dst[127:32] := b[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst", and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
dst[127:32] := b[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
IF k[0]
dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := b[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
IF k[0]
dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
ELSE
dst[31:0] := a[31:0]
FI
dst[127:32] := b[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
[sae_note]
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
IF k[0]
dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := b[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "b" to the upper elements of "dst". "imm8" is used to set the required flags reporting.
enum TOKEN_TYPE {
QNAN_TOKEN := 0, \
SNAN_TOKEN := 1, \
ZERO_VALUE_TOKEN := 2, \
ONE_VALUE_TOKEN := 3, \
NEG_INF_TOKEN := 4, \
POS_INF_TOKEN := 5, \
NEG_VALUE_TOKEN := 6, \
POS_VALUE_TOKEN := 7
}
DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) {
tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0]
CASE(tsrc[31:0]) OF
QNAN_TOKEN:j := 0
SNAN_TOKEN:j := 1
ZERO_VALUE_TOKEN: j := 2
ONE_VALUE_TOKEN: j := 3
NEG_INF_TOKEN: j := 4
POS_INF_TOKEN: j := 5
NEG_VALUE_TOKEN: j := 6
POS_VALUE_TOKEN: j := 7
ESAC
token_response[3:0] := src3[3+4*j:4*j]
CASE(token_response[3:0]) OF
0 : dest[31:0] := src1[31:0]
1 : dest[31:0] := tsrc[31:0]
2 : dest[31:0] := QNaN(tsrc[31:0])
3 : dest[31:0] := QNAN_Indefinite
4 : dest[31:0] := -INF
5 : dest[31:0] := +INF
6 : dest[31:0] := tsrc.sign? -INF : +INF
7 : dest[31:0] := -0
8 : dest[31:0] := +0
9 : dest[31:0] := -1
10: dest[31:0] := +1
11: dest[31:0] := 1/2
12: dest[31:0] := 90.0
13: dest[31:0] := PI/2
14: dest[31:0] := MAX_FLOAT
15: dest[31:0] := -MAX_FLOAT
ESAC
CASE(tsrc[31:0]) OF
ZERO_VALUE_TOKEN:
IF (imm8[0]) #ZE; FI
ZERO_VALUE_TOKEN:
IF (imm8[1]) #IE; FI
ONE_VALUE_TOKEN:
IF (imm8[2]) #ZE; FI
ONE_VALUE_TOKEN:
IF (imm8[3]) #IE; FI
SNAN_TOKEN:
IF (imm8[4]) #IE; FI
NEG_INF_TOKEN:
IF (imm8[5]) #IE; FI
NEG_VALUE_TOKEN:
IF (imm8[6]) #IE; FI
POS_INF_TOKEN:
IF (imm8[7]) #IE; FI
ESAC
RETURN dest[31:0]
}
IF k[0]
dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := b[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
[sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
[sae_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
[sae_note]
dst[63:0] := ConvertExpFP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
dst[63:0] := ConvertExpFP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
[sae_note]
IF k[0]
dst[63:0] := ConvertExpFP64(b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
IF k[0]
dst[63:0] := ConvertExpFP64(b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
[sae_note]
IF k[0]
dst[63:0] := ConvertExpFP64(b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
IF k[0]
dst[63:0] := ConvertExpFP64(b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
[sae_note]
dst[31:0] := ConvertExpFP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
dst[31:0] := ConvertExpFP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
[sae_note]
IF k[0]
dst[31:0] := ConvertExpFP32(b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
IF k[0]
dst[31:0] := ConvertExpFP32(b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
[sae_note]
IF k[0]
dst[31:0] := ConvertExpFP32(b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
IF k[0]
dst[31:0] := ConvertExpFP32(b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
IF k[0]
dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
IF k[0]
dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
IF k[0]
dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
IF k[0]
dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv)
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
IF k[0]
dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
IF k[0]
dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
IF k[0]
dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
IF k[0]
dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv)
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
IF k[0]
dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note]
DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) {
m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0])
IF IsInf(tmp[63:0])
tmp[63:0] := src1[63:0]
FI
RETURN tmp[63:0]
}
dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
IF k[0]
dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) {
m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0])
IF IsInf(tmp[31:0])
tmp[31:0] := src1[31:0]
FI
RETURN tmp[31:0]
}
dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[31:0]
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
IF k[0]
dst[63:0] := SCALE(a[63:0], b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
IF k[0]
dst[63:0] := SCALE(a[63:0], b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
IF k[0]
dst[63:0] := SCALE(a[63:0], b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
IF k[0]
dst[63:0] := SCALE(a[63:0], b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
dst[63:0] := SCALE(a[63:0], b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0]))
RETURN dst[63:0]
}
dst[63:0] := SCALE(a[63:0], b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[63:0]
}
IF k[0]
dst[31:0] := SCALE(a[31:0], b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[63:0]
}
IF k[0]
dst[31:0] := SCALE(a[31:0], b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[63:0]
}
IF k[0]
dst[31:0] := SCALE(a[31:0], b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[63:0]
}
IF k[0]
dst[31:0] := SCALE(a[31:0], b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[63:0]
}
dst[31:0] := SCALE(a[31:0], b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
DEFINE SCALE(src1, src2) {
IF (src2 == NaN)
IF (src2 == SNaN)
RETURN QNAN(src2)
FI
ELSE IF (src1 == NaN)
IF (src1 == SNaN)
RETURN QNAN(src1)
FI
IF (src2 != INF)
RETURN QNAN(src1)
FI
ELSE
tmp_src2 := src2
tmp_src1 := src1
IF (IS_DENORMAL(src2) AND MXCSR.DAZ)
tmp_src2 := 0
FI
IF (IS_DENORMAL(src1) AND MXCSR.DAZ)
tmp_src1 := 0
FI
FI
dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0]))
RETURN dst[63:0]
}
dst[31:0] := SCALE(a[31:0], b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Miscellaneous
Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
n := (j % 4)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
n := (j % 4)*64
dst[i+63:i] := a[n+63:n]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 4)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 4)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
n := (j % 4)*32
dst[i+31:i] := a[n+31:n]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
n := (j % 4)*32
IF k[j]
dst[i+31:i] := a[n+31:n]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
n := (j % 4)*64
dst[i+63:i] := a[n+63:n]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 4)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
n := (j % 4)*64
IF k[j]
dst[i+63:i] := a[n+63:n]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 64
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[511:m] := src[511:m]
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 64
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[511:m] := 0
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 32
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[511:m] := src[511:m]
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 32
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[511:m] := 0
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[1:0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
2: dst[127:0] := a[383:256]
3: dst[127:0] := a[511:384]
ESAC
dst[MAX:128] := 0
AVX512F
Swizzle
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Swizzle
Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Swizzle
Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[255:0] := a[255:0]
1: dst[255:0] := a[511:256]
ESAC
dst[MAX:256] := 0
AVX512F
Swizzle
Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Swizzle
Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Swizzle
Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[1:0] OF
0: dst[127:0] := a[127:0]
1: dst[127:0] := a[255:128]
2: dst[127:0] := a[383:256]
3: dst[127:0] := a[511:384]
ESAC
dst[MAX:128] := 0
AVX512F
Swizzle
Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Swizzle
Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[1:0] OF
0: tmp[127:0] := a[127:0]
1: tmp[127:0] := a[255:128]
2: tmp[127:0] := a[383:256]
3: tmp[127:0] := a[511:384]
ESAC
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Swizzle
Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst".
CASE imm8[0] OF
0: dst[255:0] := a[255:0]
1: dst[255:0] := a[511:256]
ESAC
dst[MAX:256] := 0
AVX512F
Swizzle
Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Swizzle
Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
CASE imm8[0] OF
0: tmp[255:0] := a[255:0]
1: tmp[255:0] := a[511:256]
ESAC
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Swizzle
Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
2: dst[383:256] := b[127:0]
3: dst[511:384] := b[127:0]
ESAC
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "dst", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE (imm8[0]) OF
0: dst[255:0] := b[255:0]
1: dst[511:256] := b[255:0]
ESAC
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "dst", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: dst[127:0] := b[127:0]
1: dst[255:128] := b[127:0]
2: dst[383:256] := b[127:0]
3: dst[511:384] := b[127:0]
ESAC
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[1:0]) OF
0: tmp[127:0] := b[127:0]
1: tmp[255:128] := b[127:0]
2: tmp[383:256] := b[127:0]
3: tmp[511:384] := b[127:0]
ESAC
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "dst", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8".
dst[511:0] := a[511:0]
CASE (imm8[0]) OF
0: dst[255:0] := b[255:0]
1: dst[511:256] := b[255:0]
ESAC
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Copy "a" to "tmp", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[511:0] := a[511:0]
CASE (imm8[0]) OF
0: tmp[255:0] := b[255:0]
1: tmp[511:256] := b[255:0]
ESAC
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low packed 32-bit integer from "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low packed 64-bit integer from "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 32
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[511:m] := src[511:m]
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 32
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[m+size-1:m] := a[i+31:i]
m := m + size
FI
ENDFOR
dst[511:m] := 0
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 64
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[511:m] := src[511:m]
dst[MAX:512] := 0
AVX512F
Swizzle
Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 64
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[m+size-1:m] := a[i+63:i]
m := m + size
FI
ENDFOR
dst[511:m] := 0
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := idx[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := (idx[i+4]) ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set)
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := idx[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := (idx[i+3]) ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := idx[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := (idx[i+4]) ? b[off+31:off] : a[off+31:off]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*32
off := idx[i+3:i]*32
dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := idx[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := (idx[i+3]) ? b[off+63:off] : a[off+63:off]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*64
off := idx[i+2:i]*64
dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
IF (imm8[4] == 0) tmp_dst[319:256] := a[319:256]; FI
IF (imm8[4] == 1) tmp_dst[319:256] := a[383:320]; FI
IF (imm8[5] == 0) tmp_dst[383:320] := a[319:256]; FI
IF (imm8[5] == 1) tmp_dst[383:320] := a[383:320]; FI
IF (imm8[6] == 0) tmp_dst[447:384] := a[447:384]; FI
IF (imm8[6] == 1) tmp_dst[447:384] := a[511:448]; FI
IF (imm8[7] == 0) tmp_dst[511:448] := a[447:384]; FI
IF (imm8[7] == 1) tmp_dst[511:448] := a[511:448]; FI
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
IF (b[257] == 0) tmp_dst[319:256] := a[319:256]; FI
IF (b[257] == 1) tmp_dst[319:256] := a[383:320]; FI
IF (b[321] == 0) tmp_dst[383:320] := a[319:256]; FI
IF (b[321] == 1) tmp_dst[383:320] := a[383:320]; FI
IF (b[385] == 0) tmp_dst[447:384] := a[447:384]; FI
IF (b[385] == 1) tmp_dst[447:384] := a[511:448]; FI
IF (b[449] == 0) tmp_dst[511:448] := a[447:384]; FI
IF (b[449] == 1) tmp_dst[511:448] := a[511:448]; FI
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI
IF (imm8[4] == 0) tmp_dst[319:256] := a[319:256]; FI
IF (imm8[4] == 1) tmp_dst[319:256] := a[383:320]; FI
IF (imm8[5] == 0) tmp_dst[383:320] := a[319:256]; FI
IF (imm8[5] == 1) tmp_dst[383:320] := a[383:320]; FI
IF (imm8[6] == 0) tmp_dst[447:384] := a[447:384]; FI
IF (imm8[6] == 1) tmp_dst[447:384] := a[511:448]; FI
IF (imm8[7] == 0) tmp_dst[511:448] := a[447:384]; FI
IF (imm8[7] == 1) tmp_dst[511:448] := a[511:448]; FI
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI
IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI
IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI
IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI
IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI
IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI
IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI
IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI
IF (b[257] == 0) tmp_dst[319:256] := a[319:256]; FI
IF (b[257] == 1) tmp_dst[319:256] := a[383:320]; FI
IF (b[321] == 0) tmp_dst[383:320] := a[319:256]; FI
IF (b[321] == 1) tmp_dst[383:320] := a[383:320]; FI
IF (b[385] == 0) tmp_dst[447:384] := a[447:384]; FI
IF (b[385] == 1) tmp_dst[447:384] := a[511:448]; FI
IF (b[449] == 0) tmp_dst[511:448] := a[447:384]; FI
IF (b[449] == 1) tmp_dst[511:448] := a[511:448]; FI
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI
IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI
IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI
IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI
IF (imm8[2] == 0) dst[191:128] := a[191:128]; FI
IF (imm8[2] == 1) dst[191:128] := a[255:192]; FI
IF (imm8[3] == 0) dst[255:192] := a[191:128]; FI
IF (imm8[3] == 1) dst[255:192] := a[255:192]; FI
IF (imm8[4] == 0) dst[319:256] := a[319:256]; FI
IF (imm8[4] == 1) dst[319:256] := a[383:320]; FI
IF (imm8[5] == 0) dst[383:320] := a[319:256]; FI
IF (imm8[5] == 1) dst[383:320] := a[383:320]; FI
IF (imm8[6] == 0) dst[447:384] := a[447:384]; FI
IF (imm8[6] == 1) dst[447:384] := a[511:448]; FI
IF (imm8[7] == 0) dst[511:448] := a[447:384]; FI
IF (imm8[7] == 1) dst[511:448] := a[511:448]; FI
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".
IF (b[1] == 0) dst[63:0] := a[63:0]; FI
IF (b[1] == 1) dst[63:0] := a[127:64]; FI
IF (b[65] == 0) dst[127:64] := a[63:0]; FI
IF (b[65] == 1) dst[127:64] := a[127:64]; FI
IF (b[129] == 0) dst[191:128] := a[191:128]; FI
IF (b[129] == 1) dst[191:128] := a[255:192]; FI
IF (b[193] == 0) dst[255:192] := a[191:128]; FI
IF (b[193] == 1) dst[255:192] := a[255:192]; FI
IF (b[257] == 0) dst[319:256] := a[319:256]; FI
IF (b[257] == 1) dst[319:256] := a[383:320]; FI
IF (b[321] == 0) dst[383:320] := a[319:256]; FI
IF (b[321] == 1) dst[383:320] := a[383:320]; FI
IF (b[385] == 0) dst[447:384] := a[447:384]; FI
IF (b[385] == 1) dst[447:384] := a[511:448]; FI
IF (b[449] == 0) dst[511:448] := a[447:384]; FI
IF (b[449] == 1) dst[511:448] := a[511:448]; FI
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
tmp_dst[287:256] := SELECT4(a[383:256], b[257:256])
tmp_dst[319:288] := SELECT4(a[383:256], b[289:288])
tmp_dst[351:320] := SELECT4(a[383:256], b[321:320])
tmp_dst[383:352] := SELECT4(a[383:256], b[353:352])
tmp_dst[415:384] := SELECT4(a[511:384], b[385:384])
tmp_dst[447:416] := SELECT4(a[511:384], b[417:416])
tmp_dst[479:448] := SELECT4(a[511:384], b[449:448])
tmp_dst[511:480] := SELECT4(a[511:384], b[481:480])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], b[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], b[33:32])
tmp_dst[95:64] := SELECT4(a[127:0], b[65:64])
tmp_dst[127:96] := SELECT4(a[127:0], b[97:96])
tmp_dst[159:128] := SELECT4(a[255:128], b[129:128])
tmp_dst[191:160] := SELECT4(a[255:128], b[161:160])
tmp_dst[223:192] := SELECT4(a[255:128], b[193:192])
tmp_dst[255:224] := SELECT4(a[255:128], b[225:224])
tmp_dst[287:256] := SELECT4(a[383:256], b[257:256])
tmp_dst[319:288] := SELECT4(a[383:256], b[289:288])
tmp_dst[351:320] := SELECT4(a[383:256], b[321:320])
tmp_dst[383:352] := SELECT4(a[383:256], b[353:352])
tmp_dst[415:384] := SELECT4(a[511:384], b[385:384])
tmp_dst[447:416] := SELECT4(a[511:384], b[417:416])
tmp_dst[479:448] := SELECT4(a[511:384], b[449:448])
tmp_dst[511:480] := SELECT4(a[511:384], b[481:480])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
dst[159:128] := SELECT4(a[255:128], imm8[1:0])
dst[191:160] := SELECT4(a[255:128], imm8[3:2])
dst[223:192] := SELECT4(a[255:128], imm8[5:4])
dst[255:224] := SELECT4(a[255:128], imm8[7:6])
dst[287:256] := SELECT4(a[383:256], imm8[1:0])
dst[319:288] := SELECT4(a[383:256], imm8[3:2])
dst[351:320] := SELECT4(a[383:256], imm8[5:4])
dst[383:352] := SELECT4(a[383:256], imm8[7:6])
dst[415:384] := SELECT4(a[511:384], imm8[1:0])
dst[447:416] := SELECT4(a[511:384], imm8[3:2])
dst[479:448] := SELECT4(a[511:384], imm8[5:4])
dst[511:480] := SELECT4(a[511:384], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], b[1:0])
dst[63:32] := SELECT4(a[127:0], b[33:32])
dst[95:64] := SELECT4(a[127:0], b[65:64])
dst[127:96] := SELECT4(a[127:0], b[97:96])
dst[159:128] := SELECT4(a[255:128], b[129:128])
dst[191:160] := SELECT4(a[255:128], b[161:160])
dst[223:192] := SELECT4(a[255:128], b[193:192])
dst[255:224] := SELECT4(a[255:128], b[225:224])
dst[287:256] := SELECT4(a[383:256], b[257:256])
dst[319:288] := SELECT4(a[383:256], b[289:288])
dst[351:320] := SELECT4(a[383:256], b[321:320])
dst[383:352] := SELECT4(a[383:256], b[353:352])
dst[415:384] := SELECT4(a[511:384], b[385:384])
dst[447:416] := SELECT4(a[511:384], b[417:416])
dst[479:448] := SELECT4(a[511:384], b[449:448])
dst[511:480] := SELECT4(a[511:384], b[481:480])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
id := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
id := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
dst[63:0] := SELECT4(a[255:0], imm8[1:0])
dst[127:64] := SELECT4(a[255:0], imm8[3:2])
dst[191:128] := SELECT4(a[255:0], imm8[5:4])
dst[255:192] := SELECT4(a[255:0], imm8[7:6])
dst[319:256] := SELECT4(a[511:256], imm8[1:0])
dst[383:320] := SELECT4(a[511:256], imm8[3:2])
dst[447:384] := SELECT4(a[511:256], imm8[5:4])
dst[511:448] := SELECT4(a[511:256], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*64
id := idx[i+2:i]*64
dst[i+63:i] := a[id+63:id]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx".
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
id := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0])
tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2])
tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4])
tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6])
tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0])
tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2])
tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4])
tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
id := idx[i+2:i]*64
IF k[j]
dst[i+63:i] := a[id+63:id]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[63:0] := src[63:0]
1: tmp[63:0] := src[127:64]
2: tmp[63:0] := src[191:128]
3: tmp[63:0] := src[255:192]
ESAC
RETURN tmp[63:0]
}
dst[63:0] := SELECT4(a[255:0], imm8[1:0])
dst[127:64] := SELECT4(a[255:0], imm8[3:2])
dst[191:128] := SELECT4(a[255:0], imm8[5:4])
dst[255:192] := SELECT4(a[255:0], imm8[7:6])
dst[319:256] := SELECT4(a[511:256], imm8[1:0])
dst[383:320] := SELECT4(a[511:256], imm8[3:2])
dst[447:384] := SELECT4(a[511:256], imm8[5:4])
dst[511:448] := SELECT4(a[511:256], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*64
id := idx[i+2:i]*64
dst[i+63:i] := a[id+63:id]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[m+31:m]
m := m + 32
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[m+63:m]
m := m + 64
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[511:0], imm8[1:0])
dst[255:128] := SELECT4(a[511:0], imm8[3:2])
dst[383:256] := SELECT4(b[511:0], imm8[5:4])
dst[511:384] := SELECT4(b[511:0], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[511:0], imm8[1:0])
dst[255:128] := SELECT4(a[511:0], imm8[3:2])
dst[383:256] := SELECT4(b[511:0], imm8[5:4])
dst[511:384] := SELECT4(b[511:0], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[511:0], imm8[1:0])
dst[255:128] := SELECT4(a[511:0], imm8[3:2])
dst[383:256] := SELECT4(b[511:0], imm8[5:4])
dst[511:384] := SELECT4(b[511:0], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0])
tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2])
tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4])
tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[127:0] := src[127:0]
1: tmp[127:0] := src[255:128]
2: tmp[127:0] := src[383:256]
3: tmp[127:0] := src[511:384]
ESAC
RETURN tmp[127:0]
}
dst[127:0] := SELECT4(a[511:0], imm8[1:0])
dst[255:128] := SELECT4(a[511:0], imm8[3:2])
dst[383:256] := SELECT4(b[511:0], imm8[5:4])
dst[511:384] := SELECT4(b[511:0], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
tmp_dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320]
tmp_dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320]
tmp_dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448]
tmp_dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
tmp_dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320]
tmp_dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320]
tmp_dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448]
tmp_dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst".
dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192]
dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192]
dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320]
dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320]
dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448]
dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448]
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
tmp_dst[351:320] := SELECT4(b[383:256], imm8[5:4])
tmp_dst[383:352] := SELECT4(b[383:256], imm8[7:6])
tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
tmp_dst[479:448] := SELECT4(b[511:384], imm8[5:4])
tmp_dst[511:480] := SELECT4(b[511:384], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6])
tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
tmp_dst[351:320] := SELECT4(b[383:256], imm8[5:4])
tmp_dst[383:352] := SELECT4(b[383:256], imm8[7:6])
tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
tmp_dst[479:448] := SELECT4(b[511:384], imm8[5:4])
tmp_dst[511:480] := SELECT4(b[511:384], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(b[127:0], imm8[5:4])
dst[127:96] := SELECT4(b[127:0], imm8[7:6])
dst[159:128] := SELECT4(a[255:128], imm8[1:0])
dst[191:160] := SELECT4(a[255:128], imm8[3:2])
dst[223:192] := SELECT4(b[255:128], imm8[5:4])
dst[255:224] := SELECT4(b[255:128], imm8[7:6])
dst[287:256] := SELECT4(a[383:256], imm8[1:0])
dst[319:288] := SELECT4(a[383:256], imm8[3:2])
dst[351:320] := SELECT4(b[383:256], imm8[5:4])
dst[383:352] := SELECT4(b[383:256], imm8[7:6])
dst[415:384] := SELECT4(a[511:384], imm8[1:0])
dst[447:416] := SELECT4(a[511:384], imm8[3:2])
dst[479:448] := SELECT4(b[511:384], imm8[5:4])
dst[511:480] := SELECT4(b[511:384], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp_dst[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128])
dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256])
dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384])
dst[MAX:512] := 0
AVX512F
Swizzle
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
k[MAX:1] := 0
AVX512F
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
k[MAX:1] := 0
AVX512F
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
IF k1[0]
k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512F
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
IF k1[0]
k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512F
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
k[MAX:1] := 0
AVX512F
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
k[MAX:1] := 0
AVX512F
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
IF k1[0]
k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512F
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
IF k1[0]
k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512F
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
RETURN ( a[63:0] OP b[63:0] ) ? 1 : 0
AVX512F
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
RETURN ( a[31:0] OP b[31:0] ) ? 1 : 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*32
m := j*64
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
m := j*64
IF k[j]
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ELSE
dst[m+63:m] := src[m+63:m]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
m := j*64
IF k[j]
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ELSE
dst[m+63:m] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [sae_note]
FOR j := 0 to 15
i := j*32
m := j*16
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 15
i := j*32
m := j*16
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [round2_note]
FOR j := 0 to 15
i := 16*j
l := 32*j
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [round2_note]
FOR j := 0 to 15
i := 16*j
l := 32*j
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round2_note]
FOR j := 0 to 15
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round2_note]
FOR j := 0 to 15
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round2_note]
FOR j := 0 to 15
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round2_note]
FOR j := 0 to 15
i := 16*j
l := 32*j
IF k[j]
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
[round_note]
dst[31:0] := Convert_FP64_To_Int32(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
[round_note]
dst[63:0] := Convert_FP64_To_Int64(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
[round_note]
dst[31:0] := Convert_FP64_To_Int32(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
[round_note]
dst[63:0] := Convert_FP64_To_Int64(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP64_To_Int32(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP64_To_Int64(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_FP64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := Convert_FP64_To_FP32(b[63:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := Convert_FP64_To_FP32(b[63:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := Convert_FP64_To_FP32(b[63:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := Convert_FP64_To_FP32(b[63:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
[round_note]
dst[31:0] := Convert_FP64_To_UInt32(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
[round_note]
dst[63:0] := Convert_FP64_To_UInt64(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP64_To_UInt32(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP64_To_UInt64(a[63:0])
AVX512F
Convert
Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int32_To_FP64(b[31:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_Int64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_Int64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[sae_note]
dst[63:0] := Convert_FP32_To_FP64(b[31:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[sae_note]
IF k[0]
dst[63:0] := Convert_FP32_To_FP64(b[31:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := Convert_FP32_To_FP64(b[31:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[sae_note]
IF k[0]
dst[63:0] := Convert_FP32_To_FP64(b[31:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := Convert_FP32_To_FP64(b[31:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
[round_note]
dst[31:0] := Convert_FP32_To_Int32(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
[round_note]
dst[63:0] := Convert_FP32_To_Int64(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
[round_note]
dst[31:0] := Convert_FP32_To_Int32(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
[round_note]
dst[63:0] := Convert_FP32_To_Int64(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP32_To_Int32(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP32_To_Int64(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
[round_note]
dst[31:0] := Convert_FP32_To_UInt32(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
[round_note]
dst[63:0] := Convert_FP32_To_UInt64(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP32_To_UInt32(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP32_To_UInt64(a[31:0])
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 32*j
l := 64*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[31:0] := Convert_FP64_To_UInt32_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[63:0] := Convert_FP64_To_UInt64_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP64_To_UInt32_Truncate(a[63:0])
AVX512F
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP64_To_UInt64_Truncate(a[63:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[31:0] := Convert_FP32_To_UInt32_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
[sae_note]
dst[63:0] := Convert_FP32_To_UInt64_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP32_To_UInt32_Truncate(a[31:0])
AVX512F
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP32_To_UInt64_Truncate(a[31:0])
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*64
l := j*32
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
IF k[j]
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert the unsigned 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the unsigned 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int32_To_FP64(b[31:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the unsigned 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Convert
Convert the unsigned 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the unsigned 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := Convert_Int64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the unsigned 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert the unsigned 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 8*j
dst[k+7:k] := Truncate8(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i])
FI
ENDFOR
AVX512F
Convert
Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 16*j
dst[k+15:k] := Truncate16(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Store
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i])
FI
ENDFOR
AVX512F
Convert
Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 8*j
dst[k+7:k] := Truncate8(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Truncate8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[k+31:k] := Truncate32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Truncate32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Store
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Truncate32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 16*j
dst[k+15:k] := Truncate16(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Truncate16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 8*j
dst[k+7:k] := Saturate8(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i])
FI
ENDFOR
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 16*j
dst[k+15:k] := Saturate16(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Store
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i])
FI
ENDFOR
AVX512F
Convert
Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 8*j
dst[k+7:k] := Saturate8(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := Saturate8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[k+31:k] := Saturate32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Saturate32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := Saturate32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 16*j
dst[k+15:k] := Saturate16(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := Saturate16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 8*j
dst[i+31:i] := SignExtend32(a[k+7:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+7:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+7:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 8*j
dst[i+63:i] := SignExtend64(a[k+7:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+7:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+7:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[i+63:i] := SignExtend64(a[k+31:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 16*j
dst[i+31:i] := SignExtend32(a[k+15:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
l := j*16
IF k[j]
dst[i+31:i] := SignExtend32(a[l+15:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := SignExtend32(a[l+15:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 16*j
dst[i+63:i] := SignExtend64(a[k+15:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+15:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := SignExtend64(a[l+15:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 8*j
dst[k+7:k] := SaturateU8(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+31:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed unsigned 32-bit integers in "a" to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i])
FI
ENDFOR
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+31:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 16*j
dst[k+15:k] := SaturateU16(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+31:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Store
Convert packed unsigned 32-bit integers in "a" to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i])
FI
ENDFOR
AVX512F
Convert
Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+31:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 8*j
dst[k+7:k] := SaturateU8(a[i+63:i])
ENDFOR
dst[MAX:64] := 0
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+63:i])
ELSE
dst[l+7:l] := src[l+7:l]
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[l+7:l] := SaturateU8(a[i+63:i])
ELSE
dst[l+7:l] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[k+31:k] := SaturateU32(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := SaturateU32(a[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[l+31:l] := SaturateU32(a[i+63:i])
ELSE
dst[l+31:l] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 16*j
dst[k+15:k] := SaturateU16(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+63:i])
ELSE
dst[l+15:l] := src[l+15:l]
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Store
Convert packed unsigned 64-bit integers in "a" to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i])
FI
ENDFOR
AVX512F
Convert
Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[l+15:l] := SaturateU16(a[i+63:i])
ELSE
dst[l+15:l] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512F
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 8*j
dst[i+31:i] := ZeroExtend32(a[k+7:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+7:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 8*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+7:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 8*j
dst[i+63:i] := ZeroExtend64(a[k+7:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+7:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 8*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+7:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 32*j
dst[i+63:i] := ZeroExtend64(a[k+31:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+31:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 32*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+31:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 15
i := 32*j
k := 16*j
dst[i+31:i] := ZeroExtend32(a[k+15:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+15:l])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := 32*j
l := 16*j
IF k[j]
dst[i+31:i] := ZeroExtend32(a[l+15:l])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := 64*j
k := 16*j
dst[i+63:i] := ZeroExtend64(a[k+15:k])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+15:l])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := 64*j
l := 16*j
IF k[j]
dst[i+63:i] := ZeroExtend64(a[l+15:l])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".
dst[31:0] := a[31:0]
AVX512F
Convert
Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".
dst[63:0] := a[63:0]
AVX512F
Convert
Copy the lower 32-bit integer in "a" to "dst".
dst[31:0] := a[31:0]
AVX512F
Convert
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][max_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][max_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [sae_note][max_float_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][max_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][max_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [sae_note][max_float_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][max_float_note]
IF k[0]
dst[63:0] := MAX(a[63:0], b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := MAX(a[63:0], b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][max_float_note]
IF k[0]
dst[63:0] := MAX(a[63:0], b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := MAX(a[63:0], b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [sae_note][max_float_note]
dst[63:0] := MAX(a[63:0], b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]
IF k[0]
dst[31:0] := MAX(a[31:0], b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := MAX(a[31:0], b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]
IF k[0]
dst[31:0] := MAX(a[31:0], b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := MAX(a[31:0], b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]
dst[31:0] := MAX(a[31:0], b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][min_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][min_float_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [sae_note][min_float_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][min_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][min_float_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [sae_note][min_float_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][min_float_note]
IF k[0]
dst[63:0] := MIN(a[63:0], b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := MIN(a[63:0], b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note][min_float_note]
IF k[0]
dst[63:0] := MIN(a[63:0], b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := MIN(a[63:0], b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" , and copy the upper element from "a" to the upper element of "dst". [sae_note][min_float_note]
dst[63:0] := MIN(a[63:0], b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]
IF k[0]
dst[31:0] := MIN(a[31:0], b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := MIN(a[31:0], b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]
IF k[0]
dst[31:0] := MIN(a[31:0], b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := MIN(a[31:0], b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]
dst[31:0] := MIN(a[31:0], b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Special Math Functions
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ABS(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ABS(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ABS(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ABS(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ABS(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ABS(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[63:0] := a[63:0]
tmp[127:64] := a[63:0]
tmp[191:128] := a[191:128]
tmp[255:192] := a[191:128]
tmp[319:256] := a[319:256]
tmp[383:320] := a[319:256]
tmp[447:384] := a[447:384]
tmp[511:448] := a[447:384]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[63:0] := a[63:0]
tmp[127:64] := a[63:0]
tmp[191:128] := a[191:128]
tmp[255:192] := a[191:128]
tmp[319:256] := a[319:256]
tmp[383:320] := a[319:256]
tmp[447:384] := a[447:384]
tmp[511:448] := a[447:384]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := tmp[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst".
dst[63:0] := a[63:0]
dst[127:64] := a[63:0]
dst[191:128] := a[191:128]
dst[255:192] := a[191:128]
dst[319:256] := a[319:256]
dst[383:320] := a[319:256]
dst[447:384] := a[447:384]
dst[511:448] := a[447:384]
dst[MAX:512] := 0
AVX512F
Move
Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := b[63:0]
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Move
Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := b[63:0]
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[31:0] := a[63:32]
tmp[63:32] := a[63:32]
tmp[95:64] := a[127:96]
tmp[127:96] := a[127:96]
tmp[159:128] := a[191:160]
tmp[191:160] := a[191:160]
tmp[223:192] := a[255:224]
tmp[255:224] := a[255:224]
tmp[287:256] := a[319:288]
tmp[319:288] := a[319:288]
tmp[351:320] := a[383:352]
tmp[383:352] := a[383:352]
tmp[415:384] := a[447:416]
tmp[447:416] := a[447:416]
tmp[479:448] := a[511:480]
tmp[511:480] := a[511:480]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[31:0] := a[63:32]
tmp[63:32] := a[63:32]
tmp[95:64] := a[127:96]
tmp[127:96] := a[127:96]
tmp[159:128] := a[191:160]
tmp[191:160] := a[191:160]
tmp[223:192] := a[255:224]
tmp[255:224] := a[255:224]
tmp[287:256] := a[319:288]
tmp[319:288] := a[319:288]
tmp[351:320] := a[383:352]
tmp[383:352] := a[383:352]
tmp[415:384] := a[447:416]
tmp[447:416] := a[447:416]
tmp[479:448] := a[511:480]
tmp[511:480] := a[511:480]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
dst[31:0] := a[63:32]
dst[63:32] := a[63:32]
dst[95:64] := a[127:96]
dst[127:96] := a[127:96]
dst[159:128] := a[191:160]
dst[191:160] := a[191:160]
dst[223:192] := a[255:224]
dst[255:224] := a[255:224]
dst[287:256] := a[319:288]
dst[319:288] := a[319:288]
dst[351:320] := a[383:352]
dst[383:352] := a[383:352]
dst[415:384] := a[447:416]
dst[447:416] := a[447:416]
dst[479:448] := a[511:480]
dst[511:480] := a[511:480]
dst[MAX:512] := 0
AVX512F
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
tmp[31:0] := a[31:0]
tmp[63:32] := a[31:0]
tmp[95:64] := a[95:64]
tmp[127:96] := a[95:64]
tmp[159:128] := a[159:128]
tmp[191:160] := a[159:128]
tmp[223:192] := a[223:192]
tmp[255:224] := a[223:192]
tmp[287:256] := a[287:256]
tmp[319:288] := a[287:256]
tmp[351:320] := a[351:320]
tmp[383:352] := a[351:320]
tmp[415:384] := a[415:384]
tmp[447:416] := a[415:384]
tmp[479:448] := a[479:448]
tmp[511:480] := a[479:448]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
tmp[31:0] := a[31:0]
tmp[63:32] := a[31:0]
tmp[95:64] := a[95:64]
tmp[127:96] := a[95:64]
tmp[159:128] := a[159:128]
tmp[191:160] := a[159:128]
tmp[223:192] := a[223:192]
tmp[255:224] := a[223:192]
tmp[287:256] := a[287:256]
tmp[319:288] := a[287:256]
tmp[351:320] := a[351:320]
tmp[383:352] := a[351:320]
tmp[415:384] := a[415:384]
tmp[447:416] := a[415:384]
tmp[479:448] := a[479:448]
tmp[511:480] := a[479:448]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
dst[31:0] := a[31:0]
dst[63:32] := a[31:0]
dst[95:64] := a[95:64]
dst[127:96] := a[95:64]
dst[159:128] := a[159:128]
dst[191:160] := a[159:128]
dst[223:192] := a[223:192]
dst[255:224] := a[223:192]
dst[287:256] := a[287:256]
dst[319:288] := a[287:256]
dst[351:320] := a[351:320]
dst[383:352] := a[351:320]
dst[415:384] := a[415:384]
dst[447:416] := a[415:384]
dst[479:448] := a[479:448]
dst[511:480] := a[479:448]
dst[MAX:512] := 0
AVX512F
Move
Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := b[31:0]
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Move
Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := b[31:0]
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Move
Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] AND b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] AND b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "a" when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 15
i := j*32
IF k[j]
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 15
i := j*32
IF k[j]
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 15
i := j*32
FOR h := 0 to 31
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "a" when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 7
i := j*64
IF k[j]
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set).
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 7
i := j*64
IF k[j]
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used according to "imm8", and the result is written to the corresponding bit in "dst".
DEFINE TernaryOP(imm8, a, b, c) {
CASE imm8[7:0] OF
0: dst[0] := 0 // imm8[7:0] := 0
1: dst[0] := NOT (a OR b OR c) // imm8[7:0] := NOT (_MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C)
// ...
254: dst[0] := a OR b OR c // imm8[7:0] := _MM_TERNLOG_A OR _MM_TERNLOG_B OR _MM_TERNLOG_C
255: dst[0] := 1 // imm8[7:0] := 1
ESAC
}
imm8[7:0] = LogicExp(_MM_TERNLOG_A, _MM_TERNLOG_B, _MM_TERNLOG_C)
FOR j := 0 to 7
i := j*64
FOR h := 0 to 63
dst[i+h] := TernaryOP(imm8[7:0], a[i+h], b[i+h], c[i+h])
ENDFOR
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 7
i := j*64
k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Logical
Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Logical
Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 15
i := j*32
k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Logical
Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero.
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Logical
Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero.
FOR j := 0 to 7
i := j*64
k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Broadcast 8-bit integer "a" to all elements of "dst".
FOR j := 0 to 63
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[31:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast 32-bit integer "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[63:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast 64-bit integer "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast the low packed 16-bit integer from "a" to all all elements of "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Set
Set packed 32-bit integers in "dst" with the repeated 4 element sequence.
dst[31:0] := a
dst[63:32] := b
dst[95:64] := c
dst[127:96] := d
dst[159:128] := a
dst[191:160] := b
dst[223:192] := c
dst[255:224] := d
dst[287:256] := a
dst[319:288] := b
dst[351:320] := c
dst[383:352] := d
dst[415:384] := a
dst[447:416] := b
dst[479:448] := c
dst[511:480] := d
dst[MAX:512] := 0
AVX512F
Set
Set packed 64-bit integers in "dst" with the repeated 4 element sequence.
dst[63:0] := a
dst[127:64] := b
dst[191:128] := c
dst[255:192] := d
dst[319:256] := a
dst[383:320] := b
dst[447:384] := c
dst[511:448] := d
dst[MAX:512] := 0
AVX512F
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the repeated 4 element sequence.
dst[63:0] := a
dst[127:64] := b
dst[191:128] := c
dst[255:192] := d
dst[319:256] := a
dst[383:320] := b
dst[447:384] := c
dst[511:448] := d
dst[MAX:512] := 0
AVX512F
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the repeated 4 element sequence.
dst[31:0] := a
dst[63:32] := b
dst[95:64] := c
dst[127:96] := d
dst[159:128] := a
dst[191:160] := b
dst[223:192] := c
dst[255:224] := d
dst[287:256] := a
dst[319:288] := b
dst[351:320] := c
dst[383:352] := d
dst[415:384] := a
dst[447:416] := b
dst[479:448] := c
dst[511:480] := d
dst[MAX:512] := 0
AVX512F
Set
Set packed 8-bit integers in "dst" with the supplied values.
dst[7:0] := e0
dst[15:8] := e1
dst[23:16] := e2
dst[31:24] := e3
dst[39:32] := e4
dst[47:40] := e5
dst[55:48] := e6
dst[63:56] := e7
dst[71:64] := e8
dst[79:72] := e9
dst[87:80] := e10
dst[95:88] := e11
dst[103:96] := e12
dst[111:104] := e13
dst[119:112] := e14
dst[127:120] := e15
dst[135:128] := e16
dst[143:136] := e17
dst[151:144] := e18
dst[159:152] := e19
dst[167:160] := e20
dst[175:168] := e21
dst[183:176] := e22
dst[191:184] := e23
dst[199:192] := e24
dst[207:200] := e25
dst[215:208] := e26
dst[223:216] := e27
dst[231:224] := e28
dst[239:232] := e29
dst[247:240] := e30
dst[255:248] := e31
dst[263:256] := e32
dst[271:264] := e33
dst[279:272] := e34
dst[287:280] := e35
dst[295:288] := e36
dst[303:296] := e37
dst[311:304] := e38
dst[319:312] := e39
dst[327:320] := e40
dst[335:328] := e41
dst[343:336] := e42
dst[351:344] := e43
dst[359:352] := e44
dst[367:360] := e45
dst[375:368] := e46
dst[383:376] := e47
dst[391:384] := e48
dst[399:392] := e49
dst[407:400] := e50
dst[415:408] := e51
dst[423:416] := e52
dst[431:424] := e53
dst[439:432] := e54
dst[447:440] := e55
dst[455:448] := e56
dst[463:456] := e57
dst[471:464] := e58
dst[479:472] := e59
dst[487:480] := e60
dst[495:488] := e61
dst[503:496] := e62
dst[511:504] := e63
dst[MAX:512] := 0
AVX512F
Set
Set packed 16-bit integers in "dst" with the supplied values.
dst[15:0] := e0
dst[31:16] := e1
dst[47:32] := e2
dst[63:48] := e3
dst[79:64] := e4
dst[95:80] := e5
dst[111:96] := e6
dst[127:112] := e7
dst[143:128] := e8
dst[159:144] := e9
dst[175:160] := e10
dst[191:176] := e11
dst[207:192] := e12
dst[223:208] := e13
dst[239:224] := e14
dst[255:240] := e15
dst[271:256] := e16
dst[287:272] := e17
dst[303:288] := e18
dst[319:304] := e19
dst[335:320] := e20
dst[351:336] := e21
dst[367:352] := e22
dst[383:368] := e23
dst[399:384] := e24
dst[415:400] := e25
dst[431:416] := e26
dst[447:432] := e27
dst[463:448] := e28
dst[479:464] := e29
dst[495:480] := e30
dst[511:496] := e31
dst[MAX:512] := 0
AVX512F
Set
Set packed 32-bit integers in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
dst[95:64] := e2
dst[127:96] := e3
dst[159:128] := e4
dst[191:160] := e5
dst[223:192] := e6
dst[255:224] := e7
dst[287:256] := e8
dst[319:288] := e9
dst[351:320] := e10
dst[383:352] := e11
dst[415:384] := e12
dst[447:416] := e13
dst[479:448] := e14
dst[511:480] := e15
dst[MAX:512] := 0
AVX512F
Set
Set packed 64-bit integers in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
dst[191:128] := e2
dst[255:192] := e3
dst[319:256] := e4
dst[383:320] := e5
dst[447:384] := e6
dst[511:448] := e7
dst[MAX:512] := 0
AVX512F
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
dst[191:128] := e2
dst[255:192] := e3
dst[319:256] := e4
dst[383:320] := e5
dst[447:384] := e6
dst[511:448] := e7
dst[MAX:512] := 0
AVX512F
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
dst[95:64] := e2
dst[127:96] := e3
dst[159:128] := e4
dst[191:160] := e5
dst[223:192] := e6
dst[255:224] := e7
dst[287:256] := e8
dst[319:288] := e9
dst[351:320] := e10
dst[383:352] := e11
dst[415:384] := e12
dst[447:416] := e13
dst[479:448] := e14
dst[511:480] := e15
dst[MAX:512] := 0
AVX512F
Set
Set packed 32-bit integers in "dst" with the repeated 4 element sequence in reverse order.
dst[31:0] := d
dst[63:32] := c
dst[95:64] := b
dst[127:96] := a
dst[159:128] := d
dst[191:160] := c
dst[223:192] := b
dst[255:224] := a
dst[287:256] := d
dst[319:288] := c
dst[351:320] := b
dst[383:352] := a
dst[415:384] := d
dst[447:416] := c
dst[479:448] := b
dst[511:480] := a
dst[MAX:512] := 0
AVX512F
Set
Set packed 64-bit integers in "dst" with the repeated 4 element sequence in reverse order.
dst[63:0] := d
dst[127:64] := c
dst[191:128] := b
dst[255:192] := a
dst[319:256] := d
dst[383:320] := c
dst[447:384] := b
dst[511:448] := a
dst[MAX:512] := 0
AVX512F
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the repeated 4 element sequence in reverse order.
dst[63:0] := d
dst[127:64] := c
dst[191:128] := b
dst[255:192] := a
dst[319:256] := d
dst[383:320] := c
dst[447:384] := b
dst[511:448] := a
dst[MAX:512] := 0
AVX512F
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the repeated 4 element sequence in reverse order.
dst[31:0] := d
dst[63:32] := c
dst[95:64] := b
dst[127:96] := a
dst[159:128] := d
dst[191:160] := c
dst[223:192] := b
dst[255:224] := a
dst[287:256] := d
dst[319:288] := c
dst[351:320] := b
dst[383:352] := a
dst[415:384] := d
dst[447:416] := c
dst[479:448] := b
dst[511:480] := a
dst[MAX:512] := 0
AVX512F
Set
Set packed 32-bit integers in "dst" with the supplied values in reverse order.
dst[31:0] := e15
dst[63:32] := e14
dst[95:64] := e13
dst[127:96] := e12
dst[159:128] := e11
dst[191:160] := e10
dst[223:192] := e9
dst[255:224] := e8
dst[287:256] := e7
dst[319:288] := e6
dst[351:320] := e5
dst[383:352] := e4
dst[415:384] := e3
dst[447:416] := e2
dst[479:448] := e1
dst[511:480] := e0
dst[MAX:512] := 0
AVX512F
Set
Set packed 64-bit integers in "dst" with the supplied values in reverse order.
dst[63:0] := e7
dst[127:64] := e6
dst[191:128] := e5
dst[255:192] := e4
dst[319:256] := e3
dst[383:320] := e2
dst[447:384] := e1
dst[511:448] := e0
dst[MAX:512] := 0
AVX512F
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst[63:0] := e7
dst[127:64] := e6
dst[191:128] := e5
dst[255:192] := e4
dst[319:256] := e3
dst[383:320] := e2
dst[447:384] := e1
dst[511:448] := e0
dst[MAX:512] := 0
AVX512F
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst[31:0] := e15
dst[63:32] := e14
dst[95:64] := e13
dst[127:96] := e12
dst[159:128] := e11
dst[191:160] := e10
dst[223:192] := e9
dst[255:224] := e8
dst[287:256] := e7
dst[319:288] := e6
dst[351:320] := e5
dst[383:352] := e4
dst[415:384] := e3
dst[447:416] := e2
dst[479:448] := e1
dst[511:480] := e0
dst[MAX:512] := 0
AVX512F
Set
Return vector of type __m512 with all elements set to zero.
dst[MAX:0] := 0
AVX512F
Set
Return vector of type __m512i with all elements set to zero.
dst[MAX:0] := 0
AVX512F
Set
Return vector of type __m512d with all elements set to zero.
dst[MAX:0] := 0
AVX512F
Set
Return vector of type __m512 with all elements set to zero.
dst[MAX:0] := 0
AVX512F
Set
Return vector of type __m512i with all elements set to zero.
dst[MAX:0] := 0
AVX512F
Set
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst".
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE LEFT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src << count) OR (src >> (32 - count))
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE LEFT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src << count) OR (src >> (64 - count))
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst".
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE RIGHT_ROTATE_DWORDS(src, count_src) {
count := count_src % 32
RETURN (src >>count) OR (src << (32 - count))
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst".
DEFINE RIGHT_ROTATE_QWORDS(src, count_src) {
count := count_src % 64
RETURN (src >> count) OR (src << (64 - count))
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF count[63:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0)
ELSE
dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0)
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF count[i+63:i] < 64
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (1.0 / a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (1.0 / a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[63:0] := (1.0 / b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[63:0] := (1.0 / b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.
dst[63:0] := (1.0 / b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[31:0] := (1.0 / b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[31:0] := (1.0 / b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.
dst[31:0] := (1.0 / b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (1.0 / SQRT(a[i+63:i]))
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[63:0] := (1.0 / SQRT(b[63:0]))
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[63:0] := (1.0 / SQRT(b[63:0]))
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14.
dst[63:0] := (1.0 / SQRT(b[63:0]))
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[31:0] := (1.0 / SQRT(b[31:0]))
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.
IF k[0]
dst[31:0] := (1.0 / SQRT(b[31:0]))
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14.
dst[31:0] := (1.0 / SQRT(b[31:0]))
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note].
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := SQRT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SQRT(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
[round_note].
FOR j := 0 to 7
i := j*64
dst[i+63:i] := SQRT(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := SQRT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SQRT(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
[round_note].
FOR j := 0 to 15
i := j*32
dst[i+31:i] := SQRT(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := SQRT(b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := SQRT(b[63:0])
ELSE
dst[63:0] := src[63:0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst[63:0] := SQRT(b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst[63:0] := SQRT(b[63:0])
ELSE
dst[63:0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := SQRT(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := SQRT(b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := SQRT(b[31:0])
ELSE
dst[31:0] := src[31:0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst[31:0] := SQRT(b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst[31:0] := SQRT(b[31:0])
ELSE
dst[31:0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := SQRT(b[31:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512F
Elementary Math Functions
Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512d to type __m128d.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512 to type __m128.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512d to type __m256d.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512 to type __m256.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512i to type __m128i.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512i to type __m256i.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m128 to type __m512; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m256 to type __m512; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Return vector of type __m512 with undefined elements.
AVX512F
General Support
Return vector of type __m512i with undefined elements.
AVX512F
General Support
Return vector of type __m512d with undefined elements.
AVX512F
General Support
Return vector of type __m512 with undefined elements.
AVX512F
General Support
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := c[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := c[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). RM.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). RM.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".
FOR j := 0 to 15
i := j*32
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
[round_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
[round_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Reduce the packed 32-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[31:0] + src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_ADD(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 0
FI
ENDFOR
dst[31:0] := REDUCE_ADD(tmp, 16)
AVX512F
Arithmetic
Reduce the packed 64-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[63:0] + src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_ADD(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 0
FI
ENDFOR
dst[63:0] := REDUCE_ADD(tmp, 8)
AVX512F
Arithmetic
Reduce the packed double-precision (64-bit) floating-point elements in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[63:0] + src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_ADD(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 0
FI
ENDFOR
dst[63:0] := REDUCE_ADD(tmp, 8)
AVX512F
Arithmetic
Reduce the packed single-precision (32-bit) floating-point elements in "a" by addition using mask "k". Returns the sum of all active elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[31:0] + src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_ADD(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 0
FI
ENDFOR
dst[31:0] := REDUCE_ADD(tmp, 16)
AVX512F
Arithmetic
Reduce the packed 32-bit integers in "a" by multiplication using mask "k". Returns the product of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[31:0] * src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_MUL(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 1
FI
ENDFOR
dst[31:0] := REDUCE_MUL(tmp, 16)
AVX512F
Arithmetic
Reduce the packed 64-bit integers in "a" by multiplication using mask "k". Returns the product of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[63:0] * src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_MUL(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 1
FI
ENDFOR
dst[63:0] := REDUCE_MUL(tmp, 8)
AVX512F
Arithmetic
Reduce the packed double-precision (64-bit) floating-point elements in "a" by multiplication using mask "k". Returns the product of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[63:0] * src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_MUL(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 1.0
FI
ENDFOR
dst[63:0] := REDUCE_MUL(tmp, 8)
AVX512F
Arithmetic
Reduce the packed single-precision (32-bit) floating-point elements in "a" by multiplication using mask "k". Returns the product of all active elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[31:0] * src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_MUL(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := FP32(1.0)
FI
ENDFOR
dst[31:0] := REDUCE_MUL(tmp, 16)
AVX512F
Arithmetic
Reduce the packed 32-bit integers in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[31:0] + src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_ADD(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_ADD(a, 16)
AVX512F
Arithmetic
Reduce the packed 64-bit integers in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[63:0] + src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_ADD(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_ADD(a, 8)
AVX512F
Arithmetic
Reduce the packed double-precision (64-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[63:0] + src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] + src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_ADD(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_ADD(a, 8)
AVX512F
Arithmetic
Reduce the packed single-precision (32-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".
DEFINE REDUCE_ADD(src, len) {
IF len == 2
RETURN src[31:0] + src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] + src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_ADD(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_ADD(a, 16)
AVX512F
Arithmetic
Reduce the packed 32-bit integers in "a" by multiplication. Returns the product of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[31:0] * src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_MUL(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MUL(a, 16)
AVX512F
Arithmetic
Reduce the packed 64-bit integers in "a" by multiplication. Returns the product of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[63:0] * src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_MUL(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MUL(a, 8)
AVX512F
Arithmetic
Reduce the packed double-precision (64-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[63:0] * src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] * src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_MUL(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MUL(a, 8)
AVX512F
Arithmetic
Reduce the packed single-precision (32-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".
DEFINE REDUCE_MUL(src, len) {
IF len == 2
RETURN src[31:0] * src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] * src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_MUL(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MUL(a, 16)
AVX512F
Arithmetic
Finds the absolute value of each packed single-precision (32-bit) floating-point element in "v2", storing the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ABS(v2[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Finds the absolute value of each packed single-precision (32-bit) floating-point element in "v2", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ABS(v2[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Finds the absolute value of each packed double-precision (64-bit) floating-point element in "v2", storing the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ABS(v2[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Finds the absolute value of each packed double-precision (64-bit) floating-point element in "v2", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ABS(v2[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Arithmetic
Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 64 bytes (16 elements) in "dst".
temp[1023:512] := a[511:0]
temp[511:0] := b[511:0]
temp[1023:0] := temp[1023:0] >> (32*imm8[3:0])
dst[511:0] := temp[511:0]
dst[MAX:512] := 0
AVX512F
Miscellaneous
Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 64 bytes (16 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
temp[1023:512] := a[511:0]
temp[511:0] := b[511:0]
temp[1023:0] := temp[1023:0] >> (32*imm8[3:0])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := temp[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
[sae_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
[sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
[sae_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
[sae_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
FOR j := 0 to 7
i := j*64
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv)
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
FOR j := 0 to 15
i := j*32
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign.
[getmant_note][sae_note]
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Miscellaneous
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the "permutevar" name. This intrinsic is identical to "_mm512_mask_permutexvar_epi32", and it is recommended that you use that intrinsic name.
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
IF k[j]
dst[i+31:i] := a[id+31:id]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the "permutevar" name. This intrinsic is identical to "_mm512_permutexvar_epi32", and it is recommended that you use that intrinsic name.
FOR j := 0 to 15
i := j*32
id := idx[i+3:i]*32
dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0])
tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2])
tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4])
tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6])
tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0])
tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2])
tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4])
tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6])
tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0])
tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2])
tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4])
tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6])
tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0])
tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2])
tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4])
tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6])
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := tmp_dst[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Swizzle
Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
dst[159:128] := SELECT4(a[255:128], imm8[1:0])
dst[191:160] := SELECT4(a[255:128], imm8[3:2])
dst[223:192] := SELECT4(a[255:128], imm8[5:4])
dst[255:224] := SELECT4(a[255:128], imm8[7:6])
dst[287:256] := SELECT4(a[383:256], imm8[1:0])
dst[319:288] := SELECT4(a[383:256], imm8[3:2])
dst[351:320] := SELECT4(a[383:256], imm8[5:4])
dst[383:352] := SELECT4(a[383:256], imm8[7:6])
dst[415:384] := SELECT4(a[511:384], imm8[1:0])
dst[447:416] := SELECT4(a[511:384], imm8[3:2])
dst[479:448] := SELECT4(a[511:384], imm8[5:4])
dst[511:480] := SELECT4(a[511:384], imm8[7:6])
dst[MAX:512] := 0
AVX512F
Swizzle
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] == b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] <= b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] < b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] != b[i+63:i]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (!(a[i+63:i] <= b[i+63:i])) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (!(a[i+63:i] < b[i+63:i])) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k".
FOR j := 0 to 7
i := j*64
k[j] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (a[i+63:i] == b[i+63:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (a[i+63:i] <= b[i+63:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (a[i+63:i] < b[i+63:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (a[i+63:i] != b[i+63:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (!(a[i+63:i] <= b[i+63:i])) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (!(a[i+63:i] < b[i+63:i])) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k1[j]
k[j] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 15
i := j*32
k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 15
i := j*32
k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := (a[i+31:i] == b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := (a[i+31:i] <= b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := (a[i+31:i] < b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := (a[i+31:i] != b[i+31:i]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := (!(a[i+31:i] <= b[i+31:i])) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := (!(a[i+31:i] < b[i+31:i])) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ((a[i+31:i] != NaN) AND (b[i+31:i] != NaN)) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ((a[i+31:i] == NaN) OR (b[i+31:i] == NaN)) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := (a[i+31:i] == b[i+31:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := (a[i+31:i] <= b[i+31:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := (a[i+31:i] < b[i+31:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := (a[i+31:i] != b[i+31:i]) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := (!(a[i+31:i] <= b[i+31:i])) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := (!(a[i+31:i] < b[i+31:i])) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] != NaN) AND (b[i+31:i] != NaN)) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] == NaN) OR (b[i+31:i] == NaN)) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k".
FOR j := 0 to 15
i := j*32
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[2:0]) OF
0: OP := _MM_CMPINT_EQ
1: OP := _MM_CMPINT_LT
2: OP := _MM_CMPINT_LE
3: OP := _MM_CMPINT_FALSE
4: OP := _MM_CMPINT_NE
5: OP := _MM_CMPINT_NLT
6: OP := _MM_CMPINT_NLE
7: OP := _MM_CMPINT_TRUE
ESAC
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Compare
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits (composed of 16 packed 32-bit integers) from memory into "dst".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits of integer data from memory into "dst".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Load 512-bits (composed of 8 packed 64-bit integers) from memory into "dst".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512F
Load
Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+31:i] := MEM[addr+31:addr]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Loads 8 64-bit integer elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" and stores them in "dst".
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Loads 8 64-bit integer elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Loads 8 double-precision (64-bit) floating-point elements stored at memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" them in "dst".
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Loads 8 double-precision (64-bit) floating-point elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Load
Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Move
Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 15
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store packed 32-bit integers from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 15
i := j*32
IF k[j]
MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits (composed of 16 packed 32-bit integers) from "a" into memory.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store 512-bits of integer data from "a" into memory.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Store packed 64-bit integers from "a" into memory using writemask "k".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
FOR j := 0 to 7
i := j*64
IF k[j]
MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Store 512-bits (composed of 8 packed 64-bit integers) from "a" into memory.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512F
Store
Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
Store
Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
ENDFOR
AVX512F
Store
Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8.
FOR j := 0 to 15
i := j*32
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+31:addr] := a[i+31:i]
FI
ENDFOR
AVX512F
Store
Stores 8 packed double-precision (64-bit) floating-point elements in "a" and to memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale".
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
Store
Stores 8 packed double-precision (64-bit) floating-point elements in "a" to memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". Only those elements whose corresponding mask bit is set in writemask "k" are written to memory.
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] AND b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise AND of 512 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[511:0] := (a[511:0] AND b[511:0])
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of 512 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
dst[511:0] := ((NOT a[511:0]) AND b[511:0])
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of 512 bits (composed of packed 64-bit integers) in "a" and then AND with "b", and store the results in "dst".
dst[511:0] := ((NOT a[511:0]) AND b[511:0])
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise AND of 512 bits (composed of packed 64-bit integers) in "a" and "b", and store the results in "dst".
dst[511:0] := (a[511:0] AND b[511:0])
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] AND b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of 512 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[511:0] := (a[511:0] OR b[511:0])
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the resut in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero.
FOR j := 0 to 15
i := j*32
IF k1[j]
k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512F
Logical
Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero.
FOR j := 0 to 15
i := j*32
k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512F
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise XOR of 512 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[511:0] := (a[511:0] XOR b[511:0])
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Reduce the packed 32-bit integers in "a" by bitwise AND using mask "k". Returns the bitwise AND of all active elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[31:0] AND src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] AND src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_AND(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 0xFFFFFFFF
FI
ENDFOR
dst[31:0] := REDUCE_AND(tmp, 16)
AVX512F
Logical
Reduce the packed 64-bit integers in "a" by bitwise AND using mask "k". Returns the bitwise AND of all active elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[63:0] AND src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] AND src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_AND(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 0xFFFFFFFFFFFFFFFF
FI
ENDFOR
dst[63:0] := REDUCE_AND(tmp, 8)
AVX512F
Logical
Reduce the packed 32-bit integers in "a" by bitwise OR using mask "k". Returns the bitwise OR of all active elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[31:0] OR src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] OR src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_OR(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 0
FI
ENDFOR
dst[31:0] := REDUCE_OR(tmp, 16)
AVX512F
Logical
Reduce the packed 64-bit integers in "a" by bitwise OR using mask "k". Returns the bitwise OR of all active elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[63:0] OR src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] OR src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_OR(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 0
FI
ENDFOR
dst[63:0] := REDUCE_OR(tmp, 8)
AVX512F
Logical
Reduce the packed 32-bit integers in "a" by bitwise AND. Returns the bitwise AND of all elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[31:0] AND src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] AND src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_AND(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_AND(a, 16)
AVX512F
Logical
Reduce the packed 64-bit integers in "a" by bitwise AND. Returns the bitwise AND of all elements in "a".
DEFINE REDUCE_AND(src, len) {
IF len == 2
RETURN src[63:0] AND src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] AND src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_AND(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_AND(a, 8)
AVX512F
Logical
Reduce the packed 32-bit integers in "a" by bitwise OR. Returns the bitwise OR of all elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[31:0] OR src[63:32]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := src[i+31:i] OR src[i+32*len+31:i+32*len]
ENDFOR
RETURN REDUCE_OR(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_OR(a, 16)
AVX512F
Logical
Reduce the packed 64-bit integers in "a" by bitwise OR. Returns the bitwise OR of all elements in "a".
DEFINE REDUCE_OR(src, len) {
IF len == 2
RETURN src[63:0] OR src[127:64]
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := src[i+63:i] OR src[i+64*len+63:i+64*len]
ENDFOR
RETURN REDUCE_OR(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_OR(a, 8)
AVX512F
Logical
Performs element-by-element bitwise AND between packed 32-bit integer elements of "v2" and "v3", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := v2[i+31:i] & v3[i+31:i]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Logical
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Special Math Functions
Reduce the packed signed 32-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MAX(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := Int32(-0x80000000)
FI
ENDFOR
dst[31:0] := REDUCE_MAX(tmp, 16)
AVX512F
Special Math Functions
Reduce the packed signed 64-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MAX(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := Int64(-0x8000000000000000)
FI
ENDFOR
dst[63:0] := REDUCE_MAX(tmp, 8)
AVX512F
Special Math Functions
Reduce the packed unsigned 32-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MAX(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 0
FI
ENDFOR
dst[31:0] := REDUCE_MAX(tmp, 16)
AVX512F
Special Math Functions
Reduce the packed unsigned 64-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MAX(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 0
FI
ENDFOR
dst[63:0] := REDUCE_MAX(tmp, 8)
AVX512F
Special Math Functions
Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MAX(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := Cast_FP64(0xFFEFFFFFFFFFFFFF)
FI
ENDFOR
dst[63:0] := REDUCE_MAX(tmp, 8)
AVX512F
Special Math Functions
Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum using mask "k". Returns the maximum of all active elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MAX(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := Cast_FP32(0xFF7FFFFF)
FI
ENDFOR
dst[31:0] := REDUCE_MAX(tmp, 16)
AVX512F
Special Math Functions
Reduce the packed signed 32-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MIN(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := Int32(0x7FFFFFFF)
FI
ENDFOR
dst[31:0] := REDUCE_MIN(tmp, 16)
AVX512F
Special Math Functions
Reduce the packed signed 64-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MIN(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := Int64(0x7FFFFFFFFFFFFFFF)
FI
ENDFOR
dst[63:0] := REDUCE_MIN(tmp, 8)
AVX512F
Special Math Functions
Reduce the packed unsigned 32-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MIN(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := 0xFFFFFFFF
FI
ENDFOR
dst[31:0] := REDUCE_MIN(tmp, 16)
AVX512F
Special Math Functions
Reduce the packed unsigned 64-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MIN(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := 0xFFFFFFFFFFFFFFFF
FI
ENDFOR
dst[63:0] := REDUCE_MIN(tmp, 8)
AVX512F
Special Math Functions
Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". [min_float_note]
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MIN(src[64*len-1:0], len)
}
tmp := a
FOR j := 0 to 8
i := j*64
IF k[j]
tmp[i+63:i] := a[i+63:i]
ELSE
tmp[i+63:i] := Cast_FP64(0x7FEFFFFFFFFFFFFF)
FI
ENDFOR
dst[63:0] := REDUCE_MIN(tmp, 8)
AVX512F
Special Math Functions
Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". [min_float_note]
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MIN(src[32*len-1:0], len)
}
tmp := a
FOR j := 0 to 16
i := j*32
IF k[j]
tmp[i+31:i] := a[i+31:i]
ELSE
tmp[i+31:i] := Cast_FP32(0x7F7FFFFF)
FI
ENDFOR
dst[31:0] := REDUCE_MIN(tmp, 16)
AVX512F
Special Math Functions
Reduce the packed signed 32-bit integers in "a" by maximum. Returns the maximum of all elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MAX(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MAX(a, 16)
AVX512F
Special Math Functions
Reduce the packed signed 64-bit integers in "a" by maximum. Returns the maximum of all elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MAX(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MAX(a, 8)
AVX512F
Special Math Functions
Reduce the packed unsigned 32-bit integers in "a" by maximum. Returns the maximum of all elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MAX(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MAX(a, 16)
AVX512F
Special Math Functions
Reduce the packed unsigned 64-bit integers in "a" by maximum. Returns the maximum of all elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MAX(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MAX(a, 8)
AVX512F
Special Math Functions
Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[63:0] > src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] > src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MAX(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MAX(a, 8)
AVX512F
Special Math Functions
Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".
DEFINE REDUCE_MAX(src, len) {
IF len == 2
RETURN (src[31:0] > src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] > src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MAX(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MAX(a, 16)
AVX512F
Special Math Functions
Reduce the packed signed 32-bit integers in "a" by minimum. Returns the minimum of all elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MIN(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MIN(a, 16)
AVX512F
Special Math Functions
Reduce the packed signed 64-bit integers in "a" by minimum. Returns the minimum of all elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MIN(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MIN(a, 8)
AVX512F
Special Math Functions
Reduce the packed unsigned 32-bit integers in "a" by minimum. Returns the minimum of all elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MIN(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MIN(a, 16)
AVX512F
Special Math Functions
Reduce the packed unsigned 64-bit integers in "a" by minimum. Returns the minimum of all elements in "a".
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MIN(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MIN(a, 8)
AVX512F
Special Math Functions
Reduce the packed double-precision (64-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". [min_float_note]
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[63:0] < src[127:64] ? src[63:0] : src[127:64])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*64
src[i+63:i] := (src[i+63:i] < src[i+64*len+63:i+64*len] ? src[i+63:i] : src[i+64*len+63:i+64*len])
ENDFOR
RETURN REDUCE_MIN(src[64*len-1:0], len)
}
dst[63:0] := REDUCE_MIN(a, 8)
AVX512F
Special Math Functions
Reduce the packed single-precision (32-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". [min_float_note]
DEFINE REDUCE_MIN(src, len) {
IF len == 2
RETURN (src[31:0] < src[63:32] ? src[31:0] : src[63:32])
FI
len := len / 2
FOR j:= 0 to (len-1)
i := j*32
src[i+31:i] := (src[i+31:i] < src[i+32*len+31:i+32*len] ? src[i+31:i] : src[i+32*len+31:i+32*len])
ENDFOR
RETURN REDUCE_MIN(src[32*len-1:0], len)
}
dst[31:0] := REDUCE_MIN(a, 16)
AVX512F
Special Math Functions
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0)
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 15
i := j*32
IF count[i+31:i] < 32
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Shift
Cast vector of type __m512d to type __m512.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512d to type __m512i.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512 to type __m512d.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512 to type __m512i.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512i to type __m512d.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Cast vector of type __m512i to type __m512.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512F
Cast
Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst".
FOR j := 0 to 7
i := j*32
n := j*64
dst[n+63:n] := Convert_FP32_To_FP64(v2[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[l+63:l] := Convert_FP32_To_FP64(v2[i+31:i])
ELSE
dst[l+63:l] := src[l+63:l]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs element-by-element conversion of the lower half of packed 32-bit integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst".
FOR j := 0 to 7
i := j*32
l := j*64
dst[l+63:l] := Convert_Int32_To_FP64(v2[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs element-by-element conversion of the lower half of packed 32-bit integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
n := j*64
IF k[j]
dst[n+63:n] := Convert_Int32_To_FP64(v2[i+31:i])
ELSE
dst[n+63:n] := src[n+63:n]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs element-by-element conversion of the lower half of packed 32-bit unsigned integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst".
FOR j := 0 to 7
i := j*32
n := j*64
dst[n+63:n] := Convert_Int32_To_FP64(v2[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs element-by-element conversion of the lower half of 32-bit unsigned integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
l := j*64
IF k[j]
dst[l+63:l] := Convert_Int32_To_FP64(v2[i+31:i])
ELSE
dst[l+63:l] := src[l+63:l]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to single-precision (32-bit) floating-point elements and stores them in "dst". The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.
FOR j := 0 to 7
i := j*64
k := j*32
dst[k+31:k] := Convert_FP64_To_FP32(v2[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to single-precision (32-bit) floating-point elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0.
FOR j := 0 to 7
i := j*64
l := j*32
IF k[j]
dst[l+31:l] := Convert_FP64_To_FP32(v2[i+63:i])
ELSE
dst[l+31:l] := src[l+31:l]
FI
ENDFOR
dst[MAX:512] := 0
AVX512F
Convert
Stores 8 packed 64-bit integer elements located in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale".
FOR j := 0 to 7
i := j*64
m := j*32
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
ENDFOR
AVX512F
Store
Stores 8 packed 64-bit integer elements located in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using writemask "k" (elements whose corresponding mask bit is not set are not written to memory).
FOR j := 0 to 7
i := j*64
m := j*32
IF k[j]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
MEM[addr+63:addr] := a[i+63:i]
FI
ENDFOR
AVX512F
Store
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:256] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:128] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:256] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:128] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512IFMA52
AVX512VL
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:512] := 0
AVX512IFMA52
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512IFMA52
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512IFMA52
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*64
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:512] := 0
AVX512IFMA52
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512IFMA52
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i])
dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512IFMA52
Arithmetic
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := POPCNT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := POPCNT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 3
i := j*64
dst[i+63:i] := POPCNT(a[i+63:i])
ENDFOR
dst[MAX:256] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := POPCNT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := POPCNT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := POPCNT(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*32
dst[i+31:i] := POPCNT(a[i+31:i])
ENDFOR
dst[MAX:256] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := POPCNT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := POPCNT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 3
i := j*32
dst[i+31:i] := POPCNT(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := POPCNT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := POPCNT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512VPOPCNTDQ
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*32
dst[i+31:i] := POPCNT(a[i+31:i])
ENDFOR
dst[MAX:512] := 0
AVX512VPOPCNTDQ
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POPCNT(a[i+31:i])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512VPOPCNTDQ
Bit Manipulation
Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := POPCNT(a[i+31:i])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512VPOPCNTDQ
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*64
dst[i+63:i] := POPCNT(a[i+63:i])
ENDFOR
dst[MAX:512] := 0
AVX512VPOPCNTDQ
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POPCNT(a[i+63:i])
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512VPOPCNTDQ
Bit Manipulation
Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := POPCNT(a[i+63:i])
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512VPOPCNTDQ
Bit Manipulation
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 15
i := j*32
m := j*16
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 15
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 15
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Convert
Convert the BF16 (16-bit) floating-point element in "a" to a floating-point element, and store the result in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
dst[31:0] := Convert_BF16_To_FP32(a[15:0])
AVX512_BF16
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst".
FOR j := 0 to 31
IF j < 16
t := b.fp32[j]
ELSE
t := a.fp32[j-16]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF j < 16
t := b.fp32[j]
ELSE
t := a.fp32[j-16]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF j < 16
t := b.fp32[j]
ELSE
t := a.fp32[j-16]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 15
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512F
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512F
Convert
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 15
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 15
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 15
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BF16
AVX512F
Arithmetic
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 3
i := j*32
m := j*16
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 3
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 3
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 7
i := j*32
m := j*16
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 7
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed BF16 (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic neither raises any floating point exceptions nor turns sNAN into qNAN.
FOR j := 0 to 7
i := j*32
m := j*16
IF k[j]
dst[i+31:i] := Convert_BF16_To_FP32(a[m+15:m])
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Convert
Convert the single-precision (32-bit) floating-point element in "a" to a BF16 (16-bit) floating-point element, and store the result in "dst".
dst[15:0] := Convert_FP32_To_BF16(a[31:0])
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst".
FOR j := 0 to 7
IF j < 4
t := b.fp32[j]
ELSE
t := a.fp32[j-4]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF j < 4
t := b.fp32[j]
ELSE
t := a.fp32[j-4]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF j < 4
t := b.fp32[j]
ELSE
t := a.fp32[j-4]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst".
FOR j := 0 to 15
IF j < 8
t := b.fp32[j]
ELSE
t := a.fp32[j-8]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF j < 8
t := b.fp32[j]
ELSE
t := a.fp32[j-8]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF j < 8
t := b.fp32[j]
ELSE
t := a.fp32[j-8]
FI
dst.word[j] := Convert_FP32_To_BF16(t)
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.word[j] := Convert_FP32_To_BF16(a.fp32[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Convert
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 3
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 3
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 3
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BF16
AVX512VL
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst".
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 7
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 7
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Arithmetic
Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE make_fp32(x[15:0]) {
y.fp32 := 0.0
y[31:16] := x[15:0]
RETURN y
}
dst := src
FOR j := 0 to 7
IF k[j]
dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1])
dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BF16
AVX512VL
Arithmetic
Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 3 //Qword
FOR j := 0 to 7 // Byte
IF k[i*8+j]
m := c.qword[i].byte[j] & 0x3F
dst[i*8+j] := b.qword[i].bit[m]
ELSE
dst[i*8+j] := 0
FI
ENDFOR
ENDFOR
dst[MAX:32] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst".
FOR i := 0 to 3 //Qword
FOR j := 0 to 7 // Byte
m := c.qword[i].byte[j] & 0x3F
dst[i*8+j] := b.qword[i].bit[m]
ENDFOR
ENDFOR
dst[MAX:32] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 1 //Qword
FOR j := 0 to 7 // Byte
IF k[i*8+j]
m := c.qword[i].byte[j] & 0x3F
dst[i*8+j] := b.qword[i].bit[m]
ELSE
dst[i*8+j] := 0
FI
ENDFOR
ENDFOR
dst[MAX:16] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst".
FOR i := 0 to 1 //Qword
FOR j := 0 to 7 // Byte
m := c.qword[i].byte[j] & 0x3F
dst[i*8+j] := b.qword[i].bit[m]
ENDFOR
ENDFOR
dst[MAX:16] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*16
dst[i+15:i] := POPCNT(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := POPCNT(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := POPCNT(a[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*16
dst[i+15:i] := POPCNT(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := POPCNT(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := POPCNT(a[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 31
i := j*8
dst[i+7:i] := POPCNT(a[i+7:i])
ENDFOR
dst[MAX:256] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := POPCNT(a[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := POPCNT(a[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*8
dst[i+7:i] := POPCNT(a[i+7:i])
ENDFOR
dst[MAX:128] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := POPCNT(a[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := POPCNT(a[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_BITALG
AVX512VL
Bit Manipulation
Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 7 //Qword
FOR j := 0 to 7 // Byte
IF k[i*8+j]
m := c.qword[i].byte[j] & 0x3F
dst[i*8+j] := b.qword[i].bit[m]
ELSE
dst[i*8+j] := 0
FI
ENDFOR
ENDFOR
dst[MAX:64] := 0
AVX512_BITALG
Bit Manipulation
Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst".
FOR i := 0 to 7 //Qword
FOR j := 0 to 7 // Byte
m := c.qword[i].byte[j] & 0x3F
dst[i*8+j] := b.qword[i].bit[m]
ENDFOR
ENDFOR
dst[MAX:64] := 0
AVX512_BITALG
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 31
i := j*16
dst[i+15:i] := POPCNT(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_BITALG
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := POPCNT(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BITALG
Bit Manipulation
Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := POPCNT(a[i+15:i])
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BITALG
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst".
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 63
i := j*8
dst[i+7:i] := POPCNT(a[i+7:i])
ENDFOR
dst[MAX:512] := 0
AVX512_BITALG
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := POPCNT(a[i+7:i])
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BITALG
Bit Manipulation
Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE POPCNT(a) {
count := 0
DO WHILE a > 0
count += a[0]
a >>= 1
OD
RETURN count
}
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := POPCNT(a[i+7:i])
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_BITALG
Bit Manipulation
Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ACOS(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ACOSH(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ASIN(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ASINH(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ATAN2(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ATAN(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ATANH(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 15
i := j*16
dst[i+15:i] := CubeRoot(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
Probability/StatisticsFOR j := 0 to 15
i := j*16
dst[i+15:i] := CDFNormal(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
Probability/StatisticsFOR j := 0 to 15
i := j*16
dst[i+15:i] := InverseCDFNormal(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := COS(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 15
i := j*16
dst[i+15:i] := COSD(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := COSH(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 15
i := j*16
dst[i+15:i] := ERF(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 15
i := j*16
dst[i+15:i] := 1.0 - ERF(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 15
i := j*16
dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 15
i := j*16
dst[i+15:i] := 1.0 / ERF(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := POW(FP16(e), a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SQRT(POW(a[i+15:i], 2.0) + POW(b[i+15:i], 2.0))
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 15
i := j*16
dst[i+15:i] := InvCubeRoot(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 15
i := j*16
dst[i+15:i] := InvSQRT(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := LOG(1.0 + a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := LOG(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
Elementary Math FunctionsFOR j := 0 to 15
i := j*16
dst[i+15:i] := ConvertExpFP16(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the exponential value of packed half-precision (16-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := POW(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SIN(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SIN(a[i+15:i])
MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
cos_res[MAX:256] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 15
i := j*16
dst[i+15:i] := SIND(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SINH(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := CEIL(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := FLOOR(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ROUND(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".
Elementary Math Functions
FOR j := 0 to 15
i := j*16
dst[i+15:i] := SQRT(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := TAN(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 15
i := j*16
dst[i+15:i] := TAND(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 15
i := j*16
dst[i+15:i] := TANH(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst"
Special Math FunctionsFOR j := 0 to 15
i := j*16
dst[i+15:i] := TRUNCATE(a[i+15:i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ACOS(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ACOSH(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ASIN(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ASINH(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ATAN2(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ATAN(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse hyperblic tangent of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" expressed in radians.
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ATANH(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 31
i := j*16
dst[i+15:i] := CubeRoot(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
Probability/StatisticsFOR j := 0 to 31
i := j*16
dst[i+15:i] := CDFNormal(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
Probability/StatisticsFOR j := 0 to 31
i := j*16
dst[i+15:i] := InverseCDFNormal(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := CEIL(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := COS(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 31
i := j*16
dst[i+15:i] := COSD(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := COSH(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 31
i := j*16
dst[i+15:i] := ERF(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 31
i := j*16
dst[i+15:i] := 1.0 - ERF(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 31
i := j*16
dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 31
i := j*16
dst[i+15:i] := 1.0 / ERF(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := POW(FP16(e), a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := FLOOR(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := SQRT(POW(a[i+15:i], 2.0) + POW(b[i+15:i], 2.0))
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 31
i := j*16
dst[i+15:i] := InvSQRT(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := LOG(1.0 + a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := LOG(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
Elementary Math FunctionsFOR j := 0 to 31
i := j*16
dst[i+15:i] := ConvertExpFP16(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ACOS(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ACOSH(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ASIN(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ASINH(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ATAN(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ATANH(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math FunctionsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := CubeRoot(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Probability/StatisticsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := CDFNormal(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Probability/StatisticsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := InverseCDFNormal(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Special Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := CEIL(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := COS(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
TrigonometryFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := COSD(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := COSH(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Probability/StatisticsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ERF(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Probability/StatisticsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := 1.0 - ERF(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Probability/StatisticsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Probability/StatisticsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := 1.0 / ERF(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := POW(FP16(e), a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Special Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := FLOOR(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math FunctionsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := InvSQRT(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := LOG(1.0 + a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := LOG(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
Elementary Math FunctionsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ConvertExpFP16(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Rounds each packed half-precision (16-bit) floating-point element in "a" to the nearest integer value and stores the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Special Math FunctionsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := NearbyInt(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Computes the reciprocal of packed half-precision (16-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Elementary Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := (1.0 / a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Rounds the packed half-precision (16-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Special Math FunctionsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := RoundToNearestEven(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SIN(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SIN(a[i+15:i])
MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
ELSE
dst[i+15:i] := sin_src[i+15:i]
MEM[mem_addr+i+15:mem_addr+i] := cos_src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
cos_res[MAX:512] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
TrigonometryFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SIND(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := SINH(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Special Math Functions
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ROUND(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := TAN(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
TrigonometryFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := TAND(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Trigonometry
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := TANH(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
Special Math FunctionsFOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := TRUNCATE(a[i+15:i])
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Rounds each packed half-precision (16-bit) floating-point element in "a" to the nearest integer value and stores the results as packed half-precision floating-point elements in "dst".
Special Math FunctionsFOR j := 0 to 31
i := j*16
dst[i+15:i] := NearbyInt(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the exponential value of packed half-precision (16-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := POW(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Computes the reciprocal of packed half-precision (16-bit) floating-point elements in "a", storing the results in "dst".
Elementary Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := (1.0 / a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Rounds the packed half-precision (16-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst".
Special Math FunctionsFOR j := 0 to 31
i := j*16
dst[i+15:i] := RoundToNearestEven(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := SIN(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := SIN(a[i+15:i])
MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
cos_res[MAX:512] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 31
i := j*16
dst[i+15:i] := SIND(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := SINH(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ROUND(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := TAN(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 31
i := j*16
dst[i+15:i] := TAND(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 31
i := j*16
dst[i+15:i] := TANH(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst".
Special Math FunctionsFOR j := 0 to 31
i := j*16
dst[i+15:i] := TRUNCATE(a[i+15:i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Compute the inverse cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ACOS(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ACOSH(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ASIN(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ASINH(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ATAN2(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ATAN(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ATANH(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 7
i := j*16
dst[i+15:i] := CubeRoot(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
Probability/StatisticsFOR j := 0 to 7
i := j*16
dst[i+15:i] := CDFNormal(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse cumulative distribution function of packed half-precision (16-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
Probability/StatisticsFOR j := 0 to 7
i := j*16
dst[i+15:i] := InverseCDFNormal(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := COS(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 7
i := j*16
dst[i+15:i] := COSD(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the hyperbolic cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := COSH(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 7
i := j*16
dst[i+15:i] := ERF(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 7
i := j*16
dst[i+15:i] := 1.0 - ERF(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse complementary error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 7
i := j*16
dst[i+15:i] := 1.0 / (1.0 - ERF(a[i+15:i]))
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse error function of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Probability/StatisticsFOR j := 0 to 7
i := j*16
dst[i+15:i] := 1.0 / ERF(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the exponential value of 10 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := POW(FP16(10.0), a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the exponential value of 2 raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := POW(FP16(2.0), a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := POW(FP16(e), a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the exponential value of "e" raised to the power of packed half-precision (16-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := POW(FP16(e), a[i+15:i]) - 1.0
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SQRT(POW(a[i+15:i], 2.0) + POW(b[i+15:i], 2.0))
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse cube root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 7
i := j*16
dst[i+15:i] := InvCubeRoot(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the inverse square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math FunctionsFOR j := 0 to 7
i := j*16
dst[i+15:i] := InvSQRT(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the base-10 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := LOG(a[i+15:i]) / LOG(10.0)
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the natural logarithm of one plus packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := LOG(1.0 + a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the base-2 logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := LOG(a[i+15:i]) / LOG(2.0)
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the natural logarithm of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := LOG(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
Elementary Math FunctionsFOR j := 0 to 7
i := j*16
dst[i+15:i] := ConvertExpFP16(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the exponential value of packed half-precision (16-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := POW(a[i+15:i], b[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SIN(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the sine and cosine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SIN(a[i+15:i])
MEM[mem_addr+i+15:mem_addr+i] := COS(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
cos_res[MAX:128] := 0
AVX512_FP16
Compute the sine of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 7
i := j*16
dst[i+15:i] := SIND(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the hyperbolic sine of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SINH(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" up to an integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := CEIL(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" down to an integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := FLOOR(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Round the packed half-precision (16-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed half-precision floating-point elements in "dst".
Special Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ROUND(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".
Elementary Math Functions
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SQRT(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := TAN(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
TrigonometryFOR j := 0 to 7
i := j*16
dst[i+15:i] := TAND(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Compute the hyperbolic tangent of packed half-precision (16-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
Trigonometry
FOR j := 0 to 7
i := j*16
dst[i+15:i] := TANH(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Truncate the packed half-precision (16-bit) floating-point elements in "a", and store the results as packed half-precision floating-point elements in "dst".
Special Math FunctionsFOR j := 0 to 7
i := j*16
dst[i+15:i] := TRUNCATE(a[i+15:i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 TO 15
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 15
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 15
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 TO 15
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR i := 0 TO 7
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 TO 7
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 TO 7
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR i := 0 TO 15
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 TO 15
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 TO 15
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 3
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 7
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".
tmp := a
FOR i := 0 to 7
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+8]
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+4]
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+2]
ENDFOR
dst.fp16[0] := tmp.fp16[0] + tmp.fp16[1]
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (316-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".
tmp := a
FOR i := 0 to 7
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+8]
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+4]
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+2]
ENDFOR
dst.fp16[0] := tmp.fp16[0] * tmp.fp16[1]
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".
tmp := a
FOR i := 0 to 7
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
ENDFOR
dst.fp16[0] := (tmp.fp16[0] > tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a".
tmp := a
FOR i := 0 to 7
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
ENDFOR
dst.fp16[0] := (tmp.fp16[0] < tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".
tmp := a
FOR i := 0 to 3
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+4]
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+2]
ENDFOR
dst.fp16[0] := tmp.fp16[0] + tmp.fp16[1]
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".
tmp := a
FOR i := 0 to 3
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+4]
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+2]
ENDFOR
dst.fp16[0] := tmp.fp16[0] * tmp.fp16[1]
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a".
tmp := a
FOR i := 0 to 3
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
ENDFOR
dst.fp16[0] := (tmp.fp16[0] > tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
AVX512_FP16
AVX512VL
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a".
tmp := a
FOR i := 0 to 3
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
ENDFOR
dst.fp16[0] := (tmp.fp16[0] < tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
AVX512_FP16
AVX512VL
Arithmetic
Finds the absolute value of each packed half-precision (16-bit) floating-point element in "v2", storing the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := ABS(v2.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Finds the absolute value of each packed half-precision (16-bit) floating-point element in "v2", storing the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := ABS(v2.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Arithmetic
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
ENDFOR
k[MAX:8] := 0
AVX512_FP16
AVX512VL
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 7
IF k1[j]
k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512_FP16
AVX512VL
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 15
k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
ENDFOR
k[MAX:16] := 0
AVX512_FP16
AVX512VL
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 15
IF k1[j]
k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512_FP16
AVX512VL
Compare
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 15
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 15
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 96 bits of "dst" are zeroed out.
FOR j := 0 TO 1
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:32] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 TO 3
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". The upper 64 bits of "dst" are zeroed out.
FOR j := 0 to 3
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 to 3
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The upper 64 bits of "dst" are zeroed out.
FOR j := 0 to 3
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:64] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 TO 3
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 TO 7
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 3
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 7
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 TO 3
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 TO 7
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 3
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 7
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 TO 1
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 TO 3
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 1
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 3
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 TO 1
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 TO 3
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 1
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 1
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 3
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 3
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 TO 7
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 TO 15
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 7
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 15
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst".
FOR j := 0 TO 7
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst".
FOR j := 0 TO 15
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 7
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 15
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := src.fp64[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := src.fp64[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := src.fp32[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := src.fp32[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Convert
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 7
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 15
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [max_float_note]
dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]
dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][max_float_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] > b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 7
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 15
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [min_float_note]
dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]
dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [sae_note][min_float_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] < b.fp16[0] ? a.fp16[0] : b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Special Math Functions
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 7
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ENDFOR
dest[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dest[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dest[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 15
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ENDFOR
dest[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dest[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dest[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 7
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 15
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 7
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 7
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 7
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 15
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 15
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 15
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 7
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 15
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 7
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR i := 0 to 7
k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
ENDFOR
k[MAX:8] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR i := 0 to 7
IF k1[i]
k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
ELSE
k[i] := 0
FI
ENDFOR
k[MAX:8] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR i := 0 to 15
k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
ENDFOR
k[MAX:16] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR i := 0 to 15
IF k1[i]
k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
ELSE
k[i] := 0
FI
ENDFOR
k[MAX:16] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Shuffle half-precision (16-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*16
off := idx[i+2:i]
dst.fp16[j] := idx[i+3] ? b.fp16[off] : a.fp16[off]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Shuffle half-precision (16-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*16
off := idx[i+3:i]
dst.fp16[j] := idx[i+4] ? b.fp16[off] : a.fp16[off]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Blend packed half-precision (16-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := b.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Blend packed half-precision (16-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 7
IF k[j]
dst.fp16[j] := b.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Shuffle half-precision (16-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*16
id := idx[i+3:i]
dst.fp16[j] := a.fp16[id]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Shuffle half-precision (16-bit) floating-point elements in "a" using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 7
i := j*16
id := idx[i+2:i]
dst.fp16[j] := a.fp16[id]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Miscellaneous
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 7
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 15
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
FOR i := 0 to 7
dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
FOR i := 0 to 15
dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 7
dst.fp16[i] := (1.0 / a.fp16[i])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 7
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 15
dst.fp16[i] := (1.0 / a.fp16[i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Elementary Math Functions
Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Load
Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Load
Load 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[255:0] := MEM[mem_addr+255:mem_addr]
dst[MAX:256] := 0
AVX512_FP16
AVX512VL
Load
Load 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
dst[MAX:128] := 0
AVX512_FP16
AVX512VL
Load
Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512_FP16
AVX512VL
Store
Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512_FP16
AVX512VL
Store
Store 256-bits (composed of 16 packed half-precision (16-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+255:mem_addr] := a[255:0]
AVX512_FP16
AVX512VL
Store
Store 128-bits (composed of 8 packed half-precision (16-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
AVX512_FP16
AVX512VL
Store
Return vector of type __m256h with undefined elements.
AVX512_FP16
AVX512VL
General Support
Return vector of type __m128h with undefined elements.
AVX512_FP16
AVX512VL
General Support
Return vector of type __m256h with all elements set to zero.
dst[MAX:0] := 0
AVX512_FP16
AVX512VL
Set
Return vector of type __m128h with all elements set to zero.
dst[MAX:0] := 0
AVX512_FP16
AVX512VL
Set
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 TO 31
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
[round_note]
FOR j := 0 TO 31
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Add packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] + b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := a.fp16[0] + b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := a.fp16[0] + b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] + b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] + b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] + b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Add the lower half-precision (16-bit) floating-point elements in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] + b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 31
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
[round_note]
FOR j := 0 to 31
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Divide packed half-precision (16-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := a.fp16[j] / b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := a.fp16[0] / b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] / b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] / b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := a.fp16[0] / b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] / b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Divide the lower half-precision (16-bit) floating-point element in "a" by the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] / b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
[round_note]
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 31
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
[round_note]
FOR j := 0 to 31
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) + c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 31
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 31
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := -(a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := a.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 7 packed elements from "c" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := c.fp16[0]
FI
dst[127:16] := c[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := -(a.fp16[0] * b.fp16[0]) - c.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 31
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 31
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 31
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst".
[round_note]
FOR j := 0 to 31
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := c.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 31
IF k[j]
IF ((j & 1) == 0)
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) + c.fp16[j]
ELSE
dst.fp16[j] := (a.fp16[j] * b.fp16[j]) - c.fp16[j]
FI
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 TO 31
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
[round_note]
FOR j := 0 TO 31
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract packed half-precision (16-bit) floating-point elements in "b" from packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := a.fp16[j] - b.fp16[j]
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := a.fp16[0] - b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := a.fp16[0] - b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] - b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] - b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] - b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Subtract the lower half-precision (16-bit) floating-point element in "b" from the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] - b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR i := 0 TO 31
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst".
[round_note]
FOR i := 0 TO 31
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed half-precision (16-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := a.fp16[i] * b.fp16[i]
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := a.fp16[0] * b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := a.fp16[0] * b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] * b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] * b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := a.fp16[0] * b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower half-precision (16-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := a.fp16[0] * b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := src.fp16[2*i+0]
dst.fp16[2*i+1] := src.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1])
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1])
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "src" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := src.fp16[0]
dst.fp16[1] := src.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1])
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1])
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "src", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "src", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" and "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) - (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) + (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := a.fp16[0]
dst.fp16[1] := a.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := c.fp16[0]
dst.fp16[1] := c.fp16[1]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := a.fp16[0]
dst.fp16[1] := a.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := c.fp16[0]
dst.fp16[1] := c.fp16[1]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex numbers in "a" and "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) - (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) + (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := a.fp16[2*i+0]
dst.fp16[2*i+1] := a.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := c.fp16[2*i+0]
dst.fp16[2*i+1] := c.fp16[2*i+1]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply packed complex numbers in "a" by the complex conjugates of packed complex numbers in "b", accumulate to the corresponding complex numbers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
FOR i := 0 to 15
IF k[i]
dst.fp16[2*i+0] := (a.fp16[2*i+0] * b.fp16[2*i+0]) + (a.fp16[2*i+1] * b.fp16[2*i+1]) + c.fp16[2*i+0]
dst.fp16[2*i+1] := (a.fp16[2*i+1] * b.fp16[2*i+0]) - (a.fp16[2*i+0] * b.fp16[2*i+1]) + c.fp16[2*i+1]
ELSE
dst.fp16[2*i+0] := 0
dst.fp16[2*i+1] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := a.fp16[0]
dst.fp16[1] := a.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := c.fp16[0]
dst.fp16[1] := c.fp16[1]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst", and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "a" when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := a.fp16[0]
dst.fp16[1] := a.fp16[1]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using writemask "k" (elements are copied from "c" when mask bit 0 is not set), and copy the upper 6 packed elements from "c" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := c.fp16[0]
dst.fp16[1] := c.fp16[1]
FI
dst[127:32] := c[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Multiply the lower complex number in "a" by the complex conjugate of the lower complex number in "b", accumulate to the lower complex number in "c", and store the result in the lower elements of "dst" using zeromask "k" (elements are zeroed out when mask bit 0 is not set), and copy the upper 6 packed elements from "a" to the upper elements of "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
[round_note]
IF k[0]
dst.fp16[0] := (a.fp16[0] * b.fp16[0]) + (a.fp16[1] * b.fp16[1]) + c.fp16[0]
dst.fp16[1] := (a.fp16[1] * b.fp16[0]) - (a.fp16[0] * b.fp16[1]) + c.fp16[1]
ELSE
dst.fp16[0] := 0
dst.fp16[1] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a".
tmp := a
FOR i := 0 to 15
tmp.fp16[i] := tmp.fp16[i] + a.fp16[i+16]
ENDFOR
FOR i := 0 to 7
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+8]
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+4]
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := tmp.fp16[i] + tmp.fp16[i+2]
ENDFOR
dst.fp16[0] := tmp.fp16[0] + tmp.fp16[1]
AVX512_FP16
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a".
tmp := a
FOR i := 0 to 15
tmp.fp16[i] := tmp.fp16[i] * a.fp16[i+16]
ENDFOR
FOR i := 0 to 7
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+8]
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+4]
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := tmp.fp16[i] * tmp.fp16[i+2]
ENDFOR
dst.fp16[0] := tmp.fp16[0] * tmp.fp16[1]
AVX512_FP16
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a". [max_float_note]
tmp := a
FOR i := 0 to 15
tmp.fp16[i] := (a.fp16[i] > a.fp16[i+16] ? a.fp16[i] : a.fp16[i+16])
ENDFOR
FOR i := 0 to 7
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := (tmp.fp16[i] > tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
ENDFOR
dst.fp16[0] := (tmp.fp16[0] > tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
AVX512_FP16
Arithmetic
Reduce the packed half-precision (16-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". [min_float_note]
tmp := a
FOR i := 0 to 15
tmp.fp16[i] := (a.fp16[i] < a.fp16[i+16] ? tmp.fp16[i] : a.fp16[i+16])
ENDFOR
FOR i := 0 to 7
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+8] ? tmp.fp16[i] : tmp.fp16[i+8])
ENDFOR
FOR i := 0 to 3
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+4] ? tmp.fp16[i] : tmp.fp16[i+4])
ENDFOR
FOR i := 0 to 1
tmp.fp16[i] := (tmp.fp16[i] < tmp.fp16[i+2] ? tmp.fp16[i] : tmp.fp16[i+2])
ENDFOR
dst.fp16[0] := (tmp.fp16[0] < tmp.fp16[1] ? tmp.fp16[0] : tmp.fp16[1])
AVX512_FP16
Arithmetic
Finds the absolute value of each packed half-precision (16-bit) floating-point element in "v2", storing the results in "dst".
FOR j := 0 to 31
dst.fp16[j] := ABS(v2.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst". Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Compute the complex conjugates of complex numbers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each complex number is composed of two adjacent half-precision (16-bit) floating-point elements, which defines the complex number "complex = vec.fp16[0] + i * vec.fp16[1]", or the complex conjugate "conjugate = vec.fp16[0] - i * vec.fp16[1]".
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := a[i+31:i] XOR FP32(-0.0)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Arithmetic
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 31
k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 31
IF k1[j]
k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 31
k[j] := (a.fp16[j] OP b.fp16[j]) ? 1 : 0
ENDFOR
k[MAX:32] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
CASE (imm8[3:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
FOR j := 0 to 31
IF k1[j]
k[j] := ( a.fp16[j] OP b.fp16[j] ) ? 1 : 0
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k".
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
k[0] := (a.fp16[0] OP b.fp16[0]) ? 1 : 0
k[MAX:1] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
k[0] := (a.fp16[0] OP b.fp16[0]) ? 1 : 0
k[MAX:1] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
IF k1[0]
k[0] := ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512_FP16
Compare
Compare packed half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
IF k1[0]
k[0] := ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1).
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
RETURN ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note]
CASE (imm8[4:0]) OF
0: OP := _CMP_EQ_OQ
1: OP := _CMP_LT_OS
2: OP := _CMP_LE_OS
3: OP := _CMP_UNORD_Q
4: OP := _CMP_NEQ_UQ
5: OP := _CMP_NLT_US
6: OP := _CMP_NLE_US
7: OP := _CMP_ORD_Q
8: OP := _CMP_EQ_UQ
9: OP := _CMP_NGE_US
10: OP := _CMP_NGT_US
11: OP := _CMP_FALSE_OQ
12: OP := _CMP_NEQ_OQ
13: OP := _CMP_GE_OS
14: OP := _CMP_GT_OS
15: OP := _CMP_TRUE_UQ
16: OP := _CMP_EQ_OS
17: OP := _CMP_LT_OQ
18: OP := _CMP_LE_OQ
19: OP := _CMP_UNORD_S
20: OP := _CMP_NEQ_US
21: OP := _CMP_NLT_UQ
22: OP := _CMP_NLE_UQ
23: OP := _CMP_ORD_S
24: OP := _CMP_EQ_US
25: OP := _CMP_NGE_UQ
26: OP := _CMP_NGT_UQ
27: OP := _CMP_FALSE_OS
28: OP := _CMP_NEQ_OS
29: OP := _CMP_GE_OQ
30: OP := _CMP_GT_OQ
31: OP := _CMP_TRUE_US
ESAC
RETURN ( a.fp16[0] OP b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for equality, and return the boolean result (0 or 1).
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] == b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than, and return the boolean result (0 or 1).
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] < b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] <= b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than, and return the boolean result (0 or 1).
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] > b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] >= b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for not-equal, and return the boolean result (0 or 1).
RETURN ( a.fp16[0] ==NaN OR b.fp16[0] ==NaN OR a.fp16[0] != b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] == b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] < b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] <= b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] > b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a.fp16[0] !=NaN AND b.fp16[0] !=NaN AND a.fp16[0] >= b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Compare the lower half-precision (16-bit) floating-point elements in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a.fp16[0] ==NaN OR b.fp16[0] ==NaN OR a.fp16[0] != b.fp16[0] ) ? 1 : 0
AVX512_FP16
Compare
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 31
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 31
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed signed 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 31
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 31
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed unsigned 16-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.fp16[j] := Convert_Int16_To_FP16(a.word[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 15
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 15
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed signed 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 15
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 15
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed unsigned 32-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.fp16[j] := Convert_Int32_To_FP16(a.dword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed signed 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 7
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed unsigned 64-bit integers in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_Int64_To_FP16(a.qword[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 TO 7
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 TO 7
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.fp16[j] := Convert_FP64_To_FP16(a.fp64[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper element of "dst".
dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper element of "dst".
[round_note]
dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".
IF k[0]
dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".
IF k[0]
dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper element of "dst".
[round_note]
IF k[0]
dst.fp16[0] := Convert_FP64_To_FP16(b.fp64[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 15
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_note]
FOR j := 0 to 15
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 to 15
IF k[j]
dst.fp16[j] := Convert_FP32_To_FP16(a.fp32[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a half-precision (16-bit) floating-point elements, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := Convert_FP32_To_FP16(b.fp32[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_Int32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 TO 15
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 15
IF k[j]
dst.dword[j] := Convert_FP16_To_UInt32_Truncate(a.fp16[j])
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_Int64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 TO 7
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := src.qword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 7
IF k[j]
dst.qword[j] := Convert_FP16_To_UInt64_Truncate(a.fp16[j])
ELSE
dst.qword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst".
[round_note]
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_Int16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst".
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst". [sae_note]
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst".
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst". [sae_note]
FOR j := 0 TO 31
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := src.word[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed unsigned 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 TO 31
IF k[j]
dst.word[j] := Convert_FP16_To_UInt16_Truncate(a.fp16[j])
ELSE
dst.word[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [sae_note]
FOR j := 0 to 7
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := src.fp64[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := src.fp64[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 7
IF k[j]
dst.fp64[j] := Convert_FP16_To_FP64(a.fp16[j])
ELSE
dst.fp64[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 15
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [sae_note]
FOR j := 0 to 15
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := src.fp32[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := src.fp32[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note]
FOR j := 0 to 15
IF k[j]
dst.fp32[j] := Convert_FP16_To_FP32(a.fp16[j])
ELSE
dst.fp32[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [sae_note]
dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
ELSE
dst.fp64[0] := src.fp64[0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note]
IF k[0]
dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
ELSE
dst.fp64[0] := src.fp64[0]
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst".
IF k[0]
dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
ELSE
dst.fp64[0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note]
IF k[0]
dst.fp64[0] := Convert_FP16_To_FP64(b.fp16[0])
ELSE
dst.fp64[0] := 0
FI
dst[127:64] := a[127:64]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note]
dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
ELSE
dst.fp32[0] := src.fp32[0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note]
IF k[0]
dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
ELSE
dst.fp32[0] := src.fp32[0]
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
ELSE
dst.fp32[0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note]
IF k[0]
dst.fp32[0] := Convert_FP16_To_FP32(b.fp16[0])
ELSE
dst.fp32[0] := 0
FI
dst[127:32] := a[127:32]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
dst.dword := Convert_FP16_To_Int32(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
[round_note]
dst.dword := Convert_FP16_To_Int32(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
dst.qword := Convert_FP16_To_Int64(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
[round_note]
dst.qword := Convert_FP16_To_Int64(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
dst.dword := Convert_FP16_To_Int32_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". [sae_note]
dst.dword := Convert_FP16_To_Int32_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
dst.qword := Convert_FP16_To_Int64_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". [sae_note]
dst.qword := Convert_FP16_To_Int64_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst".
dst.dword := Convert_FP16_To_UInt32(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst". [sae_note]
dst.dword := Convert_FP16_To_UInt32(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst".
dst.qword := Convert_FP16_To_UInt64(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst". [round_note]
dst.qword := Convert_FP16_To_UInt64(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst".
dst.dword := Convert_FP16_To_UInt32_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst". [sae_note]
dst.dword := Convert_FP16_To_UInt32_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst".
dst.qword := Convert_FP16_To_UInt64_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the lower half-precision (16-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst". [sae_note]
dst.qword := Convert_FP16_To_UInt64_Truncate(a.fp16[0])
AVX512_FP16
Convert
Convert the signed 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the signed 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the unsigned 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the unsigned 32-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := Convert_Int32_To_FP16(b.fp32[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the signed 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the signed 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the unsigned 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Convert the unsigned 64-bit integer "b" to a half-precision (16-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := Convert_Int64_To_FP16(b.fp64[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Convert
Copy 16-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst".
dst.fp16[0] := a.fp16[0]
dst[MAX:16] := 0
AVX512_FP16
Convert
Copy the lower 16-bit integer in "a" to "dst".
dst.fp16[0] := a.fp16[0]
dst[MAX:16] := 0
AVX512_FP16
Convert
Copy the lower half-precision (16-bit) floating-point element of "a" to "dst".
dst[15:0] := a.fp16[0]
AVX512_FP16
Convert
Copy the lower half-precision (16-bit) floating-point element of "a" to "dst".
dst[15:0] := a.fp16[0]
AVX512_FP16
Convert
Copy the lower half-precision (16-bit) floating-point element of "a" to "dst".
dst[15:0] := a.fp16[0]
AVX512_FP16
Convert
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [max_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [sae_note][max_float_note]
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][max_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][max_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] > b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [min_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [sae_note] [min_float_note]
FOR j := 0 to 31
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note][min_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := src.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Compare packed half-precision (16-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note][min_float_note]
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := (a.fp16[j] < b.fp16[j] ? a.fp16[j] : b.fp16[j])
ELSE
dst.fp16[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Special Math Functions
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Special Math Functions
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Special Math Functions
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
IF k[0]
dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Special Math Functions
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
IF k[0]
dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Special Math Functions
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
IF k[0]
dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Special Math Functions
Extract the reduced argument of the lower half-precision (16-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
IF k[0]
dst.fp16[0] := ReduceArgumentFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Special Math Functions
Load a half-precision (16-bit) floating-point element from memory into the lower element of "dst", and zero the upper elements.
dst.fp16[0] := MEM[mem_addr].fp16[0]
dst[MAX:16] := 0
AVX512_FP16
Load
Load a half-precision (16-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper elements of "dst" to zero.
IF k[0]
dst.fp16[0] := MEM[mem_addr].fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[MAX:16] := 0
AVX512_FP16
Load
Load a half-precision (16-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper elements of "dst" to zero.
IF k[0]
dst.fp16[0] := MEM[mem_addr].fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[MAX:16] := 0
AVX512_FP16
Load
Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512_FP16
Load
Load 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[511:0] := MEM[mem_addr+511:mem_addr]
dst[MAX:512] := 0
AVX512_FP16
Load
Store the lower half-precision (16-bit) floating-point element from "a" into memory.
MEM[mem_addr].fp16[0] := a.fp16[0]
AVX512_FP16
Store
Store the lower half-precision (16-bit) floating-point element from "a" into memory using writemask "k".
IF k[0]
MEM[mem_addr].fp16[0] := a.fp16[0]
FI
AVX512_FP16
Store
Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512_FP16
Store
Store 512-bits (composed of 32 packed half-precision (16-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+511:mem_addr] := a[511:0]
AVX512_FP16
Store
Move the lower half-precision (16-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := b.fp16[0]
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Move
Move the lower half-precision (16-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := b.fp16[0]
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Move
Move the lower half-precision (16-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := b.fp16[0]
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Move
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 31
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ENDFOR
dest[MAX:512] := 0
AVX512_FP16
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 31
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ENDFOR
dest[MAX:512] := 0
AVX512_FP16
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dest[MAX:512] := 0
AVX512_FP16
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dest[MAX:512] := 0
AVX512_FP16
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dest[MAX:512] := 0
AVX512_FP16
Miscellaneous
Round packed half-precision (16-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := RoundScaleFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dest[MAX:512] := 0
AVX512_FP16
Miscellaneous
Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
dst[127:16] := a[127:16]
dest[MAX:128] := 0
AVX512_FP16
Miscellaneous
Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
dst[127:16] := a[127:16]
dest[MAX:128] := 0
AVX512_FP16
Miscellaneous
Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
IF k[0]
dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dest[MAX:128] := 0
AVX512_FP16
Miscellaneous
Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
IF k[0]
dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dest[MAX:128] := 0
AVX512_FP16
Miscellaneous
Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
IF k[0]
dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dest[MAX:128] := 0
AVX512_FP16
Miscellaneous
Round the lower half-precision (16-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note]
DEFINE RoundScaleFP16(src.fp16, imm8[7:0]) {
m.fp16 := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp.fp16 := POW(FP16(2.0), -m) * ROUND(POW(FP16(2.0), m) * src.fp16, imm8[3:0])
RETURN tmp.fp16
}
IF k[0]
dst.fp16[0] := RoundScaleFP16(b.fp16[0], imm8)
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dest[MAX:128] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 31
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note]
FOR i := 0 to 31
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note]
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of each packed half-precision (16-bit) floating-point element in "a" to a half-precision (16-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note]
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ConvertExpFP16(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
dst.fp16[0] := ConvertExpFP16(b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note]
dst.fp16[0] := ConvertExpFP16(b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
IF k[0]
dst.fp16[0] := ConvertExpFP16(b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note]
IF k[0]
dst.fp16[0] := ConvertExpFP16(b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element.
IF k[0]
dst.fp16[0] := ConvertExpFP16(b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Convert the exponent of the lower half-precision (16-bit) floating-point element in "b" to a half-precision (16-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note]
IF k[0]
dst.fp16[0] := ConvertExpFP16(b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 31
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note][sae_note]
FOR i := 0 TO 31
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note][sae_note]
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note][sae_note]
FOR i := 0 TO 31
IF k[i]
dst.fp16[i] := GetNormalizedMantissaFP16(a.fp16[i], norm, sign)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note][sae_note]
dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
IF k[0]
dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note][sae_note]
IF k[0]
dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note]
IF k[0]
dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Normalize the mantissas of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "norm" and the sign depends on "sign" and the source sign.
[getmant_note][sae_note]
IF k[0]
dst.fp16[0] := GetNormalizedMantissaFP16(b.fp16[0], norm, sign)
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 31
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 31
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Extract the reduced argument of packed half-precision (16-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note]
DEFINE ReduceArgumentFP16(src[15:0], imm8[7:0]) {
m[15:0] := FP16(imm8[7:4]) // number of fraction bits after the binary point to be preserved
tmp[15:0] := POW(2.0, FP16(-m)) * ROUND(POW(2.0, FP16(m)) * src[15:0], imm8[3:0])
tmp[15:0] := src[15:0] - tmp[15:0]
IF IsInf(tmp[15:0])
tmp[15:0] := FP16(0.0)
FI
RETURN tmp[15:0]
}
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := ReduceArgumentFP16(a.fp16[i], imm8)
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst".
[round_note]
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Miscellaneous
Scale the packed half-precision (16-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
FOR i := 0 to 15
IF k[i]
dst.fp16[i] := ScaleFP16(a.fp16[i], b.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
IF k[0]
dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
IF k[0]
dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
IF k[0]
dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
DEFINE ScaleFP16(src1, src2) {
denormal1 := (a.exp == 0) and (a.fraction != 0)
denormal2 := (b.exp == 0) and (b.fraction != 0)
tmp1 := src1
tmp2 := src2
IF MXCSR.DAZ
IF denormal1
tmp1 := 0
FI
IF denormal2
tmp2 := 0
FI
FI
RETURN tmp1 * POW(2.0, FLOOR(tmp2))
}
IF k[0]
dst.fp16[0] := ScaleFP16(a.fp16[0], b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Miscellaneous
Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k".
[fpclass_note]
FOR i := 0 to 31
k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
ENDFOR
k[MAX:32] := 0
AVX512_FP16
Miscellaneous
Test packed half-precision (16-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set).
[fpclass_note]
FOR i := 0 to 31
IF k1[i]
k[i] := CheckFPClass_FP16(a.fp16[i], imm8[7:0])
ELSE
k[i] := 0
FI
ENDFOR
k[MAX:32] := 0
AVX512_FP16
Miscellaneous
Test the lower half-precision (16-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k".
[fpclass_note]
k[0] := CheckFPClass_FP16(a.fp16[0], imm8[7:0])
k[MAX:1] := 0
AVX512_FP16
Miscellaneous
Test the lower half-precision (16-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set).
[fpclass_note]
IF k1[0]
k[0] := CheckFPClass_FP16(a.fp16[0], imm8[7:0])
ELSE
k[0] := 0
FI
k[MAX:1] := 0
AVX512_FP16
Miscellaneous
Shuffle half-precision (16-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 31
i := j*16
off := idx[i+4:i]
dst.fp16[j] := idx[i+5] ? b.fp16[off] : a.fp16[off]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Blend packed half-precision (16-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst".
FOR j := 0 to 31
IF k[j]
dst.fp16[j] := b.fp16[j]
ELSE
dst.fp16[j] := a.fp16[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Shuffle half-precision (16-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 31
i := j*16
id := idx[i+4:i]
dst.fp16[j] := a.fp16[id]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Miscellaneous
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 31
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := (1.0 / SQRT(a.fp16[i]))
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
dst.fp16[0] := (1.0 / SQRT(b.fp16[0]))
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
IF k[0]
dst.fp16[0] := (1.0 / SQRT(b.fp16[0]))
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
IF k[0]
dst.fp16[0] := (1.0 / SQRT(b.fp16[0]))
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
FOR i := 0 to 31
dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst".
[round_note]
FOR i := 0 to 31
dst.fp16[i] := SQRT(a.fp16[i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
[round_note]
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
[round_note]
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := SQRT(a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
dst.fp16[0] := SQRT(b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
dst.fp16[0] := SQRT(b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := SQRT(b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := SQRT(b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
IF k[0]
dst.fp16[0] := SQRT(b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the square root of the lower half-precision (16-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst".
[round_note]
IF k[0]
dst.fp16[0] := SQRT(b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 31
dst.fp16[i] := (1.0 / a.fp16[i])
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := src.fp16[i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal of packed half-precision (16-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 1.5*2^-12.
FOR i := 0 to 31
IF k[i]
dst.fp16[i] := (1.0 / a.fp16[i])
ELSE
dst.fp16[i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
dst.fp16[0] := (1.0 / b.fp16[0])
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
IF k[0]
dst.fp16[0] := (1.0 / b.fp16[0])
ELSE
dst.fp16[0] := src.fp16[0]
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Compute the approximate reciprocal of the lower half-precision (16-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 7 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
IF k[0]
dst.fp16[0] := (1.0 / b.fp16[0])
ELSE
dst.fp16[0] := 0
FI
dst[127:16] := a[127:16]
dst[MAX:128] := 0
AVX512_FP16
Elementary Math Functions
Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values.
dst.fp16[0] := e0
dst.fp16[1] := e1
dst.fp16[2] := e2
dst.fp16[3] := e3
dst.fp16[4] := e4
dst.fp16[5] := e5
dst.fp16[6] := e6
dst.fp16[7] := e7
AVX512_FP16
Set
Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values.
dst.fp16[0] := e0
dst.fp16[1] := e1
dst.fp16[2] := e2
dst.fp16[3] := e3
dst.fp16[4] := e4
dst.fp16[5] := e5
dst.fp16[6] := e6
dst.fp16[7] := e7
dst.fp16[8] := e8
dst.fp16[9] := e9
dst.fp16[10] := e10
dst.fp16[11] := e11
dst.fp16[12] := e12
dst.fp16[13] := e13
dst.fp16[14] := e14
dst.fp16[15] := e15
AVX512_FP16
Set
Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values.
dst.fp16[0] := e0
dst.fp16[1] := e1
dst.fp16[2] := e2
dst.fp16[3] := e3
dst.fp16[4] := e4
dst.fp16[5] := e5
dst.fp16[6] := e6
dst.fp16[7] := e7
dst.fp16[8] := e8
dst.fp16[9] := e9
dst.fp16[10] := e10
dst.fp16[11] := e11
dst.fp16[12] := e12
dst.fp16[13] := e13
dst.fp16[14] := e14
dst.fp16[15] := e15
dst.fp16[16] := e16
dst.fp16[17] := e17
dst.fp16[18] := e18
dst.fp16[19] := e19
dst.fp16[20] := e20
dst.fp16[21] := e21
dst.fp16[22] := e22
dst.fp16[23] := e23
dst.fp16[24] := e24
dst.fp16[25] := e25
dst.fp16[26] := e26
dst.fp16[27] := e27
dst.fp16[28] := e28
dst.fp16[29] := e29
dst.fp16[30] := e30
dst.fp16[31] := e31
AVX512_FP16
Set
Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst.fp16[0] := e7
dst.fp16[1] := e6
dst.fp16[2] := e5
dst.fp16[3] := e4
dst.fp16[4] := e3
dst.fp16[5] := e2
dst.fp16[6] := e1
dst.fp16[7] := e0
AVX512_FP16
Set
Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst.fp16[0] := e15
dst.fp16[1] := e14
dst.fp16[2] := e13
dst.fp16[3] := e12
dst.fp16[4] := e11
dst.fp16[5] := e10
dst.fp16[6] := e9
dst.fp16[7] := e8
dst.fp16[8] := e7
dst.fp16[9] := e6
dst.fp16[10] := e5
dst.fp16[11] := e4
dst.fp16[12] := e3
dst.fp16[13] := e2
dst.fp16[14] := e1
dst.fp16[15] := e0
AVX512_FP16
Set
Set packed half-precision (16-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst.fp16[0] := e31
dst.fp16[1] := e30
dst.fp16[2] := e29
dst.fp16[3] := e28
dst.fp16[4] := e27
dst.fp16[5] := e26
dst.fp16[6] := e25
dst.fp16[7] := e24
dst.fp16[8] := e23
dst.fp16[9] := e22
dst.fp16[10] := e21
dst.fp16[11] := e20
dst.fp16[12] := e19
dst.fp16[13] := e18
dst.fp16[14] := e17
dst.fp16[15] := e16
dst.fp16[16] := e15
dst.fp16[17] := e14
dst.fp16[18] := e13
dst.fp16[19] := e12
dst.fp16[20] := e11
dst.fp16[21] := e10
dst.fp16[22] := e9
dst.fp16[23] := e8
dst.fp16[24] := e7
dst.fp16[25] := e6
dst.fp16[26] := e5
dst.fp16[27] := e4
dst.fp16[28] := e3
dst.fp16[29] := e2
dst.fp16[30] := e1
dst.fp16[31] := e0
AVX512_FP16
Set
Broadcast half-precision (16-bit) floating-point value "a" to all elements of "dst".
FOR i := 0 to 7
dst.fp16[i] := a[15:0]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Set
Broadcast half-precision (16-bit) floating-point value "a" to all elements of "dst".
FOR i := 0 to 15
dst.fp16[i] := a[15:0]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Set
Broadcast half-precision (16-bit) floating-point value "a" to all elements of "dst".
FOR i := 0 to 31
dst.fp16[i] := a[15:0]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Set
Broadcast half-precision (16-bit) complex floating-point value "a" to all elements of "dst".
FOR i := 0 to 3
dst.fp16[2*i+0] := a[15:0]
dst.fp16[2*i+1] := a[31:16]
ENDFOR
dst[MAX:128] := 0
AVX512_FP16
Set
Broadcast half-precision (16-bit) complex floating-point value "a" to all elements of "dst".
FOR i := 0 to 7
dst.fp16[2*i+0] := a[15:0]
dst.fp16[2*i+1] := a[31:16]
ENDFOR
dst[MAX:256] := 0
AVX512_FP16
Set
Broadcast half-precision (16-bit) complex floating-point value "a" to all elements of "dst".
FOR i := 0 to 15
dst.fp16[2*i+0] := a[15:0]
dst.fp16[2*i+1] := a[31:16]
ENDFOR
dst[MAX:512] := 0
AVX512_FP16
Set
Copy half-precision (16-bit) floating-point element "a" to the lower element of "dst", and zero the upper 7 elements.
dst.fp16[0] := a[15:0]
dst[127:16] := 0
AVX512_FP16
Set
Return vector of type __m512h with all elements set to zero.
dst[MAX:0] := 0
AVX512_FP16
Set
Cast vector of type "__m128h" to type "__m128". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256h" to type "__m256". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512h" to type "__m512". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128h" to type "__m128d". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256h" to type "__m256d". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512h" to type "__m512d". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128h" to type "__m128i". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256h" to type "__m256i". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512h" to type "__m512i". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128d" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256d" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512d" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128i" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256i" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512i" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256h" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512h" to type "__m128h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m512h" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128h" to type "__m256h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128h" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256h" to type "__m512h". This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128h" to type "__m256h"; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m128h" to type "__m512h"; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Cast vector of type "__m256h" to type "__m512h"; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
AVX512_FP16
Cast
Return vector of type __m512h with undefined elements.
AVX512_FP16
General Support
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst".
FOR i := 0 to 3
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ENDFOR
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 to 3
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
IF k[i*8+j]
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ELSE
dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8]
FI
ENDFOR
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 3
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
IF k[i*8+j]
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ELSE
dst[q+j*8+7:q+j*8] := 0
FI
ENDFOR
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst".
FOR i := 0 to 1
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ENDFOR
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 to 1
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
IF k[i*8+j]
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ELSE
dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8]
FI
ENDFOR
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 1
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
IF k[i*8+j]
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ELSE
dst[q+j*8+7:q+j*8] := 0
FI
ENDFOR
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Bit Manipulation
Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 31
i := j*8
id := idx[i+4:i]*8
dst[i+7:i] := a[id+7:id]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
id := idx[i+4:i]*8
IF k[j]
dst[i+7:i] := a[id+7:id]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
id := idx[i+4:i]*8
IF k[j]
dst[i+7:i] := a[id+7:id]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*8
id := idx[i+3:i]*8
dst[i+7:i] := a[id+7:id]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
id := idx[i+3:i]*8
IF k[j]
dst[i+7:i] := a[id+7:id]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
id := idx[i+3:i]*8
IF k[j]
dst[i+7:i] := a[id+7:id]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 31
i := j*8
off := 8*idx[i+4:i]
dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
off := 8*idx[i+4:i]
dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
off := 8*idx[i+4:i]
dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := idx[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*8
IF k[j]
off := 8*idx[i+4:i]
dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 15
i := j*8
off := 8*idx[i+3:i]
dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
off := 8*idx[i+3:i]
dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
off := 8*idx[i+3:i]
dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := idx[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*8
IF k[j]
off := 8*idx[i+3:i]
dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI
AVX512VL
Swizzle
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst".
FOR i := 0 to 7
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ENDFOR
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR i := 0 to 7
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
IF k[i*8+j]
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ELSE
dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8]
FI
ENDFOR
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Bit Manipulation
For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR i := 0 to 7
q := i * 64
FOR j := 0 to 7
tmp8 := 0
ctrl := a[q+j*8+7:q+j*8] & 63
FOR l := 0 to 7
tmp8[l] := b[q+((ctrl+l) & 63)]
ENDFOR
IF k[i*8+j]
dst[q+j*8+7:q+j*8] := tmp8[7:0]
ELSE
dst[q+j*8+7:q+j*8] := 0
FI
ENDFOR
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Bit Manipulation
Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst".
FOR j := 0 to 63
i := j*8
id := idx[i+5:i]*8
dst[i+7:i] := a[id+7:id]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
id := idx[i+5:i]*8
IF k[j]
dst[i+7:i] := a[id+7:id]
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
id := idx[i+5:i]*8
IF k[j]
dst[i+7:i] := a[id+7:id]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst".
FOR j := 0 to 63
i := j*8
off := 8*idx[i+5:i]
dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
off := 8*idx[i+5:i]
dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
off := 8*idx[i+5:i]
dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := idx[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 63
i := j*8
IF k[j]
off := 8*idx[i+5:i]
dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off]
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI
Swizzle
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst".
FOR j := 0 to 15
i := j*16
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst".
FOR j := 0 to 7
i := j*32
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst".
FOR j := 0 to 3
i := j*32
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst".
FOR j := 0 to 15
i := j*16
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst".
FOR j := 0 to 7
i := j*16
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst").
FOR j := 0 to 3
i := j*64
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 1
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst").
FOR j := 0 to 1
i := j*64
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst".
FOR j := 0 to 7
i := j*32
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst".
FOR j := 0 to 3
i := j*32
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst").
FOR j := 0 to 15
i := j*16
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst").
FOR j := 0 to 7
i := j*16
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Shift
Swizzle
Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
m := m + 16
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
m := m + 16
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
m := m + 16
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
m := m + 16
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
m := m + 8
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
m := m + 8
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
m := m + 8
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Load
Swizzle
Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
m := m + 8
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Load
Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[m+15:m]
m := m + 16
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*16
IF k[j]
dst[i+15:i] := a[m+15:m]
m := m + 16
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[m+15:m]
m := m + 16
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 7
i := j*16
IF k[j]
dst[i+15:i] := a[m+15:m]
m := m + 16
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[m+7:m]
m := m + 8
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*8
IF k[j]
dst[i+7:i] := a[m+7:m]
m := m + 8
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[m+7:m]
m := m + 8
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 15
i := j*8
IF k[j]
dst[i+7:i] := a[m+7:m]
m := m + 8
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 16
m := 0
FOR j := 0 to 15
i := j*16
IF k[j]
dst[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
dst[255:m] := 0
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 16
m := 0
FOR j := 0 to 15
i := j*16
IF k[j]
dst[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
dst[255:m] := src[255:m]
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 16
m := 0
FOR j := 0 to 7
i := j*16
IF k[j]
dst[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
dst[127:m] := 0
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 16
m := 0
FOR j := 0 to 7
i := j*16
IF k[j]
dst[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
dst[127:m] := src[127:m]
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 8
m := 0
FOR j := 0 to 31
i := j*8
IF k[j]
dst[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
dst[255:m] := 0
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 8
m := 0
FOR j := 0 to 31
i := j*8
IF k[j]
dst[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
dst[255:m] := src[255:m]
dst[MAX:256] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 8
m := 0
FOR j := 0 to 15
i := j*8
IF k[j]
dst[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
dst[127:m] := 0
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 8
m := 0
FOR j := 0 to 15
i := j*8
IF k[j]
dst[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
dst[127:m] := src[127:m]
dst[MAX:128] := 0
AVX512_VBMI2
AVX512VL
Swizzle
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 16
m := base_addr
FOR j := 0 to 15
i := j*16
IF k[j]
MEM[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
AVX512_VBMI2
AVX512VL
Store
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 16
m := base_addr
FOR j := 0 to 7
i := j*16
IF k[j]
MEM[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
AVX512_VBMI2
AVX512VL
Store
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 8
m := base_addr
FOR j := 0 to 31
i := j*8
IF k[j]
MEM[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
AVX512_VBMI2
AVX512VL
Store
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 8
m := base_addr
FOR j := 0 to 15
i := j*8
IF k[j]
MEM[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
AVX512_VBMI2
AVX512VL
Store
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63)
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15)
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst".
FOR j := 0 to 7
i := j*64
dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst".
FOR j := 0 to 15
i := j*32
dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst".
FOR j := 0 to 31
i := j*16
dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst".
FOR j := 0 to 7
i := j*64
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63)
dst[i+63:i] := tmp[127:64]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst".
FOR j := 0 to 15
i := j*32
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31)
dst[i+31:i] := tmp[63:32]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst".
FOR j := 0 to 31
i := j*16
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15)
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
i := j*64
IF k[j]
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst").
FOR j := 0 to 7
i := j*64
tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0]
dst[i+63:i] := tmp[127:64]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
i := j*32
IF k[j]
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ELSE
dst[i+31:i] := src[i+31:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst".
FOR j := 0 to 15
i := j*32
tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0]
dst[i+31:i] := tmp[63:32]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 31
i := j*16
IF k[j]
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst").
FOR j := 0 to 31
i := j*16
tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0]
dst[i+15:i] := tmp[31:16]
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Shift
Swizzle
Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
m := m + 16
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Load
Swizzle
Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m]
m := m + 16
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Load
Swizzle
Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
m := m + 8
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Load
Swizzle
Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m]
m := m + 8
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Load
Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[m+15:m]
m := m + 16
ELSE
dst[i+15:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 31
i := j*16
IF k[j]
dst[i+15:i] := a[m+15:m]
m := m + 16
ELSE
dst[i+15:i] := src[i+15:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[m+7:m]
m := m + 8
ELSE
dst[i+7:i] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
m := 0
FOR j := 0 to 63
i := j*8
IF k[j]
dst[i+7:i] := a[m+7:m]
m := m + 8
ELSE
dst[i+7:i] := src[i+7:i]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 16
m := 0
FOR j := 0 to 31
i := j*16
IF k[j]
dst[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
dst[511:m] := 0
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 16
m := 0
FOR j := 0 to 31
i := j*16
IF k[j]
dst[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
dst[511:m] := src[511:m]
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero.
size := 8
m := 0
FOR j := 0 to 63
i := j*8
IF k[j]
dst[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
dst[511:m] := 0
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src".
size := 8
m := 0
FOR j := 0 to 63
i := j*8
IF k[j]
dst[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
dst[511:m] := src[511:m]
dst[MAX:512] := 0
AVX512_VBMI2
Swizzle
Swizzle
Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 16
m := base_addr
FOR j := 0 to 31
i := j*16
IF k[j]
MEM[m+size-1:m] := a[i+15:i]
m := m + size
FI
ENDFOR
AVX512_VBMI2
Store
Swizzle
Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr".
size := 8
m := base_addr
FOR j := 0 to 63
i := j*8
IF k[j]
MEM[m+size-1:m] := a[i+7:i]
m := m + size
FI
ENDFOR
AVX512_VBMI2
Store
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 7
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 3
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:128] := 0
AVX512_VNNI
AVX512VL
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 15
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 15
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 15
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ELSE
dst.dword[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
FOR j := 0 to 15
IF k[j]
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ELSE
dst.dword[j] := src.dword[j]
FI
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 15
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:512] := 0
AVX512_VNNI
Arithmetic
Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.
MEM[k1+15:k1] := 0
MEM[k2+15:k2] := 0
FOR i := 0 TO 15
FOR j := 0 TO 15
match := (a.dword[i] == b.dword[j] ? 1 : 0)
MEM[k1+15:k1].bit[i] |= match
MEM[k2+15:k2].bit[j] |= match
ENDFOR
ENDFOR
AVX512_VP2INTERSECT
AVX512F
Mask
Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.
MEM[k1+7:k1] := 0
MEM[k2+7:k2] := 0
FOR i := 0 TO 7
FOR j := 0 TO 7
match := (a.qword[i] == b.qword[j] ? 1 : 0)
MEM[k1+7:k1].bit[i] |= match
MEM[k2+7:k2].bit[j] |= match
ENDFOR
ENDFOR
AVX512_VP2INTERSECT
AVX512F
Mask
Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.
MEM[k1+7:k1] := 0
MEM[k2+7:k2] := 0
FOR i := 0 TO 3
FOR j := 0 TO 3
match := (a.dword[i] == b.dword[j] ? 1 : 0)
MEM[k1+7:k1].bit[i] |= match
MEM[k2+7:k2].bit[j] |= match
ENDFOR
ENDFOR
AVX512_VP2INTERSECT
AVX512VL
Mask
Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.
MEM[k1+7:k1] := 0
MEM[k2+7:k2] := 0
FOR i := 0 TO 7
FOR j := 0 TO 7
match := (a.dword[i] == b.dword[j] ? 1 : 0)
MEM[k1+7:k1].bit[i] |= match
MEM[k2+7:k2].bit[j] |= match
ENDFOR
ENDFOR
AVX512_VP2INTERSECT
AVX512VL
Mask
Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.
MEM[k1+7:k1] := 0
MEM[k2+7:k2] := 0
FOR i := 0 TO 1
FOR j := 0 TO 1
match := (a.qword[i] == b.qword[j] ? 1 : 0)
MEM[k1+7:k1].bit[i] |= match
MEM[k2+7:k2].bit[j] |= match
ENDFOR
ENDFOR
AVX512_VP2INTERSECT
AVX512VL
Mask
Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers.
MEM[k1+7:k1] := 0
MEM[k2+7:k2] := 0
FOR i := 0 TO 3
FOR j := 0 TO 3
match := (a.qword[i] == b.qword[j] ? 1 : 0)
MEM[k1+7:k1].bit[i] |= match
MEM[k2+7:k2].bit[j] |= match
ENDFOR
ENDFOR
AVX512_VP2INTERSECT
AVX512VL
Mask
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:256] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:256] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:128] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:128] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:256] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 3
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:256] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[103:52])
ENDFOR
dst[MAX:128] := 0
AVX_IFMA
Arithmetic
Multiply packed unsigned 52-bit integers in each 64-bit element of "__Y" and "__Z" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "__X", and store the results in "dst".
FOR j := 0 to 1
i := j*64
tmp[127:0] := ZeroExtend64(__Y[i+51:i]) * ZeroExtend64(__Z[i+51:i])
dst[i+63:i] := __X[i+63:i] + ZeroExtend64(tmp[51:0])
ENDFOR
dst[MAX:128] := 0
AVX_IFMA
Arithmetic
Convert scalar BF16 (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
b := Convert_BF16_To_FP32(MEM[__A+15:__A])
FOR j := 0 to 7
m := j*32
dst[m+31:m] := b
ENDFOR
dst[MAX:256] := 0
AVX_NE_CONVERT
Convert
Convert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
b := Convert_FP16_To_FP32(MEM[__A+15:__A])
FOR j := 0 to 7
m := j*32
dst[m+31:m] := b
ENDFOR
dst[MAX:256] := 0
AVX_NE_CONVERT
Convert
Convert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
m := j*32
dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+15:__A+m])
ENDFOR
dst[MAX:256] := 0
AVX_NE_CONVERT
Convert
Convert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
m := j*32
dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+15:__A+m])
ENDFOR
dst[MAX:256] := 0
AVX_NE_CONVERT
Convert
Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
m := j*32
dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+31:__A+m+16])
ENDFOR
dst[MAX:256] := 0
AVX_NE_CONVERT
Convert
Convert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
m := j*32
dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+31:__A+m+16])
ENDFOR
dst[MAX:256] := 0
AVX_NE_CONVERT
Convert
Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert scalar BF16 (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
b := Convert_BF16_To_FP32(MEM[__A+15:__A])
FOR j := 0 to 3
m := j*32
dst[m+31:m] := b
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert scalar half-precision (16-bit) floating-point element stored at memory locations starting at location "__A" to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
b := Convert_FP16_To_FP32(MEM[__A+15:__A])
FOR j := 0 to 3
m := j*32
dst[m+31:m] := b
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed BF16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
m := j*32
dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+15:__A+m])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed half-precision (16-bit) floating-point even-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
m := j*32
dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+15:__A+m])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
m := j*32
dst[m+31:m] := Convert_BF16_To_FP32(MEM[__A+m+31:__A+m+16])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed half-precision (16-bit) floating-point odd-indexed elements stored at memory locations starting at location "__A" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
m := j*32
dst[m+31:m] := Convert_FP16_To_FP32(MEM[__A+m+31:__A+m+16])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Convert packed single-precision (32-bit) floating-point elements in "__A" to packed BF16 (16-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
dst.word[j] := Convert_FP32_To_BF16(__A.fp32[j])
ENDFOR
dst[MAX:128] := 0
AVX_NE_CONVERT
Convert
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j]))
tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1]))
tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2]))
tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3]))
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := src.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j])
tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1])
dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of signed 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := SignExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := SignExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding signed 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * SignExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * SignExtend32(__B.word[2*j+1])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in "__A" with corresponding unsigned 16-bit integers in "__B", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.dword := ZeroExtend32(__A.word[2*j]) * ZeroExtend32(__B.word[2*j])
tmp2.dword := ZeroExtend32(__A.word[2*j+1]) * ZeroExtend32(__B.word[2*j+1])
dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT16
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with unsigned saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 7
tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:256] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding signed 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := SignExtend16(__A.byte[4*j]) * SignExtend16(__B.byte[4*j])
tmp2.word := SignExtend16(__A.byte[4*j+1]) * SignExtend16(__B.byte[4*j+1])
tmp3.word := SignExtend16(__A.byte[4*j+2]) * SignExtend16(__B.byte[4*j+2])
tmp4.word := SignExtend16(__A.byte[4*j+3]) * SignExtend16(__B.byte[4*j+3])
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of signed 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with signed saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := Signed(SignExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j]))
tmp2.word := Signed(SignExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1]))
tmp3.word := Signed(SignExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2]))
tmp4.word := Signed(SignExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3]))
dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W", and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
dst.dword[j] := __W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT8
Arithmetic
Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "__A" with corresponding unsigned 8-bit integers in "__B", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "__W" with unsigned saturation, and store the packed 32-bit results in "dst".
FOR j := 0 to 3
tmp1.word := ZeroExtend16(__A.byte[4*j]) * ZeroExtend16(__B.byte[4*j])
tmp2.word := ZeroExtend16(__A.byte[4*j+1]) * ZeroExtend16(__B.byte[4*j+1])
tmp3.word := ZeroExtend16(__A.byte[4*j+2]) * ZeroExtend16(__B.byte[4*j+2])
tmp4.word := ZeroExtend16(__A.byte[4*j+3]) * ZeroExtend16(__B.byte[4*j+3])
dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2 + tmp3 + tmp4)
ENDFOR
dst[MAX:128] := 0
AVX_VNNI_INT8
Arithmetic
Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start".
tmp[511:0] := a
dst[31:0] := ZeroExtend32(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
BMI1
Bit Manipulation
Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control".
start := control[7:0]
len := control[15:8]
tmp[511:0] := a
dst[31:0] := ZeroExtend32(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
BMI1
Bit Manipulation
Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start".
tmp[511:0] := a
dst[63:0] := ZeroExtend64(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
BMI1
Bit Manipulation
Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control"..
start := control[7:0]
len := control[15:8]
tmp[511:0] := a
dst[63:0] := ZeroExtend64(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
BMI1
Bit Manipulation
Extract the lowest set bit from unsigned 32-bit integer "a" and set the corresponding bit in "dst". All other bits in "dst" are zeroed, and all bits are zeroed if no bits are set in "a".
dst := (-a) AND a
BMI1
Bit Manipulation
Extract the lowest set bit from unsigned 64-bit integer "a" and set the corresponding bit in "dst". All other bits in "dst" are zeroed, and all bits are zeroed if no bits are set in "a".
dst := (-a) AND a
BMI1
Bit Manipulation
Set all the lower bits of "dst" up to and including the lowest set bit in unsigned 32-bit integer "a".
dst := (a - 1) XOR a
BMI1
Bit Manipulation
Set all the lower bits of "dst" up to and including the lowest set bit in unsigned 64-bit integer "a".
dst := (a - 1) XOR a
BMI1
Bit Manipulation
Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the bit in "dst" that corresponds to the lowest set bit in "a".
dst := (a - 1) AND a
BMI1
Bit Manipulation
Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the bit in "dst" that corresponds to the lowest set bit in "a".
dst := (a - 1) AND a
BMI1
Bit Manipulation
Compute the bitwise NOT of 32-bit integer "a" and then AND with b, and store the results in dst.
dst[31:0] := ((NOT a[31:0]) AND b[31:0])
BMI1
Bit Manipulation
Compute the bitwise NOT of 64-bit integer "a" and then AND with b, and store the results in dst.
dst[63:0] := ((NOT a[63:0]) AND b[63:0])
BMI1
Bit Manipulation
Count the number of trailing zero bits in unsigned 16-bit integer "a", and return that count in "dst".
tmp := 0
dst := 0
DO WHILE ((tmp < 16) AND a[tmp] == 0)
tmp := tmp + 1
dst := dst + 1
OD
BMI1
Bit Manipulation
Count the number of trailing zero bits in unsigned 32-bit integer "a", and return that count in "dst".
tmp := 0
dst := 0
DO WHILE ((tmp < 32) AND a[tmp] == 0)
tmp := tmp + 1
dst := dst + 1
OD
BMI1
Bit Manipulation
Count the number of trailing zero bits in unsigned 64-bit integer "a", and return that count in "dst".
tmp := 0
dst := 0
DO WHILE ((tmp < 64) AND a[tmp] == 0)
tmp := tmp + 1
dst := dst + 1
OD
BMI1
Bit Manipulation
Count the number of trailing zero bits in unsigned 32-bit integer "a", and return that count in "dst".
tmp := 0
dst := 0
DO WHILE ((tmp < 32) AND a[tmp] == 0)
tmp := tmp + 1
dst := dst + 1
OD
BMI1
Bit Manipulation
Count the number of trailing zero bits in unsigned 64-bit integer "a", and return that count in "dst".
tmp := 0
dst := 0
DO WHILE ((tmp < 64) AND a[tmp] == 0)
tmp := tmp + 1
dst := dst + 1
OD
BMI1
Bit Manipulation
Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index".
n := index[7:0]
dst := a
IF (n < 32)
dst[31:n] := 0
FI
BMI2
Bit Manipulation
Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index".
n := index[7:0]
dst := a
IF (n < 64)
dst[63:n] := 0
FI
BMI2
Bit Manipulation
Deposit contiguous low bits from unsigned 32-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero.
tmp := a
dst := 0
m := 0
k := 0
DO WHILE m < 32
IF mask[m] == 1
dst[m] := tmp[k]
k := k + 1
FI
m := m + 1
OD
BMI2
Bit Manipulation
Deposit contiguous low bits from unsigned 64-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero.
tmp := a
dst := 0
m := 0
k := 0
DO WHILE m < 64
IF mask[m] == 1
dst[m] := tmp[k]
k := k + 1
FI
m := m + 1
OD
BMI2
Bit Manipulation
Extract bits from unsigned 32-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero.
tmp := a
dst := 0
m := 0
k := 0
DO WHILE m < 32
IF mask[m] == 1
dst[k] := tmp[m]
k := k + 1
FI
m := m + 1
OD
BMI2
Bit Manipulation
Extract bits from unsigned 64-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero.
tmp := a
dst := 0
m := 0
k := 0
DO WHILE m < 64
IF mask[m] == 1
dst[k] := tmp[m]
k := k + 1
FI
m := m + 1
OD
BMI2
Bit Manipulation
Multiply unsigned 32-bit integers "a" and "b", store the low 32-bits of the result in "dst", and store the high 32-bits in "hi". This does not read or write arithmetic flags.
dst[31:0] := (a * b)[31:0]
MEM[hi+31:hi] := (a * b)[63:32]
BMI2
Arithmetic
Multiply unsigned 64-bit integers "a" and "b", store the low 64-bits of the result in "dst", and store the high 64-bits in "hi". This does not read or write arithmetic flags.
dst[63:0] := (a * b)[63:0]
MEM[hi+63:hi] := (a * b)[127:64]
BMI2
Arithmetic
Increment the shadow stack pointer by 4 times the value specified in bits [7:0] of "a".
SSP := SSP + a[7:0] * 4
CET_SS
Miscellaneous
Increment the shadow stack pointer by 8 times the value specified in bits [7:0] of "a".
SSP := SSP + a[7:0] * 8
CET_SS
Miscellaneous
Read the low 32-bits of the current shadow stack pointer, and store the result in "dst".
dst := SSP[31:0]
CET_SS
Miscellaneous
Read the current shadow stack pointer, and store the result in "dst".
dst := SSP[63:0]
CET_SS
Miscellaneous
Save the previous shadow stack pointer context.
CET_SS
Miscellaneous
Restore the saved shadow stack pointer from the shadow stack restore token previously created on shadow stack by saveprevssp.
CET_SS
Miscellaneous
Write 32-bit value in "val" to a shadow stack page in memory specified by "p".
CET_SS
Miscellaneous
Write 64-bit value in "val" to a shadow stack page in memory specified by "p".
CET_SS
Miscellaneous
Write 32-bit value in "val" to a user shadow stack page in memory specified by "p".
CET_SS
Miscellaneous
Write 64-bit value in "val" to a user shadow stack page in memory specified by "p".
CET_SS
Miscellaneous
Mark shadow stack pointed to by IA32_PL0_SSP as busy.
CET_SS
Miscellaneous
Mark shadow stack pointed to by "p" as not busy.
CET_SS
Miscellaneous
If CET is enabled, read the low 32-bits of the current shadow stack pointer, and store the result in "dst". Otherwise return 0.
dst := SSP[31:0]
CET_SS
Miscellaneous
If CET is enabled, read the current shadow stack pointer, and store the result in "dst". Otherwise return 0.
dst := SSP[63:0]
CET_SS
Miscellaneous
Increment the shadow stack pointer by 4 times the value specified in bits [7:0] of "a".
SSP := SSP + a[7:0] * 4
CET_SS
Miscellaneous
Hint to hardware that the cache line that contains "p" should be demoted from the cache closest to the processor core to a level more distant from the processor core.
CLDEMOTE
Miscellaneous
Invalidate and flush the cache line that contains "p" from all levels of the cache hierarchy.
CLFLUSHOPT
General Support
Write back to memory the cache line that contains "p" from any level of the cache hierarchy in the cache coherence domain.
CLWB
General Support
Compares the value from the memory "__A" with the value of "__B". If the specified condition "__D" is met, then add the third operand "__C" to the "__A" and write it into "__A", else the value of "__A" is unchanged. The return value is the original value of "__A".
CASE (__D[3:0]) OF
0: OP := _CMPCCX_O
1: OP := _CMPCCX_NO
2: OP := _CMPCCX_B
3: OP := _CMPCCX_NB
4: OP := _CMPCCX_Z
5: OP := _CMPCCX_NZ
6: OP := _CMPCCX_BE
7: OP := _CMPCCX_NBE
8: OP := _CMPCCX_S
9: OP := _CMPCCX_NS
10: OP := _CMPCCX_P
11: OP := _CMPCCX_NP
12: OP := _CMPCCX_L
13: OP := _CMPCCX_NL
14: OP := _CMPCCX_LE
15: OP := _CMPCCX_NLE
ESAC
tmp1 := LOAD_LOCK(__A)
tmp2 := tmp1 + __C
IF (tmp1[31:0] OP __B[31:0])
STORE_UNLOCK(__A, tmp2)
ELSE
STORE_UNLOCK(__A, tmp1)
FI
dst[31:0] := tmp1[31:0]
CMPCCXADD
Arithmetic
Compares the value from the memory "__A" with the value of "__B". If the specified condition "__D" is met, then add the third operand "__C" to the "__A" and write it into "__A", else the value of "__A" is unchanged. The return value is the original value of "__A".
CASE (__D[3:0]) OF
0: OP := _CMPCCX_O
1: OP := _CMPCCX_NO
2: OP := _CMPCCX_B
3: OP := _CMPCCX_NB
4: OP := _CMPCCX_Z
5: OP := _CMPCCX_NZ
6: OP := _CMPCCX_BE
7: OP := _CMPCCX_NBE
8: OP := _CMPCCX_S
9: OP := _CMPCCX_NS
10: OP := _CMPCCX_P
11: OP := _CMPCCX_NP
12: OP := _CMPCCX_L
13: OP := _CMPCCX_NL
14: OP := _CMPCCX_LE
15: OP := _CMPCCX_NLE
ESAC
tmp1 := LOAD_LOCK(__A)
tmp2 := tmp1 + __C
IF (tmp1[63:0] OP __B[63:0])
STORE_UNLOCK(__A, tmp2)
ELSE
STORE_UNLOCK(__A, tmp1)
FI
dst[63:0] := tmp1[63:0]
CMPCCXADD
Arithmetic
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst".
tmp1[7:0] := v[0:7] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[39:0] := tmp1[7:0] << 32
tmp4[39:0] := tmp2[31:0] << 8
tmp5[39:0] := tmp3[39:0] XOR tmp4[39:0]
tmp6[31:0] := MOD2(tmp5[39:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
CRC32
Cryptography
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst".
tmp1[15:0] := v[0:15] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[47:0] := tmp1[15:0] << 32
tmp4[47:0] := tmp2[31:0] << 16
tmp5[47:0] := tmp3[47:0] XOR tmp4[47:0]
tmp6[31:0] := MOD2(tmp5[47:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
CRC32
Cryptography
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst".
tmp1[31:0] := v[0:31] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[63:0] := tmp1[31:0] << 32
tmp4[63:0] := tmp2[31:0] << 32
tmp5[63:0] := tmp3[63:0] XOR tmp4[63:0]
tmp6[31:0] := MOD2(tmp5[63:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
CRC32
Cryptography
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst".
tmp1[63:0] := v[0:63] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[95:0] := tmp1[31:0] << 32
tmp4[95:0] := tmp2[63:0] << 64
tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0]
tmp6[31:0] := MOD2(tmp5[95:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
CRC32
Cryptography
Reads 64-byte command pointed by "__src", formats 64-byte enqueue store data, and performs 64-byte enqueue store to memory pointed by "__dst". This intrinsics may only be used in User mode.
ENQCMD
Unknown
Reads 64-byte command pointed by "__src", formats 64-byte enqueue store data, and performs 64-byte enqueue store to memory pointed by "__dst" This intrinsic may only be used in Privileged mode.
ENQCMD
Unknown
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 7
i := j*32
m := j*16
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:256] := 0
F16C
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_imm_note]
FOR j := 0 to 7
i := 16*j
l := 32*j
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ENDFOR
dst[MAX:128] := 0
F16C
Convert
Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*32
m := j*16
dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m])
ENDFOR
dst[MAX:128] := 0
F16C
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst".
[round_imm_note]
FOR j := 0 to 3
i := 16*j
l := 32*j
dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l])
ENDFOR
dst[MAX:64] := 0
F16C
Convert
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] * b[63:0]) + c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] * b[63:0]) - c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := (a[31:0] * b[31:0]) - c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i]
ELSE
dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i]
FI
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
ELSE
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i]
FI
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*64
dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:128] := 0
FMA
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst".
FOR j := 0 to 7
i := j*32
dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i]
ENDFOR
dst[MAX:256] := 0
FMA
Arithmetic
Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0]
dst[127:64] := a[127:64]
dst[MAX:128] := 0
FMA
Arithmetic
Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
FMA
Arithmetic
Read the FS segment base register and store the 32-bit result in "dst".
dst[31:0] := FS_Segment_Base_Register
dst[63:32] := 0
FSGSBASE
General Support
Read the FS segment base register and store the 64-bit result in "dst".
dst[63:0] := FS_Segment_Base_Register
FSGSBASE
General Support
Read the GS segment base register and store the 32-bit result in "dst".
dst[31:0] := GS_Segment_Base_Register
dst[63:32] := 0
FSGSBASE
General Support
Read the GS segment base register and store the 64-bit result in "dst".
dst[63:0] := GS_Segment_Base_Register
FSGSBASE
General Support
Write the unsigned 32-bit integer "a" to the FS segment base register.
FS_Segment_Base_Register[31:0] := a[31:0]
FS_Segment_Base_Register[63:32] := 0
FSGSBASE
General Support
Write the unsigned 64-bit integer "a" to the FS segment base register.
FS_Segment_Base_Register[63:0] := a[63:0]
FSGSBASE
General Support
Write the unsigned 32-bit integer "a" to the GS segment base register.
GS_Segment_Base_Register[31:0] := a[31:0]
GS_Segment_Base_Register[63:32] := 0
FSGSBASE
General Support
Write the unsigned 64-bit integer "a" to the GS segment base register.
GS_Segment_Base_Register[63:0] := a[63:0]
FSGSBASE
General Support
Reload the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image at "mem_addr". This data should have been written to memory previously using the FXSAVE instruction, and in the same format as required by the operating mode. "mem_addr" must be aligned on a 16-byte boundary.
state_x87_fpu_mmx_sse := fxrstor(MEM[mem_addr+512*8:mem_addr])
FXSR
OS-Targeted
Reload the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image at "mem_addr". This data should have been written to memory previously using the FXSAVE64 instruction, and in the same format as required by the operating mode. "mem_addr" must be aligned on a 16-byte boundary.
state_x87_fpu_mmx_sse := fxrstor64(MEM[mem_addr+512*8:mem_addr])
FXSR
OS-Targeted
Save the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location at "mem_addr". The layout of the 512-byte region depends on the operating mode. Bytes [511:464] are available for software use and will not be overwritten by the processor.
MEM[mem_addr+512*8:mem_addr] := fxsave(state_x87_fpu_mmx_sse)
FXSR
OS-Targeted
Save the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location at "mem_addr". The layout of the 512-byte region depends on the operating mode. Bytes [511:464] are available for software use and will not be overwritten by the processor.
MEM[mem_addr+512*8:mem_addr] := fxsave64(state_x87_fpu_mmx_sse)
FXSR
OS-Targeted
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 63
IF k[j]
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ELSE
dst.byte[j] := 0
FI
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 63
IF k[j]
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ELSE
dst.byte[j] := src.byte[j]
FI
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 63
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 7
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := 0
FI
ENDFOR
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 7
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := src.qword[j].byte[i]
FI
ENDFOR
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst".
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 7
FOR i := 0 to 7
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ENDFOR
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 7
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := 0
FI
ENDFOR
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 7
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := src.qword[j].byte[b]
FI
ENDFOR
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst".
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 7
FOR i := 0 to 7
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ENDFOR
ENDFOR
dst[MAX:512] := 0
GFNI
AVX512F
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 31
IF k[j]
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ELSE
dst.byte[j] := 0
FI
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 31
IF k[j]
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ELSE
dst.byte[j] := src.byte[j]
FI
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 31
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 15
IF k[j]
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ELSE
dst.byte[j] := 0
FI
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 15
IF k[j]
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ELSE
dst.byte[j] := src.byte[j]
FI
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1.
DEFINE gf2p8mul_byte(src1byte, src2byte) {
tword := 0
FOR i := 0 to 7
IF src2byte.bit[i]
tword := tword XOR (src1byte << i)
FI
ENDFOR
FOR i := 14 downto 8
p := 0x11B << (i-8)
IF tword.bit[i]
tword := tword XOR p
FI
ENDFOR
RETURN tword.byte[0]
}
FOR j := 0 TO 15
dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j])
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 3
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := 0
FI
ENDFOR
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 3
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := src.qword[j].byte[i]
FI
ENDFOR
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst".
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 3
FOR i := 0 to 7
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ENDFOR
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 1
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := 0
FI
ENDFOR
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 1
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := src.qword[j].byte[i]
FI
ENDFOR
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst".
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 1
FOR i := 0 to 7
dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b)
ENDFOR
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 3
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := 0
FI
ENDFOR
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 3
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := src.qword[j].byte[i]
FI
ENDFOR
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst".
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 3
FOR i := 0 to 7
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ENDFOR
ENDFOR
dst[MAX:256] := 0
GFNI
AVX512VL
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 1
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := 0
FI
ENDFOR
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 1
FOR i := 0 to 7
IF k[j*8+i]
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ELSE
dst.qword[j].byte[i] := src.qword[j].byte[i]
FI
ENDFOR
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst".
DEFINE parity(x) {
t := 0
FOR i := 0 to 7
t := t XOR x.bit[i]
ENDFOR
RETURN t
}
DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) {
FOR i := 0 to 7
retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i]
ENDFOR
RETURN retbyte
}
FOR j := 0 TO 1
FOR i := 0 to 7
dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b)
ENDFOR
ENDFOR
dst[MAX:128] := 0
GFNI
AVX512VL
Arithmetic
Provides a hint to the processor to selectively reset the prediction history of the current logical processor specified by a signed 32-bit integer "__eax".
HRESET
General Support
Invalidate mappings in the Translation Lookaside Buffers (TLBs) and paging-structure caches for the processor context identifier (PCID) specified by "descriptor" based on the invalidation type specified in "type".
The PCID "descriptor" is specified as a 16-byte memory operand (with no alignment restrictions) where bits [11:0] specify the PCID, and bits [127:64] specify the linear address; bits [63:12] are reserved.
The types supported are:
0) Individual-address invalidation: If "type" is 0, the logical processor invalidates mappings for a single linear address and tagged with the PCID specified in "descriptor", except global translations. The instruction may also invalidate global translations, mappings for other linear addresses, or mappings tagged with other PCIDs.
1) Single-context invalidation: If "type" is 1, the logical processor invalidates all mappings tagged with the PCID specified in "descriptor" except global translations. In some cases, it may invalidate mappings for other PCIDs as well.
2) All-context invalidation: If "type" is 2, the logical processor invalidates all mappings tagged with any PCID.
3) All-context invalidation, retaining global translations: If "type" is 3, the logical processor invalidates all mappings tagged with any PCID except global translations, ignoring "descriptor". The instruction may also invalidate global translations as well.
CASE type[1:0] OF
0: // individual-address invalidation retaining global translations
OP_PCID := MEM[descriptor+11:descriptor]
ADDR := MEM[descriptor+127:descriptor+64]
BREAK
1: // single PCID invalidation retaining globals
OP_PCID := MEM[descriptor+11:descriptor]
// invalidate all mappings tagged with OP_PCID except global translations
BREAK
2: // all PCID invalidation
// invalidate all mappings tagged with any PCID
BREAK
3: // all PCID invalidation retaining global translations
// invalidate all mappings tagged with any PCID except global translations
BREAK
ESAC
INVPCID
OS-Targeted
Flag
Decrypt 10 rounds of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
MEM[__odata+127:__odata] := AES128Decrypt (__idata[127:0], __h[383:0])
dst := ZF
KEYLOCKER
Cryptography
Flag
Decrypt 10 rounds of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
MEM[__odata+127:__odata] := AES256Decrypt (__idata[127:0], __h[511:0])
dst := ZF
KEYLOCKER
Cryptography
Flag
Encrypt 10 rounds of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status.
MEM[__odata+127:__odata] := AES128Encrypt (__idata[127:0], __h[383:0])
dst := ZF
KEYLOCKER
Cryptography
Flag
Encrypt 10 rounds of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
MEM[__odata+127:__odata] := AES256Encrypt (__idata[127:0], __h[511:0])
dst := ZF
KEYLOCKER
Cryptography
Flag
Wrap a 128-bit AES key from "__key" into a 384-bit key __h stored in "__h" and set IWKey's NoBackup and KeySource bits in "dst". The explicit source operand "__htype" specifies __h restrictions.
__h[383:0] := WrapKey128(__key[127:0], __htype)
dst[0] := IWKey.NoBackup
dst[4:1] := IWKey.KeySource[3:0]
KEYLOCKER
Cryptography
Flag
Wrap a 256-bit AES key from "__key_hi" and "__key_lo" into a 512-bit key stored in "__h" and set IWKey's NoBackup and KeySource bits in "dst". The 32-bit "__htype" specifies __h restrictions.
__h[511:0] := WrapKey256(__key_lo[127:0], __key_hi[127:0], __htype)
dst[0] := IWKey.NoBackup
dst[4:1] := IWKey.KeySource[3:0]
KEYLOCKER
Cryptography
Flag
Load internal wrapping key (IWKey). The 32-bit unsigned integer "__ctl" specifies IWKey's KeySource and whether backing up the key is permitted. IWKey's 256-bit encryption key is loaded from "__enkey_lo" and "__enkey_hi". IWKey's 128-bit integrity key is loaded from "__intkey".
KEYLOCKER
Cryptography
Flag
Decrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
FOR i := 0 to 7
__odata[i] := AES128Decrypt (__idata[i], __h[383:0])
ENDFOR
dst := ZF
KEYLOCKER_WIDE
Cryptography
Flag
Decrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
FOR i := 0 to 7
__odata[i] := AES256Decrypt (__idata[i], __h[511:0])
ENDFOR
dst := ZF
KEYLOCKER_WIDE
Cryptography
Flag
Encrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 128-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
FOR i := 0 to 7
__odata[i] := AES128Encrypt (__idata[i], __h[383:0])
ENDFOR
dst := ZF
KEYLOCKER_WIDE
Cryptography
Flag
Encrypt 10 rounds of 8 groups of unsigned 8-bit integers in "__idata" using 256-bit AES key specified in "__h", store the resulting unsigned 8-bit integers into the corresponding elements of "__odata", and set "dst" to the ZF flag status. If exception happens, set ZF flag to 1 and zero initialize "__odata".
FOR i := 0 to 7
__odata[i] := AES256Encrypt (__idata[i], __h[512:0])
ENDFOR
dst := ZF
KEYLOCKER_WIDE
Cryptography
Count the number of leading zero bits in unsigned 32-bit integer "a", and return that count in "dst".
tmp := 31
dst := 0
DO WHILE (tmp >= 0 AND a[tmp] == 0)
tmp := tmp - 1
dst := dst + 1
OD
LZCNT
Bit Manipulation
Count the number of leading zero bits in unsigned 64-bit integer "a", and return that count in "dst".
tmp := 63
dst := 0
DO WHILE (tmp >= 0 AND a[tmp] == 0)
tmp := tmp - 1
dst := dst + 1
OD
LZCNT
Bit Manipulation
Copy 64-bit integer "a" to "dst".
dst[63:0] := a[63:0]
MMX
Convert
Copy 64-bit integer "a" to "dst".
dst[63:0] := a[63:0]
MMX
Convert
Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst".
dst[31:0] := a[31:0]
dst[63:32] := 0
MMX
Convert
Copy the lower 32-bit integer in "a" to "dst".
dst[31:0] := a[31:0]
MMX
Convert
Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst".
dst[31:0] := a[31:0]
dst[63:32] := 0
MMX
Convert
Copy the lower 32-bit integer in "a" to "dst".
dst[31:0] := a[31:0]
MMX
Convert
Copy 64-bit integer "a" to "dst".
dst[63:0] := a[63:0]
MMX
Convert
Copy 64-bit integer "a" to "dst".
dst[63:0] := a[63:0]
MMX
Convert
Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.
MMX
General Support
Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.
MMX
General Support
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
dst[7:0] := Saturate8(a[15:0])
dst[15:8] := Saturate8(a[31:16])
dst[23:16] := Saturate8(a[47:32])
dst[31:24] := Saturate8(a[63:48])
dst[39:32] := Saturate8(b[15:0])
dst[47:40] := Saturate8(b[31:16])
dst[55:48] := Saturate8(b[47:32])
dst[63:56] := Saturate8(b[63:48])
MMX
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
dst[15:0] := Saturate16(a[31:0])
dst[31:16] := Saturate16(a[63:32])
dst[47:32] := Saturate16(b[31:0])
dst[63:48] := Saturate16(b[63:32])
MMX
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
dst[7:0] := SaturateU8(a[15:0])
dst[15:8] := SaturateU8(a[31:16])
dst[23:16] := SaturateU8(a[47:32])
dst[31:24] := SaturateU8(a[63:48])
dst[39:32] := SaturateU8(b[15:0])
dst[47:40] := SaturateU8(b[31:16])
dst[55:48] := SaturateU8(b[47:32])
dst[63:56] := SaturateU8(b[63:48])
MMX
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
dst[7:0] := Saturate8(a[15:0])
dst[15:8] := Saturate8(a[31:16])
dst[23:16] := Saturate8(a[47:32])
dst[31:24] := Saturate8(a[63:48])
dst[39:32] := Saturate8(b[15:0])
dst[47:40] := Saturate8(b[31:16])
dst[55:48] := Saturate8(b[47:32])
dst[63:56] := Saturate8(b[63:48])
MMX
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
dst[15:0] := Saturate16(a[31:0])
dst[31:16] := Saturate16(a[63:32])
dst[47:32] := Saturate16(b[31:0])
dst[63:48] := Saturate16(b[63:32])
MMX
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
dst[7:0] := SaturateU8(a[15:0])
dst[15:8] := SaturateU8(a[31:16])
dst[23:16] := SaturateU8(a[47:32])
dst[31:24] := SaturateU8(a[63:48])
dst[39:32] := SaturateU8(b[15:0])
dst[47:40] := SaturateU8(b[31:16])
dst[55:48] := SaturateU8(b[47:32])
dst[63:56] := SaturateU8(b[63:48])
MMX
Miscellaneous
Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_BYTES(src1[63:0], src2[63:0]) {
dst[7:0] := src1[39:32]
dst[15:8] := src2[39:32]
dst[23:16] := src1[47:40]
dst[31:24] := src2[47:40]
dst[39:32] := src1[55:48]
dst[47:40] := src2[55:48]
dst[55:48] := src1[63:56]
dst[63:56] := src2[63:56]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_HIGH_BYTES(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_WORDS(src1[63:0], src2[63:0]) {
dst[15:0] := src1[47:32]
dst[31:16] := src2[47:32]
dst[47:32] := src1[63:48]
dst[63:48] := src2[63:48]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_HIGH_WORDS(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".
dst[31:0] := a[63:32]
dst[63:32] := b[63:32]
MMX
Swizzle
Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_BYTES(src1[63:0], src2[63:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_BYTES(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_WORDS(src1[63:0], src2[63:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_WORDS(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".
dst[31:0] := a[31:0]
dst[63:32] := b[31:0]
MMX
Swizzle
Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_BYTES(src1[63:0], src2[63:0]) {
dst[7:0] := src1[39:32]
dst[15:8] := src2[39:32]
dst[23:16] := src1[47:40]
dst[31:24] := src2[47:40]
dst[39:32] := src1[55:48]
dst[47:40] := src2[55:48]
dst[55:48] := src1[63:56]
dst[63:56] := src2[63:56]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_HIGH_BYTES(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_WORDS(src1[63:0], src2[63:0]) {
dst[15:0] := src1[47:32]
dst[31:16] := src2[47:32]
dst[47:32] := src1[63:48]
dst[63:48] := src2[63:48]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_HIGH_WORDS(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".
dst[31:0] := a[63:32]
dst[63:32] := b[63:32]
MMX
Swizzle
Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_BYTES(src1[63:0], src2[63:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_BYTES(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_WORDS(src1[63:0], src2[63:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
RETURN dst[63:0]
}
dst[63:0] := INTERLEAVE_WORDS(a[63:0], b[63:0])
MMX
Swizzle
Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".
dst[31:0] := a[31:0]
dst[63:32] := b[31:0]
MMX
Swizzle
Add packed 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ENDFOR
MMX
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ENDFOR
MMX
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
MMX
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ENDFOR
MMX
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ENDFOR
MMX
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ENDFOR
MMX
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ENDFOR
MMX
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ENDFOR
MMX
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ENDFOR
MMX
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
MMX
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ENDFOR
MMX
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ENDFOR
MMX
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ENDFOR
MMX
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ENDFOR
MMX
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ENDFOR
MMX
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ENDFOR
MMX
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[15:0]
ENDFOR
MMX
Arithmetic
Add packed 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ENDFOR
MMX
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ENDFOR
MMX
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
MMX
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ENDFOR
MMX
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ENDFOR
MMX
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ENDFOR
MMX
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ENDFOR
MMX
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ENDFOR
MMX
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ENDFOR
MMX
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
MMX
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ENDFOR
MMX
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ENDFOR
MMX
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ENDFOR
MMX
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ENDFOR
MMX
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ENDFOR
MMX
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ENDFOR
MMX
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[15:0]
ENDFOR
MMX
Arithmetic
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst".
IF count[63:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] << count[63:0])
FI
MMX
Shift
Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst".
IF imm8[7:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] << imm8[7:0])
FI
MMX
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst".
IF count[63:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] >> count[63:0])
FI
MMX
Shift
Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst".
IF imm8[7:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] >> imm8[7:0])
FI
MMX
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst".
IF count[63:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] << count[63:0])
FI
MMX
Shift
Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst".
IF imm8[7:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] << imm8[7:0])
FI
MMX
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
MMX
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
MMX
Shift
Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst".
IF count[63:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] >> count[63:0])
FI
MMX
Shift
Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst".
IF imm8[7:0] > 63
dst[63:0] := 0
ELSE
dst[63:0] := ZeroExtend64(a[63:0] >> imm8[7:0])
FI
MMX
Shift
Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[63:0] := (a[63:0] AND b[63:0])
MMX
Logical
Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
dst[63:0] := ((NOT a[63:0]) AND b[63:0])
MMX
Logical
Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[63:0] := (a[63:0] OR b[63:0])
MMX
Logical
Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[63:0] := (a[63:0] XOR b[63:0])
MMX
Logical
Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[63:0] := (a[63:0] AND b[63:0])
MMX
Logical
Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
dst[63:0] := ((NOT a[63:0]) AND b[63:0])
MMX
Logical
Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[63:0] := (a[63:0] OR b[63:0])
MMX
Logical
Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[63:0] := (a[63:0] XOR b[63:0])
MMX
Logical
Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
ENDFOR
MMX
Compare
Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
MMX
Compare
Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
MMX
Compare
Compare packed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
ENDFOR
MMX
Compare
Compare packed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
MMX
Compare
Compare packed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
MMX
Compare
Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
ENDFOR
MMX
Compare
Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
MMX
Compare
Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
MMX
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
ENDFOR
MMX
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
MMX
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
MMX
Compare
Return vector of type __m64 with all elements set to zero.
dst[MAX:0] := 0
MMX
Set
Set packed 32-bit integers in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
MMX
Set
Set packed 16-bit integers in "dst" with the supplied values.
dst[15:0] := e0
dst[31:16] := e1
dst[47:32] := e2
dst[63:48] := e3
MMX
Set
Set packed 8-bit integers in "dst" with the supplied values.
dst[7:0] := e0
dst[15:8] := e1
dst[23:16] := e2
dst[31:24] := e3
dst[39:32] := e4
dst[47:40] := e5
dst[55:48] := e6
dst[63:56] := e7
MMX
Set
Broadcast 32-bit integer "a" to all elements of "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
MMX
Set
Broadcast 16-bit integer "a" to all all elements of "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
MMX
Set
Broadcast 8-bit integer "a" to all elements of "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
MMX
Set
Set packed 32-bit integers in "dst" with the supplied values in reverse order.
dst[31:0] := e1
dst[63:32] := e0
MMX
Set
Set packed 16-bit integers in "dst" with the supplied values in reverse order.
dst[15:0] := e3
dst[31:16] := e2
dst[47:32] := e1
dst[63:48] := e0
MMX
Set
Set packed 8-bit integers in "dst" with the supplied values in reverse order.
dst[7:0] := e7
dst[15:8] := e6
dst[23:16] := e5
dst[31:24] := e4
dst[39:32] := e3
dst[47:40] := e2
dst[55:48] := e1
dst[63:56] := e0
MMX
Set
Arm address monitoring hardware using the address specified in "p". A store to an address within the specified address range triggers the monitoring hardware. Specify optional extensions in "extensions", and optional hints in "hints".
MONITOR
General Support
Hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or store operation to the address range specified by MONITOR.
MONITOR
General Support
Load 16 bits from memory, perform a byte swap operation, and store the result in "dst".
FOR j := 0 to 1
i := j*8
dst[i+7:i] := MEM[ptr+15-i:ptr+8-i]
ENDFOR
MOVBE
Load
Load 32 bits from memory, perform a byte swap operation, and store the result in "dst".
FOR j := 0 to 3
i := j*8
dst[i+7:i] := MEM[ptr+31-i:ptr+24-i]
ENDFOR
MOVBE
Load
Load 64 bits from memory, perform a byte swap operation, and store the result in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := MEM[ptr+63-i:ptr+56-i]
ENDFOR
MOVBE
Load
Perform a bit swap operation of the 16 bits in "data", and store the results to memory.
FOR j := 0 to 1
i := j*8
MEM[ptr+i+7:ptr+i] := data[15-i:8-i]
ENDFOR
MOVBE
Store
Perform a bit swap operation of the 32 bits in "data", and store the results to memory.
addr := MEM[ptr]
FOR j := 0 to 3
i := j*8
MEM[ptr+i+7:ptr+i] := data[31-i:24-i]
ENDFOR
MOVBE
Store
Perform a bit swap operation of the 64 bits in "data", and store the results to memory.
addr := MEM[ptr]
FOR j := 0 to 7
i := j*8
MEM[ptr+i+7:ptr+i] := data[63-i:56-i]
ENDFOR
MOVBE
Store
Move 64-byte (512-bit) value using direct store from source memory address "src" to destination memory address "dst".
MEM[dst+511:dst] := MEM[src+511:src]
MOVDIR64B
Store
Store 64-bit integer from "val" into memory using direct store.
MEM[dst+63:dst] := val[63:0]
MOVDIRI
Store
Store 32-bit integer from "val" into memory using direct store.
MEM[dst+31:dst] := val[31:0]
MOVDIRI
Store
Make a pointer with the value of "srcmem" and bounds set to ["srcmem", "srcmem" + "size" - 1], and store the result in "dst".
dst := srcmem
dst.LB := srcmem.LB
dst.UB := srcmem + size - 1
MPX
Miscellaneous
Narrow the bounds for pointer "q" to the intersection of the bounds of "r" and the bounds ["q", "q" + "size" - 1], and store the result in "dst".
dst := q
IF r.LB > (q + size - 1) OR r.UB < q
dst.LB := 1
dst.UB := 0
ELSE
dst.LB := MAX(r.LB, q)
dst.UB := MIN(r.UB, (q + size - 1))
FI
MPX
Miscellaneous
Make a pointer with the value of "q" and bounds set to the bounds of "r" (e.g. copy the bounds of "r" to pointer "q"), and store the result in "dst".
dst := q
dst.LB := r.LB
dst.UB := r.UB
MPX
Miscellaneous
Make a pointer with the value of "q" and open bounds, which allow the pointer to access the entire virtual address space, and store the result in "dst".
dst := q
dst.LB := 0
dst.UB := 0
MPX
Miscellaneous
Stores the bounds of "ptr_val" pointer in memory at address "ptr_addr".
MEM[ptr_addr].LB := ptr_val.LB
MEM[ptr_addr].UB := ptr_val.UB
MPX
Miscellaneous
Checks if "q" is within its lower bound, and throws a #BR if not.
IF q < q.LB
#BR
FI
MPX
Miscellaneous
Checks if "q" is within its upper bound, and throws a #BR if not.
IF q > q.UB
#BR
FI
MPX
Miscellaneous
Checks if ["q", "q" + "size" - 1] is within the lower and upper bounds of "q" and throws a #BR if not.
IF (q + size - 1) < q.LB OR (q + size - 1) > q.UB
#BR
FI
MPX
Miscellaneous
Return the lower bound of "q".
dst := q.LB
MPX
Miscellaneous
Return the upper bound of "q".
dst := q.UB
MPX
Miscellaneous
Set "dst" to the index of the lowest set bit in 32-bit integer "a". If no bits are set in "a" then "dst" is undefined.
tmp := 0
IF a == 0
// dst is undefined
ELSE
DO WHILE ((tmp < 32) AND a[tmp] == 0)
tmp := tmp + 1
OD
FI
dst := tmp
Bit Manipulation
Set "dst" to the index of the highest set bit in 32-bit integer "a". If no bits are set in "a" then "dst" is undefined.
tmp := 31
IF a == 0
// dst is undefined
ELSE
DO WHILE ((tmp > 0) AND a[tmp] == 0)
tmp := tmp - 1
OD
FI
dst := tmp
Bit Manipulation
Set "index" to the index of the lowest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.
tmp := 0
IF a == 0
// MEM[index+31:index] is undefined
dst := 0
ELSE
DO WHILE ((tmp < 32) AND a[tmp] == 0)
tmp := tmp + 1
OD
MEM[index+31:index] := tmp
dst := (tmp == 31) ? 0 : 1
FI
Bit Manipulation
Set "index" to the index of the highest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.
tmp := 31
IF a == 0
// MEM[index+31:index] is undefined
dst := 0
ELSE
DO WHILE ((tmp > 0) AND a[tmp] == 0)
tmp := tmp - 1
OD
MEM[index+31:index] := tmp
dst := (tmp == 0) ? 0 : 1
FI
Bit Manipulation
Set "index" to the index of the lowest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.
tmp := 0
IF a == 0
// MEM[index+31:index] is undefined
dst := 0
ELSE
DO WHILE ((tmp < 64) AND a[tmp] == 0)
tmp := tmp + 1
OD
MEM[index+31:index] := tmp
dst := (tmp == 63) ? 0 : 1
FI
Bit Manipulation
Set "index" to the index of the highest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1.
tmp := 63
IF a == 0
// MEM[index+31:index] is undefined
dst := 0
ELSE
DO WHILE ((tmp > 0) AND a[tmp] == 0)
tmp := tmp - 1
OD
MEM[index+31:index] := tmp
dst := (tmp == 0) ? 0 : 1
FI
Bit Manipulation
Return the bit at index "b" of 32-bit integer "a".
addr := a + ZeroExtend64(b)
dst[0] := MEM[addr]
Bit Manipulation
Return the bit at index "b" of 32-bit integer "a", and set that bit to its complement.
addr := a + ZeroExtend64(b)
dst[0] := MEM[addr]
MEM[addr] := ~dst[0]
Bit Manipulation
Return the bit at index "b" of 32-bit integer "a", and set that bit to zero.
addr := a + ZeroExtend64(b)
dst[0] := MEM[addr]
MEM[addr] := 0
Bit Manipulation
Return the bit at index "b" of 32-bit integer "a", and set that bit to one.
addr := a + ZeroExtend64(b)
dst[0] := MEM[addr]
MEM[addr] := 1
Bit Manipulation
Return the bit at index "b" of 64-bit integer "a".
addr := a + b
dst[0] := MEM[addr]
Bit Manipulation
Return the bit at index "b" of 64-bit integer "a", and set that bit to its complement.
addr := a + b
dst[0] := MEM[addr]
MEM[addr] := ~dst[0]
Bit Manipulation
Return the bit at index "b" of 64-bit integer "a", and set that bit to zero.
addr := a + b
dst[0] := MEM[addr]
MEM[addr] := 0
Bit Manipulation
Return the bit at index "b" of 64-bit integer "a", and set that bit to one.
addr := a + b
dst[0] := MEM[addr]
MEM[addr] := 1
Bit Manipulation
Reverse the byte order of 32-bit integer "a", and store the result in "dst". This intrinsic is provided for conversion between little and big endian values.
dst[7:0] := a[31:24]
dst[15:8] := a[23:16]
dst[23:16] := a[15:8]
dst[31:24] := a[7:0]
Bit Manipulation
Reverse the byte order of 64-bit integer "a", and store the result in "dst". This intrinsic is provided for conversion between little and big endian values.
dst[7:0] := a[63:56]
dst[15:8] := a[55:48]
dst[23:16] := a[47:40]
dst[31:24] := a[39:32]
dst[39:32] := a[31:24]
dst[47:40] := a[23:16]
dst[55:48] := a[15:8]
dst[63:56] := a[7:0]
Bit Manipulation
Cast from type float to type unsigned __int32 without conversion.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Cast
Cast from type double to type unsigned __int64 without conversion.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Cast
Cast from type unsigned __int32 to type float without conversion.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Cast
Cast from type unsigned __int64 to type double without conversion.
This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Cast
Shift the bits of unsigned long integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".
// size := 32 or 64
dst := a
count := shift AND (size - 1)
DO WHILE (count > 0)
tmp[0] := dst[size - 1]
dst := (dst << 1) OR tmp[0]
count := count - 1
OD
Shift
Shift the bits of unsigned long integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".
// size := 32 or 64
dst := a
count := shift AND (size - 1)
DO WHILE (count > 0)
tmp[size - 1] := dst[0]
dst := (dst >> 1) OR tmp[size - 1]
count := count - 1
OD
Shift
Shift the bits of unsigned 32-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".
dst := a
count := shift AND 31
DO WHILE (count > 0)
tmp[0] := dst[31]
dst := (dst << 1) OR tmp[0]
count := count - 1
OD
Shift
Shift the bits of unsigned 32-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".
dst := a
count := shift AND 31
DO WHILE (count > 0)
tmp[31] := dst[0]
dst := (dst >> 1) OR tmp
count := count - 1
OD
Shift
Shift the bits of unsigned 16-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".
dst := a
count := shift AND 15
DO WHILE (count > 0)
tmp[0] := dst[15]
dst := (dst << 1) OR tmp[0]
count := count - 1
OD
Shift
Shift the bits of unsigned 16-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".
dst := a
count := shift AND 15
DO WHILE (count > 0)
tmp[15] := dst[0]
dst := (dst >> 1) OR tmp
count := count - 1
OD
Shift
Shift the bits of unsigned 64-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst".
dst := a
count := shift AND 63
DO WHILE (count > 0)
tmp[0] := dst[63]
dst := (dst << 1) OR tmp[0]
count := count - 1
OD
Shift
Shift the bits of unsigned 64-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst".
dst := a
count := shift AND 63
DO WHILE (count > 0)
tmp[63] := dst[0]
dst := (dst >> 1) OR tmp[63]
count := count - 1
OD
Shift
Treat the processor-specific feature(s) specified in "a" as available. Multiple features may be OR'd together. See the valid feature flags below:
_FEATURE_GENERIC_IA32
_FEATURE_FPU
_FEATURE_CMOV
_FEATURE_MMX
_FEATURE_FXSAVE
_FEATURE_SSE
_FEATURE_SSE2
_FEATURE_SSE3
_FEATURE_SSSE3
_FEATURE_SSE4_1
_FEATURE_SSE4_2
_FEATURE_MOVBE
_FEATURE_POPCNT
_FEATURE_PCLMULQDQ
_FEATURE_AES
_FEATURE_F16C
_FEATURE_AVX
_FEATURE_RDRND
_FEATURE_FMA
_FEATURE_BMI
_FEATURE_LZCNT
_FEATURE_HLE
_FEATURE_RTM
_FEATURE_AVX2
_FEATURE_KNCNI
_FEATURE_AVX512F
_FEATURE_ADX
_FEATURE_RDSEED
_FEATURE_AVX512ER
_FEATURE_AVX512PF
_FEATURE_AVX512CD
_FEATURE_SHA
_FEATURE_MPX
_FEATURE_AVX512BW
_FEATURE_AVX512VL
_FEATURE_AVX512VBMI
_FEATURE_AVX512_4FMAPS
_FEATURE_AVX512_4VNNIW
_FEATURE_AVX512_VPOPCNTDQ
_FEATURE_AVX512_BITALG
_FEATURE_AVX512_VBMI2
_FEATURE_GFNI
_FEATURE_VAES
_FEATURE_VPCLMULQDQ
_FEATURE_AVX512_VNNI
_FEATURE_CLWB
_FEATURE_RDPID
_FEATURE_IBT
_FEATURE_SHSTK
_FEATURE_SGX
_FEATURE_WBNOINVD
_FEATURE_PCONFIG
_FEATURE_AXV512_4VNNIB
_FEATURE_AXV512_4FMAPH
_FEATURE_AXV512_BITALG2
_FEATURE_AXV512_VP2INTERSECT
General Support
Dynamically query the processor to determine if the processor-specific feature(s) specified in "a" are available, and return true or false (1 or 0) if the set of features is available. Multiple features may be OR'd together. This function is limited to bitmask values in the first 'page' of the libirc cpu-id information. This intrinsic does not check the processor vendor. See the valid feature flags below:
_FEATURE_GENERIC_IA32
_FEATURE_FPU
_FEATURE_CMOV
_FEATURE_MMX
_FEATURE_FXSAVE
_FEATURE_SSE
_FEATURE_SSE2
_FEATURE_SSE3
_FEATURE_SSSE3
_FEATURE_SSE4_1
_FEATURE_SSE4_2
_FEATURE_MOVBE
_FEATURE_POPCNT
_FEATURE_PCLMULQDQ
_FEATURE_AES
_FEATURE_F16C
_FEATURE_AVX
_FEATURE_RDRND
_FEATURE_FMA
_FEATURE_BMI
_FEATURE_LZCNT
_FEATURE_HLE
_FEATURE_RTM
_FEATURE_AVX2
_FEATURE_KNCNI
_FEATURE_AVX512F
_FEATURE_ADX
_FEATURE_RDSEED
_FEATURE_AVX512ER
_FEATURE_AVX512PF
_FEATURE_AVX512CD
_FEATURE_SHA
_FEATURE_MPX
_FEATURE_AVX512BW
_FEATURE_AVX512VL
_FEATURE_AVX512VBMI
_FEATURE_AVX512_4FMAPS
_FEATURE_AVX512_4VNNIW
_FEATURE_AVX512_VPOPCNTDQ
_FEATURE_AVX512_BITALG
_FEATURE_AVX512_VBMI2
_FEATURE_GFNI
_FEATURE_VAES
_FEATURE_VPCLMULQDQ
_FEATURE_AVX512_VNNI
_FEATURE_CLWB
_FEATURE_RDPID
_FEATURE_IBT
_FEATURE_SHSTK
_FEATURE_SGX
_FEATURE_WBNOINVD
_FEATURE_PCONFIG
_FEATURE_AXV512_4VNNIB
_FEATURE_AXV512_4FMAPH
_FEATURE_AXV512_BITALG2
_FEATURE_AXV512_VP2INTERSECT
_FEATURE_AXV512_FP16
General Support
Dynamically query the processor to determine if the processor-specific feature(s) specified in "a" are available, and return true or false (1 or 0) if the set of features is available. Multiple features may be OR'd together. This works identically to the previous variant, except it also accepts a 'page' index that permits checking features on the 2nd page of the libirc information. When provided with a '0' in the 'page' parameter, this works identically to _may_i_use_cpu_feature. This intrinsic does not check the processor vendor. See the valid feature flags on the 2nd page below: (provided with a '1' in the 'page' parameter)
_FEATURE_CLDEMOTE
_FEATURE_MOVDIRI
_FEATURE_MOVDIR64B
_FEATURE_WAITPKG
_FEATURE_AVX512_Bf16
_FEATURE_ENQCMD
_FEATURE_AVX_VNNI
_FEATURE_AMX_TILE
_FEATURE_AMX_INT8
_FEATURE_AMX_BF16
_FEATURE_KL
_FEATURE_WIDE_KL
_FEATURE_HRESET
_FEATURE_UINTR
_FEATURE_PREFETCHI
_FEATURE_AVXVNNIINT8
_FEATURE_CMPCCXADD
_FEATURE_AVXIFMA
_FEATURE_AVXNECONVERT
_FEATURE_RAOINT
_FEATURE_AMX_FP16
_FEATURE_AMX_COMPLEX
_FEATURE_SHA512
_FEATURE_SM3
_FEATURE_SM4
_FEATURE_AVXVNNIINT16
_FEATURE_USERMSR
_FEATURE_AVX10_1_256
_FEATURE_AVX10_1_512
_FEATURE_APXF
_FEATURE_MSRLIST
_FEATURE_WRMSRNS
_FEATURE_PBNDKB
General Support
Dynamically query the processor to determine if the processor-specific feature(s) specified a series of compile-time string literals in "feature, ..." are available, and return true or false (1 or 0) if the set of features is available. These feature names are converted to a bitmask and uses the same infrastructure as _may_i_use_cpu_feature_ext to validate it. The behavior is the same as the previous variants. This intrinsic does not check the processor vendor. Supported string literals are one-to-one corresponding in the "Operation" sections of _may_i_use_cpu_feature and _may_i_use_cpu_feature_ext. Example string literals are "avx2", "bmi", "avx512fp16", "amx-int8"...
General Support
Read the Performance Monitor Counter (PMC) specified by "a", and store up to 64-bits in "dst". The width of performance counters is implementation specific.
dst[63:0] := ReadPMC(a)
General Support
Add unsigned 32-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry flag), and store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag).
tmp[32:0] := a[31:0] + b[31:0] + (c_in > 0 ? 1 : 0)
MEM[out+31:out] := tmp[31:0]
dst[0] := tmp[32]
dst[7:1] := 0
Arithmetic
Add unsigned 64-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry flag), and store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag).
tmp[64:0] := a[63:0] + b[63:0] + (c_in > 0 ? 1 : 0)
MEM[out+63:out] := tmp[63:0]
dst[0] := tmp[64]
dst[7:1] := 0
Arithmetic
Add unsigned 8-bit borrow "c_in" (carry flag) to unsigned 32-bit integer "b", and subtract the result from unsigned 32-bit integer "a". Store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag).
tmp[32:0] := a[31:0] - (b[31:0] + (c_in > 0 ? 1 : 0))
MEM[out+31:out] := tmp[31:0]
dst[0] := tmp[32]
dst[7:1] := 0
Arithmetic
Add unsigned 8-bit borrow "c_in" (carry flag) to unsigned 64-bit integer "b", and subtract the result from unsigned 64-bit integer "a". Store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag).
tmp[64:0] := a[63:0] - (b[63:0] + (c_in > 0 ? 1 : 0))
MEM[out+63:out] := tmp[63:0]
dst[0] := tmp[64]
dst[7:1] := 0
Arithmetic
Insert the 32-bit data from "a" into a Processor Trace stream via a PTW packet. The PTW packet will be inserted if tracing is currently enabled and ptwrite is currently enabled. The current IP will also be inserted via a FUP packet if FUPonPTW is enabled.
Miscellaneous
Insert the 64-bit data from "a" into a Processor Trace stream via a PTW packet. The PTW packet will be inserted if tracing is currently enabled and ptwrite is currently enabled. The current IP will also be inserted via a FUP packet if FUPonPTW is enabled.
Miscellaneous
Invoke the Intel SGX enclave user (non-privilege) leaf function specified by "a", and return the error code. The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.
Miscellaneous
Invoke the Intel SGX enclave system (privileged) leaf function specified by "a", and return the error code. The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.
Miscellaneous
Invoke the Intel SGX enclave virtualized (VMM) leaf function specified by "a", and return the error code. The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.
Miscellaneous
Write back and flush internal caches.
Initiate writing-back and flushing of external
caches.
Miscellaneous
Convert the half-precision (16-bit) floating-point value "a" to a single-precision (32-bit) floating-point value, and store the result in "dst".
dst[31:0] := Convert_FP16_To_FP32(a[15:0])
Convert
Convert the single-precision (32-bit) floating-point value "a" to a half-precision (16-bit) floating-point value, and store the result in "dst".
[round_note]
dst[15:0] := Convert_FP32_To_FP16(a[31:0])
Convert
Perform a carry-less multiplication of two 64-bit integers, selected from "a" and "b" according to "imm8", and store the results in "dst".
IF (imm8[0] == 0)
TEMP1 := a[63:0]
ELSE
TEMP1 := a[127:64]
FI
IF (imm8[4] == 0)
TEMP2 := b[63:0]
ELSE
TEMP2 := b[127:64]
FI
FOR i := 0 to 63
TEMP[i] := (TEMP1[0] and TEMP2[i])
FOR j := 1 to i
TEMP[i] := TEMP[i] XOR (TEMP1[j] AND TEMP2[i-j])
ENDFOR
dst[i] := TEMP[i]
ENDFOR
FOR i := 64 to 127
TEMP[i] := 0
FOR j := (i - 63) to 63
TEMP[i] := TEMP[i] XOR (TEMP1[j] AND TEMP2[i-j])
ENDFOR
dst[i] := TEMP[i]
ENDFOR
dst[127] := 0
PCLMULQDQ
Application-Targeted
Invoke the PCONFIG leaf function specified by "a". The "__data" array contains 3 32- or 64-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx. May return the value in eax, depending on the semantics of the specified leaf function.
PCONFIG
Miscellaneous
Count the number of bits set to 1 in unsigned 32-bit integer "a", and return that count in "dst".
dst := 0
FOR i := 0 to 31
IF a[i]
dst := dst + 1
FI
ENDFOR
POPCNT
Bit Manipulation
Count the number of bits set to 1 in unsigned 64-bit integer "a", and return that count in "dst".
dst := 0
FOR i := 0 to 63
IF a[i]
dst := dst + 1
FI
ENDFOR
POPCNT
Bit Manipulation
Count the number of bits set to 1 in 32-bit integer "a", and return that count in "dst".
dst := 0
FOR i := 0 to 31
IF a[i]
dst := dst + 1
FI
ENDFOR
POPCNT
Bit Manipulation
Count the number of bits set to 1 in 64-bit integer "a", and return that count in "dst".
dst := 0
FOR i := 0 to 63
IF a[i]
dst := dst + 1
FI
ENDFOR
POPCNT
Bit Manipulation
Loads an instruction sequence containing the specified memory address into all level cache.
PREFETCHI
General Support
Loads an instruction sequence containing the specified memory address into all but the first-level cache.
PREFETCHI
General Support
Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i", which can be one of:<ul>
<li>_MM_HINT_ET0 // 7, move data using the ET0 hint. The PREFETCHW instruction will be generated.</li>
<li>_MM_HINT_T0 // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.</li>
<li>_MM_HINT_T1 // 2, move data using the T1 hint. The PREFETCHT1 instruction will be generated.</li>
<li>_MM_HINT_T2 // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.</li>
<li>_MM_HINT_NTA // 0, move data using the non-temporal access (NTA) hint. The PREFETCHNTA instruction will be generated.</li>
PRFCHW
General Support
Atomically add a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.
MEM[__A+31:__A] := MEM[__A+31:__A] + __B[31:0]
RAO_INT
Arithmetic
Atomically add a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.
MEM[__A+63:__A] := MEM[__A+63:__A] + __B[63:0]
RAO_INT
Arithmetic
Atomically and a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.
MEM[__A+31:__A] := MEM[__A+31:__A] AND __B[31:0]
RAO_INT
Arithmetic
Atomically and a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.
MEM[__A+63:__A] := MEM[__A+63:__A] AND __B[63:0]
RAO_INT
Arithmetic
Atomically or a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.
MEM[__A+31:__A] := MEM[__A+31:__A] OR __B[31:0]
RAO_INT
Arithmetic
Atomically or a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.
MEM[__A+63:__A] := MEM[__A+63:__A] OR __B[63:0]
RAO_INT
Arithmetic
Atomically xor a 32-bit value at memory operand "__A" and a 32-bit "__B", and store the result to the same memory location.
MEM[__A+31:__A] := MEM[__A+31:__A] XOR __B[31:0]
RAO_INT
Arithmetic
Atomically xor a 64-bit value at memory operand "__A" and a 64-bit "__B", and store the result to the same memory location.
MEM[__A+63:__A] := MEM[__A+63:__A] XOR __B[63:0]
RAO_INT
Arithmetic
Copy the IA32_TSC_AUX MSR (signature value) into "dst".
dst[31:0] := IA32_TSC_AUX[31:0]
RDPID
General Support
Read a hardware generated 16-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise.
IF HW_RND_GEN.ready == 1
val[15:0] := HW_RND_GEN.data
dst := 1
ELSE
val[15:0] := 0
dst := 0
FI
RDRAND
Random
Read a hardware generated 32-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise.
IF HW_RND_GEN.ready == 1
val[31:0] := HW_RND_GEN.data
dst := 1
ELSE
val[31:0] := 0
dst := 0
FI
RDRAND
Random
Read a hardware generated 64-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise.
IF HW_RND_GEN.ready == 1
val[63:0] := HW_RND_GEN.data
dst := 1
ELSE
val[63:0] := 0
dst := 0
FI
RDRAND
Random
Read a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise.
IF HW_NRND_GEN.ready == 1
val[15:0] := HW_NRND_GEN.data
dst := 1
ELSE
val[15:0] := 0
dst := 0
FI
RDSEED
Random
Read a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise.
IF HW_NRND_GEN.ready == 1
val[31:0] := HW_NRND_GEN.data
dst := 1
ELSE
val[31:0] := 0
dst := 0
FI
RDSEED
Random
Read a 64-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise.
IF HW_NRND_GEN.ready == 1
val[63:0] := HW_NRND_GEN.data
dst := 1
ELSE
val[63:0] := 0
dst := 0
FI
RDSEED
Random
Copy the current 64-bit value of the processor's time-stamp counter into "dst", and store the IA32_TSC_AUX MSR (signature value) into memory at "mem_addr".
dst[63:0] := TimeStampCounter
MEM[mem_addr+31:mem_addr] := IA32_TSC_AUX[31:0]
RDTSCP
General Support
Force an RTM abort. The EAX register is updated to reflect an XABORT instruction caused the abort, and the "imm8" parameter will be provided in bits [31:24] of EAX.
Following an RTM abort, the logical processor resumes execution at the fallback address computed through the outermost XBEGIN instruction.
IF RTM_ACTIVE == 0
// nop
ELSE
// restore architectural register state
// discard memory updates performed in transaction
// update EAX with status and imm8 value
eax[31:24] := imm8[7:0]
RTM_NEST_COUNT := 0
RTM_ACTIVE := 0
IF _64_BIT_MODE
RIP := fallbackRIP
ELSE
EIP := fallbackEIP
FI
FI
RTM
General Support
Specify the start of an RTM code region.
If the logical processor was not already in transactional execution, then this call causes the logical processor to transition into transactional execution.
On an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution, restores architectural state, and starts execution beginning at the fallback address computed from the outermost XBEGIN instruction. Return status of ~0 (0xFFFF) if continuing inside transaction; all other codes are aborts.
IF RTM_NEST_COUNT < MAX_RTM_NEST_COUNT
RTM_NEST_COUNT := RTM_NEST_COUNT + 1
IF RTM_NEST_COUNT == 1
IF _64_BIT_MODE
fallbackRIP := RIP
ELSE IF _32_BIT_MODE
fallbackEIP := EIP
FI
RTM_ACTIVE := 1
// enter RTM execution, record register state, start tracking memory state
FI
ELSE
// RTM abort (see _xabort)
FI
RTM
General Support
Specify the end of an RTM code region.
If this corresponds to the outermost scope, the logical processor will attempt to commit the logical processor state atomically.
If the commit fails, the logical processor will perform an RTM abort.
IF RTM_ACTIVE == 1
RTM_NEST_COUNT := RTM_NEST_COUNT - 1
IF RTM_NEST_COUNT == 0
// try to commit transaction
IF FAIL_TO_COMMIT_TRANSACTION
// RTM abort (see _xabort)
ELSE
RTM_ACTIVE := 0
FI
FI
FI
RTM
General Support
Query the transactional execution status, return 1 if inside a transactionally executing RTM or HLE region, and return 0 otherwise.
IF (RTM_ACTIVE == 1 OR HLE_ACTIVE == 1)
dst := 1
ELSE
dst := 0
FI
RTM
General Support
Serialize instruction execution, ensuring all modifications to flags, registers, and memory by previous instructions are completed before the next instruction is fetched.
SERIALIZE
General Support
Perform an intermediate calculation for the next four SHA1 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst".
W0 := a[127:96]
W1 := a[95:64]
W2 := a[63:32]
W3 := a[31:0]
W4 := b[127:96]
W5 := b[95:64]
dst[127:96] := W2 XOR W0
dst[95:64] := W3 XOR W1
dst[63:32] := W4 XOR W2
dst[31:0] := W5 XOR W3
SHA
Cryptography
Perform the final calculation for the next four SHA1 message values (unsigned 32-bit integers) using the intermediate result in "a" and the previous message values in "b", and store the result in "dst".
W13 := b[95:64]
W14 := b[63:32]
W15 := b[31:0]
W16 := (a[127:96] XOR W13) <<< 1
W17 := (a[95:64] XOR W14) <<< 1
W18 := (a[63:32] XOR W15) <<< 1
W19 := (a[31:0] XOR W16) <<< 1
dst[127:96] := W16
dst[95:64] := W17
dst[63:32] := W18
dst[31:0] := W19
SHA
Cryptography
Calculate SHA1 state variable E after four rounds of operation from the current SHA1 state variable "a", add that value to the scheduled values (unsigned 32-bit integers) in "b", and store the result in "dst".
tmp := (a[127:96] <<< 30)
dst[127:96] := b[127:96] + tmp
dst[95:64] := b[95:64]
dst[63:32] := b[63:32]
dst[31:0] := b[31:0]
SHA
Cryptography
Perform four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D) from "a" and some pre-computed sum of the next 4 round message values (unsigned 32-bit integers), and state variable E from "b", and store the updated SHA1 state (A,B,C,D) in "dst". "func" contains the logic functions and round constants.
IF (func[1:0] == 0)
f := f0()
K := K0
ELSE IF (func[1:0] == 1)
f := f1()
K := K1
ELSE IF (func[1:0] == 2)
f := f2()
K := K2
ELSE IF (func[1:0] == 3)
f := f3()
K := K3
FI
A := a[127:96]
B := a[95:64]
C := a[63:32]
D := a[31:0]
W[0] := b[127:96]
W[1] := b[95:64]
W[2] := b[63:32]
W[3] := b[31:0]
A[1] := f(B, C, D) + (A <<< 5) + W[0] + K
B[1] := A
C[1] := B <<< 30
D[1] := C
E[1] := D
FOR i := 1 to 3
A[i+1] := f(B[i], C[i], D[i]) + (A[i] <<< 5) + W[i] + E[i] + K
B[i+1] := A[i]
C[i+1] := B[i] <<< 30
D[i+1] := C[i]
E[i+1] := D[i]
ENDFOR
dst[127:96] := A[4]
dst[95:64] := B[4]
dst[63:32] := C[4]
dst[31:0] := D[4]
SHA
Cryptography
Perform an intermediate calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst".
W4 := b[31:0]
W3 := a[127:96]
W2 := a[95:64]
W1 := a[63:32]
W0 := a[31:0]
dst[127:96] := W3 + sigma0(W4)
dst[95:64] := W2 + sigma0(W3)
dst[63:32] := W1 + sigma0(W2)
dst[31:0] := W0 + sigma0(W1)
SHA
Cryptography
Perform the final calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst"."
W14 := b[95:64]
W15 := b[127:96]
W16 := a[31:0] + sigma1(W14)
W17 := a[63:32] + sigma1(W15)
W18 := a[95:64] + sigma1(W16)
W19 := a[127:96] + sigma1(W17)
dst[127:96] := W19
dst[95:64] := W18
dst[63:32] := W17
dst[31:0] := W16
SHA
Cryptography
Perform 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) from "a", an initial SHA256 state (A,B,E,F) from "b", and a pre-computed sum of the next 2 round message values (unsigned 32-bit integers) and the corresponding round constants from "k", and store the updated SHA256 state (A,B,E,F) in "dst".
A[0] := b[127:96]
B[0] := b[95:64]
C[0] := a[127:96]
D[0] := a[95:64]
E[0] := b[63:32]
F[0] := b[31:0]
G[0] := a[63:32]
H[0] := a[31:0]
W_K[0] := k[31:0]
W_K[1] := k[63:32]
FOR i := 0 to 1
A[i+1] := Ch(E[i], F[i], G[i]) + sum1(E[i]) + W_K[i] + H[i] + Maj(A[i], B[i], C[i]) + sum0(A[i])
B[i+1] := A[i]
C[i+1] := B[i]
D[i+1] := C[i]
E[i+1] := Ch(E[i], F[i], G[i]) + sum1(E[i]) + W_K[i] + H[i] + D[i]
F[i+1] := E[i]
G[i+1] := F[i]
H[i+1] := G[i]
ENDFOR
dst[127:96] := A[2]
dst[95:64] := B[2]
dst[63:32] := E[2]
dst[31:0] := F[2]
SHA
Cryptography
This intrinisc is one of the two SHA512 message scheduling instructions. The intrinsic performs an intermediate calculation for the next four SHA512 message qwords. The calculated results are stored in "dst".
DEFINE ROR64(qword, n) {
count := n % 64
dest := (qword >> count) | (qword << (64 - count))
RETURN dest
}
DEFINE SHR64(qword, n) {
RETURN qword >> n
}
DEFINE s0(qword) {
RETURN ROR64(qword,1) ^ ROR64(qword, 8) ^ SHR64(qword, 7)
}
W.qword[4] := __B.qword[0]
W.qword[3] := __A.qword[3]
W.qword[2] := __A.qword[2]
W.qword[1] := __A.qword[1]
W.qword[0] := __A.qword[0]
dst.qword[3] := W.qword[3] + s0(W.qword[4])
dst.qword[2] := W.qword[2] + s0(W.qword[3])
dst.qword[1] := W.qword[1] + s0(W.qword[2])
dst.qword[0] := W.qword[0] + s0(W.qword[1])
SHA512
AVX
Cryptography
This intrinisc is one of the two SHA512 message scheduling instructions. The intrinsic performs the final calculation for the next four SHA512 message qwords. The calculated results are stored in "dst".
DEFINE ROR64(qword, n) {
count := n % 64
dest := (qword >> count) | (qword << (64 - count))
RETURN dest
}
DEFINE SHR64(qword, n) {
RETURN qword >> n
}
DEFINE s1(qword) {
RETURN ROR64(qword,19) ^ ROR64(qword, 61) ^ SHR64(qword, 6)
}
W.qword[14] := __B.qword[2]
W.qword[15] := __B.qword[3]
W.qword[16] := __A.qword[0] + s1(W.qword[14])
W.qword[17] := __A.qword[1] + s1(W.qword[15])
W.qword[18] := __A.qword[2] + s1(W.qword[16])
W.qword[19] := __A.qword[3] + s1(W.qword[17])
dst.qword[3] := W.qword[19]
dst.qword[2] := W.qword[18]
dst.qword[1] := W.qword[17]
dst.qword[0] := W.qword[16]
SHA512
AVX
Cryptography
This intrinisc performs two rounds of SHA512 operation using initial SHA512 state (C,D,G,H) from "__A", an initial SHA512 state (A,B,E,F) from "__B", and a pre-computed sum of the next two round message qwords and the corresponding round constants from "__C" (only the two lower qwords of the third operand). The updated SHA512 state (A,B,E,F) is written to "dst", and "dst" can be used as the updated state (C,D,G,H) in later rounds.
DEFINE ROR64(qword, n) {
count := n % 64
dest := (qword >> count) | (qword << (64 - count))
RETURN dest
}
DEFINE SHR64(qword, n) {
RETURN qword >> n
}
DEFINE cap_sigma0(qword) {
RETURN ROR64(qword, 28) ^ ROR64(qword, 34) ^ ROR64(qword, 39)
}
DEFINE cap_sigma1(qword) {
RETURN ROR64(qword, 14) ^ ROR64(qword, 18) ^ ROR64(qword, 41)
}
DEFINE MAJ(a,b,c) {
RETURN (a & b) ^ (a & c) ^ (b & c)
}
DEFINE CH(a,b,c) {
RETURN (a & b) ^ (c & ~a)
}
A.qword[0] := __B.qword[3]
B.qword[0] := __B.qword[2]
C.qword[0] := __A.qword[3]
D.qword[0] := __A.qword[2]
E.qword[0] := __B.qword[1]
F.qword[0] := __B.qword[0]
G.qword[0] := __A.qword[1]
H.qword[0] := __A.qword[0]
WK.qword[0]:= __C.qword[0]
WK.qword[1]:= __C.qword[1]
FOR i := 0 to 1
A.qword[i+1] := CH(E.qword[i], F.qword[i], G.qword[i]) + cap_sigma1(E.qword[i]) + WK.qword[i] + H.qword[i] + MAJ(A.qword[i], B.qword[i], C.qword[i]) + cap_sigma0(A.qword[i])
B.qword[i+1] := A.qword[i]
C.qword[i+1] := B.qword[i]
D.qword[i+1] := C.qword[i]
E.qword[i+1] := CH(E.qword[i], F.qword[i], G.qword[i]) + cap_sigma1(E.qword[i]) + WK.qword[i] + H.qword[i] + D.qword[i]
F.qword[i+1] := E.qword[i]
G.qword[i+1] := F.qword[i]
H.qword[i+1] := G.qword[i]
ENDFOR
dst.qword[3] := A.qword[2]
dst.qword[2] := B.qword[2]
dst.qword[1] := E.qword[2]
dst.qword[0] := F.qword[2]
SHA512
AVX
Cryptography
The VSM3MSG1 intrinsic is one of the two SM3 message scheduling intrinsics. The intrinsic performs an initial calculation for the next four SM3 message words. The calculated results are stored in "dst".
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32 - count))
RETURN dest
}
DEFINE P1(x) {
RETURN x ^ ROL32(x, 15) ^ ROL32(x, 23)
}
W.dword[0] := __C.dword[0]
W.dword[1] := __C.dword[1]
W.dword[2] := __C.dword[2]
W.dword[3] := __C.dword[3]
W.dword[7] := __A.dword[0]
W.dword[8] := __A.dword[1]
W.dword[9] := __A.dword[2]
W.dword[10] := __A.dword[3]
W.dword[13] := __B.dword[0]
W.dword[14] := __B.dword[1]
W.dword[15] := __B.dword[2]
TMP0 := W.dword[7] ^ W.dword[0] ^ ROL32(W.dword[13], 15)
TMP1 := W.dword[8] ^ W.dword[1] ^ ROL32(W.dword[14], 15)
TMP2 := W.dword[9] ^ W.dword[2] ^ ROL32(W.dword[15], 15)
TMP3 := W.dword[10] ^ W.dword[3]
dst.dword[0] := P1(TMP0)
dst.dword[1] := P1(TMP1)
dst.dword[2] := P1(TMP2)
dst.dword[3] := P1(TMP3)
SM3
AVX
Cryptography
The VSM3MSG2 intrinsic is one of the two SM3 message scheduling intrinsics. The intrinsic performs the final calculation for the next four SM3 message words. The calculated results are stored in "dst".
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32-count))
RETURN dest
}
WTMP.dword[0] := __A.dword[0]
WTMP.dword[1] := __A.dword[1]
WTMP.dword[2] := __A.dword[2]
WTMP.dword[3] := __A.dword[3]
W.dword[3] := __B.dword[0]
W.dword[4] := __B.dword[1]
W.dword[5] := __B.dword[2]
W.dword[6] := __B.dword[3]
W.dword[10] := __C.dword[0]
W.dword[11] := __C.dword[1]
W.dword[12] := __C.dword[2]
W.dword[13] := __C.dword[3]
W.dword[16] := ROL32(W.dword[3], 7) ^ W.dword[10] ^ WTMP.dword[0]
W.dword[17] := ROL32(W.dword[4], 7) ^ W.dword[11] ^ WTMP.dword[1]
W.dword[18] := ROL32(W.dword[5], 7) ^ W.dword[12] ^ WTMP.dword[2]
W.dword[19] := ROL32(W.dword[6], 7) ^ W.dword[13] ^ WTMP.dword[3]
W.dword[19] := W.dword[19] ^ ROL32(W.dword[16], 6) ^ ROL32(W.dword[16], 15) ^ ROL32(W.dword[16], 30)
dst.dword[0] := W.dword[16]
dst.dword[1] := W.dword[17]
dst.dword[2] := W.dword[18]
dst.dword[3] := W.dword[19]
SM3
AVX
Cryptography
The intrinsic performs two rounds of SM3 operation using initial SM3 state (C, D, G, H) from "__A", an initial SM3 states (A, B, E, F) from "__B" and a pre-computed words from the "__C". "__A" with initial SM3 state of (C, D, G, H) assumes input of non-rotated left variables from previous state. The updated SM3 state (A, B, E, F) is written to "__A". The "imm8" should contain the even round number for the first of the two rounds computed by this instruction. The computation masks the "imm8" value by ANDing it with 0x3E so that only even round numbers from 0 through 62 are used for this operation. The calculated results are stored in "dst".
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32-count))
RETURN dest
}
DEFINE P0(x) {
RETURN x ^ ROL32(x, 9) ^ ROL32(x, 17)
}
DEFINE FF(x, y, z, round) {
IF round < 16
RETURN (x ^ y ^ z)
ELSE
RETURN (x & y) | (x & z) | (y & z)
FI
}
DEFINE GG(x, y, z, round){
IF round < 16
RETURN (x ^ y ^ z)
ELSE
RETURN (x & y) | (~x & z)
FI
}
A.dword[0] := __B.dword[3]
B.dword[0] := __B.dword[2]
C.dword[0] := __A.dword[3]
D.dword[0] := __A.dword[2]
E.dword[0] := __B.dword[1]
F.dword[0] := __B.dword[0]
G.dword[0] := __A.dword[1]
H.dword[0] := __A.dword[0]
W.dword[0] := __C.dword[0]
W.dword[1] := __C.dword[1]
W.dword[4] := __C.dword[2]
W.dword[5] := __C.dword[3]
C.dword[0] := ROL32(C.dword[0], 9)
D.dword[0] := ROL32(D.dword[0], 9)
G.dword[0] := ROL32(G.dword[0], 19)
H.dword[0] := ROL32(H.dword[0], 19)
ROUND := imm8 & 0x3E
IF ROUND < 16
CONST.dword[0] := 0x79CC4519
ELSE
CONST.dword[0] := 0x7A879D8A
FI
CONST.dword[0] := ROL32(CONST.dword[0], ROUND)
FOR i:= 0 to 1
temp.dword[0] := ROL32(A.dword[i], 12) + E.dword[i] + CONST.dword[0]
S1.dword[0] := ROL32(temp.dword[0], 7)
S2.dword[0] := S1.dword[0] ^ ROL32(A.dword[i], 12)
T1.dword[0] := FF(A.dword[i], B.dword[i], C.dword[i], ROUND) + D.dword[i] + S2.dword[0] + (W.dword[i] ^ W.dword[i+4])
T2.dword[0] := GG(E.dword[i], F.dword[i], G.dword[i], ROUND) + H.dword[i] + S1.dword[0] + W.dword[i]
D.dword[i+1] := C.dword[i]
C.dword[i+1] := ROL32(B.dword[i], 9)
B.dword[i+1] := A.dword[i]
A.dword[i+1] := T1.dword[0]
H.dword[i+1] := G.dword[i]
G.dword[i+1] := ROL32(F.dword[i], 19)
F.dword[i+1] := E.dword[i]
E.dword[i+1] := P0(T2.dword[0])
CONST.dword[0] := ROL32(CONST.dword[0], 1)
ENDFOR
dst.dword[3] := A.dword[2]
dst.dword[2] := B.dword[2]
dst.dword[1] := E.dword[2]
dst.dword[0] := F.dword[2]
SM3
AVX
Cryptography
This intrinsic performs four rounds of SM4 key expansion. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in "dst".
BYTE sbox[256] = {
0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
}
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32-count))
RETURN dest
}
DEFINE SBOX_BYTE(dword, i) {
RETURN sbox[dword.byte[i]]
}
DEFINE lower_t(dword) {
tmp.byte[0] := SBOX_BYTE(dword, 0)
tmp.byte[1] := SBOX_BYTE(dword, 1)
tmp.byte[2] := SBOX_BYTE(dword, 2)
tmp.byte[3] := SBOX_BYTE(dword, 3)
RETURN tmp
}
DEFINE L_KEY(dword) {
RETURN dword ^ ROL32(dword, 13) ^ ROL32(dword, 23)
}
DEFINE T_KEY(dword) {
RETURN L_KEY(lower_t(dword))
}
DEFINE F_KEY(X0, X1, X2, X3, round_key) {
RETURN X0 ^ T_KEY(X1 ^ X2 ^ X3 ^ round_key)
}
FOR i:= 0 to 1
P.dword[0] := __A.dword[4*i]
P.dword[1] := __A.dword[4*i+1]
P.dword[2] := __A.dword[4*i+2]
P.dword[3] := __A.dword[4*i+3]
C.dword[0] := F_KEY(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[4*i])
C.dword[1] := F_KEY(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[4*i+1])
C.dword[2] := F_KEY(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[4*i+2])
C.dword[3] := F_KEY(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[4*i+3])
dst.dword[4*i] := C.dword[0]
dst.dword[4*i+1] := C.dword[1]
dst.dword[4*i+2] := C.dword[2]
dst.dword[4*i+3] := C.dword[3]
ENDFOR
dst[MAX:256] := 0
SM4
AVX
Cryptography
This intrinisc performs four rounds of SM4 encryption. The intrinisc operates on independent 128-bit lanes. The calculated results are stored in "dst".
BYTE sbox[256] = {
0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
}
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32-count))
RETURN dest
}
DEFINE SBOX_BYTE(dword, i) {
RETURN sbox[dword.byte[i]]
}
DEFINE lower_t(dword) {
tmp.byte[0] := SBOX_BYTE(dword, 0)
tmp.byte[1] := SBOX_BYTE(dword, 1)
tmp.byte[2] := SBOX_BYTE(dword, 2)
tmp.byte[3] := SBOX_BYTE(dword, 3)
RETURN tmp
}
DEFINE L_RND(dword) {
tmp := dword
tmp := tmp ^ ROL32(dword, 2)
tmp := tmp ^ ROL32(dword, 10)
tmp := tmp ^ ROL32(dword, 18)
tmp := tmp ^ ROL32(dword, 24)
RETURN tmp
}
DEFINE T_RND(dword) {
RETURN L_RND(lower_t(dword))
}
DEFINE F_RND(X0, X1, X2, X3, round_key) {
RETURN X0 ^ T_RND(X1 ^ X2 ^ X3 ^ round_key)
}
FOR i:= 0 to 1
P.dword[0] := __A.dword[4*i]
P.dword[1] := __A.dword[4*i+1]
P.dword[2] := __A.dword[4*i+2]
P.dword[3] := __A.dword[4*i+3]
C.dword[0] := F_RND(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[4*i])
C.dword[1] := F_RND(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[4*i+1])
C.dword[2] := F_RND(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[4*i+2])
C.dword[3] := F_RND(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[4*i+3])
dst.dword[4*i] := C.dword[0]
dst.dword[4*i+1] := C.dword[1]
dst.dword[4*i+2] := C.dword[2]
dst.dword[4*i+3] := C.dword[3]
ENDFOR
dst[MAX:256] := 0
SM4
AVX
Cryptography
This intrinsic performs four rounds of SM4 key expansion. The intrinsic operates on independent 128-bit lanes. The calculated results are stored in "dst".
BYTE sbox[256] = {
0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
}
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32-count))
RETURN dest
}
DEFINE SBOX_BYTE(dword, i) {
RETURN sbox[dword.byte[i]]
}
DEFINE lower_t(dword) {
tmp.byte[0] := SBOX_BYTE(dword, 0)
tmp.byte[1] := SBOX_BYTE(dword, 1)
tmp.byte[2] := SBOX_BYTE(dword, 2)
tmp.byte[3] := SBOX_BYTE(dword, 3)
RETURN tmp
}
DEFINE L_KEY(dword) {
RETURN dword ^ ROL32(dword, 13) ^ ROL32(dword, 23)
}
DEFINE T_KEY(dword) {
RETURN L_KEY(lower_t(dword))
}
DEFINE F_KEY(X0, X1, X2, X3, round_key) {
RETURN X0 ^ T_KEY(X1 ^ X2 ^ X3 ^ round_key)
}
P.dword[0] := __A.dword[0]
P.dword[1] := __A.dword[1]
P.dword[2] := __A.dword[2]
P.dword[3] := __A.dword[3]
C.dword[0] := F_KEY(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[0])
C.dword[1] := F_KEY(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[1])
C.dword[2] := F_KEY(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[2])
C.dword[3] := F_KEY(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[3])
dst.dword[0] := C.dword[0]
dst.dword[1] := C.dword[1]
dst.dword[2] := C.dword[2]
dst.dword[3] := C.dword[3]
dst[MAX:128] := 0
SM4
AVX
Cryptography
This intrinisc performs four rounds of SM4 encryption. The intrinisc operates on independent 128-bit lanes. The calculated results are stored in "dst".
BYTE sbox[256] = {
0xD6, 0x90, 0xE9, 0xFE, 0xCC, 0xE1, 0x3D, 0xB7, 0x16, 0xB6, 0x14, 0xC2, 0x28, 0xFB, 0x2C, 0x05,
0x2B, 0x67, 0x9A, 0x76, 0x2A, 0xBE, 0x04, 0xC3, 0xAA, 0x44, 0x13, 0x26, 0x49, 0x86, 0x06, 0x99,
0x9C, 0x42, 0x50, 0xF4, 0x91, 0xEF, 0x98, 0x7A, 0x33, 0x54, 0x0B, 0x43, 0xED, 0xCF, 0xAC, 0x62,
0xE4, 0xB3, 0x1C, 0xA9, 0xC9, 0x08, 0xE8, 0x95, 0x80, 0xDF, 0x94, 0xFA, 0x75, 0x8F, 0x3F, 0xA6,
0x47, 0x07, 0xA7, 0xFC, 0xF3, 0x73, 0x17, 0xBA, 0x83, 0x59, 0x3C, 0x19, 0xE6, 0x85, 0x4F, 0xA8,
0x68, 0x6B, 0x81, 0xB2, 0x71, 0x64, 0xDA, 0x8B, 0xF8, 0xEB, 0x0F, 0x4B, 0x70, 0x56, 0x9D, 0x35,
0x1E, 0x24, 0x0E, 0x5E, 0x63, 0x58, 0xD1, 0xA2, 0x25, 0x22, 0x7C, 0x3B, 0x01, 0x21, 0x78, 0x87,
0xD4, 0x00, 0x46, 0x57, 0x9F, 0xD3, 0x27, 0x52, 0x4C, 0x36, 0x02, 0xE7, 0xA0, 0xC4, 0xC8, 0x9E,
0xEA, 0xBF, 0x8A, 0xD2, 0x40, 0xC7, 0x38, 0xB5, 0xA3, 0xF7, 0xF2, 0xCE, 0xF9, 0x61, 0x15, 0xA1,
0xE0, 0xAE, 0x5D, 0xA4, 0x9B, 0x34, 0x1A, 0x55, 0xAD, 0x93, 0x32, 0x30, 0xF5, 0x8C, 0xB1, 0xE3,
0x1D, 0xF6, 0xE2, 0x2E, 0x82, 0x66, 0xCA, 0x60, 0xC0, 0x29, 0x23, 0xAB, 0x0D, 0x53, 0x4E, 0x6F,
0xD5, 0xDB, 0x37, 0x45, 0xDE, 0xFD, 0x8E, 0x2F, 0x03, 0xFF, 0x6A, 0x72, 0x6D, 0x6C, 0x5B, 0x51,
0x8D, 0x1B, 0xAF, 0x92, 0xBB, 0xDD, 0xBC, 0x7F, 0x11, 0xD9, 0x5C, 0x41, 0x1F, 0x10, 0x5A, 0xD8,
0x0A, 0xC1, 0x31, 0x88, 0xA5, 0xCD, 0x7B, 0xBD, 0x2D, 0x74, 0xD0, 0x12, 0xB8, 0xE5, 0xB4, 0xB0,
0x89, 0x69, 0x97, 0x4A, 0x0C, 0x96, 0x77, 0x7E, 0x65, 0xB9, 0xF1, 0x09, 0xC5, 0x6E, 0xC6, 0x84,
0x18, 0xF0, 0x7D, 0xEC, 0x3A, 0xDC, 0x4D, 0x20, 0x79, 0xEE, 0x5F, 0x3E, 0xD7, 0xCB, 0x39, 0x48
}
DEFINE ROL32(dword, n) {
count := n % 32
dest := (dword << count) | (dword >> (32-count))
RETURN dest
}
DEFINE SBOX_BYTE(dword, i) {
RETURN sbox[dword.byte[i]]
}
DEFINE lower_t(dword) {
tmp.byte[0] := SBOX_BYTE(dword, 0)
tmp.byte[1] := SBOX_BYTE(dword, 1)
tmp.byte[2] := SBOX_BYTE(dword, 2)
tmp.byte[3] := SBOX_BYTE(dword, 3)
RETURN tmp
}
DEFINE L_RND(dword) {
tmp := dword
tmp := tmp ^ ROL32(dword, 2)
tmp := tmp ^ ROL32(dword, 10)
tmp := tmp ^ ROL32(dword, 18)
tmp := tmp ^ ROL32(dword, 24)
RETURN tmp
}
DEFINE T_RND(dword) {
RETURN L_RND(lower_t(dword))
}
DEFINE F_RND(X0, X1, X2, X3, round_key) {
RETURN X0 ^ T_RND(X1 ^ X2 ^ X3 ^ round_key)
}
P.dword[0] := __A.dword[0]
P.dword[1] := __A.dword[1]
P.dword[2] := __A.dword[2]
P.dword[3] := __A.dword[3]
C.dword[0] := F_RND(P.dword[0], P.dword[1], P.dword[2], P.dword[3], __B.dword[0])
C.dword[1] := F_RND(P.dword[1], P.dword[2], P.dword[3], C.dword[0], __B.dword[1])
C.dword[2] := F_RND(P.dword[2], P.dword[3], C.dword[0], C.dword[1], __B.dword[2])
C.dword[3] := F_RND(P.dword[3], C.dword[0], C.dword[1], C.dword[2], __B.dword[3])
dst.dword[0] := C.dword[0]
dst.dword[1] := C.dword[1]
dst.dword[2] := C.dword[2]
dst.dword[3] := C.dword[3]
dst[MAX:128] := 0
SM4
AVX
Cryptography
Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ACOS(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ACOS(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ACOSH(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ACOSH(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ASIN(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ASIN(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ASINH(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ASINH(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ATAN(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ATAN(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ATANH(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ATANH(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := COS(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := COS(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := COSD(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := COSD(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := COSH(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := COSH(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0))
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0))
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SIN(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SIN(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SIN(a[i+63:i])
MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SIN(a[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SIND(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SIND(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SINH(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SINH(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := TAN(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := TAN(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := TAND(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := TAND(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := TANH(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := TANH(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Trigonometry
Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CubeRoot(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := CubeRoot(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".
DEFINE CEXP(a[31:0], b[31:0]) {
result[31:0] := POW(FP32(e), a[31:0]) * COS(b[31:0])
result[63:32] := POW(FP32(e), a[31:0]) * SIN(b[31:0])
RETURN result
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CEXP(a[i+31:i], a[i+63:i+32])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the natural logarithm of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".
DEFINE CLOG(a[31:0], b[31:0]) {
result[31:0] := LOG(SQRT(POW(a, 2.0) + POW(b, 2.0)))
result[63:32] := ATAN2(b, a)
RETURN result
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CLOG(a[i+31:i], a[i+63:i+32])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the square root of packed complex snumbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]".
DEFINE CSQRT(a[31:0], b[31:0]) {
sign[31:0] := (b < 0.0) ? -FP32(1.0) : FP32(1.0)
result[31:0] := SQRT((a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
result[63:32] := sign * SQRT((-a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0)
RETURN result
}
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CSQRT(a[i+31:i], a[i+63:i+32])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := POW(e, a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := POW(FP32(e), a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := POW(10.0, a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := POW(FP32(10.0), a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := POW(2.0, a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := POW(FP32(2.0), a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := POW(e, a[i+63:i]) - 1.0
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the inverse cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := InvCubeRoot(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the inverse cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := InvCubeRoot(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := InvSQRT(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := InvSQRT(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := LOG(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := LOG(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0)
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0)
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := LOG(1.0 + a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := LOG(1.0 + a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0)
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0)
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ConvertExpFP64(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ConvertExpFP32(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := POW(a[i+63:i], b[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := POW(a[i+31:i], b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_pd".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SQRT(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SQRT(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Elementary Math Functions
Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CDFNormal(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := CDFNormal(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := InverseCDFNormal(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := InverseCDFNormal(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ERF(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := 1.0 - ERF(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+63:i] := 1.0 - ERF(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i]))
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i]))
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := 1.0 / ERF(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+63:i] := 1.0 / ERF(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Probability/Statistics
Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 15
i := 8*j
IF b[i+7:i] == 0
#DE
FI
dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 16*j
IF b[i+15:i] == 0
#DE
FI
dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 3
i := 32*j
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 1
i := 64*j
IF b[i+63:i] == 0
#DE
FI
dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 15
i := 8*j
IF b[i+7:i] == 0
#DE
FI
dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 7
i := 16*j
IF b[i+15:i] == 0
#DE
FI
dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 3
i := 32*j
IF b[i+31:i] == 0
#DE
FI
dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 1
i := 64*j
IF b[i+63:i] == 0
#DE
FI
dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ERF(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed 32-bit integers into memory at "mem_addr".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 15
i := 8*j
dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 7
i := 16*j
dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst".
FOR j := 0 to 1
i := 64*j
dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 15
i := 8*j
dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 7
i := 16*j
dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 1
i := 64*j
dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed unsigned 32-bit integers into memory at "mem_addr".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i])
MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Arithmetic
Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CEIL(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := CEIL(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := FLOOR(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := FLOOR(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ROUND(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ROUND(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Special Math Functions
Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction.
FOR j := 0 to 1
i := j*64
dst[i+63:i] := TRUNCATE(a[i+63:i])
ENDFOR
dst[MAX:128] := 0
SSE
Miscellaneous
Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := TRUNCATE(a[i+31:i])
ENDFOR
dst[MAX:128] := 0
SSE
Miscellaneous
Macro: Transpose the 4x4 matrix formed by the 4 rows of single-precision (32-bit) floating-point elements in "row0", "row1", "row2", and "row3", and store the transposed matrix in these vectors ("row0" now contains column 0, etc.).
__m128 tmp3, tmp2, tmp1, tmp0;
tmp0 := _mm_unpacklo_ps(row0, row1);
tmp2 := _mm_unpacklo_ps(row2, row3);
tmp1 := _mm_unpackhi_ps(row0, row1);
tmp3 := _mm_unpackhi_ps(row2, row3);
row0 := _mm_movelh_ps(tmp0, tmp2);
row1 := _mm_movehl_ps(tmp2, tmp0);
row2 := _mm_movelh_ps(tmp1, tmp3);
row3 := _mm_movehl_ps(tmp3, tmp1);
SSE
Swizzle
Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".
dst[15:0] := (a[63:0] >> (imm8[1:0] * 16))[15:0]
dst[31:16] := 0
SSE
Swizzle
Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".
dst[15:0] := (a[63:0] >> (imm8[1:0] * 16))[15:0]
dst[31:16] := 0
SSE
Swizzle
Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8".
dst[63:0] := a[63:0]
sel := imm8[1:0]*16
dst[sel+15:sel] := i[15:0]
SSE
Swizzle
Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8".
dst[63:0] := a[63:0]
sel := imm8[1:0]*16
dst[sel+15:sel] := i[15:0]
SSE
Swizzle
Shuffle 16-bit integers in "a" using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[15:0] := src[15:0]
1: tmp[15:0] := src[31:16]
2: tmp[15:0] := src[47:32]
3: tmp[15:0] := src[63:48]
ESAC
RETURN tmp[15:0]
}
dst[15:0] := SELECT4(a[63:0], imm8[1:0])
dst[31:16] := SELECT4(a[63:0], imm8[3:2])
dst[47:32] := SELECT4(a[63:0], imm8[5:4])
dst[63:48] := SELECT4(a[63:0], imm8[7:6])
SSE
Swizzle
Shuffle 16-bit integers in "a" using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[15:0] := src[15:0]
1: tmp[15:0] := src[31:16]
2: tmp[15:0] := src[47:32]
3: tmp[15:0] := src[63:48]
ESAC
RETURN tmp[15:0]
}
dst[15:0] := SELECT4(a[63:0], imm8[1:0])
dst[31:16] := SELECT4(a[63:0], imm8[3:2])
dst[47:32] := SELECT4(a[63:0], imm8[5:4])
dst[63:48] := SELECT4(a[63:0], imm8[7:6])
SSE
Swizzle
Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(b[127:0], imm8[5:4])
dst[127:96] := SELECT4(b[127:0], imm8[7:6])
SSE
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the high half "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
SSE
Swizzle
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
SSE
Swizzle
Get the unsigned 32-bit value of the MXCSR control and status register.
dst[31:0] := MXCSR
SSE
General Support
Set the MXCSR control and status register with the value in unsigned 32-bit integer "a".
MXCSR := a[31:0]
SSE
General Support
Macro: Get the exception state bits from the MXCSR control and status register. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT
dst[31:0] := MXCSR & _MM_EXCEPT_MASK
SSE
General Support
Macro: Set the exception state bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT
MXCSR := a[31:0] AND ~_MM_EXCEPT_MASK
SSE
General Support
Macro: Get the exception mask bits from the MXCSR control and status register. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT
dst[31:0] := MXCSR & _MM_MASK_MASK
SSE
General Support
Macro: Set the exception mask bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT
MXCSR := a[31:0] AND ~_MM_MASK_MASK
SSE
General Support
Macro: Get the rounding mode bits from the MXCSR control and status register. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO
dst[31:0] := MXCSR & _MM_ROUND_MASK
SSE
General Support
Macro: Set the rounding mode bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO
MXCSR := a[31:0] AND ~_MM_ROUND_MASK
SSE
General Support
Macro: Get the flush zero bits from the MXCSR control and status register. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF
dst[31:0] := MXCSR & _MM_FLUSH_MASK
SSE
General Support
Macro: Set the flush zero bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF
MXCSR := a[31:0] AND ~_MM_FLUSH_MASK
SSE
General Support
Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i", which can be one of:<ul>
<li>_MM_HINT_T0 // 3, move data using the T0 hint. The PREFETCHT0 instruction will be generated.</li>
<li>_MM_HINT_T1 // 2, move data using the T1 hint. The PREFETCHT1 instruction will be generated.</li>
<li>_MM_HINT_T2 // 1, move data using the T2 hint. The PREFETCHT2 instruction will be generated.</li>
<li>_MM_HINT_NTA // 0, move data using the non-temporal access (NTA) hint. The PREFETCHNTA instruction will be generated.</li>
SSE
General Support
Perform a serializing operation on all store-to-memory instructions that were issued prior to this instruction. Guarantees that every store instruction that precedes, in program order, is globally visible before any store instruction which follows the fence in program order.
SSE
General Support
Allocate "size" bytes of memory, aligned to the alignment specified in "align", and return a pointer to the allocated memory. "_mm_free" should be used to free memory that is allocated with "_mm_malloc".
SSE
General Support
Free aligned memory that was allocated with "_mm_malloc".
SSE
General Support
Return vector of type __m128 with undefined elements.
SSE
General Support
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
SSE
Special Math Functions
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
SSE
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
SSE
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
SSE
Special Math Functions
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
SSE
Special Math Functions
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
SSE
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
SSE
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
SSE
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [min_float_note]
dst[31:0] := MIN(a[31:0], b[31:0])
dst[127:32] := a[127:32]
SSE
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 3
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
SSE
Special Math Functions
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [max_float_note]
dst[31:0] := MAX(a[31:0], b[31:0])
dst[127:32] := a[127:32]
SSE
Special Math Functions
Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 3
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
SSE
Special Math Functions
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ENDFOR
SSE
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ENDFOR
SSE
Arithmetic
Miscellaneous
Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst".
FOR j := 0 to 7
i := j*8
tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
ENDFOR
dst[15:0] := tmp[7:0] + tmp[15:8] + tmp[23:16] + tmp[31:24] + tmp[39:32] + tmp[47:40] + tmp[55:48] + tmp[63:56]
dst[63:16] := 0
SSE
Arithmetic
Miscellaneous
Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst".
FOR j := 0 to 7
i := j*8
tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
ENDFOR
dst[15:0] := tmp[7:0] + tmp[15:8] + tmp[23:16] + tmp[31:24] + tmp[39:32] + tmp[47:40] + tmp[55:48] + tmp[63:56]
dst[63:16] := 0
SSE
Arithmetic
Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := a[31:0] + b[31:0]
dst[127:32] := a[127:32]
SSE
Arithmetic
Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
SSE
Arithmetic
Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := a[31:0] - b[31:0]
dst[127:32] := a[127:32]
SSE
Arithmetic
Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
SSE
Arithmetic
Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := a[31:0] * b[31:0]
dst[127:32] := a[127:32]
SSE
Arithmetic
Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] * b[i+31:i]
ENDFOR
SSE
Arithmetic
Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := a[31:0] / b[31:0]
dst[127:32] := a[127:32]
SSE
Arithmetic
Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := a[i+31:i] / b[i+31:i]
ENDFOR
SSE
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ENDFOR
SSE
Probability/Statistics
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ENDFOR
SSE
Probability/Statistics
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ENDFOR
SSE
Probability/Statistics
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ENDFOR
SSE
Probability/Statistics
Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
SSE
Convert
Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[127:32] := a[127:32]
SSE
Convert
Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
SSE
Convert
Convert packed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[63:32] := Convert_Int32_To_FP32(b[63:32])
dst[95:64] := a[95:64]
dst[127:96] := a[127:96]
SSE
Convert
Convert packed signed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(b[31:0])
dst[63:32] := Convert_Int32_To_FP32(b[63:32])
dst[95:64] := a[95:64]
dst[127:96] := a[127:96]
SSE
Convert
Convert packed 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*16
m := j*32
dst[m+31:m] := Convert_Int16_To_FP32(a[i+15:i])
ENDFOR
SSE
Convert
Convert packed unsigned 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*16
m := j*32
dst[m+31:m] := Convert_Int16_To_FP32(a[i+15:i])
ENDFOR
SSE
Convert
Convert the lower packed 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*8
m := j*32
dst[m+31:m] := Convert_Int8_To_FP32(a[i+7:i])
ENDFOR
SSE
Convert
Convert the lower packed unsigned 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := j*8
m := j*32
dst[m+31:m] := Convert_Int8_To_FP32(a[i+7:i])
ENDFOR
SSE
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", then covert the packed signed 32-bit integers in "b" to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements of "dst".
dst[31:0] := Convert_Int32_To_FP32(a[31:0])
dst[63:32] := Convert_Int32_To_FP32(a[63:32])
dst[95:64] := Convert_Int32_To_FP32(b[31:0])
dst[127:96] := Convert_Int32_To_FP32(b[63:32])
SSE
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP32_To_Int32(a[31:0])
SSE
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP32_To_Int32(a[31:0])
SSE
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP32_To_Int64(a[31:0])
SSE
Convert
Copy the lower single-precision (32-bit) floating-point element of "a" to "dst".
dst[31:0] := a[31:0]
SSE
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ENDFOR
SSE
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ENDFOR
SSE
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
SSE
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
SSE
Convert
Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
SSE
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ENDFOR
SSE
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ENDFOR
SSE
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst". Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF.
FOR j := 0 to 3
i := 16*j
k := 32*j
IF a[k+31:k] >= FP32(0x7FFF) && a[k+31:k] <= FP32(0x7FFFFFFF)
dst[i+15:i] := 0x7FFF
ELSE
dst[i+15:i] := Convert_FP32_To_Int16(a[k+31:k])
FI
ENDFOR
SSE
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 8-bit integers, and store the results in lower 4 elements of "dst". Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF.
FOR j := 0 to 3
i := 8*j
k := 32*j
IF a[k+31:k] >= FP32(0x7F) && a[k+31:k] <= FP32(0x7FFFFFFF)
dst[i+7:i] := 0x7F
ELSE
dst[i+7:i] := Convert_FP32_To_Int8(a[k+31:k])
FI
ENDFOR
SSE
Convert
Store 64-bits of integer data from "a" into memory using a non-temporal memory hint.
MEM[mem_addr+63:mem_addr] := a[63:0]
SSE
Store
Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint.
FOR j := 0 to 7
i := j*8
IF mask[i+7]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
SSE
Store
Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element).
FOR j := 0 to 7
i := j*8
IF mask[i+7]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
SSE
Store
Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE
Store
Store the upper 2 single-precision (32-bit) floating-point elements from "a" into memory.
MEM[mem_addr+31:mem_addr] := a[95:64]
MEM[mem_addr+63:mem_addr+32] := a[127:96]
SSE
Store
Store the lower 2 single-precision (32-bit) floating-point elements from "a" into memory.
MEM[mem_addr+31:mem_addr] := a[31:0]
MEM[mem_addr+63:mem_addr+32] := a[63:32]
SSE
Store
Store the lower single-precision (32-bit) floating-point element from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+31:mem_addr] := a[31:0]
SSE
Store
Store the lower single-precision (32-bit) floating-point element from "a" into 4 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+31:mem_addr] := a[31:0]
MEM[mem_addr+63:mem_addr+32] := a[31:0]
MEM[mem_addr+95:mem_addr+64] := a[31:0]
MEM[mem_addr+127:mem_addr+96] := a[31:0]
SSE
Store
Store the lower single-precision (32-bit) floating-point element from "a" into 4 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+31:mem_addr] := a[31:0]
MEM[mem_addr+63:mem_addr+32] := a[31:0]
MEM[mem_addr+95:mem_addr+64] := a[31:0]
MEM[mem_addr+127:mem_addr+96] := a[31:0]
SSE
Store
Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE
Store
Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE
Store
Store 4 single-precision (32-bit) floating-point elements from "a" into memory in reverse order.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+31:mem_addr] := a[127:96]
MEM[mem_addr+63:mem_addr+32] := a[95:64]
MEM[mem_addr+95:mem_addr+64] := a[63:32]
MEM[mem_addr+127:mem_addr+96] := a[31:0]
SSE
Store
Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
FOR j := 0 to 7
i := j*8
dst[j] := a[i+7]
ENDFOR
dst[MAX:8] := 0
SSE
Miscellaneous
Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
FOR j := 0 to 7
i := j*8
dst[j] := a[i+7]
ENDFOR
dst[MAX:8] := 0
SSE
Miscellaneous
Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a".
FOR j := 0 to 3
i := j*32
IF a[i+31]
dst[j] := 1
ELSE
dst[j] := 0
FI
ENDFOR
dst[MAX:4] := 0
SSE
Miscellaneous
Compute the square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := SQRT(a[31:0])
dst[127:32] := a[127:32]
SSE
Elementary Math Functions
Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SQRT(a[i+31:i])
ENDFOR
SSE
Elementary Math Functions
Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
dst[31:0] := (1.0 / a[31:0])
dst[127:32] := a[127:32]
SSE
Elementary Math Functions
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (1.0 / a[i+31:i])
ENDFOR
SSE
Elementary Math Functions
Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
dst[31:0] := (1.0 / SQRT(a[31:0]))
dst[127:32] := a[127:32]
SSE
Elementary Math Functions
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (1.0 / SQRT(a[i+31:i]))
ENDFOR
SSE
Elementary Math Functions
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (a[i+31:i] AND b[i+31:i])
ENDFOR
SSE
Logical
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i])
ENDFOR
SSE
Logical
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] OR b[i+31:i]
ENDFOR
SSE
Logical
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] XOR b[i+31:i]
ENDFOR
SSE
Logical
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] == b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] < b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] <= b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] <= b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] > b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] >= b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] >= b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] != b[31:0] ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] != b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := (!( a[31:0] < b[31:0] )) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := !( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := (!( a[31:0] <= b[31:0] )) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (!( a[i+31:i] <= b[i+31:i] )) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := (!( a[31:0] > b[31:0] )) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (!( a[i+31:i] > b[i+31:i] )) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := (!( a[31:0] >= b[31:0] )) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := (!( a[i+31:i] >= b[i+31:i] )) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] != NaN AND b[31:0] != NaN ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] != NaN AND b[i+31:i] != NaN ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := ( a[31:0] == NaN OR b[31:0] == NaN ) ? 0xFFFFFFFF : 0
dst[127:32] := a[127:32]
SSE
Compare
Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] == NaN OR b[i+31:i] == NaN ) ? 0xFFFFFFFF : 0
ENDFOR
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1).
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1).
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] < b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] <= b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1).
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] > b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] >= b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1).
RETURN ( a[31:0] == NaN OR b[31:0] == NaN OR a[31:0] != b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] < b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] <= b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] > b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] >= b[31:0] ) ? 1 : 0
SSE
Compare
Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[31:0] == NaN OR b[31:0] == NaN OR a[31:0] != b[31:0] ) ? 1 : 0
SSE
Compare
Copy single-precision (32-bit) floating-point element "a" to the lower element of "dst", and zero the upper 3 elements.
dst[31:0] := a[31:0]
dst[127:32] := 0
SSE
Set
Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
SSE
Set
Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
SSE
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
dst[95:64] := e2
dst[127:96] := e3
SSE
Set
Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst[31:0] := e3
dst[63:32] := e2
dst[95:64] := e1
dst[127:96] := e0
SSE
Set
Return vector of type __m128 with all elements set to zero.
dst[MAX:0] := 0
SSE
Set
Load 2 single-precision (32-bit) floating-point elements from memory into the upper 2 elements of "dst", and copy the lower 2 elements from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.
dst[31:0] := a[31:0]
dst[63:32] := a[63:32]
dst[95:64] := MEM[mem_addr+31:mem_addr]
dst[127:96] := MEM[mem_addr+63:mem_addr+32]
SSE
Load
Load 2 single-precision (32-bit) floating-point elements from memory into the lower 2 elements of "dst", and copy the upper 2 elements from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.
dst[31:0] := MEM[mem_addr+31:mem_addr]
dst[63:32] := MEM[mem_addr+63:mem_addr+32]
dst[95:64] := a[95:64]
dst[127:96] := a[127:96]
SSE
Load
Load a single-precision (32-bit) floating-point element from memory into the lower of "dst", and zero the upper 3 elements. "mem_addr" does not need to be aligned on any particular boundary.
dst[31:0] := MEM[mem_addr+31:mem_addr]
dst[127:32] := 0
SSE
Load
Load a single-precision (32-bit) floating-point element from memory into all elements of "dst".
dst[31:0] := MEM[mem_addr+31:mem_addr]
dst[63:32] := MEM[mem_addr+31:mem_addr]
dst[95:64] := MEM[mem_addr+31:mem_addr]
dst[127:96] := MEM[mem_addr+31:mem_addr]
SSE
Load
Load a single-precision (32-bit) floating-point element from memory into all elements of "dst".
dst[31:0] := MEM[mem_addr+31:mem_addr]
dst[63:32] := MEM[mem_addr+31:mem_addr]
dst[95:64] := MEM[mem_addr+31:mem_addr]
dst[127:96] := MEM[mem_addr+31:mem_addr]
SSE
Load
Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE
Load
Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE
Load
Load 4 single-precision (32-bit) floating-point elements from memory into "dst" in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[31:0] := MEM[mem_addr+127:mem_addr+96]
dst[63:32] := MEM[mem_addr+95:mem_addr+64]
dst[95:64] := MEM[mem_addr+63:mem_addr+32]
dst[127:96] := MEM[mem_addr+31:mem_addr]
SSE
Load
Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := b[31:0]
dst[127:32] := a[127:32]
SSE
Move
Move the upper 2 single-precision (32-bit) floating-point elements from "b" to the lower 2 elements of "dst", and copy the upper 2 elements from "a" to the upper 2 elements of "dst".
dst[31:0] := b[95:64]
dst[63:32] := b[127:96]
dst[95:64] := a[95:64]
dst[127:96] := a[127:96]
SSE
Move
Move the lower 2 single-precision (32-bit) floating-point elements from "b" to the upper 2 elements of "dst", and copy the lower 2 elements from "a" to the lower 2 elements of "dst".
dst[31:0] := a[31:0]
dst[63:32] := a[63:32]
dst[95:64] := b[31:0]
dst[127:96] := b[63:32]
SSE
Move
Return vector of type __m128d with undefined elements.
SSE2
General Support
Return vector of type __m128i with undefined elements.
SSE2
General Support
Provide a hint to the processor that the code sequence is a spin-wait loop. This can help improve the performance and power consumption of spin-wait loops.
SSE2
General Support
Invalidate and flush the cache line that contains "p" from all levels of the cache hierarchy.
SSE2
General Support
Perform a serializing operation on all load-from-memory instructions that were issued prior to this instruction. Guarantees that every load instruction that precedes, in program order, is globally visible before any load instruction which follows the fence in program order.
SSE2
General Support
Perform a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction. Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction which follows the fence in program order.
SSE2
General Support
Load unaligned 64-bit integer from memory into the first element of "dst".
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[MAX:64] := 0
SSE2
Load
Load unaligned 16-bit integer from memory into the first element of "dst".
dst[15:0] := MEM[mem_addr+15:mem_addr]
dst[MAX:16] := 0
SSE2
Load
Load unaligned 32-bit integer from memory into the first element of "dst".
dst[31:0] := MEM[mem_addr+31:mem_addr]
dst[MAX:32] := 0
SSE2
Load
Load 64-bit integer from memory into the first element of "dst".
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[MAX:64] := 0
SSE2
Load
Load 128-bits of integer data from memory into "dst".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE2
Load
Load 128-bits of integer data from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE2
Load
Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into "dst".
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE2
Load
Load a double-precision (64-bit) floating-point element from memory into both elements of "dst".
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[127:64] := MEM[mem_addr+63:mem_addr]
SSE2
Load
Load a double-precision (64-bit) floating-point element from memory into both elements of "dst".
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[127:64] := MEM[mem_addr+63:mem_addr]
SSE2
Load
Load 2 double-precision (64-bit) floating-point elements from memory into "dst" in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[63:0] := MEM[mem_addr+127:mem_addr+64]
dst[127:64] := MEM[mem_addr+63:mem_addr]
SSE2
Load
Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into "dst".
"mem_addr" does not need to be aligned on any particular boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE2
Load
Load a double-precision (64-bit) floating-point element from memory into the lower of "dst", and zero the upper element. "mem_addr" does not need to be aligned on any particular boundary.
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[127:64] := 0
SSE2
Load
Load a double-precision (64-bit) floating-point element from memory into the upper element of "dst", and copy the lower element from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.
dst[63:0] := a[63:0]
dst[127:64] := MEM[mem_addr+63:mem_addr]
SSE2
Load
Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst", and copy the upper element from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary.
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[127:64] := a[127:64]
SSE2
Load
Store 16-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+15:mem_addr] := a[15:0]
SSE2
Store
Store 64-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+63:mem_addr] := a[63:0]
SSE2
Store
Store 32-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+31:mem_addr] := a[31:0]
SSE2
Store
Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint. "mem_addr" does not need to be aligned on any particular boundary.
FOR j := 0 to 15
i := j*8
IF mask[i+7]
MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i]
FI
ENDFOR
SSE2
Store
Store 128-bits of integer data from "a" into memory.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE2
Store
Store 128-bits of integer data from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE2
Store
Store 64-bit integer from the first element of "a" into memory.
MEM[mem_addr+63:mem_addr] := a[63:0]
SSE2
Store
Store 128-bits of integer data from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE2
Store
Store 32-bit integer "a" into memory using a non-temporal hint to minimize cache pollution. If the cache line containing address "mem_addr" is already in the cache, the cache will be updated.
MEM[mem_addr+31:mem_addr] := a[31:0]
SSE2
Store
Store 64-bit integer "a" into memory using a non-temporal hint to minimize cache pollution. If the cache line containing address "mem_addr" is already in the cache, the cache will be updated.
MEM[mem_addr+63:mem_addr] := a[63:0]
SSE2
Store
Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE2
Store
Store the lower double-precision (64-bit) floating-point element from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+63:mem_addr] := a[63:0]
SSE2
Store
Store the lower double-precision (64-bit) floating-point element from "a" into 2 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+63:mem_addr] := a[63:0]
MEM[mem_addr+127:mem_addr+64] := a[63:0]
SSE2
Store
Store the lower double-precision (64-bit) floating-point element from "a" into 2 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+63:mem_addr] := a[63:0]
MEM[mem_addr+127:mem_addr+64] := a[63:0]
SSE2
Store
Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE2
Store
Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory.
"mem_addr" does not need to be aligned on any particular boundary.
MEM[mem_addr+127:mem_addr] := a[127:0]
SSE2
Store
Store 2 double-precision (64-bit) floating-point elements from "a" into memory in reverse order.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
MEM[mem_addr+63:mem_addr] := a[127:64]
MEM[mem_addr+127:mem_addr+64] := a[63:0]
SSE2
Store
Store the upper double-precision (64-bit) floating-point element from "a" into memory.
MEM[mem_addr+63:mem_addr] := a[127:64]
SSE2
Store
Store the lower double-precision (64-bit) floating-point element from "a" into memory.
MEM[mem_addr+63:mem_addr] := a[63:0]
SSE2
Store
Add packed 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := a[i+7:i] + b[i+7:i]
ENDFOR
SSE2
Arithmetic
Add packed 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := a[i+15:i] + b[i+15:i]
ENDFOR
SSE2
Arithmetic
Add packed 32-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR
SSE2
Arithmetic
Add 64-bit integers "a" and "b", and store the result in "dst".
dst[63:0] := a[63:0] + b[63:0]
SSE2
Arithmetic
Add packed 64-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
SSE2
Arithmetic
Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] )
ENDFOR
SSE2
Arithmetic
Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] )
ENDFOR
SSE2
Arithmetic
Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] )
ENDFOR
SSE2
Arithmetic
Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] )
ENDFOR
SSE2
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i])
ENDFOR
SSE2
Arithmetic
Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 7
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[31:16]
ENDFOR
SSE2
Arithmetic
Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst".
FOR j := 0 to 7
i := j*16
tmp[31:0] := a[i+15:i] * b[i+15:i]
dst[i+15:i] := tmp[31:16]
ENDFOR
SSE2
Arithmetic
Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst".
FOR j := 0 to 7
i := j*16
tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])
dst[i+15:i] := tmp[15:0]
ENDFOR
SSE2
Arithmetic
Multiply the low unsigned 32-bit integers from "a" and "b", and store the unsigned 64-bit result in "dst".
dst[63:0] := a[31:0] * b[31:0]
SSE2
Arithmetic
Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+31:i] * b[i+31:i]
ENDFOR
SSE2
Arithmetic
Miscellaneous
Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst".
FOR j := 0 to 15
i := j*8
tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i])
ENDFOR
FOR j := 0 to 1
i := j*64
dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \
tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56]
dst[i+63:i+16] := 0
ENDFOR
SSE2
Arithmetic
Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := a[i+7:i] - b[i+7:i]
ENDFOR
SSE2
Arithmetic
Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := a[i+15:i] - b[i+15:i]
ENDFOR
SSE2
Arithmetic
Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ENDFOR
SSE2
Arithmetic
Subtract 64-bit integer "b" from 64-bit integer "a", and store the result in "dst".
dst[63:0] := a[63:0] - b[63:0]
SSE2
Arithmetic
Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
SSE2
Arithmetic
Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i])
ENDFOR
SSE2
Arithmetic
Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i])
ENDFOR
SSE2
Arithmetic
Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i])
ENDFOR
SSE2
Arithmetic
Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i])
ENDFOR
SSE2
Arithmetic
Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := a[63:0] + b[63:0]
dst[127:64] := a[127:64]
SSE2
Arithmetic
Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] + b[i+63:i]
ENDFOR
SSE2
Arithmetic
Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := a[63:0] / b[63:0]
dst[127:64] := a[127:64]
SSE2
Arithmetic
Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst".
FOR j := 0 to 1
i := 64*j
dst[i+63:i] := a[i+63:i] / b[i+63:i]
ENDFOR
SSE2
Arithmetic
Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := a[63:0] * b[63:0]
dst[127:64] := a[127:64]
SSE2
Arithmetic
Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] * b[i+63:i]
ENDFOR
SSE2
Arithmetic
Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := a[63:0] - b[63:0]
dst[127:64] := a[127:64]
SSE2
Arithmetic
Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ENDFOR
SSE2
Arithmetic
Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1
ENDFOR
SSE2
Probability/Statistics
Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1
ENDFOR
SSE2
Probability/Statistics
Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
SSE2
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
SSE2
Special Math Functions
Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
SSE2
Special Math Functions
Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
SSE2
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [max_float_note]
dst[63:0] := MAX(a[63:0], b[63:0])
dst[127:64] := a[127:64]
SSE2
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note]
FOR j := 0 to 1
i := j*64
dst[i+63:i] := MAX(a[i+63:i], b[i+63:i])
ENDFOR
SSE2
Special Math Functions
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [min_float_note]
dst[63:0] := MIN(a[63:0], b[63:0])
dst[127:64] := a[127:64]
SSE2
Special Math Functions
Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note]
FOR j := 0 to 1
i := j*64
dst[i+63:i] := MIN(a[i+63:i], b[i+63:i])
ENDFOR
SSE2
Special Math Functions
Shift "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] << (tmp*8)
SSE2
Shift
Shift "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] << (tmp*8)
SSE2
Shift
Shift "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] >> (tmp*8)
SSE2
Shift
Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0])
FI
ENDFOR
SSE2
Shift
Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0])
FI
ENDFOR
SSE2
Shift
Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0])
FI
ENDFOR
SSE2
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF count[63:0] > 15
dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0)
ELSE
dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
SSE2
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF count[63:0] > 31
dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0)
ELSE
dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
SSE2
Shift
Shift "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst".
tmp := imm8[7:0]
IF tmp > 15
tmp := 16
FI
dst[127:0] := a[127:0] >> (tmp*8)
SSE2
Shift
Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0])
FI
ENDFOR
SSE2
Shift
Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF imm8[7:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF count[63:0] > 31
dst[i+31:i] := 0
ELSE
dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0])
FI
ENDFOR
SSE2
Shift
Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF imm8[7:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0])
FI
ENDFOR
SSE2
Shift
Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF count[63:0] > 63
dst[i+63:i] := 0
ELSE
dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0])
FI
ENDFOR
SSE2
Shift
Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[127:0] := (a[127:0] AND b[127:0])
SSE2
Logical
Compute the bitwise NOT of 128 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst".
dst[127:0] := ((NOT a[127:0]) AND b[127:0])
SSE2
Logical
Compute the bitwise OR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[127:0] := (a[127:0] OR b[127:0])
SSE2
Logical
Compute the bitwise XOR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst".
dst[127:0] := (a[127:0] XOR b[127:0])
SSE2
Logical
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] AND b[i+63:i])
ENDFOR
SSE2
Logical
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i])
ENDFOR
SSE2
Logical
Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] OR b[i+63:i]
ENDFOR
SSE2
Logical
Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[i+63:i] XOR b[i+63:i]
ENDFOR
SSE2
Logical
Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0
ENDFOR
SSE2
Compare
Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
SSE2
Compare
Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0
ENDFOR
SSE2
Compare
Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
SSE2
Compare
Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtb instruction with the order of the operands switched.
FOR j := 0 to 15
i := j*8
dst[i+7:i] := ( a[i+7:i] < b[i+7:i] ) ? 0xFF : 0
ENDFOR
SSE2
Compare
Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtw instruction with the order of the operands switched.
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ( a[i+15:i] < b[i+15:i] ) ? 0xFFFF : 0
ENDFOR
SSE2
Compare
Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtd instruction with the order of the operands switched.
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] == b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] < b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] <= b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] > b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] >= b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] != NaN AND b[63:0] != NaN) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] == NaN OR b[63:0] == NaN) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (a[63:0] != b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (!(a[63:0] < b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (!(a[63:0] <= b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (!(a[63:0] > b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := (!(a[63:0] >= b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0
dst[127:64] := a[127:64]
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] == b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] < b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] <= b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] > b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] >= b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (a[i+63:i] != b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (!(a[i+63:i] < b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (!(a[i+63:i] <= b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (!(a[i+63:i] > b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := (!(a[i+63:i] >= b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1).
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] == b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1).
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] < b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1).
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] <= b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1).
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] > b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1).
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] >= b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1).
RETURN ( a[63:0] == NaN OR b[63:0] == NaN OR a[63:0] != b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] == b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] < b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] <= b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] > b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[63:0] != NaN AND b[63:0] != NaN AND a[63:0] >= b[63:0] ) ? 1 : 0
SSE2
Compare
Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.
RETURN ( a[63:0] == NaN OR b[63:0] == NaN OR a[63:0] != b[63:0] ) ? 1 : 0
SSE2
Compare
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*32
m := j*64
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ENDFOR
SSE2
Convert
Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int32_To_FP64(b[31:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
SSE2
Convert
Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
SSE2
Convert
Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_Int64_To_FP64(b[63:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
SSE2
Convert
Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i])
ENDFOR
SSE2
Convert
Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := j*32
m := j*64
dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i])
ENDFOR
SSE2
Convert
Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst".
dst[31:0] := a[31:0]
dst[127:32] := 0
SSE2
Convert
Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element.
dst[63:0] := a[63:0]
dst[127:64] := 0
SSE2
Convert
Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element.
dst[63:0] := a[63:0]
dst[127:64] := 0
SSE2
Convert
Copy the lower 32-bit integer in "a" to "dst".
dst[31:0] := a[31:0]
SSE2
Convert
Copy the lower 64-bit integer in "a" to "dst".
dst[63:0] := a[63:0]
SSE2
Convert
Copy the lower 64-bit integer in "a" to "dst".
dst[63:0] := a[63:0]
SSE2
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k])
ENDFOR
dst[127:64] := 0
SSE2
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 32*j
dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k])
ENDFOR
SSE2
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
ENDFOR
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst".
dst[31:0] := Convert_FP64_To_Int32(a[63:0])
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP64_To_Int64(a[63:0])
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst".
dst[63:0] := Convert_FP64_To_Int64(a[63:0])
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := Convert_FP64_To_FP32(b[63:0])
dst[127:32] := a[127:32]
dst[MAX:128] := 0
SSE2
Convert
Copy the lower double-precision (64-bit) floating-point element of "a" to "dst".
dst[63:0] := a[63:0]
SSE2
Convert
Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := Convert_FP32_To_FP64(b[31:0])
dst[127:64] := a[127:64]
dst[MAX:128] := 0
SSE2
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
ENDFOR
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst".
dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
SSE2
Convert
Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst".
dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
SSE2
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i])
ENDFOR
SSE2
Convert
Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i])
ENDFOR
SSE2
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
ENDFOR
SSE2
Convert
Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst".
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k])
ENDFOR
SSE2
Convert
Set packed 64-bit integers in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
SSE2
Set
Set packed 64-bit integers in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
SSE2
Set
Set packed 32-bit integers in "dst" with the supplied values.
dst[31:0] := e0
dst[63:32] := e1
dst[95:64] := e2
dst[127:96] := e3
SSE2
Set
Set packed 16-bit integers in "dst" with the supplied values.
dst[15:0] := e0
dst[31:16] := e1
dst[47:32] := e2
dst[63:48] := e3
dst[79:64] := e4
dst[95:80] := e5
dst[111:96] := e6
dst[127:112] := e7
SSE2
Set
Set packed 8-bit integers in "dst" with the supplied values.
dst[7:0] := e0
dst[15:8] := e1
dst[23:16] := e2
dst[31:24] := e3
dst[39:32] := e4
dst[47:40] := e5
dst[55:48] := e6
dst[63:56] := e7
dst[71:64] := e8
dst[79:72] := e9
dst[87:80] := e10
dst[95:88] := e11
dst[103:96] := e12
dst[111:104] := e13
dst[119:112] := e14
dst[127:120] := e15
SSE2
Set
Broadcast 64-bit integer "a" to all elements of "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
SSE2
Set
Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
SSE2
Set
Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastd".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := a[31:0]
ENDFOR
SSE2
Set
Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate "vpbroadcastw".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := a[15:0]
ENDFOR
SSE2
Set
Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastb".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := a[7:0]
ENDFOR
SSE2
Set
Set packed 64-bit integers in "dst" with the supplied values in reverse order.
dst[63:0] := e1
dst[127:64] := e0
SSE2
Set
Set packed 32-bit integers in "dst" with the supplied values in reverse order.
dst[31:0] := e3
dst[63:32] := e2
dst[95:64] := e1
dst[127:96] := e0
SSE2
Set
Set packed 16-bit integers in "dst" with the supplied values in reverse order.
dst[15:0] := e7
dst[31:16] := e6
dst[47:32] := e5
dst[63:48] := e4
dst[79:64] := e3
dst[95:80] := e2
dst[111:96] := e1
dst[127:112] := e0
SSE2
Set
Set packed 8-bit integers in "dst" with the supplied values in reverse order.
dst[7:0] := e15
dst[15:8] := e14
dst[23:16] := e13
dst[31:24] := e12
dst[39:32] := e11
dst[47:40] := e10
dst[55:48] := e9
dst[63:56] := e8
dst[71:64] := e7
dst[79:72] := e6
dst[87:80] := e5
dst[95:88] := e4
dst[103:96] := e3
dst[111:104] := e2
dst[119:112] := e1
dst[127:120] := e0
SSE2
Set
Return vector of type __m128i with all elements set to zero.
dst[MAX:0] := 0
SSE2
Set
Copy double-precision (64-bit) floating-point element "a" to the lower element of "dst", and zero the upper element.
dst[63:0] := a[63:0]
dst[127:64] := 0
SSE2
Set
Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
SSE2
Set
Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := a[63:0]
ENDFOR
SSE2
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values.
dst[63:0] := e0
dst[127:64] := e1
SSE2
Set
Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order.
dst[63:0] := e1
dst[127:64] := e0
SSE2
Set
Return vector of type __m128d with all elements set to zero.
dst[MAX:0] := 0
SSE2
Set
Copy the lower 64-bit integer in "a" to "dst".
dst[63:0] := a[63:0]
SSE2
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst".
dst[7:0] := Saturate8(a[15:0])
dst[15:8] := Saturate8(a[31:16])
dst[23:16] := Saturate8(a[47:32])
dst[31:24] := Saturate8(a[63:48])
dst[39:32] := Saturate8(a[79:64])
dst[47:40] := Saturate8(a[95:80])
dst[55:48] := Saturate8(a[111:96])
dst[63:56] := Saturate8(a[127:112])
dst[71:64] := Saturate8(b[15:0])
dst[79:72] := Saturate8(b[31:16])
dst[87:80] := Saturate8(b[47:32])
dst[95:88] := Saturate8(b[63:48])
dst[103:96] := Saturate8(b[79:64])
dst[111:104] := Saturate8(b[95:80])
dst[119:112] := Saturate8(b[111:96])
dst[127:120] := Saturate8(b[127:112])
SSE2
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst".
dst[15:0] := Saturate16(a[31:0])
dst[31:16] := Saturate16(a[63:32])
dst[47:32] := Saturate16(a[95:64])
dst[63:48] := Saturate16(a[127:96])
dst[79:64] := Saturate16(b[31:0])
dst[95:80] := Saturate16(b[63:32])
dst[111:96] := Saturate16(b[95:64])
dst[127:112] := Saturate16(b[127:96])
SSE2
Miscellaneous
Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst".
dst[7:0] := SaturateU8(a[15:0])
dst[15:8] := SaturateU8(a[31:16])
dst[23:16] := SaturateU8(a[47:32])
dst[31:24] := SaturateU8(a[63:48])
dst[39:32] := SaturateU8(a[79:64])
dst[47:40] := SaturateU8(a[95:80])
dst[55:48] := SaturateU8(a[111:96])
dst[63:56] := SaturateU8(a[127:112])
dst[71:64] := SaturateU8(b[15:0])
dst[79:72] := SaturateU8(b[31:16])
dst[87:80] := SaturateU8(b[47:32])
dst[95:88] := SaturateU8(b[63:48])
dst[103:96] := SaturateU8(b[79:64])
dst[111:104] := SaturateU8(b[95:80])
dst[119:112] := SaturateU8(b[111:96])
dst[127:120] := SaturateU8(b[127:112])
SSE2
Miscellaneous
Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst".
FOR j := 0 to 15
i := j*8
dst[j] := a[i+7]
ENDFOR
dst[MAX:16] := 0
SSE2
Miscellaneous
Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a".
FOR j := 0 to 1
i := j*64
IF a[i+63]
dst[j] := 1
ELSE
dst[j] := 0
FI
ENDFOR
dst[MAX:2] := 0
SSE2
Miscellaneous
Copy the 64-bit integer "a" to the lower element of "dst", and zero the upper element.
dst[63:0] := a[63:0]
dst[127:64] := 0
SSE2
Move
Copy the lower 64-bit integer in "a" to the lower element of "dst", and zero the upper element.
dst[63:0] := a[63:0]
dst[127:64] := 0
SSE2
Move
Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := b[63:0]
dst[127:64] := a[127:64]
SSE2
Move
Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".
dst[15:0] := (a[127:0] >> (imm8[2:0] * 16))[15:0]
dst[31:16] := 0
SSE2
Swizzle
Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8".
dst[127:0] := a[127:0]
sel := imm8[2:0]*16
dst[sel+15:sel] := i[15:0]
SSE2
Swizzle
Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst".
DEFINE SELECT4(src, control) {
CASE(control[1:0]) OF
0: tmp[31:0] := src[31:0]
1: tmp[31:0] := src[63:32]
2: tmp[31:0] := src[95:64]
3: tmp[31:0] := src[127:96]
ESAC
RETURN tmp[31:0]
}
dst[31:0] := SELECT4(a[127:0], imm8[1:0])
dst[63:32] := SELECT4(a[127:0], imm8[3:2])
dst[95:64] := SELECT4(a[127:0], imm8[5:4])
dst[127:96] := SELECT4(a[127:0], imm8[7:6])
SSE2
Swizzle
Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst".
dst[63:0] := a[63:0]
dst[79:64] := (a >> (imm8[1:0] * 16))[79:64]
dst[95:80] := (a >> (imm8[3:2] * 16))[79:64]
dst[111:96] := (a >> (imm8[5:4] * 16))[79:64]
dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
SSE2
Swizzle
Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst".
dst[15:0] := (a >> (imm8[1:0] * 16))[15:0]
dst[31:16] := (a >> (imm8[3:2] * 16))[15:0]
dst[47:32] := (a >> (imm8[5:4] * 16))[15:0]
dst[63:48] := (a >> (imm8[7:6] * 16))[15:0]
dst[127:64] := a[127:64]
SSE2
Swizzle
Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[71:64]
dst[15:8] := src2[71:64]
dst[23:16] := src1[79:72]
dst[31:24] := src2[79:72]
dst[39:32] := src1[87:80]
dst[47:40] := src2[87:80]
dst[55:48] := src1[95:88]
dst[63:56] := src2[95:88]
dst[71:64] := src1[103:96]
dst[79:72] := src2[103:96]
dst[87:80] := src1[111:104]
dst[95:88] := src2[111:104]
dst[103:96] := src1[119:112]
dst[111:104] := src2[119:112]
dst[119:112] := src1[127:120]
dst[127:120] := src2[127:120]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[79:64]
dst[31:16] := src2[79:64]
dst[47:32] := src1[95:80]
dst[63:48] := src2[95:80]
dst[79:64] := src1[111:96]
dst[95:80] := src2[111:96]
dst[111:96] := src1[127:112]
dst[127:112] := src2[127:112]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[95:64]
dst[63:32] := src2[95:64]
dst[95:64] := src1[127:96]
dst[127:96] := src2[127:96]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) {
dst[7:0] := src1[7:0]
dst[15:8] := src2[7:0]
dst[23:16] := src1[15:8]
dst[31:24] := src2[15:8]
dst[39:32] := src1[23:16]
dst[47:40] := src2[23:16]
dst[55:48] := src1[31:24]
dst[63:56] := src2[31:24]
dst[71:64] := src1[39:32]
dst[79:72] := src2[39:32]
dst[87:80] := src1[47:40]
dst[95:88] := src2[47:40]
dst[103:96] := src1[55:48]
dst[111:104] := src2[55:48]
dst[119:112] := src1[63:56]
dst[127:120] := src2[63:56]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) {
dst[15:0] := src1[15:0]
dst[31:16] := src2[15:0]
dst[47:32] := src1[31:16]
dst[63:48] := src2[31:16]
dst[79:64] := src1[47:32]
dst[95:80] := src2[47:32]
dst[111:96] := src1[63:48]
dst[127:112] := src2[63:48]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) {
dst[31:0] := src1[31:0]
dst[63:32] := src2[31:0]
dst[95:64] := src1[63:32]
dst[127:96] := src2[63:32]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
SSE2
Swizzle
Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst".
DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[63:0]
dst[127:64] := src2[63:0]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
SSE2
Swizzle
Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst".
dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64]
dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
SSE2
Swizzle
Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := SQRT(b[63:0])
dst[127:64] := a[127:64]
SSE2
Elementary Math Functions
Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SQRT(a[i+63:i])
ENDFOR
SSE2
Elementary Math Functions
Cast vector of type __m128d to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
SSE2
Cast
Cast vector of type __m128d to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
SSE2
Cast
Cast vector of type __m128 to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
SSE2
Cast
Cast vector of type __m128 to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
SSE2
Cast
Cast vector of type __m128i to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
SSE2
Cast
Cast vector of type __m128i to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
SSE2
Cast
Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF ((j & 1) == 0)
dst[i+31:i] := a[i+31:i] - b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i] + b[i+31:i]
FI
ENDFOR
SSE3
Arithmetic
Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF ((j & 1) == 0)
dst[i+63:i] := a[i+63:i] - b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i] + b[i+63:i]
FI
ENDFOR
SSE3
Arithmetic
Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[63:0] := a[127:64] + a[63:0]
dst[127:64] := b[127:64] + b[63:0]
SSE3
Arithmetic
Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[31:0] := a[63:32] + a[31:0]
dst[63:32] := a[127:96] + a[95:64]
dst[95:64] := b[63:32] + b[31:0]
dst[127:96] := b[127:96] + b[95:64]
SSE3
Arithmetic
Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[63:0] := a[63:0] - a[127:64]
dst[127:64] := b[63:0] - b[127:64]
SSE3
Arithmetic
Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst".
dst[31:0] := a[31:0] - a[63:32]
dst[63:32] := a[95:64] - a[127:96]
dst[95:64] := b[31:0] - b[63:32]
dst[127:96] := b[95:64] - b[127:96]
SSE3
Arithmetic
Load 128-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm_loadu_si128" when the data crosses a cache line boundary.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE3
Load
Load a double-precision (64-bit) floating-point element from memory into both elements of "dst".
dst[63:0] := MEM[mem_addr+63:mem_addr]
dst[127:64] := MEM[mem_addr+63:mem_addr]
SSE3
Load
Duplicate the low double-precision (64-bit) floating-point element from "a", and store the results in "dst".
dst[63:0] := a[63:0]
dst[127:64] := a[63:0]
SSE3
Move
Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
dst[31:0] := a[63:32]
dst[63:32] := a[63:32]
dst[95:64] := a[127:96]
dst[127:96] := a[127:96]
SSE3
Move
Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst".
dst[31:0] := a[31:0]
dst[63:32] := a[31:0]
dst[95:64] := a[95:64]
dst[127:96] := a[95:64]
SSE3
Move
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF imm8[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
SSE4.1
Swizzle
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF imm8[j]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
SSE4.1
Swizzle
Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".
FOR j := 0 to 1
i := j*64
IF mask[i+63]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
SSE4.1
Swizzle
Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst".
FOR j := 0 to 3
i := j*32
IF mask[i+31]
dst[i+31:i] := b[i+31:i]
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
SSE4.1
Swizzle
Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst".
FOR j := 0 to 15
i := j*8
IF mask[i+7]
dst[i+7:i] := b[i+7:i]
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
SSE4.1
Swizzle
Blend packed 16-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst".
FOR j := 0 to 7
i := j*16
IF imm8[j]
dst[i+15:i] := b[i+15:i]
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
SSE4.1
Swizzle
Extract a single-precision (32-bit) floating-point element from "a", selected with "imm8", and store the result in "dst".
dst[31:0] := (a[127:0] >> (imm8[1:0] * 32))[31:0]
SSE4.1
Swizzle
Extract an 8-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst".
dst[7:0] := (a[127:0] >> (imm8[3:0] * 8))[7:0]
dst[31:8] := 0
SSE4.1
Swizzle
Extract a 32-bit integer from "a", selected with "imm8", and store the result in "dst".
dst[31:0] := (a[127:0] >> (imm8[1:0] * 32))[31:0]
SSE4.1
Swizzle
Extract a 64-bit integer from "a", selected with "imm8", and store the result in "dst".
dst[63:0] := (a[127:0] >> (imm8[0] * 64))[63:0]
SSE4.1
Swizzle
Copy "a" to "tmp", then insert a single-precision (32-bit) floating-point element from "b" into "tmp" using the control in "imm8". Store "tmp" to "dst" using the mask in "imm8" (elements are zeroed out when the corresponding bit is set).
tmp2[127:0] := a[127:0]
CASE (imm8[7:6]) OF
0: tmp1[31:0] := b[31:0]
1: tmp1[31:0] := b[63:32]
2: tmp1[31:0] := b[95:64]
3: tmp1[31:0] := b[127:96]
ESAC
CASE (imm8[5:4]) OF
0: tmp2[31:0] := tmp1[31:0]
1: tmp2[63:32] := tmp1[31:0]
2: tmp2[95:64] := tmp1[31:0]
3: tmp2[127:96] := tmp1[31:0]
ESAC
FOR j := 0 to 3
i := j*32
IF imm8[j%8]
dst[i+31:i] := 0
ELSE
dst[i+31:i] := tmp2[i+31:i]
FI
ENDFOR
SSE4.1
Swizzle
Copy "a" to "dst", and insert the lower 8-bit integer from "i" into "dst" at the location specified by "imm8".
dst[127:0] := a[127:0]
sel := imm8[3:0]*8
dst[sel+7:sel] := i[7:0]
SSE4.1
Swizzle
Copy "a" to "dst", and insert the 32-bit integer "i" into "dst" at the location specified by "imm8".
dst[127:0] := a[127:0]
sel := imm8[1:0]*32
dst[sel+31:sel] := i[31:0]
SSE4.1
Swizzle
Copy "a" to "dst", and insert the 64-bit integer "i" into "dst" at the location specified by "imm8".
dst[127:0] := a[127:0]
sel := imm8[0]*64
dst[sel+63:sel] := i[63:0]
SSE4.1
Swizzle
Conditionally multiply the packed double-precision (64-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".
DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
FOR j := 0 to 1
i := j*64
IF imm8[(4+j)%8]
temp[i+63:i] := a[i+63:i] * b[i+63:i]
ELSE
temp[i+63:i] := 0.0
FI
ENDFOR
sum[63:0] := temp[127:64] + temp[63:0]
FOR j := 0 to 1
i := j*64
IF imm8[j%8]
tmpdst[i+63:i] := sum[63:0]
ELSE
tmpdst[i+63:i] := 0.0
FI
ENDFOR
RETURN tmpdst[127:0]
}
dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
SSE4.1
Arithmetic
Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8".
DEFINE DP(a[127:0], b[127:0], imm8[7:0]) {
FOR j := 0 to 3
i := j*32
IF imm8[(4+j)%8]
temp[i+31:i] := a[i+31:i] * b[i+31:i]
ELSE
temp[i+31:i] := 0
FI
ENDFOR
sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0])
FOR j := 0 to 3
i := j*32
IF imm8[j%8]
tmpdst[i+31:i] := sum[31:0]
ELSE
tmpdst[i+31:i] := 0
FI
ENDFOR
RETURN tmpdst[127:0]
}
dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
SSE4.1
Arithmetic
Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i])
ENDFOR
SSE4.1
Arithmetic
Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst".
FOR j := 0 to 3
i := j*32
tmp[63:0] := a[i+31:i] * b[i+31:i]
dst[i+31:i] := tmp[31:0]
ENDFOR
SSE4.1
Arithmetic
Miscellaneous
Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst".
Eight SADs are performed using one quadruplet from "b" and eight quadruplets from "a". One quadruplet is selected from "b" starting at on the offset specified in "imm8". Eight quadruplets are formed from sequential 8-bit integers selected from "a" starting at the offset specified in "imm8".
DEFINE MPSADBW(a[127:0], b[127:0], imm8[2:0]) {
a_offset := imm8[2]*32
b_offset := imm8[1:0]*32
FOR j := 0 to 7
i := j*8
k := a_offset+i
l := b_offset
tmp[i*2+15:i*2] := ABS(Signed(a[k+7:k] - b[l+7:l])) + ABS(Signed(a[k+15:k+8] - b[l+15:l+8])) + \
ABS(Signed(a[k+23:k+16] - b[l+23:l+16])) + ABS(Signed(a[k+31:k+24] - b[l+31:l+24]))
ENDFOR
RETURN tmp[127:0]
}
dst[127:0] := MPSADBW(a[127:0], b[127:0], imm8[2:0])
SSE4.1
Arithmetic
Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := MAX(a[i+7:i], b[i+7:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := MAX(a[i+31:i], b[i+31:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := MAX(a[i+15:i], b[i+15:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := MIN(a[i+7:i], b[i+7:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := MIN(a[i+31:i], b[i+31:i])
ENDFOR
SSE4.1
Special Math Functions
Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := MIN(a[i+15:i], b[i+15:i])
ENDFOR
SSE4.1
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst".
[round_note]
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ROUND(a[i+63:i], rounding)
ENDFOR
SSE4.1
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := FLOOR(a[i+63:i])
ENDFOR
SSE4.1
Special Math Functions
Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := CEIL(a[i+63:i])
ENDFOR
SSE4.1
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed single-precision floating-point elements in "dst".
[round_note]
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ROUND(a[i+31:i], rounding)
ENDFOR
SSE4.1
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := FLOOR(a[i+31:i])
ENDFOR
SSE4.1
Special Math Functions
Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := CEIL(a[i+31:i])
ENDFOR
SSE4.1
Special Math Functions
Round the lower double-precision (64-bit) floating-point element in "b" using the "rounding" parameter, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
[round_note]
dst[63:0] := ROUND(b[63:0], rounding)
dst[127:64] := a[127:64]
SSE4.1
Special Math Functions
Round the lower double-precision (64-bit) floating-point element in "b" down to an integer value, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := FLOOR(b[63:0])
dst[127:64] := a[127:64]
SSE4.1
Special Math Functions
Round the lower double-precision (64-bit) floating-point element in "b" up to an integer value, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst".
dst[63:0] := CEIL(b[63:0])
dst[127:64] := a[127:64]
SSE4.1
Special Math Functions
Round the lower single-precision (32-bit) floating-point element in "b" using the "rounding" parameter, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
[round_note]
dst[31:0] := ROUND(b[31:0], rounding)
dst[127:32] := a[127:32]
SSE4.1
Special Math Functions
Round the lower single-precision (32-bit) floating-point element in "b" down to an integer value, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := FLOOR(b[31:0])
dst[127:32] := a[127:32]
SSE4.1
Special Math Functions
Round the lower single-precision (32-bit) floating-point element in "b" up to an integer value, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst".
dst[31:0] := CEIL(b[31:0])
dst[127:32] := a[127:32]
SSE4.1
Special Math Functions
Miscellaneous
Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst".
dst[15:0] := SaturateU16(a[31:0])
dst[31:16] := SaturateU16(a[63:32])
dst[47:32] := SaturateU16(a[95:64])
dst[63:48] := SaturateU16(a[127:96])
dst[79:64] := SaturateU16(b[31:0])
dst[95:80] := SaturateU16(b[63:32])
dst[111:96] := SaturateU16(b[95:64])
dst[127:112] := SaturateU16(b[127:96])
SSE4.1
Convert
Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := j*8
l := j*16
dst[l+15:l] := SignExtend16(a[i+7:i])
ENDFOR
SSE4.1
Convert
Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 8*j
dst[i+31:i] := SignExtend32(a[k+7:k])
ENDFOR
SSE4.1
Convert
Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 8*j
dst[i+63:i] := SignExtend64(a[k+7:k])
ENDFOR
SSE4.1
Convert
Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 16*j
dst[i+31:i] := SignExtend32(a[k+15:k])
ENDFOR
SSE4.1
Convert
Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 16*j
dst[i+63:i] := SignExtend64(a[k+15:k])
ENDFOR
SSE4.1
Convert
Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 32*j
dst[i+63:i] := SignExtend64(a[k+31:k])
ENDFOR
SSE4.1
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst".
FOR j := 0 to 7
i := j*8
l := j*16
dst[l+15:l] := ZeroExtend16(a[i+7:i])
ENDFOR
SSE4.1
Convert
Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 8*j
dst[i+31:i] := ZeroExtend32(a[k+7:k])
ENDFOR
SSE4.1
Convert
Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 8*j
dst[i+63:i] := ZeroExtend64(a[k+7:k])
ENDFOR
SSE4.1
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst".
FOR j := 0 to 3
i := 32*j
k := 16*j
dst[i+31:i] := ZeroExtend32(a[k+15:k])
ENDFOR
SSE4.1
Convert
Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 16*j
dst[i+63:i] := ZeroExtend64(a[k+15:k])
ENDFOR
SSE4.1
Convert
Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst".
FOR j := 0 to 1
i := 64*j
k := 32*j
dst[i+63:i] := ZeroExtend64(a[k+31:k])
ENDFOR
SSE4.1
Convert
Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ( a[i+63:i] == b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE4.1
Compare
Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value.
IF ((a[127:0] AND b[127:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[127:0]) AND b[127:0]) == 0)
CF := 1
ELSE
CF := 0
FI
RETURN ZF
SSE4.1
Logical
Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value.
IF ((a[127:0] AND b[127:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[127:0]) AND b[127:0]) == 0)
CF := 1
ELSE
CF := 0
FI
RETURN CF
SSE4.1
Logical
Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
IF ((a[127:0] AND b[127:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[127:0]) AND b[127:0]) == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
SSE4.1
Logical
Compute the bitwise AND of 128 bits (representing integer data) in "a" and "mask", and return 1 if the result is zero, otherwise return 0.
IF ((a[127:0] AND mask[127:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
dst := ZF
SSE4.1
Logical
Compute the bitwise AND of 128 bits (representing integer data) in "a" and "mask", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "mask", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0.
IF ((a[127:0] AND mask[127:0]) == 0)
ZF := 1
ELSE
ZF := 0
FI
IF (((NOT a[127:0]) AND mask[127:0]) == 0)
CF := 1
ELSE
CF := 0
FI
IF (ZF == 0 && CF == 0)
dst := 1
ELSE
dst := 0
FI
SSE4.1
Logical
Compute the bitwise NOT of "a" and then AND with a 128-bit vector containing all 1's, and return 1 if the result is zero, otherwise return 0.
FOR j := 0 to 127
tmp[j] := 1
ENDFOR
IF (((NOT a[127:0]) AND tmp[127:0]) == 0)
CF := 1
ELSE
CF := 0
FI
dst := CF
SSE4.1
Logical
Horizontally compute the minimum amongst the packed unsigned 16-bit integers in "a", store the minimum and index in "dst", and zero the remaining bits in "dst".
index[2:0] := 0
min[15:0] := a[15:0]
FOR j := 0 to 7
i := j*16
IF a[i+15:i] < min[15:0]
index[2:0] := j
min[15:0] := a[i+15:i]
FI
ENDFOR
dst[15:0] := min[15:0]
dst[18:16] := index[2:0]
dst[127:19] := 0
SSE4.1
Miscellaneous
Load 128-bits of integer data from memory into "dst" using a non-temporal memory hint.
"mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated.
dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE4.1
Load
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and store the generated mask in "dst".
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF a[m+size-1:m] == 0
aInvalid := 1
FI
IF b[n+size-1:n] == 0
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
bInvalid := 0
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF b[n+size-1:n] == 0
bInvalid := 1
FI
IF bInvalid // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
IF imm8[6] // byte / word mask
FOR i := 0 to UpperBound
j := i*size
IF IntRes2[i]
dst[j+size-1:j] := (imm8[0] ? 0xFF : 0xFFFF)
ELSE
dst[j+size-1:j] := 0
FI
ENDFOR
ELSE // bit mask
dst[UpperBound:0] := IntRes2[UpperBound:0]
dst[127:UpperBound+1] := 0
FI
SSE4.2
String Compare
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and store the generated index in "dst".
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF a[m+size-1:m] == 0
aInvalid := 1
FI
IF b[n+size-1:n] == 0
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
bInvalid := 0
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF b[n+size-1:n] == 0
bInvalid := 1
FI
IF bInvalid // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
IF imm8[6] // most significant bit
tmp := UpperBound
dst := tmp
DO WHILE ((tmp >= 0) AND a[tmp] == 0)
tmp := tmp - 1
dst := tmp
OD
ELSE // least significant bit
tmp := 0
dst := tmp
DO WHILE ((tmp <= UpperBound) AND a[tmp] == 0)
tmp := tmp + 1
dst := tmp
OD
FI
SSE4.2
String Compare
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if any character in "b" was null, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
bInvalid := 0
FOR j := 0 to UpperBound
n := j*size
IF b[n+size-1:n] == 0
bInvalid := 1
FI
ENDFOR
dst := bInvalid
SSE4.2
String Compare
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if the resulting mask was non-zero, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF a[m+size-1:m] == 0
aInvalid := 1
FI
IF b[n+size-1:n] == 0
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
bInvalid := 0
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF b[n+size-1:n] == 0
bInvalid := 1
FI
IF bInvalid // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
dst := (IntRes2 != 0)
SSE4.2
String Compare
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if any character in "a" was null, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
aInvalid := 0
FOR i := 0 to UpperBound
m := i*size
IF a[m+size-1:m] == 0
aInvalid := 1
FI
ENDFOR
dst := aInvalid
SSE4.2
String Compare
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns bit 0 of the resulting bit mask.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF a[m+size-1:m] == 0
aInvalid := 1
FI
IF b[n+size-1:n] == 0
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
bInvalid := 0
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF b[n+size-1:n] == 0
bInvalid := 1
FI
IF bInvalid // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
dst := IntRes2[0]
SSE4.2
String Compare
Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if "b" did not contain a null character and the resulting mask was zero, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF a[m+size-1:m] == 0
aInvalid := 1
FI
IF b[n+size-1:n] == 0
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
bInvalid := 0
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF b[n+size-1:n] == 0
bInvalid := 1
FI
IF bInvalid // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
dst := (IntRes2 == 0) AND bInvalid
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and store the generated mask in "dst".
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF i == la
aInvalid := 1
FI
IF j == lb
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF i >= lb // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
IF imm8[6] // byte / word mask
FOR i := 0 to UpperBound
j := i*size
IF IntRes2[i]
dst[j+size-1:j] := (imm8[0] ? 0xFF : 0xFFFF)
ELSE
dst[j+size-1:j] := 0
FI
ENDFOR
ELSE // bit mask
dst[UpperBound:0] := IntRes2[UpperBound:0]
dst[127:UpperBound+1] := 0
FI
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and store the generated index in "dst".
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF i == la
aInvalid := 1
FI
IF j == lb
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF i >= lb // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
IF imm8[6] // most significant bit
tmp := UpperBound
dst := tmp
DO WHILE ((tmp >= 0) AND a[tmp] == 0)
tmp := tmp - 1
dst := tmp
OD
ELSE // least significant bit
tmp := 0
dst := tmp
DO WHILE ((tmp <= UpperBound) AND a[tmp] == 0)
tmp := tmp + 1
dst := tmp
OD
FI
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if any character in "b" was null, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
dst := (lb <= UpperBound)
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if the resulting mask was non-zero, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF i == la
aInvalid := 1
FI
IF j == lb
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF i >= lb // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
dst := (IntRes2 != 0)
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if any character in "a" was null, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
dst := (la <= UpperBound)
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns bit 0 of the resulting bit mask.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF i == la
aInvalid := 1
FI
IF j == lb
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF i >= lb // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
dst := IntRes2[0]
SSE4.2
String Compare
Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if "b" did not contain a null character and the resulting mask was zero, and 0 otherwise.
[strcmp_note]
size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters
UpperBound := (128 / size) - 1
BoolRes := 0
// compare all characters
aInvalid := 0
bInvalid := 0
FOR i := 0 to UpperBound
m := i*size
FOR j := 0 to UpperBound
n := j*size
BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0
// invalidate characters after EOS
IF i == la
aInvalid := 1
FI
IF j == lb
bInvalid := 1
FI
// override comparisons for invalid characters
CASE (imm8[3:2]) OF
0: // equal any
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
1: // ranges
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
FI
2: // equal each
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
3: // equal ordered
IF (!aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 0
ELSE IF (aInvalid && !bInvalid)
BoolRes.word[i].bit[j] := 1
ELSE IF (aInvalid && bInvalid)
BoolRes.word[i].bit[j] := 1
FI
ESAC
ENDFOR
ENDFOR
// aggregate results
CASE (imm8[3:2]) OF
0: // equal any
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j]
ENDFOR
ENDFOR
1: // ranges
IntRes1 := 0
FOR i := 0 to UpperBound
FOR j := 0 to UpperBound
IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1])
j += 2
ENDFOR
ENDFOR
2: // equal each
IntRes1 := 0
FOR i := 0 to UpperBound
IntRes1[i] := BoolRes.word[i].bit[i]
ENDFOR
3: // equal ordered
IntRes1 := (imm8[0] ? 0xFF : 0xFFFF)
FOR i := 0 to UpperBound
k := i
FOR j := 0 to UpperBound-i
IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j]
k := k+1
ENDFOR
ENDFOR
ESAC
// optionally negate results
FOR i := 0 to UpperBound
IF imm8[4]
IF imm8[5] // only negate valid
IF i >= lb // invalid, don't negate
IntRes2[i] := IntRes1[i]
ELSE // valid, negate
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // negate all
IntRes2[i] := -1 XOR IntRes1[i]
FI
ELSE // don't negate
IntRes2[i] := IntRes1[i]
FI
ENDFOR
// output
dst := (IntRes2 == 0) AND (lb > UpperBound)
SSE4.2
String Compare
Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst".
FOR j := 0 to 1
i := j*64
dst[i+63:i] := ( a[i+63:i] > b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0
ENDFOR
SSE4.2
Compare
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst".
tmp1[7:0] := v[0:7] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[39:0] := tmp1[7:0] << 32
tmp4[39:0] := tmp2[31:0] << 8
tmp5[39:0] := tmp3[39:0] XOR tmp4[39:0]
tmp6[31:0] := MOD2(tmp5[39:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
SSE4.2
Cryptography
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst".
tmp1[15:0] := v[0:15] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[47:0] := tmp1[15:0] << 32
tmp4[47:0] := tmp2[31:0] << 16
tmp5[47:0] := tmp3[47:0] XOR tmp4[47:0]
tmp6[31:0] := MOD2(tmp5[47:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
SSE4.2
Cryptography
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst".
tmp1[31:0] := v[0:31] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[63:0] := tmp1[31:0] << 32
tmp4[63:0] := tmp2[31:0] << 32
tmp5[63:0] := tmp3[63:0] XOR tmp4[63:0]
tmp6[31:0] := MOD2(tmp5[63:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
SSE4.2
Cryptography
Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst".
tmp1[63:0] := v[0:63] // bit reflection
tmp2[31:0] := crc[0:31] // bit reflection
tmp3[95:0] := tmp1[31:0] << 32
tmp4[95:0] := tmp2[63:0] << 64
tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0]
tmp6[31:0] := MOD2(tmp5[95:0], 0x11EDC6F41) // remainder from polynomial division modulus 2
dst[31:0] := tmp6[0:31] // bit reflection
SSE4.2
Cryptography
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 7
i := j*8
dst[i+7:i] := ABS(Int(a[i+7:i]))
ENDFOR
SSSE3
Special Math Functions
Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 15
i := j*8
dst[i+7:i] := ABS(a[i+7:i])
ENDFOR
SSSE3
Special Math Functions
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := ABS(Int(a[i+15:i]))
ENDFOR
SSSE3
Special Math Functions
Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := ABS(a[i+15:i])
ENDFOR
SSSE3
Special Math Functions
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 1
i := j*32
dst[i+31:i] := ABS(a[i+31:i])
ENDFOR
SSSE3
Special Math Functions
Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst".
FOR j := 0 to 3
i := j*32
dst[i+31:i] := ABS(a[i+31:i])
ENDFOR
SSSE3
Special Math Functions
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
FOR j := 0 to 15
i := j*8
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[3:0] := b[i+3:i]
dst[i+7:i] := a[index*8+7:index*8]
FI
ENDFOR
SSSE3
Swizzle
Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst".
FOR j := 0 to 7
i := j*8
IF b[i+7] == 1
dst[i+7:i] := 0
ELSE
index[2:0] := b[i+2:i]
dst[i+7:i] := a[index*8+7:index*8]
FI
ENDFOR
SSSE3
Swizzle
Concatenate 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".
tmp[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8)
dst[127:0] := tmp[127:0]
SSSE3
Miscellaneous
Concatenate 8-byte blocks in "a" and "b" into a 16-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst".
tmp[127:0] := ((a[63:0] << 64)[127:0] OR b[63:0]) >> (imm8*8)
dst[63:0] := tmp[63:0]
SSSE3
Miscellaneous
Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
dst[15:0] := a[31:16] + a[15:0]
dst[31:16] := a[63:48] + a[47:32]
dst[47:32] := a[95:80] + a[79:64]
dst[63:48] := a[127:112] + a[111:96]
dst[79:64] := b[31:16] + b[15:0]
dst[95:80] := b[63:48] + b[47:32]
dst[111:96] := b[95:80] + b[79:64]
dst[127:112] := b[127:112] + b[111:96]
SSSE3
Arithmetic
Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
dst[15:0] := Saturate16(a[31:16] + a[15:0])
dst[31:16] := Saturate16(a[63:48] + a[47:32])
dst[47:32] := Saturate16(a[95:80] + a[79:64])
dst[63:48] := Saturate16(a[127:112] + a[111:96])
dst[79:64] := Saturate16(b[31:16] + b[15:0])
dst[95:80] := Saturate16(b[63:48] + b[47:32])
dst[111:96] := Saturate16(b[95:80] + b[79:64])
dst[127:112] := Saturate16(b[127:112] + b[111:96])
SSSE3
Arithmetic
Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
dst[31:0] := a[63:32] + a[31:0]
dst[63:32] := a[127:96] + a[95:64]
dst[95:64] := b[63:32] + b[31:0]
dst[127:96] := b[127:96] + b[95:64]
SSSE3
Arithmetic
Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
dst[15:0] := a[31:16] + a[15:0]
dst[31:16] := a[63:48] + a[47:32]
dst[47:32] := b[31:16] + b[15:0]
dst[63:48] := b[63:48] + b[47:32]
SSSE3
Arithmetic
Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
dst[31:0] := a[63:32] + a[31:0]
dst[63:32] := b[63:32] + b[31:0]
SSSE3
Arithmetic
Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
dst[15:0] := Saturate16(a[31:16] + a[15:0])
dst[31:16] := Saturate16(a[63:48] + a[47:32])
dst[47:32] := Saturate16(b[31:16] + b[15:0])
dst[63:48] := Saturate16(b[63:48] + b[47:32])
SSSE3
Arithmetic
Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
dst[15:0] := a[15:0] - a[31:16]
dst[31:16] := a[47:32] - a[63:48]
dst[47:32] := a[79:64] - a[95:80]
dst[63:48] := a[111:96] - a[127:112]
dst[79:64] := b[15:0] - b[31:16]
dst[95:80] := b[47:32] - b[63:48]
dst[111:96] := b[79:64] - b[95:80]
dst[127:112] := b[111:96] - b[127:112]
SSSE3
Arithmetic
Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
dst[15:0] := Saturate16(a[15:0] - a[31:16])
dst[31:16] := Saturate16(a[47:32] - a[63:48])
dst[47:32] := Saturate16(a[79:64] - a[95:80])
dst[63:48] := Saturate16(a[111:96] - a[127:112])
dst[79:64] := Saturate16(b[15:0] - b[31:16])
dst[95:80] := Saturate16(b[47:32] - b[63:48])
dst[111:96] := Saturate16(b[79:64] - b[95:80])
dst[127:112] := Saturate16(b[111:96] - b[127:112])
SSSE3
Arithmetic
Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
dst[31:0] := a[31:0] - a[63:32]
dst[63:32] := a[95:64] - a[127:96]
dst[95:64] := b[31:0] - b[63:32]
dst[127:96] := b[95:64] - b[127:96]
SSSE3
Arithmetic
Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst".
dst[15:0] := a[15:0] - a[31:16]
dst[31:16] := a[47:32] - a[63:48]
dst[47:32] := b[15:0] - b[31:16]
dst[63:48] := b[47:32] - b[63:48]
SSSE3
Arithmetic
Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst".
dst[31:0] := a[31:0] - a[63:32]
dst[63:32] := b[31:0] - b[63:32]
SSSE3
Arithmetic
Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst".
dst[15:0] := Saturate16(a[15:0] - a[31:16])
dst[31:16] := Saturate16(a[47:32] - a[63:48])
dst[47:32] := Saturate16(b[15:0] - b[31:16])
dst[63:48] := Saturate16(b[47:32] - b[63:48])
SSSE3
Arithmetic
Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
FOR j := 0 to 7
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ENDFOR
SSSE3
Arithmetic
Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst".
FOR j := 0 to 3
i := j*16
dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] )
ENDFOR
SSSE3
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
FOR j := 0 to 7
i := j*16
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ENDFOR
SSSE3
Arithmetic
Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst".
FOR j := 0 to 3
i := j*16
tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ENDFOR
SSSE3
Arithmetic
Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 15
i := j*8
IF b[i+7:i] < 0
dst[i+7:i] := -(a[i+7:i])
ELSE IF b[i+7:i] == 0
dst[i+7:i] := 0
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
SSSE3
Arithmetic
Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 7
i := j*16
IF b[i+15:i] < 0
dst[i+15:i] := -(a[i+15:i])
ELSE IF b[i+15:i] == 0
dst[i+15:i] := 0
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
SSSE3
Arithmetic
Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 3
i := j*32
IF b[i+31:i] < 0
dst[i+31:i] := -(a[i+31:i])
ELSE IF b[i+31:i] == 0
dst[i+31:i] := 0
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
SSSE3
Arithmetic
Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 7
i := j*8
IF b[i+7:i] < 0
dst[i+7:i] := -(a[i+7:i])
ELSE IF b[i+7:i] == 0
dst[i+7:i] := 0
ELSE
dst[i+7:i] := a[i+7:i]
FI
ENDFOR
SSSE3
Arithmetic
Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 3
i := j*16
IF b[i+15:i] < 0
dst[i+15:i] := -(a[i+15:i])
ELSE IF b[i+15:i] == 0
dst[i+15:i] := 0
ELSE
dst[i+15:i] := a[i+15:i]
FI
ENDFOR
SSSE3
Arithmetic
Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero.
FOR j := 0 to 1
i := j*32
IF b[i+31:i] < 0
dst[i+31:i] := -(a[i+31:i])
ELSE IF b[i+31:i] == 0
dst[i+31:i] := 0
ELSE
dst[i+31:i] := a[i+31:i]
FI
ENDFOR
SSSE3
Arithmetic
Copy the current 64-bit value of the processor's time-stamp counter into "dst".
dst[63:0] := TimeStampCounter
TSC
General Support
Mark the start of a TSX (HLE/RTM) suspend load address tracking region. If this is used inside a transactional region, subsequent loads are not added to the read set of the transaction. If this is used inside a suspend load address tracking region it will cause transaction abort. If this is used outside of a transactional region it behaves like a NOP.
TSXLDTRK
Miscellaneous
Mark the end of a TSX (HLE/RTM) suspend load address tracking region. If this is used inside a suspend load address tracking region it will end the suspend region and all following load addresses will be added to the transaction read set. If this is used inside an active transaction but not in a suspend region it will cause transaction abort. If this is used outside of a transactional region it behaves like a NOP.
TSXLDTRK
Miscellaneous
Clear the user interrupt flag (UIF).
UINTR
General Support
Send user interprocessor interrupts specified in unsigned 64-bit integer "__a".
UINTR
General Support
Sets the user interrupt flag (UIF).
UINTR
General Support
Store the current user interrupt flag (UIF) in unsigned 8-bit integer "dst".
UINTR
General Support
Reads the contents of a 64-bit MSR specified in "__A" into "dst".
DEST := MSR[__A]
USER_MSR
General Support
Writes the contents of "__B" into the 64-bit MSR specified in "__A".
MSR[__A] := __B
USER_MSR
General Support
Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."
FOR j := 0 to 1
i := j*128
a[i+127:i] := ShiftRows(a[i+127:i])
a[i+127:i] := SubBytes(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:256] := 0
VAES
AVX512VL
Cryptography
Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"."
FOR j := 0 to 1
i := j*128
a[i+127:i] := ShiftRows(a[i+127:i])
a[i+127:i] := SubBytes(a[i+127:i])
a[i+127:i] := MixColumns(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:256] := 0
VAES
AVX512VL
Cryptography
Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".
FOR j := 0 to 1
i := j*128
a[i+127:i] := InvShiftRows(a[i+127:i])
a[i+127:i] := InvSubBytes(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:256] := 0
VAES
AVX512VL
Cryptography
Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst".
FOR j := 0 to 1
i := j*128
a[i+127:i] := InvShiftRows(a[i+127:i])
a[i+127:i] := InvSubBytes(a[i+127:i])
a[i+127:i] := InvMixColumns(a[i+127:i])
dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i]
ENDFOR
dst[MAX:256] := 0
VAES
AVX512VL
Cryptography
Carry-less multiplication of one quadword of
'b' by one quadword of 'c', stores
the 128-bit result in 'dst'. The immediate 'Imm8' is
used to determine which quadwords of 'b'
and 'c' should be used.
DEFINE PCLMUL128(X,Y) {
FOR i := 0 to 63
TMP[i] := X[ 0 ] and Y[ i ]
FOR j := 1 to i
TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
ENDFOR
DEST[ i ] := TMP[ i ]
ENDFOR
FOR i := 64 to 126
TMP[i] := 0
FOR j := i - 63 to 63
TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
ENDFOR
DEST[ i ] := TMP[ i ]
ENDFOR
DEST[127] := 0
RETURN DEST // 128b vector
}
FOR i := 0 to 1
IF Imm8[0] == 0
TEMP1 := b.m128[i].qword[0]
ELSE
TEMP1 := b.m128[i].qword[1]
FI
IF Imm8[4] == 0
TEMP2 := c.m128[i].qword[0]
ELSE
TEMP2 := c.m128[i].qword[1]
FI
dst.m128[i] := PCLMUL128(TEMP1, TEMP2)
ENDFOR
dst[MAX:256] := 0
VPCLMULQDQ
AVX512VL
Application-Targeted
Carry-less multiplication of one quadword of
'b' by one quadword of 'c', stores
the 128-bit result in 'dst'. The immediate 'Imm8' is
used to determine which quadwords of 'b'
and 'c' should be used.
DEFINE PCLMUL128(X,Y) {
FOR i := 0 to 63
TMP[i] := X[ 0 ] and Y[ i ]
FOR j := 1 to i
TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
ENDFOR
DEST[ i ] := TMP[ i ]
ENDFOR
FOR i := 64 to 126
TMP[i] := 0
FOR j := i - 63 to 63
TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ])
ENDFOR
DEST[ i ] := TMP[ i ]
ENDFOR
DEST[127] := 0
RETURN DEST // 128b vector
}
FOR i := 0 to 3
IF Imm8[0] == 0
TEMP1 := b.m128[i].qword[0]
ELSE
TEMP1 := b.m128[i].qword[1]
FI
IF Imm8[4] == 0
TEMP2 := c.m128[i].qword[0]
ELSE
TEMP2 := c.m128[i].qword[1]
FI
dst.m128[i] := PCLMUL128(TEMP1, TEMP2)
ENDFOR
dst[MAX:512] := 0
VPCLMULQDQ
Application-Targeted
Directs the processor to enter an implementation-dependent optimized state until the TSC reaches or exceeds the value specified in "counter". Bit 0 of "ctrl" selects between a lower power (cleared) or faster wakeup (set) optimized state. Returns the carry flag (CF). If the processor that executed a UMWAIT instruction wakes due to the expiration of the operating system timelimit, the instructions sets RFLAGS.CF; otherwise, that flag is cleared.
WAITPKG
Miscellaneous
Directs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The instruction wakes up when the TSC reaches or exceeds the value specified in "counter" (if the monitoring hardware did not trigger beforehand). Bit 0 of "ctrl" selects between a lower power (cleared) or faster wakeup (set) optimized state. Returns the carry flag (CF). If the processor that executed a UMWAIT instruction wakes due to the expiration of the operating system timelimit, the instructions sets RFLAGS.CF; otherwise, that flag is cleared.
WAITPKG
Miscellaneous
Sets up a linear address range to be
monitored by hardware and activates the
monitor. The address range should be a writeback
memory caching type. The address is
contained in "a".
WAITPKG
Miscellaneous
Write back and do not flush internal caches.
Initiate writing-back without flushing of external
caches.
WBNOINVD
Miscellaneous
Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsavec differs from xsave in that it uses compaction and that it may use init optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
XSAVEC
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsavec differs from xsave in that it uses compaction and that it may use init optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
XSAVEC
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. The hardware may optimize the manner in which data is saved. The performance of this instruction will be equal to or better than using the XSAVE instruction.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
2: mem_addr.EXT_SAVE_Area2[YMM] := ProcessorState[YMM]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
XSAVEOPT
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. The hardware may optimize the manner in which data is saved. The performance of this instruction will be equal to or better than using the XSAVE64 instruction.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
2: mem_addr.EXT_SAVE_Area2[YMM] := ProcessorState[YMM]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
XSAVEOPT
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsaves differs from xsave in that it can save state components corresponding to bits set in IA32_XSS MSR and that it may use the modified optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
XSS
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsaves differs from xsave in that it can save state components corresponding to bits set in IA32_XSS MSR and that it may use the modified optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
XSS
OS-Targeted
Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". xrstors differs from xrstor in that it can restore state components corresponding to bits set in the IA32_XSS MSR; xrstors cannot restore from an xsave area in which the extended region is in the standard form. State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.
st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
FOR i := 0 to 62
IF (rs_mask[i] AND XCR0[i])
IF st_mask[i]
CASE (i) OF
0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
ESAC
ELSE
// ProcessorExtendedState := Processor Supplied Values
CASE (i) OF
1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
ESAC
FI
FI
i := i + 1
ENDFOR
XSAVE
XSS
OS-Targeted
Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". xrstors differs from xrstor in that it can restore state components corresponding to bits set in the IA32_XSS MSR; xrstors cannot restore from an xsave area in which the extended region is in the standard form. State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.
st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
FOR i := 0 to 62
IF (rs_mask[i] AND XCR0[i])
IF st_mask[i]
CASE (i) OF
0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
ESAC
ELSE
// ProcessorExtendedState := Processor Supplied Values
CASE (i) OF
1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
ESAC
FI
FI
i := i + 1
ENDFOR
XSAVE
XSS
OS-Targeted
Copy up to 64-bits from the value of the extended control register (XCR) specified by "a" into "dst". Currently only XFEATURE_ENABLED_MASK XCR is supported.
dst[63:0] := XCR[a]
XSAVE
OS-Targeted
Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.
st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
FOR i := 0 to 62
IF (rs_mask[i] AND XCR0[i])
IF st_mask[i]
CASE (i) OF
0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
ESAC
ELSE
// ProcessorExtendedState := Processor Supplied Values
CASE (i) OF
1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
ESAC
FI
FI
i := i + 1
ENDFOR
XSAVE
OS-Targeted
Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary.
st_mask := mem_addr.HEADER.XSTATE_BV[62:0]
FOR i := 0 to 62
IF (rs_mask[i] AND XCR0[i])
IF st_mask[i]
CASE (i) OF
0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU]
1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE]
DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i]
ESAC
ELSE
// ProcessorExtendedState := Processor Supplied Values
CASE (i) OF
1: MXCSR := mem_addr.FPUSSESave_Area[SSE]
ESAC
FI
FI
i := i + 1
ENDFOR
XSAVE
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
OS-Targeted
Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary.
mask[62:0] := save_mask[62:0] AND XCR0[62:0]
FOR i := 0 to 62
IF mask[i]
CASE (i) OF
0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU]
1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE]
DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i]
ESAC
mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i]
FI
i := i + 1
ENDFOR
XSAVE
OS-Targeted
Copy 64-bits from "val" to the extended control register (XCR) specified by "a". Currently only XFEATURE_ENABLED_MASK XCR is supported.
XCR[a] := val[63:0]
XSAVE
OS-Targeted