internal/simdgen: add dot products
diff --git a/internal/simdgen/categories.yaml b/internal/simdgen/categories.yaml
index ad5325f..6afbd9d 100644
--- a/internal/simdgen/categories.yaml
+++ b/internal/simdgen/categories.yaml
@@ -261,6 +261,20 @@
commutative: "false"
masked: "true"
extension: "AVX512.*" # VPOPCNT instructions are AVX512 (BITALG or VPOPCNTDQ)
+- go: PairDotProd
+ commutative: "true"
+ extension: "AVX.*"
+ documentation: "Multiply the elements and add the pairs together"
+- go: MaskedPairDotProd
+ masked: "true"
+ commutative: "true"
+ extension: "AVX.*"
+ documentation: "Multiply the elements and add the pairs together"
+# QuadDotProd, i.e. VPDPBUSD(S) are operations with src/dst on the same register, we are not supporting this as of now.
+- go: DotProdBroadcast
+ commutative: "true"
+ extension: "AVX.*"
+ documentation: "Multiply the elements and add the pairs together; the result is a broadcast of the dot product; imm8 = 127;"
- go: Max
commutative: "true"
extension: "AVX.*"
diff --git a/internal/simdgen/go.yaml b/internal/simdgen/go.yaml
index a6e70e2..4e88cc6 100644
--- a/internal/simdgen/go.yaml
+++ b/internal/simdgen/go.yaml
@@ -472,6 +472,36 @@
go: $t
out:
- *any
+- go: PairDotProd
+ asm: VPMADDWD
+ in:
+ - &int
+ go: $t
+ base: int
+ - *int
+ out:
+ - &int2 # The elemBits are different
+ go: $t2
+ base: int
+- go: MaskedPairDotProd
+ asm: VPMADDWD
+ in:
+ - class: mask
+ - *int
+ - *int
+ out:
+ - *int2
+- go: DotProdBroadcast
+ asm: VDPPD
+ in:
+ - &float
+ go: $t
+ base: float
+ - *float
+ - class: immediate
+ const: 127 # make sure the control bits [4:5] are all 1
+ out:
+ - *float
- go: Max
asm: "V?PMAXS[BWDQ]"
in: &2int
diff --git a/internal/simdgen/ops/MLOps/categories.yaml b/internal/simdgen/ops/MLOps/categories.yaml
new file mode 100644
index 0000000..30376cb
--- /dev/null
+++ b/internal/simdgen/ops/MLOps/categories.yaml
@@ -0,0 +1,15 @@
+!sum
+- go: PairDotProd
+ commutative: "true"
+ extension: "AVX.*"
+ documentation: "Multiply the elements and add the pairs together"
+- go: MaskedPairDotProd
+ masked: "true"
+ commutative: "true"
+ extension: "AVX.*"
+ documentation: "Multiply the elements and add the pairs together"
+# QuadDotProd, i.e. VPDPBUSD(S) are operations with src/dst on the same register, we are not supporting this as of now.
+- go: DotProdBroadcast
+ commutative: "true"
+ extension: "AVX.*"
+ documentation: "Multiply the elements and add the pairs together; the result is a broadcast of the dot product; imm8 = 127;"
diff --git a/internal/simdgen/ops/MLOps/go.yaml b/internal/simdgen/ops/MLOps/go.yaml
new file mode 100644
index 0000000..a126bba
--- /dev/null
+++ b/internal/simdgen/ops/MLOps/go.yaml
@@ -0,0 +1,31 @@
+!sum
+- go: PairDotProd
+ asm: VPMADDWD
+ in:
+ - &int
+ go: $t
+ base: int
+ - *int
+ out:
+ - &int2 # The elemBits are different
+ go: $t2
+ base: int
+- go: MaskedPairDotProd
+ asm: VPMADDWD
+ in:
+ - class: mask
+ - *int
+ - *int
+ out:
+ - *int2
+- go: DotProdBroadcast
+ asm: VDPPD
+ in:
+ - &float
+ go: $t
+ base: float
+ - *float
+ - class: immediate
+ const: 127 # make sure the control bits [4:5] are all 1
+ out:
+ - *float
\ No newline at end of file
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
documentation: "Multiply the elements and add the pairs together"
", yielding a vector of half as many elements with twice the input element size"
documentation: "Multiply the elements and add the pairs together with saturation"
"Multiply the elements and saturated-add the signed-unsigned pairs together with saturation, yielding a vector of half as many elements with twice the input element size."
I went looking at the documentation for this instruction, and it is detailed and weird and we probably need an uglier name -- specifically the unsigned-signed operands, and that it operates 2-by-2 instead of 4-by-4 (SVE, different, because of course it is). And thus, because it is signed-unsigned, it is also NOT commutative.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
documentation: "Multiply the elements and add the pairs together"
", yielding a vector of half as many elements with twice the input element size"
Done
documentation: "Multiply the elements and add the pairs together with saturation"
"Multiply the elements and saturated-add the signed-unsigned pairs together with saturation, yielding a vector of half as many elements with twice the input element size."
I went looking at the documentation for this instruction, and it is detailed and weird and we probably need an uglier name -- specifically the unsigned-signed operands, and that it operates 2-by-2 instead of 4-by-4 (SVE, different, because of course it is). And thus, because it is signed-unsigned, it is also NOT commutative.
Thanks for noticing the commutativity issue. I have also updated the name; 2-by-2 is in the name's "Pair" part so it should have distinguished them from SVE.
Also updated the documentation.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +2 |
Commit-Queue | +1 |
testdata/
.gemini*
Ideally, this small unrelated change would be in its own small CL. Once we get to a more stable world, if there are bugs that creep in, it's easier to debug and rollback CLs if they are small (if possible) and single-purpose.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |