Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upSupport multiplication of a vector against one lane (broadcasted) of another vector #227
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This is a very common thing to do, particularly in float matrix multiplication kernels.
Example.
This should be available for all multiplication instructions, including any multiply-add instructions if added to the spec. Float and integer. This will map directly to the corresponding instructions on ARM and will be implemented on x86 by using a broadcast instruction into a temporary vector.
Rationale for this programming model in WebAsm SIMD:
See ARM benchmarks in this spreadsheet.
Row 30, NEON_64bit_GEMM_Float32_WithVectorDuplicatingScalar, is the float kernel that one can write without such instructions.
Row 31, NEON_64bit_GEMM_Float32_WithScalar, is the faster float kernel that one can write with such instructions.