Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiplication of a vector against one lane (broadcasted) of another vector #227

Open
bjacob opened this issue May 11, 2020 · 0 comments

Comments

@bjacob
Copy link

@bjacob bjacob commented May 11, 2020

Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This is a very common thing to do, particularly in float matrix multiplication kernels.

Example.

This should be available for all multiplication instructions, including any multiply-add instructions if added to the spec. Float and integer. This will map directly to the corresponding instructions on ARM and will be implemented on x86 by using a broadcast instruction into a temporary vector.

Rationale for this programming model in WebAsm SIMD:

  • It's more expressive w.r.t. what many applications need to do.
  • The fallback is efficient provided well ordered instructions in the generated code. By contrast, the current lack of this instruction forces the WebAsm source to use separate broadcast instructions, which make it essentially impossible for the generated code to be efficient.

See ARM benchmarks in this spreadsheet.
Row 30, NEON_64bit_GEMM_Float32_WithVectorDuplicatingScalar, is the float kernel that one can write without such instructions.
Row 31, NEON_64bit_GEMM_Float32_WithScalar, is the faster float kernel that one can write with such instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.