Block or Report
Block or report jianyuh
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
-
NervanaSystems/neon Public
Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
-
-
-
flame/fmm-gen Public
Generating Families of Practical Fast Matrix Multiplication Algorithms
-
pytorch/pytorch Public
Tensors and Dynamic neural networks in Python with strong GPU acceleration
-
pytorch/FBGEMM Public
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
246 contributions in the last year
Activity overview
Contribution activity
April 2022
Created 4 commits in 3 repositories
Created 1 repository
- jianyuh/cutlass C++
Created a pull request in pytorch/FBGEMM that received 2 comments
Add launch_bounds constraint for CUDA kernel to avoid register overuse
Summary: Address CUDA error : too many resources requested for launch error and avoid register overuse.
Differential Revision: D35344926
Opened 3 other pull requests in 3 repositories
pytorch/pytorch
1
open
pytorch/FBGEMM
1
open
NVIDIA/cutlass
1
merged
Created an issue in NVIDIA/cutlass that received 1 comment
[FEA] FP8 GEMM implementation
Is your feature request related to a problem? Please describe. Recently more details about Nvidia's latest H100 GPU are released in https://develop…