Highlights
- Arctic Code Vault Contributor
- Pro
Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 50 million developers.
Sign up
Pinned
2,550 contributions in the last year
Contribution activity
September 2020
Created a pull request in pytorch/pytorch that received 17 comments
Add Profile template argument to dispatcher.
Stack from ghstack: #44033 DeviceGuard dispatch key #44053 Add Profile template argument to dispatcher. The profile template argument can be used…
+62
−48
•
17
comments
- Make cudaHostRegister actually useful on cudart.
- Add NativeFunction.signature and kind.
- Switch all Sequences in tools.codegen.model to Tuple
- Gitignore cachegrind and callgrind files
- Workaround nvcc miscompilation of two 16-bit fields together.
- [DO NOT MERGE] Nuclear option for nvcc Device miscompilation
- Conjugate view tensor
- Vectorize complex copy.
- Add TORCH_SELECTIVE_NAME to AMP definitions
- Revert "Allow Tensor-likes in torch.autograd.gradcheck (#43877)"
- [TESTING] [skip ci] ghexport smoketest 2
- [TESTING] [skip ci] ghexport smoketest 2
- [TESTING] [skip ci] ghexport smoketest
- [TESTING] [skip ci] ghexport smoketest
- [TESTING] [skip ci] ghexport bidirectional smoketest
- Make c10::Device::validate() debug-only assert.
- Don't register a fallback for private use to let extensions do it themselves
- DeviceGuard dispatch key
- Move xla codegen to aten.
- [RFC] Switch over gen.py to using the selective build abstraction instead of directly querying op_registration_whitelist
- [RFC] Add method SelectiveBuildOperator.combine()
- [RFC] Helpers for various selective build codegen code-paths
- [RFC] Remove per-op-registration related code in caffe2/tools/codegen/gen.py
- Grammatically updated the tech docs
- Fix mistake in norm documentation
- [ONNX] Correct a minor typo in warning
- Enable torch.tensor typechecks
- Add torch.Assert, which is symbolically traceable
- Filter `strtod_l` is undeclared errors from sccache log
- Add callgrind collection to Timer
- fast TypeMeta/ScalarType conversion
- Vectorize bitwise_not
- Relax CUDA architecture check
- Profiling allocator for mobile.
- Simple caching allocator for CPU.
- faster TensorOptions merging
- track Half/ComplexHalf default dtype
- pass TypeMeta by value
- [pytorch] refine dispatch keys in native_functions.yaml (1/N)
- Align casing in test_dispatch with dispatch keys.
- Byte-for-byte compatibility fixes in codegen
- Remove hacky_wrapper from BackendSelect kernels
- Add foreach APIs for binary ops with ScalarList
- Some pull request reviews not shown.
Created an issue in pytorch/pytorch that received 4 comments
Immutable (read-only) tensors
Previously: #30458
An immutable tensor is a tensor which cannot be mutated, e.g., via inplace operations or out= operations. Due to the reduced API…
4
comments
- Staged backend fallback (per-operator precomputation)
- test_cudart_register doesn't work on ROCm
- Add a test to detect dangling impl registrations with no corresponding defs
- Add cudaHostUnregister to torch.cuda.cudart()
- Don't query current device on stream construction
- ComplexHelper.h contains non-inline functions
- Simple functions shouldn't go through dispatcher
- cblas_gemv is not being used for gemv on complex on CPU
- [tools.codegen] Remove byte-for-byte compatibility code
- Get rid of copy_from
- [tools.codegen] Rename api.legacy_dispatcher to api.native
- Make it so that leading underscore operators are truly private and can be changed without worry for BC