Block or Report
Block or report coreylowman
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
1,401 contributions in the last year
Less
More
Contribution activity
March 2023
Created 71 commits in 4 repositories
Created a pull request in coreylowman/cudarc that received 1 comment
Opened 44 other pull requests in 2 repositories
coreylowman/dfdx
33
merged
1
closed
-
matrixmultiply optional. Adds
cpu-seq-matmul,cpu-par-matmul,cpu-mkl-matmulfeatures -
Adds
trait Traceand generic training example - Fixing no-std support
- Updating 01-tensor
- Adding features to cargo doc on ci
- Removes .trace_into(), .trace() now requires Gradients object
- Querying nvidia-smi for compute capability instead of native
- Removing double computation of mean in normalize
- Reshape skip kernels with a contiguous tensor
- Moving transformers to stable, and accepting dyn dimensions for transformer input
- Letting batch & seq dimensions of matmul be dyn
- Adds ReduceShape<Self::LastAxis> to Shape
- Bump to cudarc 0.9.0
- Docs update
- Moving src/unique_id & src/gradients into src/tensor
- Finalizing nn exports
- Moving Reshape to use stable compile time asserts
- Adding nice error message when MHA num heads doesn't divide K/H
- WIP Improving generic training loops
- Easier preprocessing
- Changing stack to be method of array/vec instead of device
-
Adds
Tensor::concat - Fixing bool tests with safetensors (serde compatibility)
-
Adding
no-stdfeature flag, matrixmultiply/threading behind feature flag. numpy no longer default -
Adding
model.alloc_grads(), removingDefaultforGradients - Some pull requests not shown.
coreylowman/cudarc
10
merged
- More sound CudaStream
- Reverting free stream, putting free_async calls on default stream
- Adds feature flags for each of the parts of cudarc
- Removing dep on find-cuda-helper. Moving no-std behind feature flag
- Simplifying Ptx functions
- Removing CudaDeviceBuilder
- Using DevicePtrMut and DevicePtr for copies
- Consistent & clear naming convention
- Adding DeviceRepr, removing AsKernelParam
- Reorganizing cudarc::driver::safe
Reviewed 13 pull requests in 1 repository
coreylowman/dfdx
13 pull requests
- Allow Modules to be constructed with the TensorCollection trait
- Removes .trace_into(), .trace() now requires Gradients object
- feat: add realize shape
- Handle path for TensorVisitors using a TensorViewer
- WIP Improving generic training loops
- Hotfixing the safetensors impl.
- Safetensors support.
- feat: adds BatchNorm1D
-
Adding
model.alloc_grads(), removingDefaultforGradients - Adding axpy tensor op & ModelEMA module walker
-
Adding
UnbiasedLinear(linear without bias). - Making K dimension of matmul dynamic.
-
Allowing
nn::Embeddingto be dynamic in shape.
Created an issue in coreylowman/dfdx that received 16 comments
Out of memory issues with gradient accumulation
This is because the temporary gradients are kept around in the gradients object. Is it possible to have gradients object clear temporary gradients? W…
16
comments
Opened 18 other issues in 3 repositories
coreylowman/dfdx
8
open
7
closed
- Optimize conv2d folding kernels
- Next release tracking issue
-
Users should allocate gradients instead of allocating them with
trace() -
Add convienence
AutoDevicetype alias based on features -
Make
matrixmultiplydependency optional - Add workspace for Cuda & Cpu devices
- Combine "intel-mkl" and "cblas" feature flag & remove "cblas"
- Use cuda memory pools to reduce allocs/frees
- Weird errors when heads is wrong with transformers
-
Weird errors when forgetting to specify
#![feature(generic_const_exprs)]with transformers - Ability to one_hot_encode 2d arrays/vecs of usizes
- Add a way to convert between const/dyn shapes
- Use the winograd convolution algorithm when possible
- Figure out a better number of threads to launch kernels with
- Move matrixmultiply threaded feature behind a new "threaded-cpu" feature




