Skip to content
#

cuda

Here are 2,519 public repositories matching this topic...

numba
gmarkall
gmarkall commented Nov 3, 2020

PR #6447 adds a public API to get the maximum number of registers per thread (numba.cuda.Dispatcher.get_regs_per_thread()). There are other attributes that might be nice to provide - shared memory per block, local memory per thread, const memory usage, maximum block size.

These are all available in the FuncAttr named tuple: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/drive

vuule
vuule commented Nov 4, 2020

Current default value for rows_per_chunk parameter of the CSV writer is 8, which means that the input table is by default broken into many small slices that are written out sequentially. This reduces the performance by an order on magnitude in some cases.

In Python layer, the default is the number of rows (i.e. write table out in a single pass). We can follow this by setting rows_per_chunk

futhark

Improve this page

Add a description, image, and links to the cuda topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cuda topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.