gpu

🐛 Bug

I stumbled upon excessive CPU usage for my training code running on GPU. After some investigations I found the culprit.
It basically was

x = torch.eye(256).to('cuda')

To Reproduce

This is quick and loads single CPU core.

%%timeit
    torch.eye(181)
6.43 µs ± 218 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This is 3 times slowe

At this moment relu_layer op doesn't allow threshold configuration, and legacy RELU op allows that.
We should add configuration option to relu_layer.

Problem: the approximate method can still be slow for many trees
catboost version: master
Operating System: ubuntu 18.04
CPU: i9
GPU: RTX2080

Would be good to be able to specify how many trees to use for shapley. The model.predict and prediction_type versions allow this. lgbm/xgb allow this.

Our users are often confused by the output from programs such as zip2john sometimes being very large (multi-gigabyte). Maybe we should identify and enhance these programs to output a message to stderr to explain to users that it's normal for the output to be very large - maybe always or maybe only when the output size is above a threshold (e.g., 1 million bytes?)

Hi ,

I have tried out both loss.backward() and model_engine.backward(loss) for my code. There are several subtle differences that I have observed , for one retain_graph = True does not work for model_engine.backward(loss) . This is creating a problem since buffers are not being retained every time I run the code for some reason.

Please look into this if you could.

Describe the bug
After applying the unstack function, the variable names change to numeric format.

Steps/Code to reproduce bug

def get_df(length, num_cols, num_months, acc_offset):
    cols = [ 'var_{}'.format(i) for i in range(num_cols)]
    df = cudf.DataFrame({col: cupy.random.rand(length * num_months) for col in cols})
    df['acc_id'] = cupy.repeat(cupy.arange(length), nu

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

We would like to forward a particular 'key' column which is part of the features to appear alongside the predictions - this is to be able to identify to which set of features a particular prediction belongs to. Here is an example of predictions output using the tensorflow.contrib.estimator.multi_class_head:

{"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
 "scores": [0.068196

Our Doxygen comments have a lot of references to the old SGI STL docs, which are outdated and no longer available. For example, in for_each:

 *  \see http://www.sgi.com/tech/stl/for_each.html

All of these links should be updated to corresponding cppreference.com links.

It's probably

gpu

Here are 2,117 public repositories matching this topic...

pytorch / pytorch

🐛 Bug

To Reproduce

This is quick and loads single CPU core.

This is 3 times slowe

alacritty / alacritty

fastai / fastai

NVIDIA / nvidia-docker

gpujs / gpu.js

eclipse / deeplearning4j

PavelDoGreat / WebGL-Fluid-Simulation

apache / tvm

OlafenwaMoses / ImageAI

catboost / catboost

chainer / chainer

h2oai / h2o-3

MVIG-SJTU / AlphaPose

cupy / cupy

openwall / john

gfx-rs / gfx

microsoft / DeepSpeed

halide / Halide

intel-isl / Open3D

PipelineAI / pipeline

NVIDIA / DIGITS

rapidsai / cudf

exelban / stats

arrayfire / arrayfire

tensorflow / adanet

ultralight-ux / Ultralight

NVIDIA / thrust

NVIDIA / DALI

pycaret / pycaret

Syllo / nvtop

Improve this page

Add this topic to your repo