gpu

🚀 Feature

Add support for torch.max with:

CUDA bfloat16
CPU float16 and bfloat16

Motivation

Currently, torch.max has support for CUDA float16:

>>> torch.rand(10, dtype=torch.float16, device='cuda').max()
tensor(0.8530, device='cuda:0', dtype=torch.float16)

But all three other combinations of CPU/CUDA and float16/bfloat16 are not supported:

>>> torch.ra

At this moment relu_layer op doesn't allow threshold configuration, and legacy RELU op allows that.
We should add configuration option to relu_layer.

Problem: the approximate method can still be slow for many trees
catboost version: master
Operating System: ubuntu 18.04
CPU: i9
GPU: RTX2080

Would be good to be able to specify how many trees to use for shapley. The model.predict and prediction_type versions allow this. lgbm/xgb allow this.

As seen in openwall/john#4530 (comment):

Benchmarking: sspr-opencl, NetIQ SSPR / Adobe AEM [MD5/SHA1/SHA2 OpenCL]... Warning: binary() returned misaligned pointer
DONE

This is because opencl_sspr_fmt_plug.c wrongly has:

#define BINARY_ALIGN            MEM_ALIGN_WORD

whereas the code only guarantees alignment appropriate for

Hi ,

I have tried out both loss.backward() and model_engine.backward(loss) for my code. There are several subtle differences that I have observed , for one retain_graph = True does not work for model_engine.backward(loss) . This is creating a problem since buffers are not being retained every time I run the code for some reason.

Please look into this if you could.

Is your feature request related to a problem? Please describe.
It might be useful to have a singular clean and performant way to check if all the columns of a dataframe are of the same dtype, such as a DataFrame property _is_homogeneous. This comes up in a lot of places, such as where we might want to dispatch to a cupy matrix implementation (Transpose, some row wise reductions I believe

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

We would like to forward a particular 'key' column which is part of the features to appear alongside the predictions - this is to be able to identify to which set of features a particular prediction belongs to. Here is an example of predictions output using the tensorflow.contrib.estimator.multi_class_head:

{"classes": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
 "scores": [0.068196

Names map and input are exchanged mistakenly. By sense of Preconditions paragraph they have to be exchanged I suppose, because there is no problem when map and result coincide (in current context).

gpu

Here are 2,025 public repositories matching this topic...

pytorch / pytorch

🚀 Feature

Motivation

alacritty / alacritty

fastai / fastai

NVIDIA / nvidia-docker

gpujs / gpu.js

eclipse / deeplearning4j

PavelDoGreat / WebGL-Fluid-Simulation

apache / tvm

OlafenwaMoses / ImageAI

catboost / catboost

chainer / chainer

h2oai / h2o-3

cupy / cupy

MVIG-SJTU / AlphaPose

gfx-rs / gfx

openwall / john

microsoft / DeepSpeed

halide / Halide

PipelineAI / pipeline

NVIDIA / DIGITS

intel-isl / Open3D

rapidsai / cudf

arrayfire / arrayfire

exelban / stats

tensorflow / adanet

NVIDIA / thrust

ultralight-ux / Ultralight

NVIDIA / DALI

Syllo / nvtop

pycaret / pycaret

Improve this page

Add this topic to your repo