Add min, max and clamp arithmetic ops #2298
Conversation
| def min(left, right): | ||
| return _arithm_op("min", left, right) | ||
|
|
||
| def max(left, right): | ||
| return _arithm_op("max", left, right) | ||
|
|
||
| def clamp(value, lo, hi): | ||
| return _arithm_op("clamp", value, lo, hi) |
mzient
Sep 24, 2020
•
Contributor
Maybe they should land in dali.fn? Both the usage mode and lowercase naming would play better with dali.fn.
Besides the module:
Suggested change
def min(left, right):
return _arithm_op("min", left, right)
def max(left, right):
return _arithm_op("max", left, right)
def clamp(value, lo, hi):
return _arithm_op("clamp", value, lo, hi)
def min(left, right):
"""Fills the output with minima of corresponding values in ``left`` and ``rigt``"""
return _arithm_op("min", left, right)
def max(left, right):
"""Fills the output with maxima of corresponding values in ``left`` and ``rigt``"""
return _arithm_op("max", left, right)
def clamp(value, lo, hi):
"""Produces a tensor of values from ``value`` clamped to the range ``lo``..``hi``."""
return _arithm_op("clamp", value, lo, hi)
Maybe they should land in dali.fn? Both the usage mode and lowercase naming would play better with dali.fn.
Besides the module:
| def min(left, right): | |
| return _arithm_op("min", left, right) | |
| def max(left, right): | |
| return _arithm_op("max", left, right) | |
| def clamp(value, lo, hi): | |
| return _arithm_op("clamp", value, lo, hi) | |
| def min(left, right): | |
| """Fills the output with minima of corresponding values in ``left`` and ``rigt``""" | |
| return _arithm_op("min", left, right) | |
| def max(left, right): | |
| """Fills the output with maxima of corresponding values in ``left`` and ``rigt``""" | |
| return _arithm_op("max", left, right) | |
| def clamp(value, lo, hi): | |
| """Produces a tensor of values from ``value`` clamped to the range ``lo``..``hi``.""" | |
| return _arithm_op("clamp", value, lo, hi) |
| @@ -115,8 +115,8 @@ DLL_PUBLIC DALIDataType PropagateTypes(ExprNode &expr, const workspace_t<Backend | |||
| } | |||
| auto &func = dynamic_cast<ExprFunc &>(expr); | |||
| int subexpression_count = func.GetSubexpressionCount(); | |||
| DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2, | |||
| "Only unary and binary expressions are supported"); | |||
| DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity, | |||
JanuszL
Sep 24, 2020
Contributor
Suggested change
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,
DALI_ENFORCE(0 < subexpression_count && subexpression_count <= kMaxArity,
To be consistent with L340.
| DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity, | |
| DALI_ENFORCE(0 < subexpression_count && subexpression_count <= kMaxArity, |
To be consistent with L340.
klecki
Oct 5, 2020
Author
Contributor
Done
Done
| @@ -193,8 +193,8 @@ DLL_PUBLIC inline const TensorListShape<> &PropagateShapes(ExprNode &expr, | |||
| } | |||
| auto &func = dynamic_cast<ExprFunc &>(expr); | |||
| int subexpression_count = expr.GetSubexpressionCount(); | |||
| DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2, | |||
| "Only unary and binary expressions are supported"); | |||
| DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity, | |||
JanuszL
Sep 24, 2020
Contributor
As above
As above
klecki
Oct 5, 2020
Author
Contributor
Done
Done
| } else { | ||
| DALI_FAIL("Expression cannot have three scalar operands"); | ||
| } | ||
| ), DALI_FAIL("No suitable type found");); // NOLINT(whitespace/parens) |
JanuszL
Sep 24, 2020
Contributor
Could you print type as well and tell which argument is the faulty one?
Could you print type as well and tell which argument is the faulty one?
.../math/expressions/expression_factory_instances/expression_impl_factory.h
Outdated
Show resolved
Hide resolved
| auto v_ = static_cast<result_t<T, Min, Max>>(v); | ||
| auto lo_ = static_cast<result_t<T, Min, Max>>(lo); | ||
| auto hi_ = static_cast<result_t<T, Min, Max>>(hi); | ||
| auto lo_clamp_ = v_ <= lo_ ? lo_ : v_; | ||
| return lo_clamp_ >= hi_ ? hi_ : lo_clamp_; |
mzient
Sep 25, 2020
•
Contributor
Suggested change
auto v_ = static_cast<result_t<T, Min, Max>>(v);
auto lo_ = static_cast<result_t<T, Min, Max>>(lo);
auto hi_ = static_cast<result_t<T, Min, Max>>(hi);
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_;
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_;
return clamp<result_t<T, Min, Max>>(v, lo, hi);
dali/core/math_util.h
| auto v_ = static_cast<result_t<T, Min, Max>>(v); | |
| auto lo_ = static_cast<result_t<T, Min, Max>>(lo); | |
| auto hi_ = static_cast<result_t<T, Min, Max>>(hi); | |
| auto lo_clamp_ = v_ <= lo_ ? lo_ : v_; | |
| return lo_clamp_ >= hi_ ? hi_ : lo_clamp_; | |
| return clamp<result_t<T, Min, Max>>(v, lo, hi); |
dali/core/math_util.h
klecki
Oct 2, 2020
Author
Contributor
done
done
| CUDA_CALL(cudaEventElapsedTime(&time, start, end)); | ||
| std::cerr << "Elapsed Time: " << time << " s\n"; | ||
|
|
||
| // time *= (1e+6f / kIters); // convert to nanoseconds / 100 samples |
JanuszL
Sep 30, 2020
Contributor
?
?
klecki
Oct 2, 2020
Author
Contributor
Will remove all of the profiling before posting final version.
Will remove all of the profiling before posting final version.
| @@ -195,6 +258,18 @@ using ExprImplGpuCT = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, fa | |||
| template <ArithmeticOp op, typename Result, typename Left, typename Right> | |||
| using ExprImplGpuTC = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, true, false>>; | |||
|
|
|||
| // template <ArithmeticOp op, typename Result, typename First, typename Second, typename Third, | |||
JanuszL
Sep 30, 2020
Contributor
?
?
2a3a457
to
2d2ecac
|
!build |
|
CI MESSAGE: [1680060]: BUILD STARTED |
|
!build |
|
CI MESSAGE: [1680100]: BUILD STARTED |
| import nvidia.dali.ops | ||
| # Fully circular imports don't work. We need to import _arithm_op late and | ||
| # replace this trampoline function. | ||
| setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op) |
mzient
Oct 6, 2020
Contributor
I find the following simpler - but it's a matter of taste, I guess.
Suggested change
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op)
global _arithm_op
_arithm_op = nvidia.dali.ops._arithm_op
I find the following simpler - but it's a matter of taste, I guess.
| setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op) | |
| global _arithm_op | |
| _arithm_op = nvidia.dali.ops._arithm_op |
klecki
Oct 6, 2020
Author
Contributor
I just copied your code from data_node.py :)
I just copied your code from data_node.py :)
| numpy_in = get_numpy_input(dali_in, kinds[i], dali_in.dtype.type, target_type if target_type is not None else dali_in.dtype.type) | ||
| inputs.append(numpy_in) | ||
| out = as_cpu(pipe_out[arity]).at(sample_id) | ||
| return tuple(inputs) + (out,) |
mzient
Oct 6, 2020
•
Contributor
More Pythonic? ;)
Suggested change
return tuple(inputs) + (out,)
return (*inputs, out)
More Pythonic? ;)
| return tuple(inputs) + (out,) | |
| return (*inputs, out) |
| auto output = static_cast<Result *>(tile.output); | ||
| const void *first = tile.args[0]; | ||
| const void *second = tile.args[1]; | ||
| const void *third = tile.args[2]; |
mzient
Oct 6, 2020
•
Contributor
Suggested change
auto output = static_cast<Result *>(tile.output);
const void *first = tile.args[0];
const void *second = tile.args[1];
const void *third = tile.args[2];
auto *__restrict__ output = static_cast<Result *>(tile.output);
const void *__restrict__ first = tile.args[0];
const void *__restrict__ second = tile.args[1];
const void *__restrict__ third = tile.args[2];
Just today I saw adding __restrict__ outperform caching in shared memory (in another kernel). The speedup was 1.7x
| auto output = static_cast<Result *>(tile.output); | |
| const void *first = tile.args[0]; | |
| const void *second = tile.args[1]; | |
| const void *third = tile.args[2]; | |
| auto *__restrict__ output = static_cast<Result *>(tile.output); | |
| const void *__restrict__ first = tile.args[0]; | |
| const void *__restrict__ second = tile.args[1]; | |
| const void *__restrict__ third = tile.args[2]; |
Just today I saw adding __restrict__ outperform caching in shared memory (in another kernel). The speedup was 1.7x
|
CI MESSAGE: [1680100]: BUILD FAILED |
| * based on `as_ptr` | ||
| */ | ||
| template <bool as_ptr, typename T> | ||
| DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) { |
mzient
Oct 6, 2020
Contributor
Suggested change
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void *__restrict__ ptr, DALIDataType type_id) {
| DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) { | |
| DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void *__restrict__ ptr, DALIDataType type_id) { |
| template <bool as_ptr, typename T> | ||
| DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) { | ||
| TYPE_SWITCH(type_id, type2id, AccessType, ARITHMETIC_ALLOWED_TYPES, ( | ||
| const auto *access = reinterpret_cast<const AccessType*>(ptr); |
mzient
Oct 6, 2020
Contributor
Suggested change
const auto *access = reinterpret_cast<const AccessType*>(ptr);
const auto *__restrict__ access = reinterpret_cast<const AccessType*>(ptr);
| const auto *access = reinterpret_cast<const AccessType*>(ptr); | |
| const auto *__restrict__ access = reinterpret_cast<const AccessType*>(ptr); |
| } | ||
|
|
||
| template <typename T> | ||
| DALI_HOST_DEV T Access(const T* ptr, int64_t idx) { |
mzient
Oct 6, 2020
Contributor
Suggested change
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) {
DALI_HOST_DEV T Access(const T* __restrict__ ptr, int64_t idx) {
| DALI_HOST_DEV T Access(const T* ptr, int64_t idx) { | |
| DALI_HOST_DEV T Access(const T* __restrict__ ptr, int64_t idx) { |
| } | ||
|
|
||
| template <typename T> | ||
| DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) { |
mzient
Oct 6, 2020
Contributor
Suggested change
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) {
DALI_HOST_DEV T Access(const void* __restrict__ ptr, int64_t idx, DALIDataType type_id) {
| DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) { | |
| DALI_HOST_DEV T Access(const void* __restrict__ ptr, int64_t idx, DALIDataType type_id) { |
|
I think there could be some gain from doing this: |
|
You can try restrict, but it's optional. |
| @@ -0,0 +1,57 @@ | |||
| // Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |||
jantonguirao
Oct 7, 2020
Contributor
Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
| // Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
| // Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. |
klecki
Oct 7, 2020
Author
Contributor
done
done
| } | ||
| } | ||
| DALIDataType result = BinaryTypePromotion(types[0], types[1]); | ||
| if (types.size() > 2) { |
jantonguirao
Oct 7, 2020
Contributor
this if is redundant
this if is redundant
klecki
Oct 7, 2020
Author
Contributor
Yes, removed 👍
Yes, removed
| DALI_HOST_DEV static constexpr result_t<L, R> impl(L l, R r) { | ||
| auto l_ = static_cast<result_t<L, R>>(l); | ||
| auto r_ = static_cast<result_t<L, R>>(r); | ||
| return l_ <= r_ ? l_ : r_; |
jantonguirao
Oct 7, 2020
Contributor
Suggested change
return l_ <= r_ ? l_ : r_;
return l_ < r_ ? l_ : r_;
This would work as well, right?
| return l_ <= r_ ? l_ : r_; | |
| return l_ < r_ ? l_ : r_; |
This would work as well, right?
klecki
Oct 7, 2020
Author
Contributor
Yes, will change
Yes, will change
| @@ -0,0 +1,24 @@ | |||
|
|
|||
| // Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |||
jantonguirao
Oct 7, 2020
Contributor
Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
| // Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
| // Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. |
klecki
Oct 7, 2020
Author
Contributor
done
done
| @@ -0,0 +1,25 @@ | |||
|
|
|||
| // Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |||
jantonguirao
Oct 7, 2020
Contributor
Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
| // Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved. | |
| // Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. |
klecki
Oct 7, 2020
Author
Contributor
done
done
| .. note:: | ||
| Type promotion is commutative. | ||
|
|
||
| For more than two arguments, the resulting type is calculated as reduction from left to right |
jantonguirao
Oct 7, 2020
Contributor
Suggested change
For more than two arguments, the resulting type is calculated as reduction from left to right
For more than two arguments, the resulting type is calculated as a reduction from left to right
| For more than two arguments, the resulting type is calculated as reduction from left to right | |
| For more than two arguments, the resulting type is calculated as a reduction from left to right |
klecki
Oct 7, 2020
Author
Contributor
done
done
|
|
||
| For more than two arguments, the resulting type is calculated as reduction from left to right | ||
| - first calculating the result of operating on first two arguments, next between that intermediate | ||
| result and thirs argument and so on, untill we have only the result type left. |
jantonguirao
Oct 7, 2020
Contributor
Suggested change
result and thirs argument and so on, untill we have only the result type left.
result and the third argument and so on, until we have only the result type left.
| result and thirs argument and so on, untill we have only the result type left. | |
| result and the third argument and so on, until we have only the result type left. |
klecki
Oct 7, 2020
Author
Contributor
done
done
| Similarly to arithmetic expressions, one can use selected mathematical functions in the Pipeline | ||
| graph definition. They also accept :class:`nvidia.dali.pipeline.DataNode`, | ||
| :meth:`nvidia.dali.types.Constant` or regular Python value of type ``bool``, ``int``, or ``float`` | ||
| as arguments. At least one of the inputs must be output of other DALI Operator. |
jantonguirao
Oct 7, 2020
Contributor
Suggested change
as arguments. At least one of the inputs must be output of other DALI Operator.
as arguments. At least one of the inputs must be the output of other DALI Operator.
| as arguments. At least one of the inputs must be output of other DALI Operator. | |
| as arguments. At least one of the inputs must be the output of other DALI Operator. |
klecki
Oct 7, 2020
Author
Contributor
done
done
| from invoking other operators. Full documentation can be found in the dedicated documentation |
jantonguirao
Oct 7, 2020
Contributor
Suggested change
from invoking other operators. Full documentation can be found in the dedicated documentation
from invoking other operators. Full documentation can be found in the dedicated section of the documentation
| from invoking other operators. Full documentation can be found in the dedicated documentation | |
| from invoking other operators. Full documentation can be found in the dedicated section of the documentation |
klecki
Oct 7, 2020
Author
Contributor
Done
Done
| from invoking other operators. Full documentation can be found in the dedicated documentation | ||
| for :ref:`mathematical expressions`. |
jantonguirao
Oct 7, 2020
Contributor
Suggested change
for :ref:`mathematical expressions`.
:ref:`mathematical expressions`.
| for :ref:`mathematical expressions`. | |
| :ref:`mathematical expressions`. |
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
.../math/expressions/expression_factory_instances/expression_impl_factory.h
Show resolved
Hide resolved
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
|
!build |
|
CI MESSAGE: [1682623]: BUILD STARTED |
The same behaviour for invalid ranges Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
|
!build |
|
CI MESSAGE: [1682793]: BUILD STARTED |
|
CI MESSAGE: [1682623]: BUILD FAILED |
|
CI MESSAGE: [1682793]: BUILD PASSED |
Why we need this PR?
Add min(a, b), max(a, b), clamp(v, lo, hi) as arithmetic ops.
Todo:
What happened in this PR?
Added support for ternary operators + their type/value switching.
Arithm Ops
Where to put named arithm op functions in DALI Python package?
Nosetest that doesn't end in L1, probably should limit that
TODO
JIRA TASK: [DALI-1628]