Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add min, max and clamp arithmetic ops #2298

Merged
merged 21 commits into from Oct 7, 2020
Merged

Add min, max and clamp arithmetic ops #2298

merged 21 commits into from Oct 7, 2020

Conversation

@klecki
Copy link
Contributor

@klecki klecki commented Sep 24, 2020

Why we need this PR?

Add min(a, b), max(a, b), clamp(v, lo, hi) as arithmetic ops.

Todo:

  • add tests with 0-dim inputs
  • docs

What happened in this PR?

  • What solution was applied:
    Added support for ternary operators + their type/value switching.
  • Affected modules and functionalities:
    Arithm Ops
  • Key points relevant for the review:
    Where to put named arithm op functions in DALI Python package?
  • Validation and testing:
    Nosetest that doesn't end in L1, probably should limit that
  • Documentation (including examples):
    TODO

JIRA TASK: [DALI-1628]

def min(left, right):
return _arithm_op("min", left, right)

def max(left, right):
return _arithm_op("max", left, right)

def clamp(value, lo, hi):
return _arithm_op("clamp", value, lo, hi)
Comment on lines 955 to 962

This comment has been minimized.

@mzient

mzient Sep 24, 2020
Contributor

Maybe they should land in dali.fn? Both the usage mode and lowercase naming would play better with dali.fn.

Besides the module:

Suggested change
def min(left, right):
return _arithm_op("min", left, right)
def max(left, right):
return _arithm_op("max", left, right)
def clamp(value, lo, hi):
return _arithm_op("clamp", value, lo, hi)
def min(left, right):
"""Fills the output with minima of corresponding values in ``left`` and ``rigt``"""
return _arithm_op("min", left, right)
def max(left, right):
"""Fills the output with maxima of corresponding values in ``left`` and ``rigt``"""
return _arithm_op("max", left, right)
def clamp(value, lo, hi):
"""Produces a tensor of values from ``value`` clamped to the range ``lo``..``hi``."""
return _arithm_op("clamp", value, lo, hi)
@@ -115,8 +115,8 @@ DLL_PUBLIC DALIDataType PropagateTypes(ExprNode &expr, const workspace_t<Backend
}
auto &func = dynamic_cast<ExprFunc &>(expr);
int subexpression_count = func.GetSubexpressionCount();
DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2,
"Only unary and binary expressions are supported");
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,

This comment has been minimized.

@JanuszL

JanuszL Sep 24, 2020
Contributor

Suggested change
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,
DALI_ENFORCE(0 < subexpression_count && subexpression_count <= kMaxArity,

To be consistent with L340.

This comment has been minimized.

@klecki

klecki Oct 5, 2020
Author Contributor

Done

@@ -193,8 +193,8 @@ DLL_PUBLIC inline const TensorListShape<> &PropagateShapes(ExprNode &expr,
}
auto &func = dynamic_cast<ExprFunc &>(expr);
int subexpression_count = expr.GetSubexpressionCount();
DALI_ENFORCE(subexpression_count == 1 || subexpression_count == 2,
"Only unary and binary expressions are supported");
DALI_ENFORCE(1 <= subexpression_count && subexpression_count <= kMaxArity,

This comment has been minimized.

@JanuszL

JanuszL Sep 24, 2020
Contributor

As above

This comment has been minimized.

@klecki

klecki Oct 5, 2020
Author Contributor

Done

} else {
DALI_FAIL("Expression cannot have three scalar operands");
}
), DALI_FAIL("No suitable type found");); // NOLINT(whitespace/parens)

This comment has been minimized.

@JanuszL

JanuszL Sep 24, 2020
Contributor

Could you print type as well and tell which argument is the faulty one?

auto v_ = static_cast<result_t<T, Min, Max>>(v);
auto lo_ = static_cast<result_t<T, Min, Max>>(lo);
auto hi_ = static_cast<result_t<T, Min, Max>>(hi);
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_;
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_;
Comment on lines 429 to 433

This comment has been minimized.

@mzient

mzient Sep 25, 2020
Contributor

Suggested change
auto v_ = static_cast<result_t<T, Min, Max>>(v);
auto lo_ = static_cast<result_t<T, Min, Max>>(lo);
auto hi_ = static_cast<result_t<T, Min, Max>>(hi);
auto lo_clamp_ = v_ <= lo_ ? lo_ : v_;
return lo_clamp_ >= hi_ ? hi_ : lo_clamp_;
return clamp<result_t<T, Min, Max>>(v, lo, hi);

dali/core/math_util.h

This comment has been minimized.

@klecki

klecki Oct 2, 2020
Author Contributor

done

CUDA_CALL(cudaEventElapsedTime(&time, start, end));
std::cerr << "Elapsed Time: " << time << " s\n";

// time *= (1e+6f / kIters); // convert to nanoseconds / 100 samples

This comment has been minimized.

@JanuszL

JanuszL Sep 30, 2020
Contributor

?

This comment has been minimized.

@klecki

klecki Oct 2, 2020
Author Contributor

Will remove all of the profiling before posting final version.

@@ -195,6 +258,18 @@ using ExprImplGpuCT = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, fa
template <ArithmeticOp op, typename Result, typename Left, typename Right>
using ExprImplGpuTC = ExprImplGPUInvoke<InvokerBinOp<op, Result, Left, Right, true, false>>;

// template <ArithmeticOp op, typename Result, typename First, typename Second, typename Third,

This comment has been minimized.

@JanuszL

JanuszL Sep 30, 2020
Contributor

?

@klecki klecki force-pushed the klecki:ternary-ops branch 2 times, most recently from 2a3a457 to 2d2ecac Oct 6, 2020
@klecki
Copy link
Contributor Author

@klecki klecki commented Oct 6, 2020

!build

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 6, 2020

CI MESSAGE: [1680060]: BUILD STARTED

@klecki
Copy link
Contributor Author

@klecki klecki commented Oct 6, 2020

!build

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 6, 2020

CI MESSAGE: [1680100]: BUILD STARTED

import nvidia.dali.ops
# Fully circular imports don't work. We need to import _arithm_op late and
# replace this trampoline function.
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op)

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

I find the following simpler - but it's a matter of taste, I guess.

Suggested change
setattr(sys.modules[__name__], "_arithm_op", nvidia.dali.ops._arithm_op)
global _arithm_op
_arithm_op = nvidia.dali.ops._arithm_op

This comment has been minimized.

@klecki

klecki Oct 6, 2020
Author Contributor

I just copied your code from data_node.py :)

numpy_in = get_numpy_input(dali_in, kinds[i], dali_in.dtype.type, target_type if target_type is not None else dali_in.dtype.type)
inputs.append(numpy_in)
out = as_cpu(pipe_out[arity]).at(sample_id)
return tuple(inputs) + (out,)

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

More Pythonic? ;)

Suggested change
return tuple(inputs) + (out,)
return (*inputs, out)
auto output = static_cast<Result *>(tile.output);
const void *first = tile.args[0];
const void *second = tile.args[1];
const void *third = tile.args[2];
Comment on lines +136 to +139

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

Suggested change
auto output = static_cast<Result *>(tile.output);
const void *first = tile.args[0];
const void *second = tile.args[1];
const void *third = tile.args[2];
auto *__restrict__ output = static_cast<Result *>(tile.output);
const void *__restrict__ first = tile.args[0];
const void *__restrict__ second = tile.args[1];
const void *__restrict__ third = tile.args[2];

Just today I saw adding __restrict__ outperform caching in shared memory (in another kernel). The speedup was 1.7x

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 6, 2020

CI MESSAGE: [1680100]: BUILD FAILED

* based on `as_ptr`
*/
template <bool as_ptr, typename T>
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

Suggested change
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void *__restrict__ ptr, DALIDataType type_id) {
template <bool as_ptr, typename T>
DALI_HOST_DEV std::enable_if_t<!as_ptr, T> Pass(const void* ptr, DALIDataType type_id) {
TYPE_SWITCH(type_id, type2id, AccessType, ARITHMETIC_ALLOWED_TYPES, (
const auto *access = reinterpret_cast<const AccessType*>(ptr);

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

Suggested change
const auto *access = reinterpret_cast<const AccessType*>(ptr);
const auto *__restrict__ access = reinterpret_cast<const AccessType*>(ptr);
}

template <typename T>
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) {

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

Suggested change
DALI_HOST_DEV T Access(const T* ptr, int64_t idx) {
DALI_HOST_DEV T Access(const T* __restrict__ ptr, int64_t idx) {
}

template <typename T>
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) {

This comment has been minimized.

@mzient

mzient Oct 6, 2020
Contributor

Suggested change
DALI_HOST_DEV T Access(const void* ptr, int64_t idx, DALIDataType type_id) {
DALI_HOST_DEV T Access(const void* __restrict__ ptr, int64_t idx, DALIDataType type_id) {
Copy link
Contributor

@mzient mzient left a comment

I think there could be some gain from doing this:
const ExtendedTileDesc *__restrict__ tiles
the tile is read many times, so making it cacheable is desireable. There is some type-punning going on, so the compiler might be quite conservative here.

@mzient
mzient approved these changes Oct 6, 2020
Copy link
Contributor

@mzient mzient left a comment

You can try restrict, but it's optional.

@@ -0,0 +1,57 @@
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

done

}
}
DALIDataType result = BinaryTypePromotion(types[0], types[1]);
if (types.size() > 2) {

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

this if is redundant

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

Yes, removed 👍

DALI_HOST_DEV static constexpr result_t<L, R> impl(L l, R r) {
auto l_ = static_cast<result_t<L, R>>(l);
auto r_ = static_cast<result_t<L, R>>(r);
return l_ <= r_ ? l_ : r_;

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
return l_ <= r_ ? l_ : r_;
return l_ < r_ ? l_ : r_;

This would work as well, right?

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

Yes, will change

@@ -0,0 +1,24 @@

// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

done

@@ -0,0 +1,25 @@

// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
// Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

done

.. note::
Type promotion is commutative.

For more than two arguments, the resulting type is calculated as reduction from left to right

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
For more than two arguments, the resulting type is calculated as reduction from left to right
For more than two arguments, the resulting type is calculated as a reduction from left to right

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

done


For more than two arguments, the resulting type is calculated as reduction from left to right
- first calculating the result of operating on first two arguments, next between that intermediate
result and thirs argument and so on, untill we have only the result type left.

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
result and thirs argument and so on, untill we have only the result type left.
result and the third argument and so on, until we have only the result type left.

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

done

Similarly to arithmetic expressions, one can use selected mathematical functions in the Pipeline
graph definition. They also accept :class:`nvidia.dali.pipeline.DataNode`,
:meth:`nvidia.dali.types.Constant` or regular Python value of type ``bool``, ``int``, or ``float``
as arguments. At least one of the inputs must be output of other DALI Operator.

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
as arguments. At least one of the inputs must be output of other DALI Operator.
as arguments. At least one of the inputs must be the output of other DALI Operator.

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

done

from invoking other operators. Full documentation can be found in the dedicated documentation

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
from invoking other operators. Full documentation can be found in the dedicated documentation
from invoking other operators. Full documentation can be found in the dedicated section of the documentation

This comment has been minimized.

@klecki

klecki Oct 7, 2020
Author Contributor

Done

from invoking other operators. Full documentation can be found in the dedicated documentation
for :ref:`mathematical expressions`.

This comment has been minimized.

@jantonguirao

jantonguirao Oct 7, 2020
Contributor

Suggested change
for :ref:`mathematical expressions`.
:ref:`mathematical expressions`.
klecki added 4 commits Sep 24, 2020
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
klecki added 11 commits Oct 1, 2020
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
There appears to be an issue with the clamp tests

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
docs/supported_ops.rst Outdated Show resolved Hide resolved
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki klecki force-pushed the klecki:ternary-ops branch from a7351e7 to 1701b8d Oct 7, 2020
@klecki
Copy link
Contributor Author

@klecki klecki commented Oct 7, 2020

!build

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 7, 2020

CI MESSAGE: [1682623]: BUILD STARTED

@JanuszL
JanuszL approved these changes Oct 7, 2020
The same behaviour for invalid ranges

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

@klecki klecki commented Oct 7, 2020

!build

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 7, 2020

CI MESSAGE: [1682793]: BUILD STARTED

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 7, 2020

CI MESSAGE: [1682623]: BUILD FAILED

@dali-automaton
Copy link

@dali-automaton dali-automaton commented Oct 7, 2020

CI MESSAGE: [1682793]: BUILD PASSED

@klecki klecki merged commit 6c5d611 into NVIDIA:master Oct 7, 2020
4 checks passed
4 checks passed
DCO DCO
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/gitlab-master/nvidia/dali [1682793]: BUILD PASSED
ci/jenkins/nvidia/dali BUILD SCHEDULED TO gitlab-master
Details
@klecki klecki deleted the klecki:ternary-ops branch Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants