Doc updates to readme.md and howitworks.md #283

EvenOldridge · 2020-09-09T03:50:29Z

Updated docs to better reflect 0.2 and our reliance on Dask-cuDF.


          Update README.md

Removed out of date benchmarks and updated the description


          Merge pull request #1 from EvenOldridge/EvenOldridge-Doc-Update

Update README.md to better describe 0.2 and to remove out of date benchmarks.


          Update README.md

Minor edits


          Updated How it works to reflect the changes in 0.2

Added dask details

nvidia-merlin-bot · 2020-09-09T03:58:31Z

Click to view CI Results

GitHub pull request #283 of commit 378af815d379213bcd862314fa91c561c5c0deec, no merge conflicts.
Running as SYSTEM
Setting status of 378af815d379213bcd862314fa91c561c5c0deec to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/802/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse 378af815d379213bcd862314fa91c561c5c0deec^{commit} # timeout=10
Checking out Revision 378af815d379213bcd862314fa91c561c5c0deec (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 378af815d379213bcd862314fa91c561c5c0deec # timeout=10
Commit message: "Updated How it works to reflect the changes in 0.2"
 > git rev-list --no-walk 466a298b205957900a66d5ceda43431d709fa910 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3915493954289684944.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 13%]

tests/unit/test_io.py .................................................. [ 25%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 61%]

........................................                                 [ 70%]

tests/unit/test_s3.py ..                                                 [ 70%]

tests/unit/test_tf_dataloader.py ............                            [ 73%]

tests/unit/test_torch_dataloader.py ...............                      [ 77%]

tests/unit/test_workflow.py ............................................ [ 87%]

.......................................................                  [100%]
=============================== warnings summary ===============================

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33549 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29960 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29568 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:97: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                  Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                     8      0      0      0   100%

nvtabular/io/init.py                  4      0      0      0   100%

nvtabular/io/csv.py                      14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                     80      3     32      6    92%   154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178

nvtabular/io/dataframe_engine.py         12      2      4      1    81%   31->32, 32, 37

nvtabular/io/dataset.py                  99      9     46      8    88%   94->95, 95, 107->108, 108, 116->117, 117, 125->137, 130->135, 135-137, 212->213, 213, 227->228, 228-229, 247->248, 248

nvtabular/io/dataset_engine.py           12      0      0      0   100%

nvtabular/io/hugectr.py                  42      1     18      1    97%   64->87, 91

nvtabular/io/parquet.py                 153      1     50      3    98%   139->140, 140, 235->237, 243->248

nvtabular/io/shuffle.py                  25      2     10      2    89%   38->39, 39, 43->46, 46

nvtabular/io/writer.py                  119      9     42      2    92%   29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197

nvtabular/io/writer_factory.py           16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py              0      0      0      0   100%

nvtabular/loader/backend.py             207     15     56      4    92%   84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308

nvtabular/loader/tensorflow.py          109     16     46     10    82%   39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307

nvtabular/loader/tf_utils.py             51      7     20      5    83%   13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48

nvtabular/loader/torch.py                33      0      4      0   100%

nvtabular/ops/init.py                20      0      0      0   100%

nvtabular/ops/categorify.py             365     54    192     38    82%   143->144, 144, 152->157, 157, 167->168, 168, 212->213, 213, 256->257, 257, 260->266, 336->337, 337-339, 341->342, 342, 343->344, 344, 362->365, 365, 376->377, 377, 383->386, 409->410, 410-411, 413->414, 414-415, 417->418, 418-434, 436->440, 440, 444->445, 445, 446->447, 447, 454->455, 455, 456->457, 457, 463->464, 464, 473->482, 482-483, 487->488, 488, 501->502, 502, 504->507, 509->526, 526-529, 552->553, 553, 556->557, 557, 558->559, 559, 566->567, 567, 568->571, 571, 678->679, 679, 680->681, 681, 702->717, 742->747, 745->746, 746, 756->753, 761->753

nvtabular/ops/clip.py                    25      3     10      4    80%   52->53, 53, 61->62, 62, 66->68, 68->69, 69

nvtabular/ops/column_similarity.py       89     21     28      4    70%   171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238

nvtabular/ops/difference_lag.py          21      1      4      1    92%   73->74, 74

nvtabular/ops/dropna.py                  14      0      0      0   100%

nvtabular/ops/fill.py                    36      2     10      2    91%   53->54, 54, 82->83, 83

nvtabular/ops/filter.py                  17      1      2      1    89%   44->45, 45

nvtabular/ops/groupby_statistics.py      80      3     30      3    95%   146->147, 147, 151->176, 183->184, 184, 208

nvtabular/ops/hash_bucket.py             30      4     16      2    83%   31->32, 32-34, 35->38, 38

nvtabular/ops/join_external.py           66      4     26      5    90%   80->81, 81, 82->83, 83, 97->100, 100, 113->117, 153->154, 154

nvtabular/ops/join_groupby.py            56      0     18      0   100%

nvtabular/ops/lambdaop.py                24      2      8      2    88%   46->47, 47, 48->49, 49

nvtabular/ops/logop.py                   17      1      4      1    90%   45->46, 46

nvtabular/ops/median.py                  24      1      2      0    96%   52

nvtabular/ops/minmax.py                  30      1      2      0    97%   56

nvtabular/ops/moments.py                 33      1      2      0    97%   60

nvtabular/ops/normalize.py               49      4     14      4    84%   53->54, 54, 61->60, 98->99, 99, 108->110, 110-111

nvtabular/ops/operator.py                19      1      8      2    89%   43->42, 45->46, 46

nvtabular/ops/stat_operator.py           10      0      0      0   100%

nvtabular/ops/target_encoding.py         92      1     22      3    96%   144->146, 173->174, 174, 225->228

nvtabular/ops/transform_operator.py      41      6     10      2    80%   42-46, 68->69, 69-71, 88->89, 89

nvtabular/utils.py                       17      3      6      3    74%   22->23, 23, 25->26, 26, 33->34, 34

nvtabular/worker.py                      65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py                   420     38    232     24    89%   99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905

setup.py                                  2      2      0      0     0%   18-20
TOTAL                                  2646    223   1014    148    89%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 88.77%

================= 431 passed, 17 warnings in 453.26s (0:07:33) =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins3717350163533120138.sh

benfred

Thanks for this! Aside from one minor thing this looks great

README.md

benfred · 2020-09-09T05:41:37Z

README.md


 ```
-docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.1 /bin/bash
+docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.2 /bin/bash


This container hasn't been published yet - but I think we should update the README now anyways in anticipation of this


          Merge branch 'main' into main

benfred · 2020-09-09T16:51:21Z

rerun tests


          Merge branch 'main' into main


          Update README.md

nvidia-merlin-bot · 2020-09-09T16:59:42Z

Click to view CI Results

GitHub pull request #283 of commit c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7, no merge conflicts.
Running as SYSTEM
Setting status of c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/806/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7^{commit} # timeout=10
Checking out Revision c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 887a853f27a3d789d628acda72bba145204ec59b # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins711813519864152018.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 13%]

tests/unit/test_io.py .................................................. [ 25%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 61%]

........................................                                 [ 70%]

tests/unit/test_s3.py ..                                                 [ 70%]

tests/unit/test_tf_dataloader.py ............                            [ 73%]

tests/unit/test_torch_dataloader.py ...............                      [ 77%]

tests/unit/test_workflow.py ............................................ [ 87%]

.......................................................                  [100%]
=============================== warnings summary ===============================

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41825 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29960 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29568 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:182: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                  Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                     8      0      0      0   100%

nvtabular/io/init.py                  4      0      0      0   100%

nvtabular/io/csv.py                      14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                     80      3     32      6    92%   154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178

nvtabular/io/dataframe_engine.py         12      2      4      1    81%   31->32, 32, 37

nvtabular/io/dataset.py                  99      9     46      8    88%   179->180, 180, 192->193, 193, 201->202, 202, 210->222, 215->220, 220-222, 297->298, 298, 312->313, 313-314, 332->333, 333

nvtabular/io/dataset_engine.py           12      0      0      0   100%

nvtabular/io/hugectr.py                  42      1     18      1    97%   64->87, 91

nvtabular/io/parquet.py                 153      1     50      3    98%   139->140, 140, 235->237, 243->248

nvtabular/io/shuffle.py                  25      2     10      2    89%   38->39, 39, 43->46, 46

nvtabular/io/writer.py                  119      9     42      2    92%   29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197

nvtabular/io/writer_factory.py           16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py              0      0      0      0   100%

nvtabular/loader/backend.py             207     15     56      4    92%   84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308

nvtabular/loader/tensorflow.py          109     16     46     10    82%   39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307

nvtabular/loader/tf_utils.py             51      7     20      5    83%   13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48

nvtabular/loader/torch.py                33      0      4      0   100%

nvtabular/ops/init.py                20      0      0      0   100%

nvtabular/ops/categorify.py             365     54    192     38    82%   155->156, 156, 164->169, 169, 179->180, 180, 224->225, 225, 268->269, 269, 272->278, 348->349, 349-351, 353->354, 354, 355->356, 356, 374->377, 377, 388->389, 389, 395->398, 421->422, 422-423, 425->426, 426-427, 429->430, 430-446, 448->452, 452, 456->457, 457, 458->459, 459, 466->467, 467, 468->469, 469, 475->476, 476, 485->494, 494-495, 499->500, 500, 513->514, 514, 516->519, 521->538, 538-541, 564->565, 565, 568->569, 569, 570->571, 571, 578->579, 579, 580->583, 583, 690->691, 691, 692->693, 693, 714->729, 754->759, 757->758, 758, 768->765, 773->765

nvtabular/ops/clip.py                    25      3     10      4    80%   52->53, 53, 61->62, 62, 66->68, 68->69, 69

nvtabular/ops/column_similarity.py       89     21     28      4    70%   171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238

nvtabular/ops/difference_lag.py          21      1      4      1    92%   73->74, 74

nvtabular/ops/dropna.py                  14      0      0      0   100%

nvtabular/ops/fill.py                    36      2     10      2    91%   66->67, 67, 107->108, 108

nvtabular/ops/filter.py                  17      1      2      1    89%   44->45, 45

nvtabular/ops/groupby_statistics.py      80      3     30      3    95%   146->147, 147, 151->176, 183->184, 184, 208

nvtabular/ops/hash_bucket.py             30      4     16      2    83%   94->95, 95-97, 98->101, 101

nvtabular/ops/join_external.py           66      4     26      5    90%   105->106, 106, 107->108, 108, 122->125, 125, 138->142, 178->179, 179

nvtabular/ops/join_groupby.py            56      0     18      0   100%

nvtabular/ops/lambdaop.py                24      2      8      2    88%   82->83, 83, 84->85, 85

nvtabular/ops/logop.py                   17      1      4      1    90%   57->58, 58

nvtabular/ops/median.py                  24      1      2      0    96%   52

nvtabular/ops/minmax.py                  30      1      2      0    97%   56

nvtabular/ops/moments.py                 33      1      2      0    97%   60

nvtabular/ops/normalize.py               49      4     14      4    84%   65->66, 66, 73->72, 122->123, 123, 132->134, 134-135

nvtabular/ops/operator.py                19      1      8      2    89%   43->42, 45->46, 46

nvtabular/ops/stat_operator.py           10      0      0      0   100%

nvtabular/ops/target_encoding.py         92      1     22      3    96%   144->146, 173->174, 174, 225->228

nvtabular/ops/transform_operator.py      41      6     10      2    80%   42-46, 68->69, 69-71, 88->89, 89

nvtabular/utils.py                       25      5     10      5    71%   26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47

nvtabular/worker.py                      65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py                   420     38    232     24    89%   99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905

setup.py                                  2      2      0      0     0%   18-20
TOTAL                                  2654    225   1018    150    89%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 88.70%

================= 431 passed, 17 warnings in 471.65s (0:07:51) =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins5629388077575832059.sh

nvidia-merlin-bot · 2020-09-09T17:08:17Z

Click to view CI Results

GitHub pull request #283 of commit 652e93d2b581aabd3af46175ce95aa544d0679c5, no merge conflicts.
Running as SYSTEM
Setting status of 652e93d2b581aabd3af46175ce95aa544d0679c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/807/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse 652e93d2b581aabd3af46175ce95aa544d0679c5^{commit} # timeout=10
Checking out Revision 652e93d2b581aabd3af46175ce95aa544d0679c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 652e93d2b581aabd3af46175ce95aa544d0679c5 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1392405149694813284.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 13%]

tests/unit/test_io.py .................................................. [ 25%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 61%]

........................................                                 [ 70%]

tests/unit/test_s3.py ..                                                 [ 70%]

tests/unit/test_tf_dataloader.py ............                            [ 73%]

tests/unit/test_torch_dataloader.py ...............                      [ 77%]

tests/unit/test_workflow.py ............................................ [ 87%]

.......................................................                  [100%]
=============================== warnings summary ===============================

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39803 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 30520 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29568 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:182: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                  Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                     8      0      0      0   100%

nvtabular/io/init.py                  4      0      0      0   100%

nvtabular/io/csv.py                      14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                     80      3     32      6    92%   154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178

nvtabular/io/dataframe_engine.py         12      2      4      1    81%   31->32, 32, 37

nvtabular/io/dataset.py                  99      9     46      8    88%   179->180, 180, 192->193, 193, 201->202, 202, 210->222, 215->220, 220-222, 297->298, 298, 312->313, 313-314, 332->333, 333

nvtabular/io/dataset_engine.py           12      0      0      0   100%

nvtabular/io/hugectr.py                  42      1     18      1    97%   64->87, 91

nvtabular/io/parquet.py                 153      1     50      3    98%   139->140, 140, 235->237, 243->248

nvtabular/io/shuffle.py                  25      2     10      2    89%   38->39, 39, 43->46, 46

nvtabular/io/writer.py                  119      9     42      2    92%   29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197

nvtabular/io/writer_factory.py           16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py              0      0      0      0   100%

nvtabular/loader/backend.py             207     15     56      4    92%   84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308

nvtabular/loader/tensorflow.py          109     16     46     10    82%   39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307

nvtabular/loader/tf_utils.py             51      7     20      5    83%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 56->64, 59-64

nvtabular/loader/torch.py                33      0      4      0   100%

nvtabular/ops/init.py                20      0      0      0   100%

nvtabular/ops/categorify.py             365     54    192     38    82%   155->156, 156, 164->169, 169, 179->180, 180, 224->225, 225, 268->269, 269, 272->278, 348->349, 349-351, 353->354, 354, 355->356, 356, 374->377, 377, 388->389, 389, 395->398, 421->422, 422-423, 425->426, 426-427, 429->430, 430-446, 448->452, 452, 456->457, 457, 458->459, 459, 466->467, 467, 468->469, 469, 475->476, 476, 485->494, 494-495, 499->500, 500, 513->514, 514, 516->519, 521->538, 538-541, 564->565, 565, 568->569, 569, 570->571, 571, 578->579, 579, 580->583, 583, 690->691, 691, 692->693, 693, 714->729, 754->759, 757->758, 758, 768->765, 773->765

nvtabular/ops/clip.py                    25      3     10      4    80%   52->53, 53, 61->62, 62, 66->68, 68->69, 69

nvtabular/ops/column_similarity.py       89     21     28      4    70%   171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238

nvtabular/ops/difference_lag.py          21      1      4      1    92%   73->74, 74

nvtabular/ops/dropna.py                  14      0      0      0   100%

nvtabular/ops/fill.py                    36      2     10      2    91%   66->67, 67, 107->108, 108

nvtabular/ops/filter.py                  17      1      2      1    89%   44->45, 45

nvtabular/ops/groupby_statistics.py      80      3     30      3    95%   146->147, 147, 151->176, 183->184, 184, 208

nvtabular/ops/hash_bucket.py             30      4     16      2    83%   94->95, 95-97, 98->101, 101

nvtabular/ops/join_external.py           66      4     26      5    90%   105->106, 106, 107->108, 108, 122->125, 125, 138->142, 178->179, 179

nvtabular/ops/join_groupby.py            56      0     18      0   100%

nvtabular/ops/lambdaop.py                24      2      8      2    88%   82->83, 83, 84->85, 85

nvtabular/ops/logop.py                   17      1      4      1    90%   57->58, 58

nvtabular/ops/median.py                  24      1      2      0    96%   52

nvtabular/ops/minmax.py                  30      1      2      0    97%   56

nvtabular/ops/moments.py                 33      1      2      0    97%   60

nvtabular/ops/normalize.py               49      4     14      4    84%   65->66, 66, 73->72, 122->123, 123, 132->134, 134-135

nvtabular/ops/operator.py                19      1      8      2    89%   43->42, 45->46, 46

nvtabular/ops/stat_operator.py           10      0      0      0   100%

nvtabular/ops/target_encoding.py         92      1     22      3    96%   144->146, 173->174, 174, 225->228

nvtabular/ops/transform_operator.py      41      6     10      2    80%   42-46, 68->69, 69-71, 88->89, 89

nvtabular/utils.py                       25      5     10      5    71%   26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47

nvtabular/worker.py                      65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py                   420     38    232     24    89%   99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905

setup.py                                  2      2      0      0     0%   18-20
TOTAL                                  2654    225   1018    150    89%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 88.70%

================= 431 passed, 17 warnings in 498.79s (0:08:18) =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins4210699040404548045.sh

nvidia-merlin-bot · 2020-09-09T17:16:11Z

Click to view CI Results

GitHub pull request #283 of commit 85333ae754c0512f7b213a4e98117a1501500dda, no merge conflicts.
Running as SYSTEM
Setting status of 85333ae754c0512f7b213a4e98117a1501500dda to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/808/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse 85333ae754c0512f7b213a4e98117a1501500dda^{commit} # timeout=10
Checking out Revision 85333ae754c0512f7b213a4e98117a1501500dda (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 85333ae754c0512f7b213a4e98117a1501500dda # timeout=10
Commit message: "Update README.md"
 > git rev-list --no-walk 652e93d2b581aabd3af46175ce95aa544d0679c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1141379037389914386.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
tests/unit/test_column_similarity.py ......                              [  1%]

tests/unit/test_dask_nvt.py ............................................ [ 11%]

..........                                                               [ 13%]

tests/unit/test_io.py .................................................. [ 25%]

............................                                             [ 32%]

tests/unit/test_notebooks.py ....                                        [ 32%]

tests/unit/test_ops.py ................................................. [ 44%]

........................................................................ [ 61%]

........................................                                 [ 70%]

tests/unit/test_s3.py ..                                                 [ 70%]

tests/unit/test_tf_dataloader.py ............                            [ 73%]

tests/unit/test_torch_dataloader.py ...............                      [ 77%]

tests/unit/test_workflow.py ............................................ [ 87%]

.......................................................                  [100%]
=============================== warnings summary ===============================

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12

/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

import pandas.util.testing
tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]

tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.

warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)
tests/unit/test_notebooks.py::test_multigpu_dask_example

/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43099 instead

http_address["port"], self.http_server.port
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

mask = pd.Series(mask)
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29960 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 30912 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069

f"Row group size {rg_byte_size_0} is bigger than requested part_size "
tests/unit/test_workflow.py::test_chaining_3

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:182: UserWarning: part_mem_fraction is ignored for DataFrame input.

warnings.warn("part_mem_fraction is ignored for DataFrame input.")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
----------- coverage: platform linux, python 3.7.8-final-0 -----------

Name                                  Stmts   Miss Branch BrPart  Cover   Missing
nvtabular/init.py                     8      0      0      0   100%

nvtabular/io/init.py                  4      0      0      0   100%

nvtabular/io/csv.py                      14      1      4      1    89%   35->36, 36

nvtabular/io/dask.py                     80      3     32      6    92%   154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178

nvtabular/io/dataframe_engine.py         12      2      4      1    81%   31->32, 32, 37

nvtabular/io/dataset.py                  99      9     46      8    88%   179->180, 180, 192->193, 193, 201->202, 202, 210->222, 215->220, 220-222, 297->298, 298, 312->313, 313-314, 332->333, 333

nvtabular/io/dataset_engine.py           12      0      0      0   100%

nvtabular/io/hugectr.py                  42      1     18      1    97%   64->87, 91

nvtabular/io/parquet.py                 153      1     50      3    98%   139->140, 140, 235->237, 243->248

nvtabular/io/shuffle.py                  25      2     10      2    89%   38->39, 39, 43->46, 46

nvtabular/io/writer.py                  119      9     42      2    92%   29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197

nvtabular/io/writer_factory.py           16      2      6      2    82%   31->32, 32, 49->52, 52

nvtabular/loader/init.py              0      0      0      0   100%

nvtabular/loader/backend.py             207     15     56      4    92%   84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308

nvtabular/loader/tensorflow.py          109     16     46     10    82%   39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307

nvtabular/loader/tf_utils.py             51      7     20      5    83%   29->32, 32->34, 39->41, 42->43, 43, 50-51, 56->64, 59-64

nvtabular/loader/torch.py                33      0      4      0   100%

nvtabular/ops/init.py                20      0      0      0   100%

nvtabular/ops/categorify.py             365     54    192     38    82%   155->156, 156, 164->169, 169, 179->180, 180, 224->225, 225, 268->269, 269, 272->278, 348->349, 349-351, 353->354, 354, 355->356, 356, 374->377, 377, 388->389, 389, 395->398, 421->422, 422-423, 425->426, 426-427, 429->430, 430-446, 448->452, 452, 456->457, 457, 458->459, 459, 466->467, 467, 468->469, 469, 475->476, 476, 485->494, 494-495, 499->500, 500, 513->514, 514, 516->519, 521->538, 538-541, 564->565, 565, 568->569, 569, 570->571, 571, 578->579, 579, 580->583, 583, 690->691, 691, 692->693, 693, 714->729, 754->759, 757->758, 758, 768->765, 773->765

nvtabular/ops/clip.py                    25      3     10      4    80%   52->53, 53, 61->62, 62, 66->68, 68->69, 69

nvtabular/ops/column_similarity.py       89     21     28      4    70%   171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238

nvtabular/ops/difference_lag.py          21      1      4      1    92%   73->74, 74

nvtabular/ops/dropna.py                  14      0      0      0   100%

nvtabular/ops/fill.py                    36      2     10      2    91%   66->67, 67, 107->108, 108

nvtabular/ops/filter.py                  17      1      2      1    89%   44->45, 45

nvtabular/ops/groupby_statistics.py      80      3     30      3    95%   146->147, 147, 151->176, 183->184, 184, 208

nvtabular/ops/hash_bucket.py             30      4     16      2    83%   94->95, 95-97, 98->101, 101

nvtabular/ops/join_external.py           66      4     26      5    90%   105->106, 106, 107->108, 108, 122->125, 125, 138->142, 178->179, 179

nvtabular/ops/join_groupby.py            56      0     18      0   100%

nvtabular/ops/lambdaop.py                24      2      8      2    88%   82->83, 83, 84->85, 85

nvtabular/ops/logop.py                   17      1      4      1    90%   57->58, 58

nvtabular/ops/median.py                  24      1      2      0    96%   52

nvtabular/ops/minmax.py                  30      1      2      0    97%   56

nvtabular/ops/moments.py                 33      1      2      0    97%   60

nvtabular/ops/normalize.py               49      4     14      4    84%   65->66, 66, 73->72, 122->123, 123, 132->134, 134-135

nvtabular/ops/operator.py                19      1      8      2    89%   43->42, 45->46, 46

nvtabular/ops/stat_operator.py           10      0      0      0   100%

nvtabular/ops/target_encoding.py         92      1     22      3    96%   144->146, 173->174, 174, 225->228

nvtabular/ops/transform_operator.py      41      6     10      2    80%   42-46, 68->69, 69-71, 88->89, 89

nvtabular/utils.py                       25      5     10      5    71%   26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47

nvtabular/worker.py                      65      1     30      2    97%   80->92, 118->121, 121

nvtabular/workflow.py                   420     38    232     24    89%   99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905

setup.py                                  2      2      0      0     0%   18-20
TOTAL                                  2654    225   1018    150    89%

Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 88.70%

================= 431 passed, 17 warnings in 456.01s (0:07:36) =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

source activate rapids

cd /var/jenkins_home/

python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins3263314540322387925.sh

rjzamora · 2020-09-09T17:33:55Z

HowItWorks.md

@@ -3,11 +3,9 @@ How it Works

 ![NVTabular Workflow](./images/nvt_workflow.png)

-NVTabular wraps the RAPIDS cuDF library which provides the bulk of the functionality, accelerating dataframe operations on the GPU.  We found in our internal usage of cuDF on massive datasets like [Criteo](https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/) or [RecSys 2020](https://recsys-twitter.com/) that it wasn’t straightforward to use once the dataset had scaled past GPU memory.  The same design pattern kept emerging for us and we decided to package it up as NVTabular in order to make tabular data workflows simpler.
+With the transition to v0.2 the NVTabular engine uses the [RAPIDS](http://www.rapids.ai) [Dask-cuDF library](https://github.com/rapidsai/dask-cuda) which provides the bulk of the functionality, accelerating dataframe operations on the GPU, and scaling across multiple GPUs.  NVTabular provides functionality commonly found in deep learning recommendation workflows, allowing you to focus on what you want to do with your data, not how you need to do it.  We also provide a template for our core compute mechanism, Operations, or ‘ops’ allowing you to build your own custom ops from cuDF and other libraries.


The Dask-CuDF link is actually pointing to the Dask-CUDA library. Since Dask-CuDF is actually a part of the CuDF repository, there is not a great landing page at the moment. For now, it may be best to point to: https://github.com/rapidsai/cudf/tree/main/python/dask_cudf

I'll submit a small PR with the change - but wanted to make a quick note here in case I got pulled away.

EvenOldridge added 4 commits Sep 9, 2020

Update README.md

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

aba7cbe

Removed out of date benchmarks and updated the description

Update README.md

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

c2f95f8

Minor edits

Updated How it works to reflect the changes in 0.2

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

378af81

Added dask details

EvenOldridge added the documentation label Sep 9, 2020

EvenOldridge requested review from benfred and rjzamora Sep 9, 2020

EvenOldridge added this to In progress in v0.2 Release via automation Sep 9, 2020

benfred reviewed Sep 9, 2020

View changes

Merge branch 'main' into main

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

c5c74ee

benfred added 2 commits Sep 9, 2020

Merge branch 'main' into main

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

652e93d

Update README.md

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

85333ae

benfred approved these changes Sep 9, 2020

View changes

benfred merged commit 1707046 into NVIDIA:main Sep 9, 2020
1 check passed

1 check passed

Jenkins Unit Test Run Success
Details

v0.2 Release automation moved this from In progress to Done Sep 9, 2020

rjzamora reviewed Sep 9, 2020

View changes

NVIDIA / NVTabular

Doc updates to readme.md and howitworks.md #283

Doc updates to readme.md and howitworks.md #283

EvenOldridge commented Sep 9, 2020

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

benfred left a comment

This comment has been minimized.

benfred commented Sep 9, 2020

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

This comment has been minimized.

NVIDIA / NVTabular

Join GitHub today

Doc updates to readme.md and howitworks.md #283

Doc updates to readme.md and howitworks.md #283

Conversation

EvenOldridge commented Sep 9, 2020

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

benfred left a comment

This comment has been minimized.

benfred Sep 9, 2020 Collaborator

benfred commented Sep 9, 2020

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvidia-merlin-bot commented Sep 9, 2020

----------- coverage: platform linux, python 3.7.8-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

This comment has been minimized.

rjzamora Sep 9, 2020 Collaborator

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

benfred Sep 9, 2020
Collaborator

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

rjzamora Sep 9, 2020
Collaborator