Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updates to readme.md and howitworks.md #283

Merged
merged 7 commits into from Sep 9, 2020
Merged

Conversation

@EvenOldridge
Copy link
Collaborator

@EvenOldridge EvenOldridge commented Sep 9, 2020

Updated docs to better reflect 0.2 and our reliance on Dask-cuDF.

EvenOldridge added 4 commits Sep 9, 2020
Removed out of date benchmarks and updated the description
Update README.md to better describe 0.2 and to remove out of date benchmarks.
Minor edits
Added dask details
@EvenOldridge EvenOldridge requested review from benfred and rjzamora Sep 9, 2020
@EvenOldridge EvenOldridge added this to In progress in v0.2 Release via automation Sep 9, 2020
@nvidia-merlin-bot
Copy link
Collaborator

@nvidia-merlin-bot nvidia-merlin-bot commented Sep 9, 2020

Click to view CI Results
GitHub pull request #283 of commit 378af815d379213bcd862314fa91c561c5c0deec, no merge conflicts.
Running as SYSTEM
Setting status of 378af815d379213bcd862314fa91c561c5c0deec to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/802/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse 378af815d379213bcd862314fa91c561c5c0deec^{commit} # timeout=10
Checking out Revision 378af815d379213bcd862314fa91c561c5c0deec (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 378af815d379213bcd862314fa91c561c5c0deec # timeout=10
Commit message: "Updated How it works to reflect the changes in 0.2"
 > git rev-list --no-walk 466a298b205957900a66d5ceda43431d709fa910 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3915493954289684944.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 13%]
tests/unit/test_io.py .................................................. [ 25%]
............................ [ 32%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 61%]
........................................ [ 70%]
tests/unit/test_s3.py .. [ 70%]
tests/unit/test_tf_dataloader.py ............ [ 73%]
tests/unit/test_torch_dataloader.py ............... [ 77%]
tests/unit/test_workflow.py ............................................ [ 87%]
....................................................... [100%]

=============================== warnings summary ===============================
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33549 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29960 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29568 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:97: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 8 0 0 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 80 3 32 6 92% 154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178
nvtabular/io/dataframe_engine.py 12 2 4 1 81% 31->32, 32, 37
nvtabular/io/dataset.py 99 9 46 8 88% 94->95, 95, 107->108, 108, 116->117, 117, 125->137, 130->135, 135-137, 212->213, 213, 227->228, 228-229, 247->248, 248
nvtabular/io/dataset_engine.py 12 0 0 0 100%
nvtabular/io/hugectr.py 42 1 18 1 97% 64->87, 91
nvtabular/io/parquet.py 153 1 50 3 98% 139->140, 140, 235->237, 243->248
nvtabular/io/shuffle.py 25 2 10 2 89% 38->39, 39, 43->46, 46
nvtabular/io/writer.py 119 9 42 2 92% 29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 207 15 56 4 92% 84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308
nvtabular/loader/tensorflow.py 109 16 46 10 82% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307
nvtabular/loader/tf_utils.py 51 7 20 5 83% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops/init.py 20 0 0 0 100%
nvtabular/ops/categorify.py 365 54 192 38 82% 143->144, 144, 152->157, 157, 167->168, 168, 212->213, 213, 256->257, 257, 260->266, 336->337, 337-339, 341->342, 342, 343->344, 344, 362->365, 365, 376->377, 377, 383->386, 409->410, 410-411, 413->414, 414-415, 417->418, 418-434, 436->440, 440, 444->445, 445, 446->447, 447, 454->455, 455, 456->457, 457, 463->464, 464, 473->482, 482-483, 487->488, 488, 501->502, 502, 504->507, 509->526, 526-529, 552->553, 553, 556->557, 557, 558->559, 559, 566->567, 567, 568->571, 571, 678->679, 679, 680->681, 681, 702->717, 742->747, 745->746, 746, 756->753, 761->753
nvtabular/ops/clip.py 25 3 10 4 80% 52->53, 53, 61->62, 62, 66->68, 68->69, 69
nvtabular/ops/column_similarity.py 89 21 28 4 70% 171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238
nvtabular/ops/difference_lag.py 21 1 4 1 92% 73->74, 74
nvtabular/ops/dropna.py 14 0 0 0 100%
nvtabular/ops/fill.py 36 2 10 2 91% 53->54, 54, 82->83, 83
nvtabular/ops/filter.py 17 1 2 1 89% 44->45, 45
nvtabular/ops/groupby_statistics.py 80 3 30 3 95% 146->147, 147, 151->176, 183->184, 184, 208
nvtabular/ops/hash_bucket.py 30 4 16 2 83% 31->32, 32-34, 35->38, 38
nvtabular/ops/join_external.py 66 4 26 5 90% 80->81, 81, 82->83, 83, 97->100, 100, 113->117, 153->154, 154
nvtabular/ops/join_groupby.py 56 0 18 0 100%
nvtabular/ops/lambdaop.py 24 2 8 2 88% 46->47, 47, 48->49, 49
nvtabular/ops/logop.py 17 1 4 1 90% 45->46, 46
nvtabular/ops/median.py 24 1 2 0 96% 52
nvtabular/ops/minmax.py 30 1 2 0 97% 56
nvtabular/ops/moments.py 33 1 2 0 97% 60
nvtabular/ops/normalize.py 49 4 14 4 84% 53->54, 54, 61->60, 98->99, 99, 108->110, 110-111
nvtabular/ops/operator.py 19 1 8 2 89% 43->42, 45->46, 46
nvtabular/ops/stat_operator.py 10 0 0 0 100%
nvtabular/ops/target_encoding.py 92 1 22 3 96% 144->146, 173->174, 174, 225->228
nvtabular/ops/transform_operator.py 41 6 10 2 80% 42-46, 68->69, 69-71, 88->89, 89
nvtabular/utils.py 17 3 6 3 74% 22->23, 23, 25->26, 26, 33->34, 34
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 420 38 232 24 89% 99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905
setup.py 2 2 0 0 0% 18-20

TOTAL 2646 223 1014 148 89%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.77%
================= 431 passed, 17 warnings in 453.26s (0:07:33) =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins3717350163533120138.sh

Copy link
Collaborator

@benfred benfred left a comment

Thanks for this! Aside from one minor thing this looks great

README.md Outdated Show resolved Hide resolved

```
docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.1 /bin/bash
docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.2 /bin/bash

This comment has been minimized.

@benfred

benfred Sep 9, 2020
Collaborator

This container hasn't been published yet - but I think we should update the README now anyways in anticipation of this

@benfred
Copy link
Collaborator

@benfred benfred commented Sep 9, 2020

rerun tests

benfred added 2 commits Sep 9, 2020
@nvidia-merlin-bot
Copy link
Collaborator

@nvidia-merlin-bot nvidia-merlin-bot commented Sep 9, 2020

Click to view CI Results
GitHub pull request #283 of commit c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7, no merge conflicts.
Running as SYSTEM
Setting status of c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/806/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7^{commit} # timeout=10
Checking out Revision c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 887a853f27a3d789d628acda72bba145204ec59b # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins711813519864152018.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 13%]
tests/unit/test_io.py .................................................. [ 25%]
............................ [ 32%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 61%]
........................................ [ 70%]
tests/unit/test_s3.py .. [ 70%]
tests/unit/test_tf_dataloader.py ............ [ 73%]
tests/unit/test_torch_dataloader.py ............... [ 77%]
tests/unit/test_workflow.py ............................................ [ 87%]
....................................................... [100%]

=============================== warnings summary ===============================
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41825 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29960 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29568 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:182: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 8 0 0 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 80 3 32 6 92% 154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178
nvtabular/io/dataframe_engine.py 12 2 4 1 81% 31->32, 32, 37
nvtabular/io/dataset.py 99 9 46 8 88% 179->180, 180, 192->193, 193, 201->202, 202, 210->222, 215->220, 220-222, 297->298, 298, 312->313, 313-314, 332->333, 333
nvtabular/io/dataset_engine.py 12 0 0 0 100%
nvtabular/io/hugectr.py 42 1 18 1 97% 64->87, 91
nvtabular/io/parquet.py 153 1 50 3 98% 139->140, 140, 235->237, 243->248
nvtabular/io/shuffle.py 25 2 10 2 89% 38->39, 39, 43->46, 46
nvtabular/io/writer.py 119 9 42 2 92% 29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 207 15 56 4 92% 84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308
nvtabular/loader/tensorflow.py 109 16 46 10 82% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307
nvtabular/loader/tf_utils.py 51 7 20 5 83% 13->16, 16->18, 23->25, 26->27, 27, 34-35, 40->48, 43-48
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops/init.py 20 0 0 0 100%
nvtabular/ops/categorify.py 365 54 192 38 82% 155->156, 156, 164->169, 169, 179->180, 180, 224->225, 225, 268->269, 269, 272->278, 348->349, 349-351, 353->354, 354, 355->356, 356, 374->377, 377, 388->389, 389, 395->398, 421->422, 422-423, 425->426, 426-427, 429->430, 430-446, 448->452, 452, 456->457, 457, 458->459, 459, 466->467, 467, 468->469, 469, 475->476, 476, 485->494, 494-495, 499->500, 500, 513->514, 514, 516->519, 521->538, 538-541, 564->565, 565, 568->569, 569, 570->571, 571, 578->579, 579, 580->583, 583, 690->691, 691, 692->693, 693, 714->729, 754->759, 757->758, 758, 768->765, 773->765
nvtabular/ops/clip.py 25 3 10 4 80% 52->53, 53, 61->62, 62, 66->68, 68->69, 69
nvtabular/ops/column_similarity.py 89 21 28 4 70% 171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238
nvtabular/ops/difference_lag.py 21 1 4 1 92% 73->74, 74
nvtabular/ops/dropna.py 14 0 0 0 100%
nvtabular/ops/fill.py 36 2 10 2 91% 66->67, 67, 107->108, 108
nvtabular/ops/filter.py 17 1 2 1 89% 44->45, 45
nvtabular/ops/groupby_statistics.py 80 3 30 3 95% 146->147, 147, 151->176, 183->184, 184, 208
nvtabular/ops/hash_bucket.py 30 4 16 2 83% 94->95, 95-97, 98->101, 101
nvtabular/ops/join_external.py 66 4 26 5 90% 105->106, 106, 107->108, 108, 122->125, 125, 138->142, 178->179, 179
nvtabular/ops/join_groupby.py 56 0 18 0 100%
nvtabular/ops/lambdaop.py 24 2 8 2 88% 82->83, 83, 84->85, 85
nvtabular/ops/logop.py 17 1 4 1 90% 57->58, 58
nvtabular/ops/median.py 24 1 2 0 96% 52
nvtabular/ops/minmax.py 30 1 2 0 97% 56
nvtabular/ops/moments.py 33 1 2 0 97% 60
nvtabular/ops/normalize.py 49 4 14 4 84% 65->66, 66, 73->72, 122->123, 123, 132->134, 134-135
nvtabular/ops/operator.py 19 1 8 2 89% 43->42, 45->46, 46
nvtabular/ops/stat_operator.py 10 0 0 0 100%
nvtabular/ops/target_encoding.py 92 1 22 3 96% 144->146, 173->174, 174, 225->228
nvtabular/ops/transform_operator.py 41 6 10 2 80% 42-46, 68->69, 69-71, 88->89, 89
nvtabular/utils.py 25 5 10 5 71% 26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 420 38 232 24 89% 99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905
setup.py 2 2 0 0 0% 18-20

TOTAL 2654 225 1018 150 89%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.70%
================= 431 passed, 17 warnings in 471.65s (0:07:51) =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5629388077575832059.sh

@nvidia-merlin-bot
Copy link
Collaborator

@nvidia-merlin-bot nvidia-merlin-bot commented Sep 9, 2020

Click to view CI Results
GitHub pull request #283 of commit 652e93d2b581aabd3af46175ce95aa544d0679c5, no merge conflicts.
Running as SYSTEM
Setting status of 652e93d2b581aabd3af46175ce95aa544d0679c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/807/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse 652e93d2b581aabd3af46175ce95aa544d0679c5^{commit} # timeout=10
Checking out Revision 652e93d2b581aabd3af46175ce95aa544d0679c5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 652e93d2b581aabd3af46175ce95aa544d0679c5 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1392405149694813284.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 13%]
tests/unit/test_io.py .................................................. [ 25%]
............................ [ 32%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 61%]
........................................ [ 70%]
tests/unit/test_s3.py .. [ 70%]
tests/unit/test_tf_dataloader.py ............ [ 73%]
tests/unit/test_torch_dataloader.py ............... [ 77%]
tests/unit/test_workflow.py ............................................ [ 87%]
....................................................... [100%]

=============================== warnings summary ===============================
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39803 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 30520 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29568 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:182: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 8 0 0 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 80 3 32 6 92% 154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178
nvtabular/io/dataframe_engine.py 12 2 4 1 81% 31->32, 32, 37
nvtabular/io/dataset.py 99 9 46 8 88% 179->180, 180, 192->193, 193, 201->202, 202, 210->222, 215->220, 220-222, 297->298, 298, 312->313, 313-314, 332->333, 333
nvtabular/io/dataset_engine.py 12 0 0 0 100%
nvtabular/io/hugectr.py 42 1 18 1 97% 64->87, 91
nvtabular/io/parquet.py 153 1 50 3 98% 139->140, 140, 235->237, 243->248
nvtabular/io/shuffle.py 25 2 10 2 89% 38->39, 39, 43->46, 46
nvtabular/io/writer.py 119 9 42 2 92% 29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 207 15 56 4 92% 84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308
nvtabular/loader/tensorflow.py 109 16 46 10 82% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307
nvtabular/loader/tf_utils.py 51 7 20 5 83% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 56->64, 59-64
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops/init.py 20 0 0 0 100%
nvtabular/ops/categorify.py 365 54 192 38 82% 155->156, 156, 164->169, 169, 179->180, 180, 224->225, 225, 268->269, 269, 272->278, 348->349, 349-351, 353->354, 354, 355->356, 356, 374->377, 377, 388->389, 389, 395->398, 421->422, 422-423, 425->426, 426-427, 429->430, 430-446, 448->452, 452, 456->457, 457, 458->459, 459, 466->467, 467, 468->469, 469, 475->476, 476, 485->494, 494-495, 499->500, 500, 513->514, 514, 516->519, 521->538, 538-541, 564->565, 565, 568->569, 569, 570->571, 571, 578->579, 579, 580->583, 583, 690->691, 691, 692->693, 693, 714->729, 754->759, 757->758, 758, 768->765, 773->765
nvtabular/ops/clip.py 25 3 10 4 80% 52->53, 53, 61->62, 62, 66->68, 68->69, 69
nvtabular/ops/column_similarity.py 89 21 28 4 70% 171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238
nvtabular/ops/difference_lag.py 21 1 4 1 92% 73->74, 74
nvtabular/ops/dropna.py 14 0 0 0 100%
nvtabular/ops/fill.py 36 2 10 2 91% 66->67, 67, 107->108, 108
nvtabular/ops/filter.py 17 1 2 1 89% 44->45, 45
nvtabular/ops/groupby_statistics.py 80 3 30 3 95% 146->147, 147, 151->176, 183->184, 184, 208
nvtabular/ops/hash_bucket.py 30 4 16 2 83% 94->95, 95-97, 98->101, 101
nvtabular/ops/join_external.py 66 4 26 5 90% 105->106, 106, 107->108, 108, 122->125, 125, 138->142, 178->179, 179
nvtabular/ops/join_groupby.py 56 0 18 0 100%
nvtabular/ops/lambdaop.py 24 2 8 2 88% 82->83, 83, 84->85, 85
nvtabular/ops/logop.py 17 1 4 1 90% 57->58, 58
nvtabular/ops/median.py 24 1 2 0 96% 52
nvtabular/ops/minmax.py 30 1 2 0 97% 56
nvtabular/ops/moments.py 33 1 2 0 97% 60
nvtabular/ops/normalize.py 49 4 14 4 84% 65->66, 66, 73->72, 122->123, 123, 132->134, 134-135
nvtabular/ops/operator.py 19 1 8 2 89% 43->42, 45->46, 46
nvtabular/ops/stat_operator.py 10 0 0 0 100%
nvtabular/ops/target_encoding.py 92 1 22 3 96% 144->146, 173->174, 174, 225->228
nvtabular/ops/transform_operator.py 41 6 10 2 80% 42-46, 68->69, 69-71, 88->89, 89
nvtabular/utils.py 25 5 10 5 71% 26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 420 38 232 24 89% 99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905
setup.py 2 2 0 0 0% 18-20

TOTAL 2654 225 1018 150 89%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.70%
================= 431 passed, 17 warnings in 498.79s (0:08:18) =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins4210699040404548045.sh

@nvidia-merlin-bot
Copy link
Collaborator

@nvidia-merlin-bot nvidia-merlin-bot commented Sep 9, 2020

Click to view CI Results
GitHub pull request #283 of commit 85333ae754c0512f7b213a4e98117a1501500dda, no merge conflicts.
Running as SYSTEM
Setting status of 85333ae754c0512f7b213a4e98117a1501500dda to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/808/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
 > git rev-parse 85333ae754c0512f7b213a4e98117a1501500dda^{commit} # timeout=10
Checking out Revision 85333ae754c0512f7b213a4e98117a1501500dda (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 85333ae754c0512f7b213a4e98117a1501500dda # timeout=10
Commit message: "Update README.md"
 > git rev-list --no-walk 652e93d2b581aabd3af46175ce95aa544d0679c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1141379037389914386.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
  Attempting uninstall: nvtabular
    Found existing installation: nvtabular 0.1.1
    Uninstalling nvtabular-0.1.1:
      Successfully uninstalled nvtabular-0.1.1
  Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
  warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items

tests/unit/test_column_similarity.py ...... [ 1%]
tests/unit/test_dask_nvt.py ............................................ [ 11%]
.......... [ 13%]
tests/unit/test_io.py .................................................. [ 25%]
............................ [ 32%]
tests/unit/test_notebooks.py .... [ 32%]
tests/unit/test_ops.py ................................................. [ 44%]
........................................................................ [ 61%]
........................................ [ 70%]
tests/unit/test_s3.py .. [ 70%]
tests/unit/test_tf_dataloader.py ............ [ 73%]
tests/unit/test_torch_dataloader.py ............... [ 77%]
tests/unit/test_workflow.py ............................................ [ 87%]
....................................................... [100%]

=============================== warnings summary ===============================
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12
/opt/conda/envs/rapids/lib/python3.7/site-packages/pandas/util/init.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing

tests/unit/test_io.py::test_mulifile_parquet[True-0-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-0-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-1-2-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-0-csv]
tests/unit/test_io.py::test_mulifile_parquet[True-2-2-csv]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/shuffle.py:42: DeprecationWarning: shuffle=True is deprecated. Using PER_WORKER.
warnings.warn("shuffle=True is deprecated. Using PER_WORKER.", DeprecationWarning)

tests/unit/test_notebooks.py::test_multigpu_dask_example
/opt/conda/envs/rapids/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43099 instead
http_address["port"], self.http_server.port

tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
tests/unit/test_torch_dataloader.py::test_empty_cols[parquet]
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/dataframe.py:660: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
mask = pd.Series(mask)

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-1-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 32088 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-10-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 29960 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_gpu_dl[None-parquet-100-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 30912 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_torch_dataloader.py::test_kill_dl[parquet-1e-06]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/parquet.py:75: UserWarning: Row group size 60480 is bigger than requested part_size 17069
f"Row group size {rg_byte_size_0} is bigger than requested part_size "

tests/unit/test_workflow.py::test_chaining_3
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/io/dataset.py:182: UserWarning: part_mem_fraction is ignored for DataFrame input.
warnings.warn("part_mem_fraction is ignored for DataFrame input.")

-- Docs: https://docs.pytest.org/en/stable/warnings.html

----------- coverage: platform linux, python 3.7.8-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 8 0 0 0 100%
nvtabular/io/init.py 4 0 0 0 100%
nvtabular/io/csv.py 14 1 4 1 89% 35->36, 36
nvtabular/io/dask.py 80 3 32 6 92% 154->157, 164->165, 165, 169->171, 171->167, 175->176, 176, 177->178, 178
nvtabular/io/dataframe_engine.py 12 2 4 1 81% 31->32, 32, 37
nvtabular/io/dataset.py 99 9 46 8 88% 179->180, 180, 192->193, 193, 201->202, 202, 210->222, 215->220, 220-222, 297->298, 298, 312->313, 313-314, 332->333, 333
nvtabular/io/dataset_engine.py 12 0 0 0 100%
nvtabular/io/hugectr.py 42 1 18 1 97% 64->87, 91
nvtabular/io/parquet.py 153 1 50 3 98% 139->140, 140, 235->237, 243->248
nvtabular/io/shuffle.py 25 2 10 2 89% 38->39, 39, 43->46, 46
nvtabular/io/writer.py 119 9 42 2 92% 29, 46, 70->71, 71, 109, 112, 173->174, 174, 195-197
nvtabular/io/writer_factory.py 16 2 6 2 82% 31->32, 32, 49->52, 52
nvtabular/loader/init.py 0 0 0 0 100%
nvtabular/loader/backend.py 207 15 56 4 92% 84, 96-104, 132->133, 133, 179, 192, 267->269, 282->283, 283, 306->307, 307-308
nvtabular/loader/tensorflow.py 109 16 46 10 82% 39->40, 40-41, 51->52, 52, 59->60, 60-63, 72->73, 73, 78->83, 83, 244-253, 268->269, 269, 288->289, 289, 296->297, 297, 298->301, 301, 306->307, 307
nvtabular/loader/tf_utils.py 51 7 20 5 83% 29->32, 32->34, 39->41, 42->43, 43, 50-51, 56->64, 59-64
nvtabular/loader/torch.py 33 0 4 0 100%
nvtabular/ops/init.py 20 0 0 0 100%
nvtabular/ops/categorify.py 365 54 192 38 82% 155->156, 156, 164->169, 169, 179->180, 180, 224->225, 225, 268->269, 269, 272->278, 348->349, 349-351, 353->354, 354, 355->356, 356, 374->377, 377, 388->389, 389, 395->398, 421->422, 422-423, 425->426, 426-427, 429->430, 430-446, 448->452, 452, 456->457, 457, 458->459, 459, 466->467, 467, 468->469, 469, 475->476, 476, 485->494, 494-495, 499->500, 500, 513->514, 514, 516->519, 521->538, 538-541, 564->565, 565, 568->569, 569, 570->571, 571, 578->579, 579, 580->583, 583, 690->691, 691, 692->693, 693, 714->729, 754->759, 757->758, 758, 768->765, 773->765
nvtabular/ops/clip.py 25 3 10 4 80% 52->53, 53, 61->62, 62, 66->68, 68->69, 69
nvtabular/ops/column_similarity.py 89 21 28 4 70% 171-172, 181-183, 191-207, 222->232, 224->227, 227->228, 228, 237->238, 238
nvtabular/ops/difference_lag.py 21 1 4 1 92% 73->74, 74
nvtabular/ops/dropna.py 14 0 0 0 100%
nvtabular/ops/fill.py 36 2 10 2 91% 66->67, 67, 107->108, 108
nvtabular/ops/filter.py 17 1 2 1 89% 44->45, 45
nvtabular/ops/groupby_statistics.py 80 3 30 3 95% 146->147, 147, 151->176, 183->184, 184, 208
nvtabular/ops/hash_bucket.py 30 4 16 2 83% 94->95, 95-97, 98->101, 101
nvtabular/ops/join_external.py 66 4 26 5 90% 105->106, 106, 107->108, 108, 122->125, 125, 138->142, 178->179, 179
nvtabular/ops/join_groupby.py 56 0 18 0 100%
nvtabular/ops/lambdaop.py 24 2 8 2 88% 82->83, 83, 84->85, 85
nvtabular/ops/logop.py 17 1 4 1 90% 57->58, 58
nvtabular/ops/median.py 24 1 2 0 96% 52
nvtabular/ops/minmax.py 30 1 2 0 97% 56
nvtabular/ops/moments.py 33 1 2 0 97% 60
nvtabular/ops/normalize.py 49 4 14 4 84% 65->66, 66, 73->72, 122->123, 123, 132->134, 134-135
nvtabular/ops/operator.py 19 1 8 2 89% 43->42, 45->46, 46
nvtabular/ops/stat_operator.py 10 0 0 0 100%
nvtabular/ops/target_encoding.py 92 1 22 3 96% 144->146, 173->174, 174, 225->228
nvtabular/ops/transform_operator.py 41 6 10 2 80% 42-46, 68->69, 69-71, 88->89, 89
nvtabular/utils.py 25 5 10 5 71% 26->27, 27, 28->31, 31, 37->38, 38, 40->41, 41, 45->47, 47
nvtabular/worker.py 65 1 30 2 97% 80->92, 118->121, 121
nvtabular/workflow.py 420 38 232 24 89% 99->103, 103, 109->110, 110-114, 144->exit, 160->exit, 176->exit, 192->exit, 245->247, 295->296, 296, 375->378, 378, 403->404, 404, 410->413, 413, 476->477, 477, 495->497, 497-506, 517->516, 566->571, 571, 574->575, 575, 610->611, 611, 660->651, 726->737, 737, 759-789, 816->817, 817, 830->833, 863->864, 864-866, 870->871, 871, 904->905, 905
setup.py 2 2 0 0 0% 18-20

TOTAL 2654 225 1018 150 89%
Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 88.70%
================= 431 passed, 17 warnings in 456.01s (0:07:36) =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
source activate rapids
cd /var/jenkins_home/
python test_res_push.py "https://api.GitHub.com/repos/NVIDIA/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins3263314540322387925.sh

@benfred
benfred approved these changes Sep 9, 2020
@benfred benfred merged commit 1707046 into NVIDIA:main Sep 9, 2020
1 check passed
1 check passed
Jenkins Unit Test Run Success
Details
v0.2 Release automation moved this from In progress to Done Sep 9, 2020
@@ -3,11 +3,9 @@ How it Works

![NVTabular Workflow](./images/nvt_workflow.png)

NVTabular wraps the RAPIDS cuDF library which provides the bulk of the functionality, accelerating dataframe operations on the GPU. We found in our internal usage of cuDF on massive datasets like [Criteo](https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/) or [RecSys 2020](https://recsys-twitter.com/) that it wasn’t straightforward to use once the dataset had scaled past GPU memory. The same design pattern kept emerging for us and we decided to package it up as NVTabular in order to make tabular data workflows simpler.
With the transition to v0.2 the NVTabular engine uses the [RAPIDS](http://www.rapids.ai) [Dask-cuDF library](https://github.com/rapidsai/dask-cuda) which provides the bulk of the functionality, accelerating dataframe operations on the GPU, and scaling across multiple GPUs. NVTabular provides functionality commonly found in deep learning recommendation workflows, allowing you to focus on what you want to do with your data, not how you need to do it. We also provide a template for our core compute mechanism, Operations, or ‘ops’ allowing you to build your own custom ops from cuDF and other libraries.

This comment has been minimized.

@rjzamora

rjzamora Sep 9, 2020
Collaborator

The Dask-CuDF link is actually pointing to the Dask-CUDA library. Since Dask-CuDF is actually a part of the CuDF repository, there is not a great landing page at the moment. For now, it may be best to point to: https://github.com/rapidsai/cudf/tree/main/python/dask_cudf

I'll submit a small PR with the change - but wanted to make a quick note here in case I got pulled away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
v0.2 Release
  
Done
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.