Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upDoc updates to readme.md and howitworks.md #283
Conversation
Click to view CI ResultsGitHub pull request #283 of commit 378af815d379213bcd862314fa91c561c5c0deec, no merge conflicts.
Running as SYSTEM
Setting status of 378af815d379213bcd862314fa91c561c5c0deec to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/802/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
> git rev-parse 378af815d379213bcd862314fa91c561c5c0deec^{commit} # timeout=10
Checking out Revision 378af815d379213bcd862314fa91c561c5c0deec (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 378af815d379213bcd862314fa91c561c5c0deec # timeout=10
Commit message: "Updated How it works to reflect the changes in 0.2"
> git rev-list --no-walk 466a298b205957900a66d5ceda43431d709fa910 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3915493954289684944.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
Attempting uninstall: nvtabular
Found existing installation: nvtabular 0.1.1
Uninstalling nvtabular-0.1.1:
Successfully uninstalled nvtabular-0.1.1
Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
|
|
Thanks for this! Aside from one minor thing this looks great |
|
|
||
| ``` | ||
| docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.1 /bin/bash | ||
| docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.2 /bin/bash |
benfred
Sep 9, 2020
Collaborator
This container hasn't been published yet - but I think we should update the README now anyways in anticipation of this
This container hasn't been published yet - but I think we should update the README now anyways in anticipation of this
|
rerun tests |
Click to view CI ResultsGitHub pull request #283 of commit c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7, no merge conflicts.
Running as SYSTEM
Setting status of c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/806/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
> git rev-parse c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7^{commit} # timeout=10
Checking out Revision c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk 887a853f27a3d789d628acda72bba145204ec59b # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins711813519864152018.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
Attempting uninstall: nvtabular
Found existing installation: nvtabular 0.1.1
Uninstalling nvtabular-0.1.1:
Successfully uninstalled nvtabular-0.1.1
Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
|
Click to view CI ResultsGitHub pull request #283 of commit 652e93d2b581aabd3af46175ce95aa544d0679c5, no merge conflicts.
Running as SYSTEM
Setting status of 652e93d2b581aabd3af46175ce95aa544d0679c5 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/807/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
> git rev-parse 652e93d2b581aabd3af46175ce95aa544d0679c5^{commit} # timeout=10
Checking out Revision 652e93d2b581aabd3af46175ce95aa544d0679c5 (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 652e93d2b581aabd3af46175ce95aa544d0679c5 # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk c5c74ee3ebcc8f1b19ef2302e1f8f66a12f992e7 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1392405149694813284.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
Attempting uninstall: nvtabular
Found existing installation: nvtabular 0.1.1
Uninstalling nvtabular-0.1.1:
Successfully uninstalled nvtabular-0.1.1
Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
|
Click to view CI ResultsGitHub pull request #283 of commit 85333ae754c0512f7b213a4e98117a1501500dda, no merge conflicts.
Running as SYSTEM
Setting status of 85333ae754c0512f7b213a4e98117a1501500dda to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/808/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --progress -- https://github.com/NVIDIA/NVTabular.git +refs/pull/283/*:refs/remotes/origin/pr/283/* # timeout=10
> git rev-parse 85333ae754c0512f7b213a4e98117a1501500dda^{commit} # timeout=10
Checking out Revision 85333ae754c0512f7b213a4e98117a1501500dda (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 85333ae754c0512f7b213a4e98117a1501500dda # timeout=10
Commit message: "Update README.md"
> git rev-list --no-walk 652e93d2b581aabd3af46175ce95aa544d0679c5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins1141379037389914386.sh
Obtaining file:///var/jenkins_home/workspace/nvtabular_tests/nvtabular
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing wheel metadata: started
Preparing wheel metadata: finished with status 'done'
Installing collected packages: nvtabular
Attempting uninstall: nvtabular
Found existing installation: nvtabular 0.1.1
Uninstalling nvtabular-0.1.1:
Successfully uninstalled nvtabular-0.1.1
Running setup.py develop for nvtabular
Successfully installed nvtabular
All done! ✨ 🍰 ✨
61 files would be left unchanged.
/var/jenkins_home/.local/lib/python3.7/site-packages/isort/main.py:125: UserWarning: Likely recursive symlink detected to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/images
warn(f"Likely recursive symlink detected to {resolved_path}")
Skipped 1 files
============================= test session starts ==============================
platform linux -- Python 3.7.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
benchmark: 3.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: setup.cfg
plugins: benchmark-3.2.3, hypothesis-5.28.0, asyncio-0.12.0, timeout-1.4.2, cov-2.10.1, forked-1.3.0, xdist-2.1.0
collected 431 items
|
| @@ -3,11 +3,9 @@ How it Works | |||
|
|
|||
|  | |||
|
|
|||
| NVTabular wraps the RAPIDS cuDF library which provides the bulk of the functionality, accelerating dataframe operations on the GPU. We found in our internal usage of cuDF on massive datasets like [Criteo](https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/) or [RecSys 2020](https://recsys-twitter.com/) that it wasn’t straightforward to use once the dataset had scaled past GPU memory. The same design pattern kept emerging for us and we decided to package it up as NVTabular in order to make tabular data workflows simpler. | |||
| With the transition to v0.2 the NVTabular engine uses the [RAPIDS](http://www.rapids.ai) [Dask-cuDF library](https://github.com/rapidsai/dask-cuda) which provides the bulk of the functionality, accelerating dataframe operations on the GPU, and scaling across multiple GPUs. NVTabular provides functionality commonly found in deep learning recommendation workflows, allowing you to focus on what you want to do with your data, not how you need to do it. We also provide a template for our core compute mechanism, Operations, or ‘ops’ allowing you to build your own custom ops from cuDF and other libraries. | |||
rjzamora
Sep 9, 2020
Collaborator
The Dask-CuDF link is actually pointing to the Dask-CUDA library. Since Dask-CuDF is actually a part of the CuDF repository, there is not a great landing page at the moment. For now, it may be best to point to: https://github.com/rapidsai/cudf/tree/main/python/dask_cudf
I'll submit a small PR with the change - but wanted to make a quick note here in case I got pulled away.
The Dask-CuDF link is actually pointing to the Dask-CUDA library. Since Dask-CuDF is actually a part of the CuDF repository, there is not a great landing page at the moment. For now, it may be best to point to: https://github.com/rapidsai/cudf/tree/main/python/dask_cudf
I'll submit a small PR with the change - but wanted to make a quick note here in case I got pulled away.
Updated docs to better reflect 0.2 and our reliance on Dask-cuDF.