A command-line utility to provision infrastructure for ML workflows
Documentation | Issues | Twitter | Slack
dstack is a lightweight command-line utility to provision infrastructure for ML workflows.
Features
- Define your ML workflows declaratively, incl. their dependencies, environment, and required compute resources
- Run workflows via the
dstackCLI. Have infrastructure provisioned automatically in a configured cloud account. - Save output artifacts, such as data and models, and reuse them in other ML workflows
- Use
dstackto process data, train models, host apps, and launch dev environments
Installation
Use pip to install dstack locally:
pip install dstackThe dstack CLI needs your AWS account credentials to be configured locally
(e.g. in ~/.aws/credentials or AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables).
Before you can use the dstack CLI, you need to configure it:
dstack configIt will prompt you to select the AWS region where dstack will provision compute resources, and the S3 bucket, where dstack will save data.
Region name (eu-west-1):
S3 bucket name (dstack-142421590066-eu-west-1):Support for GCP and Azure is in the roadmap.
How does it work?
- Install
dstacklocally - Define ML workflows in
.dstack/workflows.yaml(within your existing Git repository) - Run ML workflows via the
dstack runCLI command - Use other
dstackCLI commands to manage runs, artifacts, etc.
When you run an ML workflow via the
dstackCLI, it provisions the required compute resources (in a configured cloud account), sets up environment (such as Python, Conda, CUDA, etc), fetches your code, downloads deps, saves artifacts, and tears down compute resources.