-
Sky Computing Lab, UC Berkeley
- Berkeley, CA
Highlights
- Pro
Block or Report
Block or report Michaelvll
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
-
skypilot-org/skypilot Public
SkyPilot is a framework for easily running machine learning workloads on any cloud through a unified interface.
-
lm-sys/FastChat Public
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and FastChat-T5.
-
ucbrise/graphtrans Public
Representing Long-Range Context for Graph Neural Networks with Global Attention
-
mit-han-lab/lite-transformer Public
[ICLR 2020] Lite Transformer with Long-Short Range Attention
-
facebookresearch/fairseq Public
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
1,925 contributions in the last year
Contribution activity
June 2023
Created 27 commits in 2 repositories
Created 1 repository
- Michaelvll/instruct-eval Python
Created a pull request in skypilot-org/skypilot that received 17 comments
[Core] User ray cluster causes SkyPilot cluster in INIT state
Fixes #2019 Tested (run the relevant ones): Any manual or new tests for this PR (please specify below) Reproducible script in #2019 All smo…
Opened 20 other pull requests in 2 repositories
skypilot-org/skypilot
4
open
15
merged
- Reorder CLIs
- [Refactor] Move status query into the cloud class
- [UX] Fix the message for the spot jobs in sky status
- [catalog] Make the price fetching more robust
- [ray] Fix the api called for placement group
- [test] Default to terminate on failure
- [Core] Fix log buffering issue
- [UX/minor] Remove uneccessary spot setup log
- [Dependency] Avoid buggy grpcio version
- [Spot] Fix spot pending status
- [Spot] Make the controller resources configurable
- [UI] Add cloud logos to Readme and docs
- [OCI] Add instructions for OCI
- [Identity] Make the identity loading more robust
- [Core] Avoid deduplication of the logs for multi-node job
- [Storage] Fix default storage selection
- [Storage] Fix the storage cloud checking before sky.check is called.
- [SCP] Format the scp check
- [GCP] Remove unsupported GPUs from the list_accelerators
skypilot-org/skypilot-catalog
1
merged
Reviewed 43 pull requests in 2 repositories
skypilot-org/skypilot
25 pull requests
- [Docs] Onprem docs merge fix
- [Refactor] Move status query into the cloud class
- UX: if a cluster becomes INIT, warn about autostop reset.
- UX: drop image_id warning, and print a hint for a corner case.
- [Docs] Mark onprem as experimental
- [Docs] Add permission setup page for the clouds
- [ray] Fix the api called for placement group
- [OCI] Support configurable boot volume size (disk_size) and performance (disk_tier)
-
Speed up refresh: delay the slower
ray statuscall & use cached IPs. - [OCI example]: Update the OCI example task files
- [OCI fix] Nodes are not reusable if launch config changed
-
UX: don't print refresh hint on
status -r. - [OCI docs] Update quota.rst
- Add docker support for SkyPilot
- [Spot] Spot job pipeline support
- [Core] Fix log buffering issue
- API: fix a possibly unbound error in core.cancel().
- [OCI fix] Add tenancy specific prefix to zone in runtime (use general catalog file)
- [Dependency] Avoid buggy grpcio version
- Prefer to obtain the ssh_user from gcloud os-login instead of assuming that the email address is the ssh_user
- Doc: add a "Cloud Administration" page.
- Make path_size_megabytes() more robust.
- [OCI] Reduce retry times by excluding unsubscribed regions
- [Spot] Fix spot pending status
- Docs: update spot controller docs in spot-jobs.rst
- Some pull request reviews not shown.
skypilot-org/skypilot-catalog
3 pull requests
Created an issue in skypilot-org/skypilot that received 2 comments
[Core] Resource leakage for sky down if a multi-node cluster is partially stopped
If a multi-node cluster is partially stopped (during autostop or manually stop the worker node), i.e. the cluster is in INIT state, our backend.tea…
Opened 15 other issues in 1 repository
skypilot-org/skypilot
8
open
7
closed
- [Core] Fail to restart a STOPPED cluster if no capacity
- [Usage] Collect the number of CPUs as well
- [Spot pipeline] Feature requests for spot pipeline
- [Core] Logging buffered for run section
- [Core] Placement group implementation deprecated
-
[UX] sky status refresh hint message is confusing when
--refreshis passed -
[Core] Ray failed to start dashboard when
ray startis called bysky launch - Support migration of a stopped cluster to another region
- [100 jobs/Spot] Submitting 100 long running spot jobs should correctly queue them
- [UI] show-gpus should show the combined price for GCP GPU instance
- [Core/Internal] Improve internal exception for status refresh
- [Spot/Auth] Public key should not be uploaded to the remote VM
- [Storage] Skip clouds that does not support storage when choosing default storage cloud
- [ray] User launched ray cluster will cause cluster becoming INIT
- [Spot] Spot job with GCS store fails on a new controller





