Add JarvisLabs backend by peterschmidt85 · Pull Request #3875 · dstackai/dstack

peterschmidt85 · 2026-05-12T20:18:19Z

Adds JarvisLabs as a dstack backend.

Implementation notes:

Adds backend registration, config models, configurator, API client, compute implementation, docs, and backend tests.
Uses the JarvisLabs provider from gpuhunt for offer selection. This branch depends on Add JarvisLabs provider gpuhunt#231 until the provider is released.
Supports JarvisLabs VM workloads only. GPU VMs and CPU VMs use separate JarvisLabs create/destroy APIs.
Supports GPU spot by passing the selected offer's spot flag to JarvisLabs GPU VM creation. CPU spot is not emitted by gpuhunt and is not supported.
Does not select a JarvisLabs template or custom image; provisioning uses the provider default VM image.
Validates configured regions against gpuhunt's JarvisLabs supported-region map and fails closed if an unsupported region reaches a regional API call.
Registers the project SSH key in JarvisLabs before creating an instance.
Starts the dstack shim over SSH and persists hostname only after shim startup succeeds, so provisioning can retry after a server restart.
Maps immediate and delayed JarvisLabs create capacity failures to NoCapacityError and destroys any failed machine id returned by JarvisLabs before retrying another offer. Non-capacity failed create status raises ProvisioningError. After a VM is running, interruption/unreachability is handled by the generic VM health path, as with other VM backends.
Wraps JarvisLabs request failures and malformed success responses as BackendError instead of leaking raw transport/JSON exceptions.

E2E validation:

CPU on-demand task provisioned and completed on JarvisLabs.
L4 GPU on-demand task provisioned and completed CUDA tensor matmul on the GPU.
H100 GPU spot task provisioned with JarvisLabs is_spot: true and completed CUDA tensor matmul on the GPU.
Requested 120GB/200GB disks were visible inside containers in the live disk checks.
Server restart was tested while JarvisLabs runs were active; provisioning resumed instead of losing the run.
L4 spot no-capacity was observed from JarvisLabs and handled as a capacity failure.

Added tests cover config validation, API payloads, API error normalization, spot flag propagation, region failure behavior, capacity-failure mapping and cleanup, CPU/GPU provisioning data, disk sizing, SSH username parsing, termination routing, and restart-safe hostname persistence.

peterschmidt85 force-pushed the jarvislabs branch 4 times, most recently from c8850b2 to 3ad620f Compare May 12, 2026 21:01

Add JarvisLabs backend

8861fe2

peterschmidt85 force-pushed the jarvislabs branch from 3ad620f to 8861fe2 Compare May 12, 2026 21:06

peterschmidt85 marked this pull request as ready for review May 12, 2026 21:17

peterschmidt85 requested a review from jvstme May 12, 2026 21:17

Document JarvisLabs VM startup script behavior

760dbd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add JarvisLabs backend#3875

Add JarvisLabs backend#3875
peterschmidt85 wants to merge 2 commits into
masterfrom
jarvislabs

peterschmidt85 commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

peterschmidt85 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

peterschmidt85 commented May 12, 2026 •

edited

Loading