Introduction

ExaDeploy is a system that offloads GPU and deep learning computation.

Core Features

☁️ Runs in your cloud: None of your models, inputs, or outputs ever leave your private network. Continue to use your cloud provider discounts.
🤝 Shared GPU resources: Models are shared between workloads and GPUs can be shared by multiple models. Automatic rebalancing and node draining maximize utilization and minimize costs.
📈 Scalable serverless deployment model: ExaDeploy autoscales based on GPU usage time. Scale down to 0 or up to thousands of GPUs.
🧠 Support for a variety of computation types: You can offload deep learning models from all major ML frameworks as well as arbitrary C++ code, CUDA kernels, and Python functions.
📚 Dynamic model registration and versioning: New models or model versions can be registered and run without having to rebuild or redeploy the system.
🚀 Point to point execution: Clients connect directly to remote GPUs, enabling low latency and high throughput. They can even store state remotely.
⚡ Asychronous execution: ExaDeploy supports asynchronous execution of models, allowing clients to parallelize local computation with remote GPU work.
🔁 Fault-tolerant remote pipelines: ExaDeploy allows clients to dynamically compose remote computations (models, preprocessing, etc.) into pipelines.
📊 Out of the box monitoring: ExaDeploy provides Prometheus metrics and Grafana dashboards to visualize GPU usage and other system metrics.

📉 Low GPU utilization: GPUs are expensive and hard to keep busy. ExaDeploy can help you optimize utilization for any workload.
💰 High TCO for model deployment: ExaDeploy is easy to set up and uses a familiar API for model users — no YAML, REST, or gRPC knowledge required.

🤖 Robotics and autonomous vehicles: Maintain one code path for on-device and in the cloud. Ensure reproducibility.
📹 Video processing: Save bandwidth and stream video to remote GPUs without having to think about fault tolerance.
🍽️ Multi-model serving: ExaDeploy can keep the most models warm in the fewest GPUs.