Skip to main content


ExaDeploy is a system that offloads GPU and deep learning computation.

Core Features

  • ☁️ Runs in your cloud: None of your models, inputs, or outputs ever leave your private network. Continue to use your cloud provider discounts.
  • 🤝 Shared GPU resources: Models are shared between workloads and GPUs can be shared by multiple models. Automatic rebalancing and node draining maximize utilization and minimize costs.
  • 📈 Scalable serverless deployment model: ExaDeploy autoscales based on GPU usage time. Scale down to 0 or up to thousands of GPUs.
  • 🧠 Support for a variety of computation types: You can offload deep learning models from all major ML frameworks as well as arbitrary C++ code, CUDA kernels, and Python functions.
  • 📚 Dynamic model registration and versioning: New models or model versions can be registered and run without having to rebuild or redeploy the system.
  • 🚀 Point to point execution: Clients connect directly to remote GPUs, enabling low latency and high throughput. They can even store state remotely.
  • Asychronous execution: ExaDeploy supports asynchronous execution of models, allowing clients to parallelize local computation with remote GPU work.
  • 🔁 Fault-tolerant remote pipelines: ExaDeploy allows clients to dynamically compose remote computations (models, preprocessing, etc.) into pipelines.
  • 📊 Out of the box monitoring: ExaDeploy provides Prometheus metrics and Grafana dashboards to visualize GPU usage and other system metrics.

Example use cases

Specific business pains

  • 📉 Low GPU utilization: GPUs are expensive and hard to keep busy. ExaDeploy can help you optimize utilization for any workload.
  • 💰 High TCO for model deployment: ExaDeploy is easy to set up and uses a familiar API for model users no YAML, REST, or gRPC knowledge required.

Specific industries

  • 🤖 Robotics and autonomous vehicles: Maintain one code path for on-device and in the cloud. Ensure reproducibility.
  • 📹 Video processing: Save bandwidth and stream video to remote GPUs without having to think about fault tolerance.
  • 🍽️ Multi-model serving: ExaDeploy can keep the most models warm in the fewest GPUs.