ExaDeploy is a system that offloads GPU and deep learning computation.
- ☁️ Runs in your cloud: None of your models, inputs, or outputs ever leave your private network. Continue to use your cloud provider discounts.
- 🤝 Shared GPU resources: Models are shared between workloads and GPUs can be shared by multiple models. Automatic rebalancing and node draining maximize utilization and minimize costs.
- 📈 Scalable serverless deployment model: ExaDeploy autoscales based on GPU usage time. Scale down to 0 or up to thousands of GPUs.
- 🧠 Support for a variety of computation types: You can offload deep learning models from all major ML frameworks as well as arbitrary C++ code, CUDA kernels, and Python functions.
- 📚 Dynamic model registration and versioning: New models or model versions can be registered and run without having to rebuild or redeploy the system.
- 🚀 Point to point execution: Clients connect directly to remote GPUs, enabling low latency and high throughput. They can even store state remotely.
- ⚡ Asychronous execution: ExaDeploy supports asynchronous execution of models, allowing clients to parallelize local computation with remote GPU work.
- 🔁 Fault-tolerant remote pipelines: ExaDeploy allows clients to dynamically compose remote computations (models, preprocessing, etc.) into pipelines.
- 📊 Out of the box monitoring: ExaDeploy provides Prometheus metrics and Grafana dashboards to visualize GPU usage and other system metrics.
Example use cases
Specific business pains
- 📉 Low GPU utilization: GPUs are expensive and hard to keep busy. ExaDeploy can help you optimize utilization for any workload.
- 💰 High TCO for model deployment: ExaDeploy is easy to set up and uses a familiar API for model users — no YAML, REST, or gRPC knowledge required.
- 🤖 Robotics and autonomous vehicles: Maintain one code path for on-device and in the cloud. Ensure reproducibility.
- 📹 Video processing: Save bandwidth and stream video to remote GPUs without having to think about fault tolerance.
- 🍽️ Multi-model serving: ExaDeploy can keep the most models warm in the fewest GPUs.
- 💬 Join the Exafunction Community Slack to ask questions and get help.
- 📝 Check out our blog to see what we've been thinking about and working on.