Attach multiple Elastic Inference accelerators to a single EC2 instance

Posted on: Dec 12, 2019

You can now attach multiple Amazon Elastic Inference accelerators to a single Amazon EC2 instance. With this capability, you can use a single EC2 instance in an auto-scaling group when you are running inference for multiple models. By attaching multiple accelerators to a single instance, you can avoid deploying multiple auto-scaling groups of CPU or GPU instances for your inference and lower your operating costs.

Amazon Elastic Inference allows you to attach just the right amount of GPU-powered acceleration to any Amazon EC2 instance to reduce the cost of running deep learning inference by up to 75%. Since your models might require different amounts of GPU memory and compute capacity, you can choose different Elastic Inference accelerator sizes to attach to your CPU instance. For faster response times, you can load your models to an accelerator once and continue making inference calls without unloading models.  

Adding multiple accelerators to an EC2 instance is supported in all the regions where Amazon Elastic Inference is available. For more information on attaching multiple accelerators to a single instance, see Using TensorFlow Models with Elastic Inference and Using MXNet Models with Elastic Inference.