How to Deploy GPU-Accelerated Applications on Amazon ECS with Docker Compose
December 8, 2025 · 1963 words · 10 min
Many applications can take advantage of GPU acceleration, in particular resource-intensive Machine L
- Many applications can take advantage of GPU acceleration, in particular resource-intensive Machine Learning (ML) applications. The development time of such applications may vary based on the hardware of the machine we use for development. Containerization will facilitate development due to reproducibility and will make the setup easily transferable to other machines. Most importantly, a containerized application is easily deployable to platforms such as Amazon ECS, where it can take advantage of different hardware configurations.
- In this tutorial, we discuss how to develop GPU-accelerated applications in containers locally and how to use Docker Compose to easily deploy them to the cloud (the Amazon ECS platform). We make the transition from the local environment to a cloud effortless, the GPU-accelerated application being packaged with all its dependencies in a Docker image, and deployed in the same way regardless of the target environment.
- In order to follow this tutorial, we need the following tools installed locally:
- For deploying to a cloud platform, we rely on the new Docker Compose implementation embedded into the
- binary. Therefore, when targeting a cloud platform we are going to run
- commands instead of
- . For local commands, both implementations of Docker Compose should work. If you find a missing feature that you use, report it on the
- .
- Keep in mind that what we want to showcase is how to structure and manage a GPU accelerated application with Docker Compose, and how we can deploy it to the cloud. We do not focus on GPU programming or the AI/ML algorithms, but rather on how to structure and containerize such an application to facilitate portability, sharing and deployment.
- For this tutorial, we rely on sample code provided in the Tensorflow documentation, to simulate a GPU-accelerated translation service that we can orchestrate with Docker Compose. The original code can be found documented at
- For this exercise, we have reorganized the code such that we can easily manage it with Docker Compose.
- This sample uses the Tensorflow platform which can automatically use GPU devices if available on the host. Next, we will discuss how to organize this sample in services to containerize them easily and what the challenges are when we locally run such a resource-intensive application.
- The sample code to use throughout this tutorial can be found
- . It needs to be downloaded locally to exercise the commands we are going to discuss.
- Let’s assume we want to build and deploy a service that can translate simple sentences to a language of our choice. For such a service, we need to train an ML model to translate from one language to another and then use this model to translate new inputs.
- We choose to separate the phases of the ML process in two different Compose services:
- This structure is defined in the
- from the downloaded sample application which has the following content:
- We want the training service to train a model to translate from English to French and to save this model to a named volume
- that is shared between the two services. The translator service has a published port to allow us to query it easily.
- The reason for starting with the simplified compose file is that it can be deployed locally whether a GPU is present or not. We will see later how to add the GPU resource reservation to it.
- Before deploying, rename the
- to
- to avoid setting the file path with the flag
- for every compose command.
- To deploy the Compose file, all we need to do is open a terminal, go to its base directory and run:
- Docker Compose deploys a container for each service and attaches us to their logs which allows us to follow the progress of the training service.
- Every 10 cycles (epochs), the training service requests the translator to reload its model from the last checkpoint. If the translator is queried before the first training phase (10 cycles) is completed, we should get the following message.
- From the logs, we can see that each training cycle is resource-intensive and may take very long (depending on parameter setup in the ML algorithm).
- The training service runs continuously and checkpoints the model periodically to a named volume shared between the two services.
- We can now query the translator service which uses the trained model:
- Keep in mind that, for this exercise, we are not concerned about the accuracy of the translation but how to set up the entire process following a service approach that will make it easy to deploy with Docker Compose.
- During development, we may have to re-run the training process and evaluate it each time we tweak the algorithm. This is a very time consuming task if we do not use development machines built for high performance.
- An alternative is to use on-demand cloud resources. For example, we could use cloud instances hosting GPU devices to run the resource-intensive components of our application. Running our sample application on a machine with access to a GPU will automatically switch to train the model on the GPU. This will speed up the process and significantly reduce the development time.
- The first step to deploy this application to some faster cloud instances is to pack it as a Docker image and push it to Docker Hub, from where we can access it from cloud instances.
- During the deployment with
- , the application is packed as a Docker image which is then used to create the containers. We need to tag the built images and push them to Docker Hub.
- A simple way to do this is by setting the image property for services in the Compose file. Previously, we had only set the build property for our services, however we had no image defined. Docker Compose requires at least one of these properties to be defined in order to deploy the application.
- We set the image property following the pattern
- where the tag is optional (default to ‘latest’). We take for example a Docker Hub account ID
- and the application name
- . Edit the compose file and set the image property for the two services as below:
- To build the images run:
- Notice the image has been named according to what we set in the Compose file.
- Before pushing this image to Docker Hub, we need to make sure we are logged in. For this we run:
- Push the image we built:
- The image pushed is public unless we set it to private in Docker Hub’s repository settings. The
- covers this in more detail.
- With the image stored in a public image registry, we will look now at how we can use it to deploy our application on Amazon ECS and how we can use GPUs to accelerate it.
- To deploy the application to Amazon ECS, we need to have credentials for accessing an AWS account and to have Docker CLI set to target the platform.
- Let’s assume we have a valid set of AWS credentials that we can use to connect to AWS services. We need now to create an ECS Docker context to redirect all Docker CLI commands to Amazon ECS.
- To create an ECS context run the following command:
- This prompts users with 3 options, depending on their familiarity with the AWS credentials setup.
- For this exercise, to skip the details of AWS credential setup, we choose the first option. This requires us to have the
- and
- set in our environment, when running Docker commands that target Amazon ECS.
- We can now run Docker commands and set the context flag for all commands targeting the platform, or we can switch it to be the context in use to avoid setting the flag on each command.
- Set the context we created previously as the context in use by running:
- Starting from here, all the subsequent Docker commands are going to target Amazon ECS. To switch back to the default context targeting the local environment, we can run the following:
- For the following commands, we keep ECS context as the current context in use. We can now run a command to check we can successfully access ECS.
- Before deploying the application to Amazon ECS, let’s have a look at how to update the Compose file to request GPU access for the training service. This
- describes a way to define GPU reservations. In the next section, we cover the new format supported in the local compose and the legacy
- .
- Tensorflow can make use of NVIDIA GPUs with CUDA compute capabilities to speed up computations. To reserve NVIDIA GPUs, we edit the docker-compose.yaml that we defined previously and add the deploy property under the
- service as follows:
- For this example we defined a reservation of 2 NVIDIA GPUs and 32GB memory dedicated to the container. We can tweak these parameters according to the resources of the machine we target for deployment. If our local dev machine hosts an NVIDIA GPU, we can tweak the reservation accordingly and deploy the Compose file locally. Ensure you have installed the
- and set up the Docker Engine to use it before deploying the Compose file.
- We focus in the next part on how to make use of GPU cloud instances to run our sample application.
- Note: We assume the image we pushed to Docker Hub is public. If so, there is no need to authenticate in order to pull it (unless we exceed the pull rate limit). For images that need to be kept private, we need to define the
- property with a reference to the credentials to use for authentication. Details on how to set it can be found in the
- .
- Export the AWS credentials to avoid setting them for every command.
- When deploying the Compose file, Docker Compose will also reserve an EC2 instance with GPU capabilities that satisfies the reservation parameters. In the example we provided, we ask to reserve an instance with 32GB and 2 Nvidia GPUs. Docker Compose matches this reservation with the instance that satisfies this requirement. Before setting the reservation property in the Compose file, we recommend to check the Amazon GPU instance types and set your reservation accordingly. Ensure you are targeting an Amazon region that contains such instances.
- Aside from ECS containers, we will have a
g4dn.12xlargeEC2 instance reserved. Before deploying to the cloud, check the Amazon documentation for the resource cost this will incur. To deploy the application, we run the same command as in the local environment. Check the status of the services: Query the exposed translator endpoint. We notice the same behaviour as in the local deployment (the model reload has not been triggered yet by the training service). Check the logs for the GPU device’s tensorflow detected. We can easily identify the 2 GPU devices we reserved and how the training is almost 10X faster than our CPU-based local training. The training service runs continuously and triggers the model reload on the translation service every 10 cycles (epochs). Once the translation service has been notified at least once, we can stop and remove the training service and release the GPU instances at any time we choose. We can easily do this by removing the service from the Compose file: and then run again to update the running application. This will apply the changes and remove the training service. We can list the services running to see the training service has been removed and we only have the translator one: Query the translator: To remove the application from Amazon ECS run: We discussed how to setup a resource-intensive ML application to make it easily deployable in different environments with Docker Compose. We have exercised how to define the use of GPUs in a Compose file and how to deploy it on Amazon ECS.