Powered by Docker Streamlining Engineering Operations as a Platform Engineer
December 8, 2025 · 787 words · 4 min
As a platform engineer at a mid-size startup, I’m responsible for identifying bottlenecks and develo
As a platform engineer at a mid-size startup, I’m responsible for identifying bottlenecks and developing solutions to streamline engineering operations to keep up with the velocity and scale of the engineering organization. In this post, I outline some of the challenges we faced with one of our clients, how we addressed them, and provide guides on how to tackle these challenges at your company. One of our clients faced critical engineering challenges, including poor synchronization between development and CI/CD environments, slow incident response due to inadequate rollback mechanisms, and fragmented telemetry tools that delayed issue resolution. Siimpl implemented strategic solutions to enhance development efficiency, improve system reliability, and streamline observability, turning obstacles into opportunities for growth. We had a requirement for multi-architecture support (arm64/amd64), which we initially implemented in CI/CD with Docker Buildx and QEMU. However, we noticed an extreme dip in performance due to the emulated architecture build times. We were able to reduce build times by almost 90% by ditching QEMU (emulated builds), and targeting arm64 and amd64 self-hosted runners. This gave us the advantage of blazing-fast native architecture builds, but still allowed us to support multi-arch by publishing the manifest after-the-fact. Here’s a working example of the solution we will walk through: If you’d like to deploy this yourself, there’s a guide in the . This project uses the following tools: Because this project uses industry-standard tooling like Terraform, Kubernetes, and Helm, it can be easily adapted to any CI/CD or cloud solution you need. The secret sauce of this solution is provisioning the self-hosted runners in a way that allows our CI/CD to specify which architecture to execute the build on. The first step is to provision two node pools — an amd64 node pool and an arm64 node pool, which can be found in the . In this example, the node_count is fixed at 1 for both node pools but for better scalability/flexibility you can also enable autoscaling for a dynamic pool. Next, we need to update the self-hosted runners’ to have a configurable nodeSelector. This will allow us to deploy one runner scale set to the and one to the . Once the Terraform resources are successfully created, the runners should be registered to the organization or repository you specified in the GitHub config URL. We can now update the REGISTRY values for the and the . After creating a pull request with those changes, navigate to the tab to witness the results. You should see two jobs kick off, one using the emulated build path with QEMU, and the other using the self-hosted runners for native node builds. Depending on cache hits or the Dockerfile being built, the performance improvements can be up to 90%. Even with this substantial improvement, utilizing Docker Build Cloud can improve performance 95%. More importantly, you can reap the benefits during development builds! Take a look at the workflow for more details. All you need is a Docker Build Cloud subscription and a cloud driver to take advantage of the improved pipeline.
- Generate GitHub PAT
- Update the
- Initialise AZ CLI
- Deploy Cluster
- Create a PR to validate pipelines for reference Recognizing that deployment issues can arise unexpectedly, we needed a mechanism to quickly and reliably rollback production deployments. Below is an example workflow for properly rolling back a deployment based on the tagging strategy we implemented above. As we adopted a OpenTelemetry to standardize observability, we quickly realized that adoption was one of the toughest hurdles. As a team, we decided to bake in as much configuration as possible into the infrastructure (Terraform modules) so that we could easily distribute and maintain observability instrumentation. At the application level, configuring the auto-instrumentation posed a challenge since most applications varied in their build process. By leveraging multi-stage Dockerfiles, we were able to help standardize the way we initialized the auto-instrumentation libraries across microservices. We were primarily a nodejs shop, so below is an example Dockerfile for that. By addressing these challenges we were able to reduce build times by , which alone dropped our DORA metrics for and by With the rollback strategy and telemetry changes, we were able to reduce our Mean time to Detect (MTTD) and Mean time to resolve (MTTR) by . We believe that it could get to with tuning of alerts and the addition of runbooks (automated and manual). With Docker at the core, Siimpl.io’s solutions demonstrate how teams can build faster, more reliable, and scalable systems. Whether you’re optimizing CI/CD pipelines, enhancing telemetry, or ensuring secure rollbacks, Docker provides the foundation for success. Try Docker today to unlock new levels of developer productivity and operational efficiency.