Unlocking Local AI on Any GPU Docker Model Runner Now with Vulkan Support
December 8, 2025 · 400 words · 2 min
Running large language models (LLMs) on your local machine is one of the most exciting frontiers in
Running large language models (LLMs) on your local machine is one of the most exciting frontiers in AI development. At Docker, our goal is to make this process as simple and accessible as possible. That’s why we built , a tool to help you download and run LLMs with a single command. Until now, GPU-accelerated inferencing with Model Runner was limited to CPU, NVIDIA GPUs (via CUDA), and Apple Silicon (via Metal). Today, we’re thrilled to announce a major step forward in democratizing local AI: This means you can now leverage hardware acceleration for LLM inferencing on a much wider range of GPUs, and other vendors that support the Vulkan API. So, what’s the big deal about Vulkan? Vulkan is a modern, cross-platform graphics and compute API. Unlike CUDA, which is specific to NVIDIA GPUs, or Metal, which is for Apple hardware, Vulkan is an open standard that works across a huge range of graphics cards. This means if you have a modern GPU from AMD, Intel, or even an integrated GPU on your laptop, you can now get a massive performance boost for your local AI workloads. By integrating Vulkan (thanks to our underlying llama.cpp engine), we’re unlocking GPU-accelerated inferencing for a much broader community of developers and enthusiasts. More hardware, more speed, more fun! The best part? You don’t need to do anything special to enable it. We believe in convention over configuration. Docker Model Runner automatically detects compatible Vulkan hardware and uses it for inferencing. If a Vulkan-compatible GPU isn’t found, it seamlessly falls back to CPU. Ready to give it a try? Just run the following command in your terminal: This command will: Pull the Gemma 3 model. Detect if you have a Vulkan-compatible GPU with the necessary drivers installed. Run the model, using your GPU to accelerate the process. It’s that simple. You can now chat with a powerful LLM running directly on your own machine, faster than ever. Docker Model Runner is an open-source project, and we’re building it in the open with our community. Your contributions are vital as we expand hardware support and add new features. Head over to our GitHub repository to get involved: Please star the repo to show your support, fork it to experiment, and consider contributing back with your own improvements.