Run Embedding Models and Unlock Semantic Search with Docker Model Runner
December 8, 2025 · 984 words · 5 min
Embeddings have become the backbone of many modern AI applications. From semantic search to retrieva
- Embeddings have become the backbone of many modern AI applications. From semantic search to retrieval-augmented generation (RAG) and intelligent recommendation systems, embedding models enable systems to understand the
- behind text, code, or documents, not just the literal words.
- But generating embeddings comes with trade-offs. Using a hosted API for embedding generation often results in reduced data privacy, higher call costs, and time-consuming model regeneration. When your data is private or constantly evolving (think internal documentation, proprietary code, or customer support content), these limitations quickly become blockers.
- Instead of sending data to a remote service, you can easily run local embedding models on-premises with
- . Model Runner brings the power of modern embeddings to your local environment, giving you privacy, control, and cost-efficiency out of the box.
- In this post, you’ll learn how to use embedding models for semantic search. We’ll start by covering the theory behind embedding and why developers should run them. Then, we’ll wrap up with a practical example, using Model Runner, to help you get started.
- Let’s take a moment to first demystify what embeddings are.
- represent words, sentences, and even code as high-dimensional numerical vectors that capture semantic relationships. In this vector space, similar items cluster together, while dissimilar ones are farther apart.
- For example, a traditional keyword search looks for exact matches. If you search for “
- ”, you’ll only find documents containing that exact term. But with embeddings, searching for “
- ” might also surface results about authentication, session management, or security tokens because the model understands that these are semantically related ideas.
- This makes embeddings the foundation for more intelligent search, retrieval, and discovery — where systems understand
- , not just what you type.
- For a deeper perspective on how language and meaning intersect in AI, check out “
- .
- Here’s where the math behind semantic search comes in, and it’s elegantly simple.
- Once text is converted into vectors (lists of numbers), we can measure how similar two pieces of text are using cosine similarity:
- Similarity = A ⋅ B / ||A|| x ||B||
- Where:
- The result is a similarity score, typically between 0 and 1, where values closer to 1 mean the texts are more similar in meaning.
- In practice:
- This simple mathematical measure allows you to rank documents by how semantically close they are to your query, which powers features like:
- With Model Runner, you can generate these embeddings locally, feed them into a vector database (like Milvus, Qdrant, or pgvector), and start building your own semantic search system without sending a single byte to a third-party API.
- With Model Runner, you don’t have to worry about setting up environments or dependencies. Just pull a model, start the runner, and you’re ready to generate embeddings, all inside a familiar Docker workflow.
- Your sensitive data never leaves your environment. Whether you’re embedding source code, internal documents, or customer content, you can rest assured that everything stays local — no third-party API calls, no network exposure.
- There are no usage-based API costs. Once you have the
- , you can generate, update, or rebuild your embeddings as often as you need, at no extra cost.
- That means iterating on your dataset or experimenting with new prompts won’t affect your budget.
- Run the model that best fits your use case, leveraging your own CPU or GPU for inference.
- Models are distributed as OCI artifacts, so they integrate seamlessly into your existing Docker workflows, CI/CD pipelines, and local development setups. This means you can manage and version models just like any other container image, ensuring consistency and reproducibility across environments.
- Model Runner lets you bring models to your data, not the other way around, unlocking local, private, and cost-effective AI workflows.
- Now that we understand what embeddings are and how they capture semantic meaning, let’s see how simple it is to generate embeddings locally using Model Runner.
- You can now send text to this endpoint via curl or your preferred HTTP client:
- The response will include a list of embedding vectors, which is a numerical representation of your input text.
- You can store these vectors in a vector database like Milvus, Qdrant, or pgvector to perform semantic search or similarity queries.
- Let’s make it practical.
- Imagine you want to enable semantic code search across your project repository.
- The process will look like:
- Split your codebase into logical chunks. Generate embeddings for each chunk using your local Docker Model Runner endpoint.
- Save those embeddings along with metadata (file name, path, etc.). You would usually use a Vector Database to store these embeddings, but in this demo, we’re going to store them in a file for simplicity.
- When a developer searches “
- ”, you embed the query and compare it to your stored vectors using cosine similarity.
- We have included a
- that does exactly that.
- Codebase example demo with embeddings stats, example queries, and search results. Embeddings help applications work with intelligent meaning, not just keywords. The old hassle was wiring up third-party APIs, juggling data privacy, and watching per-call costs creep up. Docker Model Runner flips the script. Now, you can run embedding models locally where your data lives with full control over your data and infrastructure. Ship semantic search, RAG pipelines, or custom search with a consistent Docker workflow — private, cost-effective, and reproducible. No usage fees. No external dependencies. By bringing models directly to your data, Docker makes it easier than ever to explore, experiment, and innovate, safely and at your own pace. The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can: We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work! →