Introducing agentic retrieval in Azure AI Search

MattGotteiner

Microsoft

May 19, 2025

An automated query engine that creates and runs its own retrieval plan for relevant results to complex questions.

Today we’re announcing agentic retrieval in Azure AI Search, a multiturn query engine that plans and runs its own retrieval strategy for improved answer relevance. Compared to traditional, single-shot RAG, agentic retrieval improves answer relevance to complex questions by up to 40%. It transforms queries, runs parallel searches, and delivers results tuned for agents, along with references and a query activity log. Now available in public preview.

What is agentic retrieval?

Agentic retrieval in Azure AI Search uses a new query architecture that incorporates user conversation history and an Azure OpenAI model to plan, retrieve and synthesize queries.

Here's how it works:

An LLM analyzes the entire chat thread to identify the underlying information need. Instead of a single, catch-all query, the model breaks down compound questions into focused subqueries.

Query planning uses an LLM to set a strategy, incorporating chat history along with the original query.

2. Each subquery runs simultaneously across both your traditional text fields and any vector embeddings in Azure AI Search. This hybrid approach ensures you surface both keyword matches and semantic similarities at once, dramatically improving recall.

3. Results from every subquery are reranked using Azure AI Search’s semantic ranker to produce a single, coherent grounding payload. A unified response string contains the top hits, while an accompanying references array delivers structured document metadata for flexible downstream use.

4. The API returns an activity log of every retrieval step: input and output token counts for the LLM, subquery text, per-query hit counts, filters applied, and execution timings. This visibility helps you understand decisions taken during the retrieval process and troubleshoot any relevance issues

Architecture overview

Agentic retrieval builds on top of three core components of Azure AI Search:

Index: Your search index holds both plain text and vectorized content, organized under a semantic configuration. Text fields marked as searchable and retrievable feed the LLM for query planning and grounding, while vector fields support similarity search when you enable a vectorizer.

Agent: A new top-level resource that links your Azure AI Search service to an Azure OpenAI model. It encapsulates the model’s endpoint, authentication, and default parameters for reranking thresholds, reference inclusion, and runtime limits.

Retrieval engine: Orchestrates the end-to-end flow: invoking the LLM for query planning, dispatching subqueries to the index in parallel, collecting results, reranking, and packaging the final grounding data along with metadata arrays.

Agentic retrieval marks a departure from traditional search features, and a shift to knowledge retrieval capabilities intentionally designed to ground agents.

AT&T: pioneers in delivering enterprise-scale RAG solutions.

"Our applications must deliver high RAG performance quality, and to uphold these standards, we rely on AI Search to provide us with the latest retrieval technology. We are looking forward to using Azure AI Search’s agentic retrieval with our agents to match the speed, complexity and diversity of information we’ll need to hit our targets."

- Mark Austin, Vice President, Data Science, AT&T

Availability and pricing

Agentic Retrieval is now in public preview in select regions.
Pricing consists of two parts:

Query planning: Billed per input and output token in Azure OpenAI
Semantic ranking: Token-based billing in Azure AI Search for each subquery’s reranker calls.

In the initial phase of the public preview, semantic ranking is free; standard token billing applies after the preview phase. Learn more about pricing here