Building a Web-Searching Agent with Ollama, Langchain4j, and Quarkus

Running Large Language Models (LLMs) locally with tools like Ollama is a paradigm shift for developer productivity. It creates a sandbox for innovation, enabling rapid prototyping, offline development, and limitless experimentation without the friction of API keys, rate limits, or pay-per-token costs. This local-first approach empowers developers to integrate AI into their workflows on their own terms. However, this powerful autonomy has a critical limitation: a local model’s knowledge is static, frozen at the time of its training. This “knowledge cutoff” makes it unreliable for tasks requiring real-time information, such as working with the latest API documentation or understanding current events, which can bring a productive coding session to a halt.

Ollama’s Web Search feature is the key to unlocking the full potential of this local AI workflow. It acts as a bridge, connecting the self-contained local LLM to the live, dynamic information of the internet. This transforms the LLM from a static database into an active reasoning engine capable of performing actions, with web searching being a primary and powerful new skill. This evolution is the core of building agentic AI, where the model can autonomously decide it needs more information and knows how to get it.

This guide provides a step-by-step, hands-on tutorial for Java developers to construct a sophisticated AI agent that embodies this new paradigm. The agent will run on a local Ollama instance using the gpt-oss:20b model, be orchestrated by the Langchain4j library, and be built upon the high-performance, container-native Quarkus framework. The objective is to create an agent that can autonomously analyze a user’s query and decide when to consult the web to formulate an accurate, up-to-date response. The approach shown here, wrapping a REST API into a Langchain4j Tool, is a powerful and transferable pattern. The skills learned can be applied to connect LLMs to any internal or external API, unlocking a vast potential for intelligent automation within the enterprise Java ecosystem.

Connecting the Components

Before writing any code, it is essential to establish a clear architectural blueprint. This conceptual map clarifies the role of each component and the flow of information through the system, providing the necessary context for the implementation details that follow. The application’s architecture is designed to be robust, modular, and efficient, leveraging the strengths of each technology in the stack.

The request lifecycle proceeds through the following sequence:

  1. User Query: An external user or system sends a natural language query to the application’s endpoint, for instance, “What were the key outcomes of the most recent G7 summit?”
  2. Quarkus JAX-RS Endpoint: A RESTful endpoint, built with Quarkus, receives the incoming HTTP request.
  3. Langchain4j AiService: The request is forwarded to a Langchain4j AiService, which acts as the central orchestrator for the AI interaction.
  4. Ollama LLM (gpt-oss:20b): The AiService passes the query to the local gpt-oss:20b model running in Ollama. The LLM analyzes the query and recognizes that it lacks the necessary up-to-date information to provide an accurate answer. Based on the tools available to it, it determines that it must use the WebSearchTool.
  5. WebSearchTool Invocation: The agentic framework within Langchain4j invokes the designated WebSearchTool, passing the query as an argument.
  6. Quarkus REST Client: The WebSearchTool utilizes a type-safe, declarative Quarkus REST Client to make a secure and authenticated API call to the Ollama Web Search service. This client is enhanced with MicroProfile Fault Tolerance policies to handle transient network issues gracefully.
  7. Ollama Web Search API: The external API executes the search against its indexes and returns a structured JSON object containing a list of relevant results, including titles, URLs, and content snippets.
  8. WebSearchTool Response Formatting: The tool receives the response, processes it, and formats the search results into a concise, context-rich string that is optimized for LLM consumption.
  9. LLM Synthesis: This formatted string of search results is passed back to the Ollama LLM as additional context along with the original query. The LLM then synthesizes this new information to generate a comprehensive and grounded answer.
  10. Final Response: The synthesized answer is returned through the AiService to the JAX-RS endpoint, which then sends the final response back to the user.

In this architecture, each component has a distinct and vital role:

  • Quarkus: Serves as the cloud-native runtime. It provides the high-performance application framework, handles dependency injection (CDI), manages configuration, and, crucially, supplies the reactive REST client for efficient, non-blocking communication with the external search API.
  • Langchain4j: Functions as the AI orchestration layer. It abstracts the complexities of interacting with the LLM and provides the core AiService and @Tool programming models that enable agentic behavior. The quarkus-langchain4j extension ensures seamless integration into the Quarkus ecosystem.
  • Ollama: Acts as the local inference server. It hosts and executes the gpt-oss:20b model, performing the fundamental tasks of natural language understanding, reasoning, and text generation.
  • Ollama Web Search API: Operates as the external, specialized tool. This cloud-based service provides the real-time information necessary to ground the agent’s responses in current reality, overcoming the inherent limitations of the local model.

Project Initialization

The first step in implementation is to establish the project foundation. This involves creating a new Quarkus application with all the necessary dependencies configured from the outset. A proper setup ensures a smooth development experience and avoids configuration issues later in the process.

Prerequisites

Before proceeding, ensure the following tools are installed and configured on the development machine:

  • Java Development Kit (JDK) 24 or later.
  • Apache Maven 3.8.x or later.
  • A running instance of Ollama. Installation instructions can be found on the official Ollama website.
  • The required LLM model pulled locally. For this guide, execute the following command to download the gpt-oss:20b model:
ollama pull gpt-oss:20b

Bootstrapping the Application

The Quarkus command-line interface (CLI) provides the most efficient way to create a new project. Open a terminal and run the following command:

quarkus create app com.eldermoraes:ollama-web-search \
  --extension='rest,rest-client,quarkus-langchain4j-ollama,smallrye-fault-tolerance' \
  --java=24

This command scaffolds a new Maven project with a logical group and artifact ID. Crucially, it includes four essential Quarkus extensions:

  • rest: Provides the core Quarkus REST implementation for creating RESTful endpoints.
  • rest-client: The modern, reactive REST client, which will be used to communicate with the Ollama Web Search API.
  • quarkus-langchain4j-ollama: The official Quarkiverse extension that provides seamless integration between Quarkus and Langchain4j for Ollama models.
  • smallrye-fault-tolerance: An implementation of the MicroProfile Fault Tolerance specification, used to add resilience to the REST client declaratively.

Reviewing Maven Dependencies

After the project is created, navigate into the ollama-web-search directory. The pom.xml file will contain the necessary dependencies. This confirms the setup and serves as a reference for developers who may wish to integrate these capabilities into an existing project. The key dependencies in the <dependencies> block will include:

<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-rest</artifactId>
</dependency>
<dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-rest-client</artifactId>
</dependency>
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-smallrye-fault-tolerance</artifactId>
</dependency>

With the project structure and dependencies in place, the foundation is now set for configuration and implementation.

Configuration: Wiring the Local and Remote Services

Proper configuration is critical for connecting the application’s components. The application.properties file in Quarkus serves as the central hub for defining how the application communicates with both the local Ollama LLM and the remote Ollama Web Search API.

The complete configuration is placed in src/main/resources/application.properties:

# Langchain4j - Ollama Integration: Point to the local model
# This configures the base URL for the local Ollama server.
quarkus.langchain4j.ollama.base-url=http://localhost:11434

# Specifies the exact model to be used by the chat service.
quarkus.langchain4j.ollama.chat-model.model-name=gpt-oss:20b

# Local models can take time to respond, especially for complex queries.
# A generous timeout prevents premature request failures.
quarkus.langchain4j.ollama.timeout=240s

# Quarkus REST Client for Ollama Web Search API
# This property maps the REST client interface to its base URL.
# The property key is the fully qualified class name of the interface.
com.eldermoraes.OllamaWebSearchApi/mp-rest/url=https://ollama.com/api

# API Key Management (for Ollama Web Search)
# This custom property will be injected into our tool to provide the API key.
# It reads from the OLLAMA_API_KEY environment variable.
ollama.web.search.api.key=${OLLAMA_API_KEY}

Configuration Details

  • Local Ollama Connection: The quarkus.langchain4j.ollama.* properties configure the bridge to the local LLM instance. base-url points to the default Ollama API endpoint, model-name specifies the exact model to use, and timeout is set to a high value to accommodate potentially slow inference times on local hardware.
  • Remote API REST Client: The property prefixed with the fully qualified class name of the REST client interface (com.eldermoraes.OllamaWebSearchApi) is used by the MicroProfile REST Client extension to set the target URL. Resilience policies, such as timeouts and retries, are applied directly in the code using annotations, which we will see in the next section.
  • API Key Management: Security is paramount. The Ollama Web Search API requires an API key for authentication. The configuration ollama.web.search.api.key=${OLLAMA_API_KEY} instructs Quarkus to read the key from an environment variable named OLLAMA_API_KEY. For production environments, developers should leverage more robust solutions like Quarkus’s built-in support for secret management systems such as Kubernetes Secrets.

It is worth noting that the quarkus-langchain4j-ollama extension also supports Quarkus Dev Services. This powerful feature can automatically start an Ollama instance in a container when running in development mode (quarkus dev), completely removing the need for manual setup. While this guide uses a manually managed Ollama instance for clarity, Dev Services significantly enhances the developer experience for rapid prototyping and testing.

Crafting a Custom Web Search Tool

This section contains the technical heart of the application: the Java code that bridges Langchain4j with the Ollama Web Search API. This is achieved by creating a custom “Tool.” In the Langchain4j paradigm, a tool is a standard Java method, annotated to make it discoverable and callable by an AI agent. The framework handles the complex mechanics of interpreting the LLM’s intent to use a tool, parsing its arguments, invoking the corresponding Java method, and returning the result to the LLM for further processing.

Step 1: Defining the API Contract with Java Records

Before making an API call, it is best practice to define a clear, immutable contract for the data being exchanged. Java records are ideally suited for this task, providing a concise way to model the JSON structures of the request and response. Based on the Ollama Web Search API documentation, the following records are defined.

Create a new file WebSearchModels.java:

package com.eldermoraes;

import java.util.List;

public class WebSearchModels {

    public record WebSearchRequest(String query) {}

    public record SearchResult(String title, String url, String content) {}

    public record WebSearchResponse(List<SearchResult> results) {}
}

Step 2: Building the Declarative and Resilient REST Client

With the data models defined, the next step is to create the client that will perform the HTTP request. The Quarkus MicroProfile REST Client extension allows for a declarative, interface-based approach that eliminates boilerplate code. We will also make it resilient using MicroProfile Fault Tolerance annotations.

Create a new interface OllamaWebSearchApi.java:

package com.eldermoraes;

import com.eldermoraes.WebSearchModels.WebSearchRequest;
import com.eldermoraes.WebSearchModels.WebSearchResponse;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.HeaderParam;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.faulttolerance.Timeout;
import org.eclipse.microprofile.rest.client.inject.RegisterRestClient;

@Path("/web_search")
@RegisterRestClient
public interface OllamaWebSearchApi {

    @POST
    @Produces(MediaType.APPLICATION_JSON)
    @Consumes(MediaType.APPLICATION_JSON)
    @Timeout(15000)
    @Retry(maxRetries = 2, delay = 2000)
    WebSearchResponse search(@HeaderParam("Authorization") String bearerToken, WebSearchRequest request);
}

Quarkus automatically implements this interface at build time. The @Path, @POST, @Produces, and @Consumes annotations are standard Jakarta REST annotations that define the HTTP request’s structure. The key additions are the fault tolerance annotations:

  • @Timeout(15000): This annotation ensures that if the API call takes longer than 15,000 milliseconds (15 seconds), it will be aborted with a TimeoutException.
  • @Retry(maxRetries = 2, delay = 2000): This powerful annotation instructs Quarkus to automatically retry the search method up to two times if it fails. A delay of 2000 milliseconds is added between retries to give the remote service a chance to recover. This pattern is invaluable for handling transient network issues.

Step 3: Implementing the WebSearchTool

This class ties the configuration, data models, and REST client together into a functional tool that the Langchain4j agent can use.

Create a new class WebSearchTool.java:

package com.eldermoraes;

import com.eldermoraes.WebSearchModels.WebSearchRequest;
import com.eldermoraes.WebSearchModels.WebSearchResponse;
import dev.langchain4j.agent.tool.Tool;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import org.eclipse.microprofile.rest.client.inject.RestClient;
import java.util.stream.Collectors;

@ApplicationScoped
public class WebSearchTool {

    @Inject
    @RestClient
    OllamaWebSearchApi webSearchApi;

    @ConfigProperty(name = "ollama.web.search.api.key")
    String apiKey;

    @Tool("Performs a web search to find up-to-date information on a given topic.")
    public String search(String query) {
        String bearerToken = "Bearer " + apiKey;
        WebSearchRequest request = new WebSearchRequest(query);
        WebSearchResponse response = webSearchApi.search(bearerToken, request);

        if (response == null || response.results() == null || response.results().isEmpty()) {
            return "No results found.";
        }

        return response.results().stream()
         .map(result -> "Title: " + result.title() + "\nURL: " + result.url() + "\nContent: " + result.content())
         .collect(Collectors.joining("\n\n---\n\n"));
    }
}

This class uses standard CDI annotations: @ApplicationScoped makes it a singleton bean, @Inject and @RestClient provide an instance of the API client, and @ConfigProperty injects the API key from application.properties. The most important annotation is @Tool. The descriptive string "Performs a web search..." is not a comment; it is the metadata the LLM will use to understand the tool’s purpose and decide when to use it. The method’s logic formats the structured API response into a clean, readable string, which is an effective way to provide context to the LLM.

Defining the AiService

With the web search tool built and ready, the final step is to create the AI agent and grant it the ability to use this new capability. In quarkus-langchain4j, this is accomplished declaratively using the AiService abstraction, which transforms a simple Java interface into a fully operational AI agent.

Create a new interface ResearchAgent.java:

package com.eldermoraes;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(tools = WebSearchTool.class)
public interface ResearchAgent {

    @SystemMessage("""
        You are a helpful research assistant.
        Your primary function is to provide accurate and up-to-date answers.
        When you receive a question about recent events, emerging technologies, or any topic
        for which your internal knowledge might be outdated, you must use the search tool.
        After using the tool, synthesize the information from the search results into a
        concise answer. You must cite your sources by including the URLs from the search results.
    """)
    String answer(@UserMessage String query);
}

Explanation of Key Annotations

  • @RegisterAiService: This is a Quarkus-specific annotation that serves as the primary integration point. It instructs Quarkus to create a CDI bean that implements the ResearchAgent interface, wiring it to the configured LLM and any specified tools.
  • tools = WebSearchTool.class: This is the critical link that equips the agent. By passing the WebSearchTool.class to the annotation, the agent becomes aware of the search method and its description. It can now include this tool in its decision-making process.
  • @SystemMessage: This annotation is used for prompt engineering, which is essential for guiding the agent’s behavior. The system message acts as a set of standing instructions for the LLM. The provided prompt explicitly tells the agent its persona (“research assistant”), the conditions under which it should use the search tool (“recent events,” “outdated knowledge”), and the required format for its output (“synthesize,” “cite your sources”). This level of instruction is crucial for creating reliable and predictable agents.

Execution and Verification

With all the components built and configured, the final step is to expose the agent through a REST endpoint and verify its functionality with practical test cases. This will demonstrate that the agent can correctly differentiate between queries that require internal knowledge and those that necessitate a web search.

Exposing the Agent via a JAX-RS Resource

A simple Quarkus JAX-RS resource is sufficient to make the ResearchAgent accessible over HTTP.

Create a new class AgentResource.java:

package com.eldermoraes;

import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.core.MediaType;

@Path("/chat")
public class AgentResource {

    @Inject
    ResearchAgent agent;

    @POST
    @Consumes(MediaType.TEXT_PLAIN)
    public String chat(String query) {
        if (query == null || query.isBlank()) {
            return "Please provide a query.";
        }
        return agent.answer(query);
    }
}

This standard REST resource injects the ResearchAgent AiService and exposes a POST endpoint at /chat that accepts plain text.

Testing Scenarios

Start the application in development mode by running ./mvnw quarkus:dev in the project root. Then, use a tool like curl to interact with the agent.

1. Internal Knowledge Query

First, ask a question that a well-trained LLM should be able to answer from its existing knowledge base.

curl -X POST -H "Content-Type: text/plain" \
  -d "What is the difference between a monolith and microservices" \
  http://localhost:8080/chat

Expected Output: The agent will respond directly with a detailed explanation of monolithic and microservice architectures based on its training data. In the application logs, there would be no indication that the WebSearchTool was invoked. This demonstrates the agent’s ability to use its internal knowledge efficiently.

2. External Knowledge Query

Next, ask a question that requires information created after the model’s knowledge cutoff date.

curl -X POST -H "Content-Type: text/plain" \
  -d "What are the JEPs listed in Java 25 release?" \
  http://localhost:8080/chat

Expected Output: The agent, guided by its system prompt, will recognize its knowledge gap. It will invoke the WebSearchTool, process the results, and provide a synthesized answer that lists key features of Java 25 (which was released just a few days ago, after the model was trained), citing the URLs from which it gathered the information. The response will be grounded in real-time data, providing concrete proof that the entire agentic loop (from query analysis to tool invocation to final synthesis) is functioning correctly.

The Path to Smarter Java Agents

This guide has demonstrated the successful construction of a local AI agent in Java, capable of overcoming its inherent knowledge limitations by autonomously searching the web. By integrating Ollama, Langchain4j, and Quarkus, a sophisticated application was built that showcases a modern, powerful stack for enterprise AI development.

The most significant takeaway, however, extends beyond the specific implementation of a web search feature. The core lesson lies in the underlying pattern: using Langchain4j Tools in conjunction with the Quarkus REST Client and MicroProfile Fault Tolerance to grant AI agents new, powerful, and resilient capabilities. This pattern is a universal adapter, enabling Java developers to connect LLMs to virtually any system or service that exposes a REST API. This includes:

  • Internal corporate knowledge bases and databases.
  • Enterprise platforms like Salesforce, SAP, or Jira.
  • Third-party services for weather, finance, or logistics.

The future of enterprise AI for the vast majority of Java developers will not be in training foundational models from scratch. Instead, the real value and competitive advantage will come from skillfully integrating, orchestrating, and grounding these powerful models within existing business processes and systems. The agentic tool-use pattern is the key that unlocks this future, transforming LLMs from simple chatbots into autonomous agents that can perform meaningful work. This tutorial serves as a foundational step on that path, equipping developers with the practical skills needed to build the next generation of intelligent Java applications.

2 thoughts on “Building a Web-Searching Agent with Ollama, Langchain4j, and Quarkus”

  1. Hi Elder!
    This article is vanguardist and important for me to reuse concepts and technics around AI on the Java enviroments.
    Tks a lot.

    Reply

Leave a Reply to Elder Moraes Cancel reply