Running Large Language Models (LLMs) locally with tools like Ollama is a paradigm shift for developer productivity. It creates a sandbox for innovation, enabling rapid prototyping, offline development, and limitless experimentation without the friction of API keys, rate limits, or pay-per-token costs. This local-first approach empowers developers to integrate AI into their workflows on their own terms. However, this powerful autonomy has a critical limitation: a local model’s knowledge is static, frozen at the time of its training. This “knowledge cutoff” makes it unreliable for tasks requiring real-time information, such as working with the latest API documentation or understanding current events, which can bring a productive coding session to a halt.
Ollama’s Web Search feature is the key to unlocking the full potential of this local AI workflow. It acts as a bridge, connecting the self-contained local LLM to the live, dynamic information of the internet. This transforms the LLM from a static database into an active reasoning engine capable of performing actions, with web searching being a primary and powerful new skill. This evolution is the core of building agentic AI, where the model can autonomously decide it needs more information and knows how to get it.
This guide provides a step-by-step, hands-on tutorial for Java developers to construct a sophisticated AI agent that embodies this new paradigm. The agent will run on a local Ollama instance using the gpt-oss:20b
model, be orchestrated by the Langchain4j library, and be built upon the high-performance, container-native Quarkus framework. The objective is to create an agent that can autonomously analyze a user’s query and decide when to consult the web to formulate an accurate, up-to-date response. The approach shown here, wrapping a REST API into a Langchain4j Tool
, is a powerful and transferable pattern. The skills learned can be applied to connect LLMs to any internal or external API, unlocking a vast potential for intelligent automation within the enterprise Java ecosystem.
Connecting the Components
Before writing any code, it is essential to establish a clear architectural blueprint. This conceptual map clarifies the role of each component and the flow of information through the system, providing the necessary context for the implementation details that follow. The application’s architecture is designed to be robust, modular, and efficient, leveraging the strengths of each technology in the stack.
The request lifecycle proceeds through the following sequence:
- User Query: An external user or system sends a natural language query to the application’s endpoint, for instance, “What were the key outcomes of the most recent G7 summit?”
- Quarkus JAX-RS Endpoint: A RESTful endpoint, built with Quarkus, receives the incoming HTTP request.
- Langchain4j
AiService
: The request is forwarded to a Langchain4jAiService
, which acts as the central orchestrator for the AI interaction. - Ollama LLM (
gpt-oss:20b
): TheAiService
passes the query to the localgpt-oss:20b
model running in Ollama. The LLM analyzes the query and recognizes that it lacks the necessary up-to-date information to provide an accurate answer. Based on the tools available to it, it determines that it must use theWebSearchTool
. WebSearchTool
Invocation: The agentic framework within Langchain4j invokes the designatedWebSearchTool
, passing the query as an argument.- Quarkus REST Client: The
WebSearchTool
utilizes a type-safe, declarative Quarkus REST Client to make a secure and authenticated API call to the Ollama Web Search service. This client is enhanced with MicroProfile Fault Tolerance policies to handle transient network issues gracefully. - Ollama Web Search API: The external API executes the search against its indexes and returns a structured JSON object containing a list of relevant results, including titles, URLs, and content snippets.
WebSearchTool
Response Formatting: The tool receives the response, processes it, and formats the search results into a concise, context-rich string that is optimized for LLM consumption.- LLM Synthesis: This formatted string of search results is passed back to the Ollama LLM as additional context along with the original query. The LLM then synthesizes this new information to generate a comprehensive and grounded answer.
- Final Response: The synthesized answer is returned through the
AiService
to the JAX-RS endpoint, which then sends the final response back to the user.
In this architecture, each component has a distinct and vital role:
- Quarkus: Serves as the cloud-native runtime. It provides the high-performance application framework, handles dependency injection (CDI), manages configuration, and, crucially, supplies the reactive REST client for efficient, non-blocking communication with the external search API.
- Langchain4j: Functions as the AI orchestration layer. It abstracts the complexities of interacting with the LLM and provides the core
AiService
and@Tool
programming models that enable agentic behavior. Thequarkus-langchain4j
extension ensures seamless integration into the Quarkus ecosystem. - Ollama: Acts as the local inference server. It hosts and executes the
gpt-oss:20b
model, performing the fundamental tasks of natural language understanding, reasoning, and text generation. - Ollama Web Search API: Operates as the external, specialized tool. This cloud-based service provides the real-time information necessary to ground the agent’s responses in current reality, overcoming the inherent limitations of the local model.
Project Initialization
The first step in implementation is to establish the project foundation. This involves creating a new Quarkus application with all the necessary dependencies configured from the outset. A proper setup ensures a smooth development experience and avoids configuration issues later in the process.
Prerequisites
Before proceeding, ensure the following tools are installed and configured on the development machine:
- Java Development Kit (JDK) 24 or later.
- Apache Maven 3.8.x or later.
- A running instance of Ollama. Installation instructions can be found on the official Ollama website.
- The required LLM model pulled locally. For this guide, execute the following command to download the
gpt-oss:20b
model:
ollama pull gpt-oss:20b
Bootstrapping the Application
The Quarkus command-line interface (CLI) provides the most efficient way to create a new project. Open a terminal and run the following command:
quarkus create app com.eldermoraes:ollama-web-search \
--extension='rest,rest-client,quarkus-langchain4j-ollama,smallrye-fault-tolerance' \
--java=24
This command scaffolds a new Maven project with a logical group and artifact ID. Crucially, it includes four essential Quarkus extensions:
rest
: Provides the core Quarkus REST implementation for creating RESTful endpoints.rest-client
: The modern, reactive REST client, which will be used to communicate with the Ollama Web Search API.quarkus-langchain4j-ollama
: The official Quarkiverse extension that provides seamless integration between Quarkus and Langchain4j for Ollama models.smallrye-fault-tolerance
: An implementation of the MicroProfile Fault Tolerance specification, used to add resilience to the REST client declaratively.
Reviewing Maven Dependencies
After the project is created, navigate into the ollama-web-search
directory. The pom.xml
file will contain the necessary dependencies. This confirms the setup and serves as a reference for developers who may wish to integrate these capabilities into an existing project. The key dependencies in the <dependencies>
block will include:
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest</artifactId>
</dependency>
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-ollama</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-rest-client</artifactId>
</dependency>
<dependency>
<groupId>io.quarkus</groupId>
<artifactId>quarkus-smallrye-fault-tolerance</artifactId>
</dependency>
With the project structure and dependencies in place, the foundation is now set for configuration and implementation.
Configuration: Wiring the Local and Remote Services
Proper configuration is critical for connecting the application’s components. The application.properties
file in Quarkus serves as the central hub for defining how the application communicates with both the local Ollama LLM and the remote Ollama Web Search API.
The complete configuration is placed in src/main/resources/application.properties
:
# Langchain4j - Ollama Integration: Point to the local model
# This configures the base URL for the local Ollama server.
quarkus.langchain4j.ollama.base-url=http://localhost:11434
# Specifies the exact model to be used by the chat service.
quarkus.langchain4j.ollama.chat-model.model-name=gpt-oss:20b
# Local models can take time to respond, especially for complex queries.
# A generous timeout prevents premature request failures.
quarkus.langchain4j.ollama.timeout=240s
# Quarkus REST Client for Ollama Web Search API
# This property maps the REST client interface to its base URL.
# The property key is the fully qualified class name of the interface.
com.eldermoraes.OllamaWebSearchApi/mp-rest/url=https://ollama.com/api
# API Key Management (for Ollama Web Search)
# This custom property will be injected into our tool to provide the API key.
# It reads from the OLLAMA_API_KEY environment variable.
ollama.web.search.api.key=${OLLAMA_API_KEY}
Configuration Details
- Local Ollama Connection: The
quarkus.langchain4j.ollama.*
properties configure the bridge to the local LLM instance.base-url
points to the default Ollama API endpoint,model-name
specifies the exact model to use, andtimeout
is set to a high value to accommodate potentially slow inference times on local hardware. - Remote API REST Client: The property prefixed with the fully qualified class name of the REST client interface (
com.eldermoraes.OllamaWebSearchApi
) is used by the MicroProfile REST Client extension to set the target URL. Resilience policies, such as timeouts and retries, are applied directly in the code using annotations, which we will see in the next section. - API Key Management: Security is paramount. The Ollama Web Search API requires an API key for authentication. The configuration
ollama.web.search.api.key=${OLLAMA_API_KEY}
instructs Quarkus to read the key from an environment variable namedOLLAMA_API_KEY
. For production environments, developers should leverage more robust solutions like Quarkus’s built-in support for secret management systems such as Kubernetes Secrets.
It is worth noting that the quarkus-langchain4j-ollama
extension also supports Quarkus Dev Services. This powerful feature can automatically start an Ollama instance in a container when running in development mode (quarkus dev
), completely removing the need for manual setup. While this guide uses a manually managed Ollama instance for clarity, Dev Services significantly enhances the developer experience for rapid prototyping and testing.
Crafting a Custom Web Search Tool
This section contains the technical heart of the application: the Java code that bridges Langchain4j with the Ollama Web Search API. This is achieved by creating a custom “Tool.” In the Langchain4j paradigm, a tool is a standard Java method, annotated to make it discoverable and callable by an AI agent. The framework handles the complex mechanics of interpreting the LLM’s intent to use a tool, parsing its arguments, invoking the corresponding Java method, and returning the result to the LLM for further processing.
Step 1: Defining the API Contract with Java Records
Before making an API call, it is best practice to define a clear, immutable contract for the data being exchanged. Java records
are ideally suited for this task, providing a concise way to model the JSON structures of the request and response. Based on the Ollama Web Search API documentation, the following records are defined.
Create a new file WebSearchModels.java
:
package com.eldermoraes;
import java.util.List;
public class WebSearchModels {
public record WebSearchRequest(String query) {}
public record SearchResult(String title, String url, String content) {}
public record WebSearchResponse(List<SearchResult> results) {}
}
Step 2: Building the Declarative and Resilient REST Client
With the data models defined, the next step is to create the client that will perform the HTTP request. The Quarkus MicroProfile REST Client extension allows for a declarative, interface-based approach that eliminates boilerplate code. We will also make it resilient using MicroProfile Fault Tolerance annotations.
Create a new interface OllamaWebSearchApi.java
:
package com.eldermoraes;
import com.eldermoraes.WebSearchModels.WebSearchRequest;
import com.eldermoraes.WebSearchModels.WebSearchResponse;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.HeaderParam;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;
import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.faulttolerance.Timeout;
import org.eclipse.microprofile.rest.client.inject.RegisterRestClient;
@Path("/web_search")
@RegisterRestClient
public interface OllamaWebSearchApi {
@POST
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
@Timeout(15000)
@Retry(maxRetries = 2, delay = 2000)
WebSearchResponse search(@HeaderParam("Authorization") String bearerToken, WebSearchRequest request);
}
Quarkus automatically implements this interface at build time. The @Path
, @POST
, @Produces
, and @Consumes
annotations are standard Jakarta REST annotations that define the HTTP request’s structure. The key additions are the fault tolerance annotations:
@Timeout(15000)
: This annotation ensures that if the API call takes longer than 15,000 milliseconds (15 seconds), it will be aborted with aTimeoutException
.@Retry(maxRetries = 2, delay = 2000)
: This powerful annotation instructs Quarkus to automatically retry thesearch
method up to two times if it fails. A delay of 2000 milliseconds is added between retries to give the remote service a chance to recover. This pattern is invaluable for handling transient network issues.
Step 3: Implementing the WebSearchTool
This class ties the configuration, data models, and REST client together into a functional tool that the Langchain4j agent can use.
Create a new class WebSearchTool.java
:
package com.eldermoraes;
import com.eldermoraes.WebSearchModels.WebSearchRequest;
import com.eldermoraes.WebSearchModels.WebSearchResponse;
import dev.langchain4j.agent.tool.Tool;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import org.eclipse.microprofile.rest.client.inject.RestClient;
import java.util.stream.Collectors;
@ApplicationScoped
public class WebSearchTool {
@Inject
@RestClient
OllamaWebSearchApi webSearchApi;
@ConfigProperty(name = "ollama.web.search.api.key")
String apiKey;
@Tool("Performs a web search to find up-to-date information on a given topic.")
public String search(String query) {
String bearerToken = "Bearer " + apiKey;
WebSearchRequest request = new WebSearchRequest(query);
WebSearchResponse response = webSearchApi.search(bearerToken, request);
if (response == null || response.results() == null || response.results().isEmpty()) {
return "No results found.";
}
return response.results().stream()
.map(result -> "Title: " + result.title() + "\nURL: " + result.url() + "\nContent: " + result.content())
.collect(Collectors.joining("\n\n---\n\n"));
}
}
This class uses standard CDI annotations: @ApplicationScoped
makes it a singleton bean, @Inject
and @RestClient
provide an instance of the API client, and @ConfigProperty
injects the API key from application.properties
. The most important annotation is @Tool
. The descriptive string "Performs a web search..."
is not a comment; it is the metadata the LLM will use to understand the tool’s purpose and decide when to use it. The method’s logic formats the structured API response into a clean, readable string, which is an effective way to provide context to the LLM.
Defining the AiService
With the web search tool built and ready, the final step is to create the AI agent and grant it the ability to use this new capability. In quarkus-langchain4j
, this is accomplished declaratively using the AiService
abstraction, which transforms a simple Java interface into a fully operational AI agent.
Create a new interface ResearchAgent.java
:
package com.eldermoraes;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;
@RegisterAiService(tools = WebSearchTool.class)
public interface ResearchAgent {
@SystemMessage("""
You are a helpful research assistant.
Your primary function is to provide accurate and up-to-date answers.
When you receive a question about recent events, emerging technologies, or any topic
for which your internal knowledge might be outdated, you must use the search tool.
After using the tool, synthesize the information from the search results into a
concise answer. You must cite your sources by including the URLs from the search results.
""")
String answer(@UserMessage String query);
}
Explanation of Key Annotations
@RegisterAiService
: This is a Quarkus-specific annotation that serves as the primary integration point. It instructs Quarkus to create a CDI bean that implements theResearchAgent
interface, wiring it to the configured LLM and any specified tools.tools = WebSearchTool.class
: This is the critical link that equips the agent. By passing theWebSearchTool.class
to the annotation, the agent becomes aware of thesearch
method and its description. It can now include this tool in its decision-making process.@SystemMessage
: This annotation is used for prompt engineering, which is essential for guiding the agent’s behavior. The system message acts as a set of standing instructions for the LLM. The provided prompt explicitly tells the agent its persona (“research assistant”), the conditions under which it should use the search tool (“recent events,” “outdated knowledge”), and the required format for its output (“synthesize,” “cite your sources”). This level of instruction is crucial for creating reliable and predictable agents.
Execution and Verification
With all the components built and configured, the final step is to expose the agent through a REST endpoint and verify its functionality with practical test cases. This will demonstrate that the agent can correctly differentiate between queries that require internal knowledge and those that necessitate a web search.
Exposing the Agent via a JAX-RS Resource
A simple Quarkus JAX-RS resource is sufficient to make the ResearchAgent
accessible over HTTP.
Create a new class AgentResource.java
:
package com.eldermoraes;
import jakarta.inject.Inject;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.core.MediaType;
@Path("/chat")
public class AgentResource {
@Inject
ResearchAgent agent;
@POST
@Consumes(MediaType.TEXT_PLAIN)
public String chat(String query) {
if (query == null || query.isBlank()) {
return "Please provide a query.";
}
return agent.answer(query);
}
}
This standard REST resource injects the ResearchAgent
AiService
and exposes a POST
endpoint at /chat
that accepts plain text.
Testing Scenarios
Start the application in development mode by running ./mvnw quarkus:dev
in the project root. Then, use a tool like curl
to interact with the agent.
1. Internal Knowledge Query
First, ask a question that a well-trained LLM should be able to answer from its existing knowledge base.
curl -X POST -H "Content-Type: text/plain" \
-d "What is the difference between a monolith and microservices" \
http://localhost:8080/chat
Expected Output: The agent will respond directly with a detailed explanation of monolithic and microservice architectures based on its training data. In the application logs, there would be no indication that the WebSearchTool
was invoked. This demonstrates the agent’s ability to use its internal knowledge efficiently.
2. External Knowledge Query
Next, ask a question that requires information created after the model’s knowledge cutoff date.
curl -X POST -H "Content-Type: text/plain" \
-d "What are the JEPs listed in Java 25 release?" \
http://localhost:8080/chat
Expected Output: The agent, guided by its system prompt, will recognize its knowledge gap. It will invoke the WebSearchTool
, process the results, and provide a synthesized answer that lists key features of Java 25 (which was released just a few days ago, after the model was trained), citing the URLs from which it gathered the information. The response will be grounded in real-time data, providing concrete proof that the entire agentic loop (from query analysis to tool invocation to final synthesis) is functioning correctly.
The Path to Smarter Java Agents
This guide has demonstrated the successful construction of a local AI agent in Java, capable of overcoming its inherent knowledge limitations by autonomously searching the web. By integrating Ollama, Langchain4j, and Quarkus, a sophisticated application was built that showcases a modern, powerful stack for enterprise AI development.
The most significant takeaway, however, extends beyond the specific implementation of a web search feature. The core lesson lies in the underlying pattern: using Langchain4j Tools
in conjunction with the Quarkus REST Client and MicroProfile Fault Tolerance to grant AI agents new, powerful, and resilient capabilities. This pattern is a universal adapter, enabling Java developers to connect LLMs to virtually any system or service that exposes a REST API. This includes:
- Internal corporate knowledge bases and databases.
- Enterprise platforms like Salesforce, SAP, or Jira.
- Third-party services for weather, finance, or logistics.
The future of enterprise AI for the vast majority of Java developers will not be in training foundational models from scratch. Instead, the real value and competitive advantage will come from skillfully integrating, orchestrating, and grounding these powerful models within existing business processes and systems. The agentic tool-use pattern is the key that unlocks this future, transforming LLMs from simple chatbots into autonomous agents that can perform meaningful work. This tutorial serves as a foundational step on that path, equipping developers with the practical skills needed to build the next generation of intelligent Java applications.
Hi Elder!
This article is vanguardist and important for me to reuse concepts and technics around AI on the Java enviroments.
Tks a lot.
Thanks, Espedito! Happy to help.