Building a Local-First AI Knowledge Base with Quarkus, Langchain4j, and Ollama

Every established company faces a common, persistent challenge: its most valuable asset, its collective knowledge, is often fragmented and difficult to access. Technical documentation, project histories, HR policies, and architectural decisions are scattered across Confluence, SharePoint, network drives, and countless Slack channels. When a developer needs a specific piece of information, their search often becomes a frustrating exercise in navigating outdated links, disorganized folder structures, and interrupting colleagues in the hope that someone holds the answer.

This isn’t just an inconvenience; it carries a significant business cost. Knowledge silos slow down the onboarding of new team members, hinder developer productivity, and can lead to expensive mistakes when decisions are based on incomplete or outdated information.

The modern solution is not another wiki but an intelligent, private Q&A system capable of understanding natural language and retrieving precise answers from this sea of data. This tutorial will guide you through building such a system with a “local-first” approach, which can easily be ported to a production environment. By leveraging the power of local Large Language Models (LLMs), we can create a solution that is private, cost effective, and gives developers complete control over their data and infrastructure.

We will assemble a powerful, modern Java toolkit to achieve this. Quarkus will serve as our high performance, container-first application framework, leveraging its powerful Dev Services. Langchain4j will act as the AI orchestrator, simplifying our interactions with the model and database. For our intelligence, we will use Ollama to run the powerful gpt-oss:20b model locally. Finally, PgVector will provide our system with a semantic memory, turning a standard PostgreSQL database into a sophisticated search engine. By the end of this guide, you will have a fully functional, end to end Retrieval Augmented Generation (RAG) application running entirely on your local machine.

The Anatomy of Our Smart Q&A System

Before diving into the code, it is essential to understand the architecture of our application. The flow is straightforward: a user’s question arrives at a JAX-RS REST endpoint. This endpoint passes the query to a core AI Service, which uses a Retriever to find relevant information from a PgVector database. This retrieved context, along with the original question, is then sent to the LLM, which generates a coherent, context-aware answer to be returned to the user.

Component Deep Dive

  • Quarkus (The Engine): Quarkus is the foundation of our application. It provides a high-performance, container-first Java stack optimized for developer productivity. Its most compelling feature for this project is Dev Services. This capability allows Quarkus to automatically provision and configure required services, like databases, in development and test modes without any manual setup. If you include an extension for a service but do not provide connection details, Quarkus transparently starts a container for you and wires your application to it, creating a zero configuration local development loop.
  • Ollama & gpt-oss:20b (The Brain): Ollama is a lightweight, extensible platform for running LLMs on your local machine. It simplifies the process of downloading, managing, and serving powerful models. For this tutorial, we will use gpt-oss:20b, an open weight model from OpenAI. Despite its relatively small size, it packs impressive reasoning capabilities and, crucially, can run effectively on consumer hardware with as little as 16 GB of RAM, making it a perfect choice for local development and experimentation.
  • PgVector (The Memory): A standard relational database excels at structured queries but struggles with finding data based on conceptual similarity. This is where vector databases come in. PgVector is a PostgreSQL extension that adds the ability to store and query high dimensional vectors, the numerical representations of text generated by embedding models. This transforms a familiar database into a powerful tool for semantic search, allowing us to find documents that are contextually relevant to a user’s query, not just those that share keywords. While simpler RAG implementations often start with an in-memory vector store for quick prototyping, this tutorial takes a more robust, production-oriented approach by using PgVector. The choice of a persistent, database-backed vector store is deliberate and crucial for several reasons. First, it ensures persistence and scalability; our knowledge base survives application restarts and can grow to handle millions of documents without being limited by available RAM. Second, and most importantly, it unlocks powerful metadata filtering. By storing our vectors in PostgreSQL, we can combine semantic similarity search with traditional SQL WHERE clauses. This allows for sophisticated queries that are impossible with basic in-memory stores, such as finding information semantically related to “housing quality” but filtered only for the city of “Rio de Janeiro”. This capability to blend vector search with structured data queries makes a dedicated vector store like PgVector the clear choice for building scalable, enterprise-grade AI applications.
  • Langchain4j (The Conductor): Langchain4j is the glue that connects our AI components. It provides a unified and intuitive Java API for interacting with a wide variety of LLMs and embedding stores, abstracting away their specific implementations. The Quarkus ecosystem provides dedicated Langchain4j extensions, such as quarkus-langchain4j-ollama and quarkus-langchain4j-pgvector, which make integration completely seamless and automatically leverage the power of Dev Services.

This combination of tools marks a significant shift in AI application development. A few years ago, an architecture like this would have required specialized MLOps expertise and significant investment in cloud infrastructure. Today, the combination of Quarkus Dev Services automating the infrastructure, Ollama making powerful models accessible, and Langchain4j simplifying the code allows a single Java developer to build, run, and manage the entire stack on their laptop. This maturation of the toolchain effectively lowers the barrier to entry for advanced AI development for millions of Java developers worldwide.

Project Setup and Dependencies

Let’s begin by bootstrapping our Quarkus project. We will use the Quarkus Maven plugin to generate a new application with all the necessary extensions. Open your terminal and run the following command:


mvn io.quarkus.platform:quarkus-maven-plugin:3.2.10.Final:create \
-DprojectGroupId=com.eldermoraes.ai \
-DprojectArtifactId=local-ai-knowledge-base \
-Dextensions="quarkus-rest,quarkus-langchain4j-ollama,quarkus-langchain4j-pgvector,quarkus-jdbc-postgresql"

This command creates a new Quarkus 3.2 project and includes four key extensions: quarkus-rest for our REST API, quarkus-langchain4j-ollama to connect to our local LLM, quarkus-langchain4j-pgvector for our vector store, and quarkus-jdbc-postgresql which is required by the PgVector extension. The generated pom.xml file will contain the dependencies that power our application.

The following table outlines the core dependencies and their roles.

Artifact ID Group ID Purpose
quarkus-rest io.quarkus Provides the Jakarta REST implementation for creating our REST API endpoint.
quarkus-langchain4j-ollama io.quarkiverse.langchain4j Integrates Langchain4j with Ollama. It includes the Dev Service that will automatically start an Ollama container if one is not already running.
quarkus-langchain4j-pgvector io.quarkiverse.langchain4j Provides the EmbeddingStore implementation for PgVector and includes the Dev Service that automatically starts a PostgreSQL container with the pgvector extension enabled.
quarkus-jdbc-postgresql io.quarkus The standard JDBC driver for PostgreSQL, required by the PgVector extension to connect to the database.

Configuring the Local AI Stack

For this tutorial, we will assume you already have Ollama installed and running with the gpt-oss:20b and nomic-embed-text models downloaded. By explicitly providing the base URL for your Ollama instance, we instruct Quarkus to connect to it directly, which automatically disables the Dev Service for Ollama. However, Quarkus Dev Services will still manage the PostgreSQL database for us, providing a perfect hybrid of control and convenience.

Create the src/main/resources/application.properties file and add the following configuration:


# Part 1: Configure the Ollama connection
# Point to your local Ollama instance. This disables the Ollama Dev Service.
quarkus.langchain4j.ollama.base-url=http://localhost:11434

# Specify the chat model for generating answers.
quarkus.langchain4j.ollama.chat-model.model-id=gpt-oss:20b

# Specify the model for creating embeddings.
quarkus.langchain4j.ollama.embedding-model.model-id=nomic-embed-text

# Local inference can be slower, so we increase the default timeout.
quarkus.langchain4j.ollama.timeout=60s

# Part 2: Configure the PgVector Store
# This MUST match the output dimension of the embedding model.
# The nomic-embed-text-v1.5 model supports variable dimensions up to 768.
quarkus.langchain4j.pgvector.dimension=768

# When running in dev or test mode, instruct the extension to automatically
# execute 'CREATE EXTENSION IF NOT EXISTS vector;' in the database.
quarkus.langchain4j.pgvector.register-vector-pg-extension=true

# For this tutorial, we want a clean slate on every application restart.
quarkus.langchain4j.pgvector.drop-table-first=true

# Part 3: Configure our data source location
# This is a custom property we'll use in our code to locate the knowledge base file.
rag.data.path=cpc_2023-custom.csv

This configuration file reveals a crucial, tightly coupled relationship that is fundamental to building reliable RAG systems. The quarkus.langchain4j.pgvector.dimension property is not an arbitrary number; it is dictated by the specific embedding model in use. The nomic-embed-text model can produce vectors of different sizes; we are using its 768-dimension variant. If a developer were to switch to a different model, this property would also need to be updated. Failing to do so would result in database errors.

Sourcing the Knowledge

Our Q&A system needs a knowledge base. For this tutorial, we will use public data from INEP (National Institute for Educational Studies and Research Anísio Teixeira) regarding the “Indicadores de Qualidade da Educação Superior” (Higher Education Quality Indicators) for the year 2023. This dataset is not that large, making it ideal for rapid local development and testing. You can download the required XLSX file from the official government portal, but I customized a version for this tutorial, which is available here. It was exported to CSV format, and named cpc_2023-custom.csv. Place this file in the src/main/resources directory of your project.

Creating the Ingestion Pipeline

To load our data, we need an EmbeddingStoreIngestor. Since we are not using the easy-rag extension, we must explicitly configure this component. A CDI producer is the perfect mechanism for this, as it allows us to build a customized ingestor and make it available for injection throughout our application. This approach also gives us the opportunity to define a DocumentSplitter, a best practice for RAG that ensures our data is broken into optimally sized chunks for the embedding model.

Create a new Java class IngestorProducer.java:


package com.eldermoraes.ai;

import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.inject.Produces;

@ApplicationScoped
public class IngestorProducer {

    @Produces
    public EmbeddingStoreIngestor embeddingStoreIngestor(
            EmbeddingStore<TextSegment> store,
            EmbeddingModel embeddingModel) {

        return EmbeddingStoreIngestor.builder()
              .documentSplitter(DocumentSplitters.recursive(800, 100))
              .embeddingModel(embeddingModel)
              .embeddingStore(store)
              .build();
    }
}

Now we can create the ingestion service itself. This service will load the CSV data on application startup, transform each row into a descriptive sentence, and use our custom EmbeddingStoreIngestor to process and store it in the PgVector database.

Create a new Java class DataIngestor.java:


package com.eldermoraes.ai;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkus.runtime.StartupEvent;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.enterprise.event.Observes;
import jakarta.inject.Inject;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import io.smallrye.common.annotation.RunOnVirtualThread;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
import java.util.stream.Collectors;

@ApplicationScoped
@RunOnVirtualThread
public class DataIngestor {

    @Inject
    EmbeddingStoreIngestor ingestor;

    @ConfigProperty(name = "rag.data.path")
    String dataPath;

    public void ingest(@Observes StartupEvent event) {
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(
                Objects.requireNonNull(Thread.currentThread().getContextClassLoader().getResourceAsStream(dataPath)),
                StandardCharsets.UTF_8))) {

            // Skip header line
            reader.readLine();

            List documents = reader.lines()
                    .map(line -> line.split(","))
                    .map(this::toDocument)
                    .filter(Objects::nonNull)
                    .collect(Collectors.toList());

            ingestor.ingest(documents);
        } catch (Exception e) {
            throw new RuntimeException("Failed to ingest data", e);
        }
    }

    private Document toDocument(String[] row) {
        try {
            // Column mapping based on the INEP CPC 2023 dataset structure.
            // Verify these indices if you use a different version of the file.

            String institutionName = row[0]; // Nome da IES
            String institutionCode = row[1]; // Sigla da IES
            String adminCategory = row[2];   // Categoria Administrativa
            String courseName = row[3];      // Área de Avaliação
            String cityName = row[4];        // Município do Curso
            String stateAbbr = row[5];       // Sigla da UF
            String cpcScore = row[6];        // CPC (Faixa)

            String documentText = String.format(
                    "In the 2023 evaluation, the course '%s' at the institution %s (%s), " +
                    "which is a '%s' located in the city of %s, %s, " +
                    "received a Preliminary Course Concept (CPC) score of %s.",
                    courseName, institutionName, institutionCode, adminCategory, cityName, stateAbbr, cpcScore
            );

            System.out.println("row = " + documentText);

            Document document = Document.from(documentText);
            document.metadata().put("city", cityName);
            document.metadata().put("state", stateAbbr);
            document.metadata().put("institution", institutionName);
            document.metadata().put("course", courseName);
            return document;

        } catch (ArrayIndexOutOfBoundsException e) {
            // Handle potential malformed rows properly
            return null;
        }
    }
}

The toDocument method performs two crucial steps: semantic enrichment and metadata attachment. It transforms each raw CSV row into a coherent sentence and attaches key attributes like city and state as metadata. This process is fundamental to the quality and capability of our RAG system, enabling not only better semantic search but also the potential for advanced, filtered queries.

Crafting the AI’s Core Logic

With our knowledge base populated, we can now build the core reasoning component of our application. Langchain4j and Quarkus provide a powerful, declarative model for this using the @RegisterAiService annotation. We simply define a Java interface that describes what we want the AI to do, and Quarkus generates the implementation for us at build time.

First, we need to define how our AI service will retrieve information. We will create a Retriever class that provides the necessary RetrievalAugmentor.

Create the file EducationDataRetriever.java:


package com.eldermoraes.ai;

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.store.embedding.EmbeddingStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

import java.util.function.Supplier;

@ApplicationScoped
public class EducationDataRetriever implements Supplier<RetrievalAugmentor> {

    private final RetrievalAugmentor augmentor;

    @Inject
    public EducationDataRetriever(EmbeddingStore<TextSegment> embeddingStore, EmbeddingModel embeddingModel) {
        EmbeddingStoreContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
       .embeddingStore(embeddingStore)
       .embeddingModel(embeddingModel)
       .build();

        this.augmentor = DefaultRetrievalAugmentor.builder()
       .contentRetriever(contentRetriever)
       .build();
    }

    @Override
    public RetrievalAugmentor get() {
        return augmentor;
    }
}

Next, we define our main AI service interface, EducationAssistant.java:


package com.eldermoraes.ai;

import dev.langchain4j.service.SystemMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(retrievalAugmentor = EducationDataRetriever.class)
public interface EducationAssistant {

    @SystemMessage("You are an assistant specializing in Brazilian higher education quality indicators from 2023. Answer questions based on the provided information.")
    String answer(String question);
}

This structure demonstrates a clear separation of concerns. The DataIngestor is responsible for knowledge acquisition, the EducationDataRetriever handles knowledge retrieval, and the EducationAssistant performs reasoning and generation. Each component is modular, allowing them to be developed and tested independently. For example, switching to a different vector store would only require changing the quarkus-langchain4j-pgvector dependency and modifying the EducationDataRetriever; the core AI service logic in EducationAssistant would remain unchanged.

Exposing the Service

The final step is to expose our AI service to the outside world through a REST API. We will create a simple JAX-RS resource for this purpose.

Create the file EducationResource.java:


package com.eldermoraes.ai;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.QueryParam;

@Path("/education")
public class EducationResource {

    @Inject
    EducationAssistant assistant;

    @GET
    @Path("/ask")
    public String ask(@QueryParam("q") String question) {
        return assistant.answer(question);
    }
}

This resource is straightforward: it defines a /education/ask endpoint that accepts a question via the q query parameter. It then injects our EducationAssistant and calls its answer method, returning the result directly.

Running and Testing the Application

With all the pieces in place, it is time to run our application. From your project’s root directory, execute:


./mvnw quarkus:dev

As the application starts, you will see Dev Services automatically provision the PostgreSQL database. Logs from Testcontainers will appear, indicating that it is starting a postgres container with the pgvector extension. The application will then connect to your running Ollama instance for the LLM and embedding models.

IMPORTANT: The first time you run it, it will take a while to populate the embeddings (in my environment, it takes around 3 minutes – it may vary for a number of reasons).

Once the application is running, you can interact with it using curl or any HTTP client. Open a new terminal and try asking some questions about the education data:


# Ask a broad question about a university
curl "http://localhost:8080/education/ask?q=Tell%20me%20about%20the%20courses%20at%20UFSCAR"

# Ask a more specific question about a course in a city
curl "http://localhost:8080/education/ask?q=What%20is%20the%20CPC%20score%20for%20the%20Law%20course%20in%20the%20city%20of%20S%C3%A3o%20Paulo%3F"

The LLM, augmented with the context retrieved from PgVector, should provide an accurate answer based on the data we ingested. For the second query, the system will find the relevant data chunks and synthesize a response. In this process, your query was embedded, used to find the most relevant document (the transformed CSV row) in PgVector, and that document was provided to the gpt-oss:20b model to synthesize a precise, natural-language answer.

Your Gateway to Practical AI with Java

In this tutorial, we have built a complete, private, and intelligent Q&A system from the ground up. We leveraged a modern, cloud native Java stack to create a sophisticated RAG application that runs entirely on a local machine, addressing key concerns of data privacy and operational cost.

This tutorial highlights the efficiency of a modern developer experience. The seamless integration between Quarkus, Langchain4j, and their respective Dev Services abstracts away the complexity of managing a local AI stack. A process that would have previously been a multi-day infrastructure setup task is now fully automated, allowing developers to focus on building application features.

This project is just the beginning. From here, you can explore numerous paths to extend and enhance your application:

  • Expand the Knowledge Base: Ingest your own company’s internal documents or different public datasets.
  • Experiment with Models: Try other models available through Ollama to see how they impact response quality and performance.
  • Build a User Interface: Use Quarkus’s web bundling capabilities to create a simple chat interface for a more interactive experience.
  • Explore Advanced AI Patterns: Dive deeper into Langchain4j to build more complex systems using agents and tools.

You now have a solid foundation and a powerful set of tools to continue your journey into the exciting world of practical, developer-friendly AI with Java.

 

 

If you run Java at scale, grab the free whitepaper “The Enterprise Guide to AI in Java (POC to Production)”. Download it here.

2 thoughts on “Building a Local-First AI Knowledge Base with Quarkus, Langchain4j, and Ollama”

  1. Hey Elder,

    This is an excellent article. Testcontainers are truly a breakthrough for testing hypotheses and building proofs of concept.

    Thank you for sharing and for answering my previous question.

    We’re planning a PoC for our helpdesk using our Confluence knowledge base. I’ll share the results once we have them.

    Reply

Leave a Comment