Agentic products at scale: what the demo hides

Recently, I sat in on a talk at StartSe AI Festival 2026 that stayed with me for days. On stage was Isabella Piratininga, iFood’s Director of Technology and Innovation, and the subject was building agentic products at scale, told through the case of Ailo, iFood’s conversational assistant.

Let me be upfront from the start: this is my reading of what I heard, not coverage and not a transcript. It’s what I, as a developer, wrote down and decided to dig into for other developers. I’m writing about it for one simple reason. iFood probably runs one of the largest agent operations in production in Brazil today, and hearing from someone who keeps that running every day, with real users and SLAs to meet, is worth more than most of what gets said about agents out there.

To set the scale: iFood already runs more than 10,000 active agents internally, has more than 200,000 agents in customer support, processes more than 180 million orders a month, serves more than 65 million active users, and connects more than 500,000 partner businesses. At that scale, agents stop being an experiment and become production infrastructure. That’s why the principles below deserve a developer’s attention, whatever your stack.

It starts with one question

It started with a provocation that, for me, was the best part: how much of your product, today, was actually designed to change user behavior?

Notice that the provocation is about behavior, not about the interface and not about the model. And behavior takes time to change. Almost everyone is building agents, but almost no one is building the sense of what this kind of product is. Technology stopped being the bottleneck, because the models already deliver plenty. The hard problem moved. It’s now about changing the behavior of the people who use the product, and of the teams that build it.

The practical consequence for developers is immediate: stop measuring progress by how many AI features you ship, and start measuring it by the behavior change those features actually produce.

Generative vs. agentic products

Before the principles, one distinction is worth making. I consider it the starting point for any architecture decision in this space.

A generative product is one where you give an input, ask for something, and wait for a response. The loop is short, the interaction is direct, and almost everything happens in real time. If the response isn’t good, you ask again. A simple chatbot lives in this world.

An agentic product works by delegation. You don’t ask and wait. You hand over a task and trust that it will be carried out. This is where things change, because the whole interaction behind the scenes, the number of loops, the intermediate decisions, none of it is transparent to the user. The failure surface of an agentic product tends to be much larger than that of a chatbot. When a chatbot gets something wrong, you reword the message and try again. When an agent gets something wrong in the middle of a multi-step flow, the damage is proportionally larger. A concrete example: a personal agent called Tim sent the wrong email to the wrong person and broke a communication flow. Multiply that kind of error by iFood’s scale, and it’s clear why this distinction has to be settled from the first line of code.

One more point deserves honesty: the fully autonomous agent, the one that runs everything end to end without intervention, isn’t real yet. It doesn’t matter which model you pick. Without handling rules, without giving context, and without teaching the agent when to stop, it won’t get there. Building a complex product while expecting full autonomy is a good recipe for frustration, including your own.

Five principles that guide the build

There are five principles, and each one has a direct translation into development work.

1. Trust

The principle comes down to one direct line: trust is built in small moments. Before you try something ambitious, you need to stack up small wins, because the odds of getting the ambitious thing wrong are high. And when you lose a user’s trust, you usually lose that user for good. They rarely come back.

In code, trust stops being a product concept and becomes an engineering property. It shows up as a narrow, reliable scope before a broad, unstable one, as graceful degradation when the agent isn’t sure, and as predictable behavior across runs. An agent that gets every task right in a small set builds more trust than an agent that gets 70% of everything right.

2. Cost

AI is still an expensive technology, and there’s no point pretending otherwise. iFood monitors token consumption and call cost, and that stays on the observability dashboard. But the point that matters is a shift in lens. From a product view, what counts is the real cost per completed task. It’s the well-done task that solves the user’s problem. Counting saved tokens, on its own, doesn’t solve the problem and doesn’t move your agent forward.

This changes instrumentation in concrete terms. Measuring cost per model call isn’t enough. You have to attribute cost to the whole task, from start to completion, including every intermediate step and retry. And you have to cross that cost with the outcome, because a cheap response that didn’t solve the user’s problem isn’t cheap. It comes back as rework, as support tickets, or as one fewer user. This is FinOps applied to agents, with the right unit of measure.

3. Determinism

This is the principle that speaks most directly to developers. Early on, the iFood team looked at agent development as if everything had to go through an LLM. That’s not how it works. If someone comes to iFood and says “I want to reorder my last order,” you don’t need sophisticated reasoning to handle that. It’s a deterministic operation.

The lesson is that basic engineering still holds. Identifying which steps need the model’s reasoning and which only need predictable code makes your product better. Doing this doesn’t downgrade the agent, it just puts the technology in the right place. In practice, this draws an orchestration and routing layer. The LLM becomes one tool among several, called when the problem genuinely requires interpretation, while the deterministic paths stay deterministic. The gain is threefold, because you cut cost, cut latency, and cut the failure surface in one move.

4. Evals

Agent evaluation is a familiar topic, but there’s one angle worth repeating. In a typical chatbot setup, you look at the input and the output. That’s still necessary. But the real value, when you build an agentic product, is in looking at the middle.

The middle is where you learn. How many interactions the agent needed, how many steps it took to reach a conclusion, what that path cost, and, above all, whether it recognized the moment when it could no longer act alone and had to ask the user. In code terms: instrument the whole trajectory, not just the ends. Log every step, every tool call, every decision point, and evaluate that path. Without visibility into the middle, there’s no information to improve the product, and the team ends up fixing things in the dark.

5. Onboarding

The fifth principle is onboarding, with an important caveat. This isn’t traditional onboarding, the kind with an animation showing the product or a tooltip pointing out that there’s an AI agent in there. Onboarding for an agentic product is the work of securing small wins, consistently and often, and of communicating value until the user lives a moment where the experience clearly worked and solved their problem.

The technical consequence of this principle is what matters most here. Treating AI as just another feature inside just another product doesn’t build that recognition. The agent has to behave consistently across runs, has to know when to ask instead of act, and has to communicate what it did. Behavioral consistency, in this context, is an engineering property: it depends on how you design control over the agent.

The agent as the new product interface

If I had to pick the most important idea in all of this, it would be this one: building an agentic product is not a feature roadmap. The iFood team itself built the first version of Ailo as a feature, including its architecture and infrastructure, and it’s rewriting that, and will rewrite it again if it finds that something isn’t working.

The framing here is a re-platforming. The provocation is to assume this technology had always existed and then ask: what does your product become from there? Agents will become the new interaction interface, and that’s an architecture change, not a backlog item.

Ailo is the concrete example of this process. It’s already in production for a share of the user base, growing organically, available on WhatsApp among other channels. Close to 2 million users have had access, and it has around 200,000 monthly active users. In some flows, placing an order through Ailo is up to 50% faster than through the traditional app. And one detail says a lot about the maturity of the operation: Ailo was put to the test running live, in production, with no special preparation from the team, rather than as a rehearsed demo.

It’s also worth noting that none of this was an easy story. iFood has invested in data intelligence since 2018, and by 2024 more than 90% of the company was already using AI assistants, supported by an internal platform called Toqan. The early feedback on Ailo was harsh, with users frustrated by responses that tried to be magical and over-personalized with too little information. The lesson was exactly the first principle: start small.

Opportunities for developers

The optimistic reading of all this is that the differentiator moved, and it became more accessible. If technology is no longer the bottleneck, what separates a good agentic product from a bad one is engineering and product judgment. It’s knowing where to apply the model and where to apply deterministic code, instrumenting the trajectory, designing for trust. That’s exactly the kind of thing an experienced developer knows how to do, and it isn’t solved by switching models.

There’s a clear opportunity in the platform layer. iFood uses Toqan as an internal layer for the whole company to build agents, and the need to standardize orchestration, observability, and evaluation will show up in any organization that takes agents seriously. Whoever can build that layer will be in high demand. And there’s the opportunity of deterministic design as a practice. Teams that treat the LLM as one tool among several, rather than the center of everything, ship cheaper, faster, more stable products.

The risks that come with it

The other side is an honest inventory of risks, and it’s worth listing them with the same clarity.

Larger failure surface. In a multi-step flow, an error in the middle propagates, and the final impact is large.
Cost out of control. This happens when you measure tokens instead of cost per completed task, and find out too late that each resolved task came out expensive.
Fragile trust. A single serious error can cost you the user for good, which raises the price of every failure in production.
Over-delegation. Building a complex product while expecting full autonomy from an agent that doesn’t deliver it yet.
Architecture debt. Treating the agent as a feature charges you back later in the form of a rewrite, exactly what iFood is doing with Ailo.
Evaluation blindness. Looking only at input and output and missing where the agent actually breaks.
Magic too soon. Trying to deliver a hyper-personalized experience before stacking up the small wins that support that promise.

What I’m taking away

I came away with one conviction reinforced: building with agents, done in the right way, has less to do with picking the model of the moment and more to do with engineering discipline. The five principles, trust, cost, determinism, evals, and onboarding, are all, at their core, decisions about architecture and instrumentation. None of them depends on a specific model.

And I’ll go back to the provocation from the start, because it sums up what’s at stake. Putting an agent in your product is the easy part of the decision. The hard part, and the one that actually matters, is the design: was your product built to change the behavior of the people who use it, and was the engineering that supports that change with trust built alongside it? iFood is answering that question in production, at its scale, getting things wrong and rewriting along the way. For developers, that’s the most useful lesson of all: nobody has the map ready, and the advantage goes to whoever has the discipline to build, measure, and iterate.

Agentic products at scale: what the demo hides

It starts with one question

Generative vs. agentic products

Five principles that guide the build

1. Trust

2. Cost

3. Determinism

4. Evals

5. Onboarding

The agent as the new product interface

Opportunities for developers

The risks that come with it

What I’m taking away

Like this:

Related

Leave a Comment Cancel reply

It starts with one question

Generative vs. agentic products

Five principles that guide the build

1. Trust

2. Cost

3. Determinism

4. Evals

5. Onboarding

The agent as the new product interface

Opportunities for developers

The risks that come with it

What I’m taking away

Share this:

Like this:

Related

Leave a Comment Cancel reply