Over the last few days, whenever I have had some free time, I have been working on a new edition of an important demo we originally built about a year and a half ago.
The scenario is a Procurement solution designed to make the management of Express Orders faster and more efficient.
Imagine a company that manufactures electric vehicles. An internal user needs to urgently order some parts to avoid stopping the production line. These parts must be delivered by a specific date and to a specific plant.
The solution automates the process: it identifies potential suppliers, invites them to submit offers, selects the best one according to company policies, and finally registers the purchase order in the dedicated system.
Compared with the first version, this new edition introduces some important changes.
The first one is architectural. The solution is now designed as a network of independent agents that interact and communicate through the A2A protocol. In this implementation, I am using version 1 of the protocol.
This approach makes it easier to reuse existing agents, treating them as black boxes when appropriate. For example, the agent that registers the Purchase Order can be integrated without having to know or modify its internal implementation. At the same time, each agent can be developed, tested, and evolved independently.
The second change is about the way I developed the solution.
I followed a strictly spec-driven development approach. I used OpenAI Codex, but I did not start directly from the code. First, I wrote and refined the specifications, which are stored in the project’s GitHub repository. Only after consolidating the specifications did I ask Codex to generate the code.
The generated code was not accepted automatically. I reviewed it manually and asked for changes when needed. In this process, Codex acted as an accelerator and a collaborator, not as a replacement for design and review.
The third point, and perhaps the most important from a design perspective, is the role of AI in the solution.
In this demo, AI is used only where it adds value.
I am quite firm on this point. Even though I work with AI and I am passionate about it, I do not believe it should be used everywhere. Where a deterministic solution is simpler, more controllable, and better suited to the problem, that remains the preferred choice.
In this demo, AI is used in two main places.
The first one is the initial conversational layer, where the user describes the parts to be ordered and provides the required information. Here, a strong LLM can handle ambiguity, ask for clarifications, and transform a natural-language request into structured data. This is a use case where the conversational capabilities of the model really make a difference.
The second place is the agent that analyzes the offers and selects the best one.
This is a more delicate choice. The company policies are written in natural language, in a Markdown file. I did not want to introduce an intermediate step where a developer has to translate those policies into code. The goal is to keep the policies readable, easy to change, and close to the way they are actually defined.
Compared with the first version of the demo, this is an important change.
At that time, the solution generated code, and that code was then executed to determine the best offer. Today, thanks to the progress LLMs have made in handling structured reasoning and numerical data, I was able to adopt a more direct approach: the evaluation is performed by the model itself.
Of course, this requires checks. For this reason, I added an evaluation test suite to verify that the decisions are correct and consistent with the defined policies.
The project repository is available here: