-
Notifications
You must be signed in to change notification settings - Fork 12
6.1.4 PREDIBAG: Predicate‐Based Agent
Large language models (LLMs) are undeniably powerful, but they often demonstrate inconsistencies that pose a major challenge for traditional workflow applications. Most agent solutions rely on nested tests to handle these inconsistencies, which often leads to complex and fragile systems. With PREDIBAG (Predicate-Based Agent), we propose revisiting logic programming to tame LLMs, placing their calls under precise and manageable control.
The most well-known logic programming language is obviously Prolog, but it imposes a particularly steep learning curve that might deter even the most experienced programmer. We have developed our own version of this language in the past, which has given us a very rich experience, not only on the best way to create such an interpreter (see tamgu), but also on the optimal strategies to use it effectively.
With this experience, we decided to create a simplified form of logical rules to provide the minimal framework necessary for managing our agents.
These functions introduced by the keyword defpred
are an integral part of the LispE language: defpred thus combines pattern programming and backtracking to orchestrate interactions with LLMs as naturally as any other component, offering a robust alternative to traditional agent design. These predicate functions take up the underlying idea of Prolog, but restrict unification to their arguments only. Moreover, execution is deterministic: unlike Prolog's underlying mechanism where the complete graph defined by the rules is systematically explored, here we stop as soon as a rule has satisfied its constraints.
PREDIBAG is a general methodology for building agents that systematically explore multiple strategies. As an example, let's consider an agent tackling mathematical problems:
- The following functions are used to execute Python code:
(defun execute(code)
; Executes Python code in an isolated environment with a 2-second timeout
(python_reset py) ; Resets the Python interpreter to a clean state
(python_run py (+ pythonpaths code) "result" 2) ; Executes the code, storing the output in 'result'
)
And to extract Python code generated by the LLM
(defun extraction(s)
; Extracts Python code from a string, assumed to be between markers ```python and ```
(@@ s "```python" "```") ; Returns the substring between these delimiters
)
Here's how logical functions are written:
-
generate
: this function offers two different ways to generate and execute Python code- If the code doesn't execute correctly, we have a final fallback function.
; First attempt to generate Python code with a direct prompt
(defpred generate(chat hint pb)
; Displays a label to track this strategy (e.g., "Python 1: problem123")
(println "Python 1:" (@ pb "unique_id"))
; Asks the LLM to generate Python code, ensuring 'result' contains the answer
(setq chat
(tchat chat (+ "Generate the Python code for this problem. The result must be stored in the variable: 'result'." hint))
)
; Extracts and cleans the code from the LLM's response (content of the last message)
; If no Python code is produced, extraction fails and moves to the next "generate" function
(setqv code (extraction . clean . @ chat -1 "content"))
; Executes the code; if it fails (e.g., timeout), backtracking is triggered
(setqv r (execute code))
; Builds a dictionary of results with problem ID, chat history, response, and expected value
(dictionary "unique_id" (@ pb "unique_id") "chat" chat "answer" r "expected" (@ pb "answer"))
)
; Second attempt with a slightly different prompt for more variety
(defpred generate(chat hint pb)
; Labels this strategy (e.g., "Python 2: problem123")
(println "Python 2:" (@ pb "unique_id"))
; Requests Python code with a reformulated prompt, still targeting 'result'
(setq chat
(tchat chat (+ "Write the appropriate Python code to solve this problem. The result must be stored in the variable: 'result'." hint))
)
; Extracts and cleans the LLM's code response
; If no Python code is produced, extraction will fail
(setqv code (extraction . clean . @ chat -1 "content"))
; Executes the code; failure (e.g., syntax error) triggers backtracking
(setqv r (execute code))
; Returns a dictionary with results, same structure as above
(dictionary "unique_id" (@ pb "unique_id") "chat" chat "answer" r "expected" (@ pb "answer"))
)
; Fallback rule if previous attempts fail
; This implementation cannot fail, ensuring a result
(defpred generate(chat hint pb)
; Marks this as a last resort (e.g., "Python Failure: problem123")
(println "Python Failure:" (@ pb "unique_id"))
; Skips code generation/execution, recording "FAIL" as the answer
(dictionary "unique_id" (@ pb "unique_id") "chat" chat "answer" "FAIL" "expected" (@ pb "answer"))
)
-
agent
: this function executes an initial prompt and callsgenerate
- We could have additional definitions with other prompts as needed
; Helper predicate to try a prompt and generate code
(defpred try_prompt(chat system hint pb prompt)
; Sends the prompt to the LLM, updating the chat history
(setq chat (tchat chat prompt system))
; Calls 'generate' to process the LLM's response and store the result
(setqv res (generate chat hint pb))
; Adds the result to the global 'all' list for later saving
(push all res)
)
; Single agent rule to start the process with a descriptive prompt
(defpred agent(chat system hint pb)
; Indicates which problem is being handled (e.g., "First: problem123")
(println "First:" (@ pb "unique_id"))
; Uses 'try_prompt' with a detailed solution prompt, passing the problem context
(try_prompt chat system hint pb (+ "Describe in detail a solution to the following problem:" (@ pb "problem") "\n"))
)
; We could have other agents that would activate if this one fails...
Here, agent
explores strategies for solving math problems from MATH.json
, producing outputs (e.g., MATHS_result.json
) that include the answers but also the complete chat history. PREDIBAG is not limited to mathematics alone — it's a model applicable to any task requiring structured exploration, from text analysis to decision-making.
The strength of PREDIBAG lies in defpred
, LispE's predicate mechanism. It allows bringing together multiple functions under the same name. When a function is executed, each instruction is evaluated as a Boolean expression. If an instruction returns false
, the function fails and the engine then tries the next function. In this way, we avoid the inextricable nesting of if/else
so common in other approaches. The rules are clearer, more readable, and adding a new rule is done without modifying the internal logic of a node, unlike LangGraph for example. Moreover, variables in functions are unified at execution, which means that when trying the next rule, it will start with the same data as the previous rule. Thus, we can explore various paths in parallel.
One of the striking features of PREDIBAG is its ability to handle the inherent inconsistency of large language models (LLMs) by relegating them to a supporting role through the generation of Python code. Rather than relying on the LLM to directly provide answers, knowing their propensity to hallucinate, we prefer to ask it to generate Python code that we can validate during execution. In this way, the final judge is the Python interpreter itself. If the code presents any problem — failing syntax, execution taking too long (we can add a timeout to code execution) or result incompatible with the expected value — all you need to do is to move on to the next function to test with a new prompt and generate a new version of the code. This design choice brings the LLM back to a simple component role, just like the Python interpreter. In this way, we keep the LLM in a role that is certainly fundamental as a code generator, but under control, because only the final execution of code can guarantee an almost consistent operation in a workflow.
Let's compare PREDIBAG with LangGraph or Smolagents:
-
Readability:
-
PREDIBAG: Graph exploration through defpred functions clarifies intent, avoiding cluttered nested
if
s while abstracting from LLM complexity. -
LangGraph: Graph nodes and edges visualize workflows, but routing often involves nested
if/else
(e.g.,if state["mood"] == "positive"
), exposing more LLM logic. -
Smolagents: Flat execution (e.g.,
agent.run("solve x + 2")
) is simple, but LLM-generated code may include numerous tests to maintain execution logic.
-
PREDIBAG: Graph exploration through defpred functions clarifies intent, avoiding cluttered nested
-
Maintainability:
-
PREDIBAG: New
defpred
rules add nodes effortlessly, with shared logic (e.g.,try_prompt
) centralized. LLM adjustments are rule-based, not prompt-dependent. - LangGraph: Modular nodes are adjustable, but changes involve updates to edges or nested routing, with more explicit LLM handling.
-
Smolagents: Light adjustments are easy, but LLM behavior changes rely on prompt engineering, less structured than the
defpred
graph.
-
PREDIBAG: New
-
Data Generation (e.g., math example):
-
PREDIBAG: Backward chaining explores all nodes, producing diverse outputs (e.g.,
chat
,answer
), handling LLM inconsistency. - LangGraph: Structured workflows produce detailed data, but diversity requires explicit nodes, and LLM variability is less abstracted.
- Smolagents: Concise code outputs focus on solutions, but single runs limit variety, with more exposed LLM inconsistency.
-
PREDIBAG: Backward chaining explores all nodes, producing diverse outputs (e.g.,
-
Scalability:
- PREDIBAG: Sequential by default, but LispE is also multitasking, allowing parallel exploration as needed.
- LangGraph: Native graph parallelism excels for complex tasks.
- Smolagents: Lightweight execution scales well for single tasks.
PREDIBAG's defpred
-driven graph exploration and LLM abstraction offer superior readability and versatility, minimizing LLM inconsistency compared to LangGraph's explicit flows and Smolagents' LLM-centered simplicity.
LispE offers multithreading by default, allowing agents to be launched in parallel as needed (dethread
, wait
):
(dethread solve_task(task nb)
(setq hint (@ category_instructions (@ task "category")))
(agent {} `You have remarkable skills in this field.` hint task)
(threadstore "results" (json all)))
(loop task tasks (solve_task task (incr nb)))
(wait)
We can therefore easily parallelize the work of agents through particular threads, with a very simple mechanism to collect data across all threads in a protected way using threadstore
. Thus, we can easily implement agents largely equivalent in terms of power and expressiveness to LangGraph or Smolagents.
PREDIBAG reinvents agent design by merging predicate logic with graph exploration, using defpred
to tame LLM inconsistency and push it into the background. Its flat rules and backtracking offer a cleaner and more maintainable alternative to nested conditionals, while the flexibility of tchat
ensures tool independence. Threading adds scalability, and its ability to treat LLMs as just another component makes it adaptable to tasks like data generation or beyond. Compared to LangGraph's explicit graphs and Smolagents' lightweight simplicity, PREDIBAG offers a unique, robust, and elegant approach — a predicate-based paradigm to explore for AI practitioners and Lisp enthusiasts.