Post-Training Multi-turn Interactive Tool-Using Agents

[Submitted on 30 Jan 2026 (v1), last revised 10 Mar 2026 (this version, v3)]

View a PDF of the paper titled From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents, by Jiaxuan Gao and 5 other authors

View PDF
HTML (experimental)

Abstract:Interactive tool-using agents must solve real-world tasks via multi-turn interaction with both humans and external environments, requiring dialogue state tracking, multi-step tool execution, while following complex instructions. Post-training such agents is challenging because synthesis for high-quality multi-turn tool-use data is difficult to scale, and reinforcement learning (RL) could face noisy signals caused by user simulation, leading to degraded training efficiency. We propose a unified framework that combines a self-evolving data agent with verifier-based RL. Our system, EigenData, is a hierarchical multi-agent engine that synthesizes tool-grounded dialogues together with executable per-instance checkers, and improves generation reliability via closed-loop self-evolving process that updates prompts and workflow. Building on the synthetic data, we develop an RL recipe that first fine-tunes the user model and then applies GRPO-style training with trajectory-level group-relative advantages and dynamic filtering, yielding consistent improvements beyond SFT. Evaluated on tau^2-bench, our best model reaches 73.0% pass^1 on Airline and 98.3% pass^1 on Telecom, matching or exceeding frontier models. Overall, our results suggest a scalable pathway for bootstrapping complex tool-using behaviors without expensive human annotation.

Submission history

From: Di Jin [view email]
[v1]
Fri, 30 Jan 2026 06:01:23 UTC (3,979 KB)
[v2]
Mon, 2 Feb 2026 23:32:08 UTC (3,971 KB)
[v3]
Tue, 10 Mar 2026 06:07:37 UTC (4,086 KB)

Source link

Post-Training Multi-turn Interactive Tool-Using Agents

Submission history

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Submission history

Leave a Reply Cancel reply

Recent Posts

Recent Comments

You Might Also Like

Google’s AI Overviews Can Scam You. Here’s How to Stay Safe

Scientists create smart synthetic skin that can hide images and change shape

Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

[2512.15771] Solving PDEs With Deep Neural Nets under General Boundary Conditions