@aymericroucher - AI Dynamics

Debugging agents with tracing and LLM-judge systems

By

–

28 February 2025 17h22

How do I debug my agent? You can trace your agent run for later inspection, for instance using @ArizePhoenix
. @JohnGilhuly and team have just made a blog post explaining how to instrument a smolagent run, and how to setup LLM-judge systems. Should be mandatory reading for

→ View original post on X — @aymericroucher

28 February 2025

Hugging Face Agents Course Unit 2: Leverage SmolAgents for Party Organization!

By

@aymericroucher

–

25 February 2025 16h57

Today we launch unit 2 of the @huggingface agents course! Alfred needs your help! He’s organizing a party at the Wayne manor, and he’s completely overwhelmed by his tasks! You’ll learn how to leverage smolagents to organize the party:
⏵ Build code agents
⏵ Create tools
⏵

→ View original post on X — @aymericroucher

25 February 2025

Claude uses multi-step trajectories, looking forward to trying

By

@aymericroucher

–

24 February 2025 20h09

New Claude seems to make good use of multi-step trajectories, looking forward to trying it!

→ View original post on X — @aymericroucher

24 February 2025

Overview of future assistants and generated survey examples

By

@aymericroucher

–

24 February 2025 12h58

I advise you to read the paper, it's a great overview of the kind of assistants that we'll get in the short future!
https://
huggingface.co/papers/2502.14
776
…
Their website shows examples of generated surveys http://
surveyx.cn

→ View original post on X — @aymericroucher

24 February 2025

SurveyX automatically writes academic surveys indistinguishable from human-written

By

@aymericroucher

–

24 February 2025 12h56

Now there's a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones! Research surveys, go in two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing).

→ View original post on X — @aymericroucher

24 February 2025

32B model with 817 examples beats o1-preview on math reasoning

By

@aymericroucher

–

18 February 2025 13h08

Less is More for Reasoning (LIMO): a 32B model fine-tuned with 817 examples can beat o1-preview on math reasoning! Do we really need o1's huge RL procedure to see reasoning emerge? It seems not.
Researchers from Shanghai Jiaotong University just demonstrated that carefully

→ View original post on X — @aymericroucher

18 February 2025

smolagents memory documentation: replay, callbacks, step control

By

@aymericroucher

–

17 February 2025 17h34

Over time, we've added many memory-related mechanisms to smolagents.
Now I've just made a dedicated documentation page! This explains how to replay the agent's memory, use step_callbacks to dynamically change it, or run an agent step by step to have full control over memory.

→ View original post on X — @aymericroucher

17 February 2025

New smolagents Feature Enables Sharing AI Agents with Chat Interface

By

@aymericroucher

–

14 February 2025 17h02

We've just released the coolest feature ever in smolagents: you can now share agents to the Hub! And any agent pushed to Hub get a cool Space interface to directly chat with it. This was a real technical challenge: for instance, serializing tools to export them meant that

→ View original post on X — @aymericroucher

14 February 2025

Trick to Discuss GitHub Repos with Large Language Models

By

@aymericroucher

–

13 February 2025 17h19

For those who haven't come across it yet, here's a handy trick to discuss an entire GitHub repo with an LLM: => Just replace "github" with "gitingest" in the url, and you get the whole repo as a single string that you can then paste in your LLMs

→ View original post on X — @aymericroucher

13 February 2025

2025 Will Be the Year of AI Agents Progressing Rapidly

By

@aymericroucher

–

12 February 2025 18h04

"2025 will be the year of AI agents": here are numbers to support this wide-spread statement. I've plotted the progress of AI agents on GAIA test set, and it seems they're headed to catch up with the human baseline in early 2026. And that progress is still driven mostly by the

→ View original post on X — @aymericroucher

12 February 2025