AI Dynamics

Global AI News Aggregator

@hwchase17

Testing r1: First Experience with New AI Model

By

@hwchase17

–

28 January 2025 22h15

need to test r1! havent tried that

→ View original post on X — @hwchase17,

28 January 2025
COT Benchmarking Training in Modern AI Models

By

@hwchase17

–

28 January 2025 22h15

these were easy to benchmark. COT would be easy as well but is trained into most models these days so i dont think it would change too much

→ View original post on X — @hwchase17,

28 January 2025
Framework-Agnostic AI Optimizers vs DSPy Dependency

By

@hwchase17

–

28 January 2025 22h14

dspy more powerful, but requires using the dspy framework (in addition to their optimizers) i think framework-agnostic optimizers are more usable

→ View original post on X — @hwchase17,

28 January 2025
Production-ready prompt optimization methods rollout begins

By

@hwchase17

–

28 January 2025 22h09

We're going to start rolling out production ready versions of some of these prompt optimization methods – if interested in early access, fill out form here: https://
docs.google.com/forms/d/e/1FAI
pQLSdK1pZihqohabtGRQ99LZ2Rdwo5dnpnzJrBFRdojVSFq7k8eg/viewform?usp=dialog
…

→ View original post on X — @hwchase17,

28 January 2025
Claude-Sonnet Outperforms o1 for Meta-Prompting and Optimization

By

@hwchase17

–

28 January 2025 20h19

I think a lot of people are sleeping on using claude-sonnet to do meta-prompting/prompt optimization we found its better than o1 (and cheaper/faster) it still struggles for complex tasks (curious to see how o3 would do) but it works quite well for simpler one

→ View original post on X — @hwchase17,

28 January 2025
LangChain Community Building Through In-Person Events

By

@hwchase17

–

27 January 2025 3h30

Great to see the LangChain community organizing irl events LangChain was started by going to a bunch of meetups and hearing what people were building – lots of value in these types of events

→ View original post on X — @hwchase17,

27 January 2025
Model struggles with date handling and processing

By

@hwchase17

–

24 January 2025 17h43

Model just isn’t good at date stuff rn, don’t think it has to do with ui

→ View original post on X — @hwchase17,

24 January 2025
Open Source AI Outperforms Major Labs’ Latest Releases

By

@hwchase17

–

24 January 2025 5h26

Kinda crazy that open source that came out months ago is sota over what major labs are releasing today

→ View original post on X — @hwchase17,

24 January 2025
Browser Use: Open Source Alternative to OpenAI’s Operator

By

@hwchase17

–

23 January 2025 19h56

Want an open source version of OpenAI's Operator? There's a great open source project called Browser Use that does similar things (and more) while being open source Allows you to plug in any model you want Love to see open source leading the way https://
github.com/browser-use/br
owser-use
…

→ View original post on X — @hwchase17,

23 January 2025
Write LLM Evaluations Like Software Tests

By

@hwchase17

–

22 January 2025 20h02

write LLM evals like you write software tests (pytest/vitest/jest)

writing software tests is standard practice. writing evals for LLMs is equally important, but we don't see it as common place yet

we hope this helps bridge the gap! https://t.co/GZozKNbOw0
— Harrison Chase (@hwchase17) 22 janvier 2025

write LLM evals like you write software tests (pytest/vitest/jest) writing software tests is standard practice. writing evals for LLMs is equally important, but we don't see it as common place yet we hope this helps bridge the gap!

→ View original post on X — @hwchase17,

22 January 2025