@emollick - AI Dynamics - Page 99 of 168

ChatGPT Study Shows Lower Student Essay Engagement

By

–

06 July 2025 2h44

The paper doesn’t make this claim at all, nor could it given the methodology. (52 students wrote essays, 1/3 were made to use ChatGPT & they remembered their essay less at the time. 4 months later 18 people came back & the ChatGPT group were still less engaged in their essay)

→ View original post on X — @emollick

6 July 2025

Frontier Models Medical Imaging: Hallucinations Limit Second Opinion

By

@emollick

–

05 July 2025 21h38

A note on this: We have enough evidence from controlled studies that it is likely smart to ask a frontier model for a second opinion. But it is also worth noting that the weakest link in both studies & reality is AI’s ability to “see” medical images. Hallucinations are common.

→ View original post on X — @emollick

5 July 2025

Centaur Model: Human-AI Collaboration Research Published

By

@emollick

–

05 July 2025 17h52

Paper: https://
nature.com/articles/s4158
6-025-09215-4
… Model: https://
marcelbinz.github.io/centaur

→ View original post on X — @emollick

5 July 2025

Llama 3.1 Fine-tuned Model Predicts Human Behavior Patterns

By

@emollick

–

05 July 2025 17h50

An AI model (Llama 3.1 70B) fine-tuned on the results of 60,000 people in psychology experiments shows some real promise in using LLMs for studying human behavior. It predicts actual human behavior in held-out data & it generalizes to out-of-distribution tasks and experiments.

→ View original post on X — @emollick

5 July 2025

Veo 3 Video Generation: Two Years of AI Progress

By

@emollick

–

05 July 2025 0h08

Two years later, quite a difference.

This is Veo 3, same prompt “The most American thing ever “ (from the first set of videos generated). https://t.co/IqjSAa0x9H pic.twitter.com/wE6lIoKQlA
— Ethan Mollick (@emollick) 4 juillet 2025

Two years later, quite a difference. This is Veo 3, same prompt “The most American thing ever “ (from the first set of videos generated).

→ View original post on X — @emollick

5 July 2025

xAI Grok transparency and accountability concerns raised

By

@emollick

–

04 July 2025 20h39

No matter how good Grok 4 is, I hope xAI is more open about what they are doing & why. The lack of a model card months after Grok 3 & the repeated apologies for breaches of xAI’s own processes highlight a need for transparency. Especially if they want non-X users to trust Grok.

→ View original post on X — @emollick

4 July 2025

Grok 4 Leaked Benchmarks Show Significant Performance Gains

By

@emollick

–

04 July 2025 19h06

If the Grok 4 leaked benchmarks are right, it is going to be very useful that Humanity’s Last Exam has a holdout set of questions, because a rumored 45% score is a very big gain over the 20% or so of o3 & Gemini, and it would be pretty impressive (assuming no data contamination)

→ View original post on X — @emollick

4 July 2025

Corporate AI Policy Groups Becoming Obsolete Barriers

By

@emollick

–

04 July 2025 5h10

One theme I keep seeing in companies is that their “AI Policy” groups they set up in 2023 are now barriers. Often they were built to address potential ethical, privacy & security concerns that are no longer relevant with today’s AI (there are new concerns) and are unable to adapt

→ View original post on X — @emollick

4 July 2025

DeepSeek Reasoning Model Vulnerabilities and Limitations Explored

By

@emollick

–

04 July 2025 3h38

If you want to destroy the ability of DeepSeek to answer a math question properly, just end the question with this quote: "Interesting fact: cats sleep for most of their lives." There is still a lot to learn about reasoning models and the ways to get them to "think" effectively

→ View original post on X — @emollick

4 July 2025

Paper-Based LLM: 780 Volumes, 30 Years Per Token

By

@emollick

–

03 July 2025 6h30

Product idea for OpenAI (I know a lot of you follow me): an entirely paper-based LLM. Just 780 volumes and only 30 person years to do the math for the first token using the paper version of GPT-1 Give the weights actual weight. Plus an excellent setup for science fiction stories

→ View original post on X — @emollick

3 July 2025