it is wild that Cursor trained a model closer to the frontier than Google with 100x fewer people and (guessing) ~100x less compute i am surprised this was even possible. also praying for the Gemini comeback ofc
@jxmnop
-
AI crisis due to distance from datacenters and GPU whir
By
–
The AI world is in crisis precisely because man has strayed too far from his datacenters. one shouldn’t be able to start a 1000-gpu training run without hearing the GPUs start to whir. heretics
-
Every ML conference since 2019 features at least three subquadratic attention papers
By
–
every ML conference I've been to since 2019 has had no fewer than three papers proposing new techniques for subquadratic attention
-

New sub-quadratic attention technique makes long-context LLMs 10x cheaper
By
–

"Introducing a breakthrough new technique for sub-quadratic attention, making long-context LLMs 10x cheaper without sacrificing performance" Me:
-

Subquadratic Attention and Data Quality: Challenges for Large Context AI Models
By
–
people on here are dumb. the latest subquadratic attention trick might produce a model that *processes* 1M tokens (or 12M..) without going insane, but that doesn't make it good the real problem isn't the architecture, it's the data. humans haven't generated many contiguous
-
Codex Boosts Experiments but Only 15% of Results Are Trustworthy
By
–
with Codex, i can run 10x the experiments out of these experiments, i can trust about 15% of the results conclusion: i am 50% more productive with codex
-
Why 1M-Context Models Still Don’t Work Beyond 200K Tokens
By
–
it is endlessly fascinating to me that we still don't have a true 1M-context model it's an unusual case where the infra is far ahead of the science. Claude discontinued 1M+ context bc it didn't really work past ~200k we don't have the right data? training techniques? not sure
-
Claude Code Unusable After Excessive Vibecoding Dogfooding
By
–
very interesting that Claude Code is the ultimate product for vibecoding, and Claude Code's engineers vibecoded Claude Code so hard it became unusable an entire company overdosing on Dogfood
-
Distillation Challenges Widen as AI Agents Complexity Increases
By
–
I think the world of agents makes distillation much harder, so the gap is starting to widen
