AI Dynamics

Global AI News Aggregator

@cerebras

  • SWE-1.6 Windsurf: 950 tokens/s code model

    Some of us are still writing code. We just want to find our functions faster. SWE-1.6 on @windsurf
    , runs at 950 tokens/s, powered by Cerebras. For the real ones.

    → View original post on X — @cerebras,

  • Cafe Compute Expands Globally with AI Developer Events

    Cafe Compute went global this week Two cities. Two continents. One big chip. This week Cafe Compute hit San Francisco for @HumanXCo and London at @OpenAI
    's office for @aiDotEngineer Europe — fueling developers with coffee and the fastest AI on the planet. Next stop: Miami

    → View original post on X — @cerebras,

  • AI Agent Audits Developer Documentation Sites Automatically

    audit and fix your entire developer docs site in under a minute We built an AI agent with @browserbase and @cerebras that audits your entire docs site > point it at any documentation site > agent goes down the link tree > agent crawls and verifies every page, checks every link, reads every code snippet > agent also compares content to your GitHub repo > returns a full report with suggested fixes and source references The full code and tutorial for how to build this yourself is live now.

    → View original post on X — @cerebras, 2026-04-02 16:37 UTC

  • Semiconductor Yield Problem Solved by Cerebras Wafer Scale Processors
    Semiconductor Yield Problem Solved by Cerebras Wafer Scale Processors

    What is semiconductor yield? How does it work? Why did it define the semiconductor industry for 70 years? How did this problem get solved? And how does this impact developers? What Is Semiconductor Yield? When you manufacture chips, not every one comes out working. Some have defects. “Yield” is the percentage of chips from a manufacturing run that actually work. If you make 100 chips and 90 work, your yield is 90%. How Does Yield Work? Chips are made from silicon wafers – thin, circular discs about 12 inches in diameter. In a perfect world, every square millimeter of a wafer would be flawless. But that never happens. Every wafer has tiny random defects scattered across it. Chips are cut from these wafers. And any chip that lands on a defect is thrown away. The process of chip manufacturing looks a lot like your mother making cookies. Imagine your mom rolled out a circle of cookie dough 12 inches in diameter. Then when she wasn't looking, your brother threw a handful of peanut M&Ms into the air and they landed at random on the dough. Those M&Ms are flaws. Nobody can eat a cookie with a peanut M&M in it. So she has to throw away every cookie that has one. Now she gets out a small cookie cutter and stamps out cookies. Because the cookie cutter is small, the probability of hitting an M&M is low. And when a cookie does have one, there isn't much good dough surrounding it. Not much good dough is thrown away. The result: a lot of good cookies. They are small but there are a lot of them. On the other hand, if she uses a big cookie cutter, the probability of hitting an M&M is much larger. And when she throws that cookie away, she throws away a lot of good dough with it. The result: only a few cookies. They are big, but the 12 inch diameter circle of dough yielded only a few. This is exactly how chip manufacturing works. The cookie dough is a silicon wafer. The cookies are chips. Peanut M&Ms are flaws (because they are gross) Bigger chips hit more flaws. More good silicon gets thrown away. Smaller chips, like smaller cookies, are less likely to hit flaws. And when they do, less silicon is discarded. This is why big chips are disproportionately more expensive. This is also why people assumed that because there was no way to make a wafer without flaws, there was no way to make a chip the size of a wafer. Why Did This Define The Industry For 70 Years? In an ideal world, you'd build really big chips for many data center applications. Data moves incredibly fast on-chip. So if you keep the data and compute on-chip, your work takes less time, and uses less power. In AI, that manifests as super fast inference. But the moment data has to leave one chip and travel to another – through cables, switches, connectors, circuit boards – it slows down and uses more power. Lots of off-chip communication slows work, and, in AI, produces slow inference. Though everyone agreed they were faster, nobody could yield big chips. So the industry settled on a workaround: don't build one big chip. Build thousands of small ones and wire them together. Most AI data centers are built this way today. Thousands of little GPUs connected by cables, switches, and networks. It works. But you pay a price. Every connection adds latency. Every cable adds overhead. Every hop between chips slows things down. For 70 years, everyone accepted this as the only way. How Did Cerebras Solve the Yield Problem? In 2019, we solved the yield problem at @cerebras and brought the first wafer sized processor, wafer scale processor, to market. How did we do that? The answer came from studying a different kind of chip entirely. Memory. Memory is built with a different process. Memory chips are made up of millions of identical tiles, with redundant tiles woven throughout. In a memory chip, if a tile has a flaw in it, the chip doesn't get thrown away. The bad tile is shut down and one of the redundant ones is called into action. Memory chips weren't designed to avoid flaws, but rather to withstand them. They use redundancy to withstand flaws. And their yield is extraordinary. Our founders realized that if we could develop a compute architecture that looked like memory, that was built of hundreds of thousands of identical tiles, we too could use redundancy to withstand flaws. We could fail in place, and route around the failed tile, just as they do in memory (and interestingly as they do in data centers where they fail in place, route around, and keep going). This would enable us to yield a wafer scale processor. And today we are happy to compare our yields to GPUs, that are 1/58th our size. How Does This Impact Developers? The impact is simple and easy to see. Cerebras wafer scale processors are up to 15 times faster than @nvidia GPUs. And when your AI is fast, people use it more often, stay longer, and use it to solve more interesting problems.

    → View original post on X — @cerebras, 2026-04-02 16:07 UTC

  • Cerebras Event Draws Massive Crowd Seeking Speed

    Packed room. Line around the block. Why? Because everyone wants to go faster. Appreciate everyone who showed up.

    → View original post on X — @cerebras, 2026-03-20 16:55 UTC

  • Cerebras Wafer Scale Advantage Over NVIDIA Groq Inference Chips
    Cerebras Wafer Scale Advantage Over NVIDIA Groq Inference Chips

    Problem solved. âś… Andrew Feldman (@andrewdfeldman) NVIDIA's biggest GTC announcement was a $20 billion bet on the same problem we solved 6 years ago. Their next-gen inference chip – not available yet – has 140x less memory bandwidth than @cerebras. To run a single 2 trillion parameter model, you need 2,000+ Groq chips. On Cerebras, that's just over 20 wafers. Even paired with GPUs, Groq maxes out at ~1,000 tokens per second. We run at thousands of tokens per second today. And every day. In production now. Why? When you connect 2,000 chips together, every interconnect has latency. Every cable has overhead. It doesn't matter what your memory bandwidth is on paper if you're bottlenecked by the wiring between thousands of tiny chips. We solved this with wafer scale. One integrated system. Little interconnect tax. Jensen told the world that fast inference is where the value is. He’s right – it’s why the world’s leading AI companies and hyperscalers are choosing Cerebras. — https://nitter.net/andrewdfeldman/status/2034015373595672594#m

    → View original post on X — @cerebras, 2026-03-19 21:21 UTC

  • GPT-5.3-Codex-Spark: Three Real Workflows for Building

    what can you build with gpt-5.3-codex-spark? @jxnlco from @OpenAI demos 3 real workflows — ones you can set up yourself inside the Codex app to help you spend less time on overhead and more time building. 00:09 – what is gpt-5.3-codex-spark? 00:25 – workflow 1: multi-agent daily briefing from slack, drive & meets 01:06 – workflow 2: automated PR review 01:31 – workflow 3: real-time interactive coding 02:56 – what speed changes, and what's coming next

    → View original post on X — @cerebras, 2026-03-18 22:45 UTC

  • Link to article X from March 17 2026 by Cerebras

    x.com/i/article/203369858369… [Translated from EN to English]

    → View original post on X — @cerebras, 2026-03-17 18:17 UTC