AI Dynamics

Global AI News Aggregator

About

@id_aa_carmack

  • Deep Learning Ethics: Data Relevance vs. Societal Legitimacy
    Deep Learning Ethics: Data Relevance vs. Societal Legitimacy

    I still give the book Understanding Deep Learning by Simon J.D. Prince a good recommendation, but chapter 21: Deep learning and Ethics was sloppy. It could have been a chapter to really dig in on case studies, but it was just the basic public news story level coverage of bias and such, like: “In AI, it can be pernicious when this deviation depends on illegitimate factors that impact an output. For example, gender is irrelevant to job performance, so it is illegitimate to use gender as a basis for hiring a candidate. Similarly, race is irrelevant to criminality, so it is illegitimate to use race as a feature for recidivism prediction.” If they had stuck with “illegitimate”, then it would have been a question of societal choices, but “irrelevant” is a question about data, and your priors shouldn’t be so strong that data can’t move them. I would like to see a book or course walk through a machine learning problem with the input features being presented as something like car choices: color, style, doors, horsepower, etc. Do lots of analysis over representation, training, and generalization, then swap the feature labels to socially charged ones. What makes generalization credible in one situation but not the other?

    → View original post on X — @id_aa_carmack, 2026-03-09 23:31 UTC

  • Visual Cortex Information Processing: Correcting Order of Magnitude Estimates

    You are directionally correct about the amount of information going into the visual cortex being far less than a (stereo!) video stream, but off by at LEAST an order of magnitude — if it was less than one bit per video frame, you wouldn’t even be able to recognize a full frame

    → View original post on X — @id_aa_carmack,

  • Code generation efforts alongside parser and semantics development

    Parser and semantics yes, but there were some serious code generation efforts.

    → View original post on X — @id_aa_carmack,

  • Frame Rate vs Bit Rate: Motion Compression Trade-offs in Video

    Increasing the frame rate doesn’t increase the bit rate at the same rate, because smaller motion between frames compresses better. You could give them exactly the same bit rate and have almost no difference. It isn’t zero, so a talking head video might trade off ok, but anything

    → View original post on X — @id_aa_carmack,

  • Understanding SiLU/GELU Activation Performance Loss in RL Networks
    Understanding SiLU/GELU Activation Performance Loss in RL Networks

    I always lost performance when I tried to use silu/gelu activations in my RL value networks, and I finally understand why. If the pre-activation values are small, the smooth curve through zero is basically a linear activation, destroying the representation power of the network. You need a batch/layer/rms norm on the preactivations to put them in the range the smooth activations are designed for. Internal norms generally hurt performance on our RL tasks, but combining them with a smooth activation at least works basically as well as a raw relu (but slower). So, not actually a win, but the lightbulb of understanding was good!

    → View original post on X — @id_aa_carmack, 2026-02-23 16:54 UTC

  • Being a Wizard: Reframing Programming Identity Beyond Manual Coding

    Is it weird that AI coding assistance is not giving me identity fracture? A lot of software developers are feeling disoriented and threatened these days. Programming by hand is clearly going the way of the buggy whip and the hand-cranked auger. Which is how we're finding out that a lot of people have their identities bound up in being good at hand-coding and how it feels to do that. That's not me. It's not me at all. Rather to my surprise, I don't miss coding by hand, not any more than I missed writing assembler when compilers ate the world and made that unnecessary. (That was in a couple years back around 1983, for you youngsters.) Maybe the fact that I'm not feeling any of this disorientation disqualifies me from having anything to say to people who are. On the other hand…if you can learn to emulate my mental stance and be completely unbothered, maybe that would be a good thing? So. If you're a programmer, and you're feeling disoriented, try this on for size: I like being a wizard. I like being able to speak spells, to weave complex patterns of logic that make things happen in the world. Writing code is a way to manifest my will. Yes, I've piled up a lot of arcane knowledge over the 50 years I've been doing this. But languages of invocation, they come and they go. Been a long time since I've had any use for being able to program in 8086 assembler, and that's okay. I have better spells now, and these days some rather powerful familiars. What I'm inviting you to do is think of yourself as a wizard. Not as a person who writes code, but as a person who is good at assuming the kind of mental states required to bend reality with the application of spells. And if that's who you are, does it matter if the spells are painstakingly scribed in runes of power, versus being spoken to an obedient machine spirit? It's all one; it's all the manifestation of will. Arcane languages come and go, machine spirits appear and then diminish to be replaced by more powerful ones, but you? You are the magic-wielder. Without you, none of it happens. Same as it ever was. Same is it ever was. And so mote it be.

    → View original post on X — @id_aa_carmack, 2026-02-18 04:30 UTC

  • GPU Task Preemption and Scheduling for Research Clusters

    The glory work of GPU scheduling is in the frontier data centers with hundreds of thousands of GPUs, but a lot of research work is done with single GPU jobs on modest clusters, and the scheduling leaves much to be desired. I wish there were a clean way to preempt GPU tasks, so long running tasks could be transparently paused to allow higher priority tasks to get the minimum time-to-results. Manual checkpointing and cooperative multitasking is an option, but it complicates codebases and is fertile ground for bugs. It feels like most of the pieces are present: Everything goes through page tables on the GPUs already, Nvidia UVM (Unified Virtual Memory) allows demand paging to host memory, and MPS (Multi-Process Service) could act as a CUDA shim to force everything to use a different memory allocator. Memory page thrashing would be catastrophic for GPU tasks, but the idea would be to pause the host task of the low priority process, then let the high priority process force only the necessary pages out (or maybe none at all, if the memory pressure wasn’t high enough) while it is running, then resume the low priority task on completion, allowing it to page everything back in. Task switching at the level of tens of seconds, not milliseconds. Even if it didn’t handle absolutely all memory (kernel allocations and such) and had some overhead, that would be quite useful. Of course, Nvidia would prefer you to Just Buy More GPUs!

    → View original post on X — @id_aa_carmack, 2026-02-17 17:03 UTC

  • Python’s Inefficiency Makes 1000x Speedups Possible

    Normally, claims of 1000x speedups are bullshit. But starting from python makes it possible

    → View original post on X — @id_aa_carmack,

  • Agency Surpasses Intelligence in the Age of AI

    The modern age has richly rewarded people with a combination of high intelligence and high agency. Now that many aspects of intelligence are successfully being automated, it seems likely that people with relatively lower intelligence but exceptional agency will come into their own if they are willing to egolessly accept AI advice. Imagine a ruthless criminal that completely trusts everything their always-on AI glasses are telling them, knowing that it is carefully looking out for their best interests and isn't scheming to betray them. [Translated from EN to English]

    → View original post on X — @id_aa_carmack, 2026-02-12 18:46 UTC