I see. Hard to imagine how this would work in a real life use cases though (at current performance levels). Grok 4 voice feels much more consistent in instruction following but still a high chance that active listening will be hard to achieve in fact (due to hallucinations and