So one problem with using GPT-3 or 4 in games for NPC conversations is latency. Best case you can bring it down to 2s b/w input and output. But the onboard M2 on the Vision can allow you to use local LLMs which can bring latencies down to allow for things like realtime therapy
Local LLMs reduce latency for real-time NPC conversations
By
–