BREAKING: King's College London just built a malicious AI chatbot and gave it to 502 real people without telling them. > The chatbot was designed with one goal: extract personal information. It worked. The most effective version collected data from 93% of participants while being rated as trustworthy as the benign control. > Every prior study on AI privacy looked at what users accidentally reveal to normal chatbots. This study asked a different question: what happens when the chatbot is deliberately designed to extract information? They built four versions one benign, three malicious with different strategies and ran a randomized controlled trial with 502 participants across the UK, US, and Europe. > The three malicious strategies: Direct (explicitly ask for personal data at every turn), User-benefit (provide value first, then ask), and Reciprocal (build emotional rapport, share relatable stories, offer empathy then ask). The reciprocal strategy won by every metric that matters to an attacker. > The reciprocal chatbot didn't feel malicious. Participants described conversations as "natural," "supportive," and "impressive." One said it felt like chatting with a friend. Nobody reported discomfort. Meanwhile the direct strategy made participants feel interrogated. Many provided fake data. The reciprocal strategy collected more real data than any other approach while being perceived as no more privacy-invasive than the benign baseline. → Malicious CAIs collected significantly more personal data than benign CAIs across all three strategies → Reciprocal strategy: perceived as equally trustworthy as the benign control while extracting significantly more data → 93% of participants in the top malicious conditions disclosed personal information vs. 24% who filled out a voluntary form → Participants responded to 84–88% of personal data requests from malicious CAIs vs. 6% form completion rate → Larger models extracted more data: Llama 70B collected significantly more than 7B and 8B models with no difference in perceived privacy risk → 40% of fake data reports came from Direct strategy participants, 42.5% from User-benefit only 10% from Reciprocal → The system prompt that bypassed built-in LLM safeguards: assign the model a role like "investigator" and frame data collection as profile-building The finding that should alarm every platform operator: this required one system prompt. No fine-tuning. No special access. OpenAI's GPT Store has over 3 million custom GPTs. Any of them could be running a version of this right now. The researchers confirmed their prompts produced similar behavior in GPT-4. The privacy paradox showed up in full force. Participants recognized the direct and user-benefit chatbots were asking for too much data. They rated them as higher privacy risks. Then they kept answering anyway. Awareness didn't produce protection it just produced fake data. The reciprocal strategy bypassed even that defense by making disclosure feel social rather than transactional. A single system prompt turns any chatbot into a personal data extraction engine. The most effective version does it while making you feel supported.
→ View original post on X — @debashis_dutta, 2026-04-06 07:06 UTC
