This paper was heavily inspired by prior work, especially Ajeya Cotra's 'sandwiching' concept:
ETHICS
-
Human-AI Collaboration Improves Task Performance Through Simple Chat Strategy
By
–
Our experiment shows that through a simple strategy – having humans chat with models while completing a task – we can help humans perform better at these tasks. This is very encouraging, albeit preliminary!
-
Scalable Oversight Framework and Language Model Question-Answering Proof of Concept
By
–
Along with developing a framework for scalable oversight, we also conduct a proof of concept experiment that demonstrates a couple of question-answering tasks that work well under this paradigm with current language models:
-
Challenges in Studying Model Assistance: Task Selection and Experimental Design
By
–
It’s also challenging to study: For most tasks today, we don’t actually need our model’s help in this way. So testing these methods will require us to be clever about how we choose our tasks and design our experiments.
-
Leveraging Model Capabilities While Maintaining Supervisory Control
By
–
This is tricky: To do this, we’ll need ways for the human who’s supervising the model to use any relevant knowledge or skills that the model already has, even though they can’t trust the model to be reliably helpful.
-
Scalable Oversight: Supervising AI Systems Beyond Human Capabilities
By
–
To ensure that AI systems remain safe as they start to exceed human capabilities, we’ll need to develop techniques for scalable oversight: the problem of supervising systems’ behavior without assuming that the overseer understands the task better than the system being trained.
-
AI Systems Improving Human Oversight of Large Language Models
By
–
In "Measuring Progress on Scalable Oversight for Large Language Models” we show how humans could use AI systems to better oversee other AI systems, and demonstrate some proof-of-concept results where a language model improves human performance at a task.
-
JD’s Internet Safety Initiative Remembered as Lasting Legacy
By
–
Such a shock to hear this, so very sad indeed, JDs Internet Safety initiative is absolutely a lasting testament – sending thoughts and prayers
-
Content Moderation: Survey on Human-AI Partnership
By
–
According to a survey, humans and AI should be combined for effective online content moderation https://actuia.com/actualite/selon-un-sondage-humains-et-ia-doivent-etre-associes-pour-une-moderation-de-contenu-en-ligne-efficace/
… #AI #artificialintelligence -
Critique d’un article sur la surveillance gouvernementale
By
–
Critique d'un article sur la surveillance gouvernementale Title in English: Critique of an Article on Government Surveillance Note: The URL provided appears to be incomplete or improperly formatted. The text content from the article is not included in your request, only a URL reference. To provide a complete translation of the article text, please provide the full content you wish to have translated.