This is an early step; there is a long path from this work to fully understanding the complex behaviors of our most powerful models. Our aim is to understand larger models, gradually expand the set of behaviors we can reliably interpret, and obtain safety assurances using our
Leave a Reply