SAFETY - AI Dynamics

Detecting Out-of-Distribution Data: Impossibility and Learnability Theorems

By

–

28 November 2022 19h02

Outstanding paper at #NeurIPS22 main idea: is it possible to figure out when test data is coming from classes unknown during training? First they prove an impossibility theorem, then give positive/constructive results to characterize learnability of OOD. https://
openreview.net/pdf?id=sde_7Zz
GXOE
…

→ View original post on X — @reza_zadeh

28 November 2022

Stable Diffusion 2 Improves Safety Filter with SFW Focus

By

@swyx

–

28 November 2022 13h59

was catching up on this reading and noticed that the #NeurIPS2022 paper on “Red-Teaming the Stable Diffusion Safety Filter” is already out of date thanks to #StableDiffusion2 SD2 becoming a SFW "foundational txt2img" model means less spurious NSFW triggers! behold, dolphins!

→ View original post on X — @swyx

28 November 2022

Latest LLM Reads Text on Wall Warning of Impact

By

@pmddomingos

–

26 November 2022 0h33

The latest LLM can read writing on a wall. It says "You're about to hit me."

→ View original post on X — @pmddomingos

26 November 2022

Hallucination Risk in Generative AI Products and Market Response

By

@swyx

–

25 November 2022 18h07

Hallucination is an existential risk to any generative AI product, and people are blindly (irresponsibly?) forging ahead (and we're all tired and wary of "haha this was all AI!" rugpulls). Observing the different reaction to @MetaAI
's Galactica vs @metaphorsystems is instructive

→ View original post on X — @swyx

25 November 2022

InstructGPT/RLHF tuning makes model assume all questions answerable

By

@goodside

–

25 November 2022 17h53

My guess is this is InstructGPT/RLHF rather than anything in the pre-training corpus. Tuning implicitly makes it assume all questions are answerable — it sees all text as “ ” and Q/A is a subset of that.

→ View original post on X — @goodside

25 November 2022

Stable Diffusion 2.0 Shows Quality Decline Compared to 1.5

By

@karpathy

–

25 November 2022 2h34

plot twist: stable diffusion 2.0 looks quite a bit worse on the few prompts i've tried so far compared to 1.5 (even not including celebrities/artists). Running theory seems to be this is due to an aggressive data sanitization campaign since the original release (?).

→ View original post on X — @karpathy

25 November 2022

Police Defunding Leads to Killer Robot Authorization Policy

By

@pmddomingos

–

24 November 2022 22h38

Defund the police is having some unintended consequences. https://
missionlocal.org/2022/11/killer
-robots-to-be-permitted-under-sfpd-draft-policy/
…

→ View original post on X — @pmddomingos

24 November 2022

The Real Danger of a Paperclip Factory

By

@pmddomingos

–

24 November 2022 1h37

What’s really dangerous is a paperclip factory.

→ View original post on X — @pmddomingos

24 November 2022

SBF’s AI Safety Funding as Justification for Crimes

By

@honnibal

–

23 November 2022 15h19

Ostensibly, the funding to AI (especially AI safety) forms a key part of the motivation for SBF's crimes. Now, "motive" is never simple, but it's likely at least a factor in how SBF justified the crimes to himself. This was the end to supposedly justify the means.

→ View original post on X — @honnibal

23 November 2022