Thanks Alex. I like the papers that do this, but I also have some concern when this is done on a dataset that is proprietary and only Google has access (JFT). I would like to see a version that is pretrained on LAION. This still has privacy issues, but is at least all public.
REGULATION
-
Privacy in AI: Beyond Training and Model Usage
By
–
And finally, privacy is… hard! While a lot of work focuses on training and using models privately, this is a narrow view of privacy, which encapsulates much more. 14/n
-
Privacy-Respecting Public Pre-Training Datasets for AI Models
By
–
So where do we go from here? We conclude with a number of suggestions for the field. The first ones focuses on making sure we have public pre-training sets which are truly privacy-respecting. Can we make such a dataset/model with comparable utility to what people use now? 12/n
-
Public Data vs Private ML Training Ethics
By
–
1. Publicly available data is not the same as public data. For example, http://
insecam.org has livestreams from videocameras with default passwords. This is publicly available. But it certainly should not be used to train an ML model which purports to be "private." 6/n -
Privacy Challenges in Public Data Pretraining and Fine-tuning
By
–
Seems great, right? Public data is plentiful online, we can just download tons of it, pretrain our models with this public data, and do fine-tuning privately! Privacy is solved! Of course not, and we highlight three (orthogonal) considerations for these settings. 5/n
-
Differential Privacy Challenges in Large-Scale AI Pretraining
By
–
New paper w Nicholas Carlini & @florian_tramer
: "Considerations for Differentially Private Learning with Large-Scale Public Pretraining." We critique the increasingly popular use of large-scale public pretraining in private ML. Comments welcome. https://
arxiv.org/abs/2212.06470 1/n -
The Rise and Fall of Peer Review System Integrity
By
–
Do a little fraud // get a paper published // get down tonight https://
experimentalhistory.substack.com/p/the-rise-and
-fall-of-peer-review
… -
OpenAI’s Missing GPT Detector Tool Release
By
–
I'm a bit surprised OpenAI didn't release an updated GPT detector together with ChatGPT. Seems like something that would be useful for the impacted sectors, no?
-
Fair Compensation for Artists’ Work Used by AI Systems
By
–
I think there's also a discussion about "can it be fair for people to benefit from artists' work without them being compensated", regardless of whether it's legal. I also think coders should feel the same way about this as they feel about their code being used by OpenAI Codex.
-
Regulation Ineffective for Protecting Industries from Automation
By
–
I agree 1) is stronger than 2). Although I'm not sure it's *really* strong. In general, there's lots of good arguments *against* using regulation to prop up industries that are impacted by automation.