trading Elon for Saudis does not seem like a win
REGULATION
-
Public and Private AI Training: Different Accuracy Standards
By
–
Also, my personal opinion is that both public pretraining (with certain constraints/limitations) as well as private training from scratch both have roles to play in the space. They of course can't be held to the same accuracy standard though.
-
Public vs Proprietary Datasets in AI Model Training
By
–
Thanks Alex. I like the papers that do this, but I also have some concern when this is done on a dataset that is proprietary and only Google has access (JFT). I would like to see a version that is pretrained on LAION. This still has privacy issues, but is at least all public.
-
Privacy in AI: Beyond Training and Model Usage
By
–
And finally, privacy is… hard! While a lot of work focuses on training and using models privately, this is a narrow view of privacy, which encapsulates much more. 14/n
-
Privacy-Respecting Public Pre-Training Datasets for AI Models
By
–
So where do we go from here? We conclude with a number of suggestions for the field. The first ones focuses on making sure we have public pre-training sets which are truly privacy-respecting. Can we make such a dataset/model with comparable utility to what people use now? 12/n
-
Public Data vs Private ML Training Ethics
By
–
1. Publicly available data is not the same as public data. For example, http://
insecam.org has livestreams from videocameras with default passwords. This is publicly available. But it certainly should not be used to train an ML model which purports to be "private." 6/n -
Privacy Challenges in Public Data Pretraining and Fine-tuning
By
–
Seems great, right? Public data is plentiful online, we can just download tons of it, pretrain our models with this public data, and do fine-tuning privately! Privacy is solved! Of course not, and we highlight three (orthogonal) considerations for these settings. 5/n
-
Differential Privacy Challenges in Large-Scale AI Pretraining
By
–
New paper w Nicholas Carlini & @florian_tramer
: "Considerations for Differentially Private Learning with Large-Scale Public Pretraining." We critique the increasingly popular use of large-scale public pretraining in private ML. Comments welcome. https://
arxiv.org/abs/2212.06470 1/n -
The Rise and Fall of Peer Review System Integrity
By
–
Do a little fraud // get a paper published // get down tonight https://
experimentalhistory.substack.com/p/the-rise-and
-fall-of-peer-review
… -
OpenAI’s Missing GPT Detector Tool Release
By
–
I'm a bit surprised OpenAI didn't release an updated GPT detector together with ChatGPT. Seems like something that would be useful for the impacted sectors, no?