for the first time i am aware of, there is an entirely private subfield of AI research every company that actually trains models is doing RL with rubrics and LLM-judged rewards but academic work is stuck on RL with automated rewards (math problems and code). much cleaner for
@jxmnop
-
Regularization During Training for AI Models
By
–
couldn't you just regularize for this during training? i bet it'd work fine
-

Where to Buy Adversarial Examples AI Sweatshirt Merchandise
By
–
how can i buy an Adversarial Examples sweatshirt? (for real. i would wear it)
-

LLM writing patterns influence human thought processes
By
–
caught myself writing out my thoughts like an LLM :/
-
Learning Curves: Asymptotic Values and Convergence Rates
By
–
Learning Curves: Asymptotic Values and Rate of Convergence published in Advances in Neural Information Processing Systems 6 (NIPS 1993) https://
proceedings.neurips.cc/paper/1993/fil
e/1aa48fc4880bb0c9b8a3bf979d3b917e-Paper.pdf
… -

Neural Network Training: Predictive Performance Estimation Methods
By
–
"After training on 12,000 patterns it becomes obvious that the new network will outperform the old… If our predictive method gives a good quantitative estimate of the network's test error, we can decide whether three weeks of training should be devoted to the new architecture"
-

Scaling Laws History: From Bell Labs 1993 to Modern AI
By
–
first i thought scaling laws originated in OpenAI (2020) then i thought they came from Baidu (2017) now i am enlightened:
Scaling Laws were first explored at Bell Labs (1993) -

NeurIPS Paper on Power Laws in Classifier Training
By
–
it's literally a NeurIPS paper they train classifiers on different-sized datasets, different-sized models and fit power laws can't believe this was 32 years ago
-
Bad Model Good Context Outperforms Good Model Bad Context
By
–
bad model + good context > good model + bad context The Big Labs don't want you to know this
-
RL with LLM-as-Judge: The Next AI Development Paradigm
By
–
it seems like the next few years of AI development will be a lot of RL with LLM-as-a-judge reward functions. strange times we live in where can i learn more about this paradigm? what are the most relevant blogs and papers?