Post-training in the Apple Intelligence paper – Two models: – On-device: ~3B param with task-specific LoRA adapters – Server: ~70B (my estimation), almost no details – Human-annotated and synthetic data: – Mathematics: @WizardLM_AI
-like evol strategy to create
Apple Intelligence: 3B On-Device and 70B Server Models
By
–
