Hertz-vae: > 1.8B parameters, 8-layer decoder-only transformer
> First four layers receive latent history
> Layer 5 receives ground-truth 15-bit quantized representation during training
> Directly samples hertz-lm's next token prediction during inference
> Near-perfect at
Hertz-VAE: 1.8B Parameter Decoder-Only Transformer Architecture
By
–
Leave a Reply