Tencent's Hunyuan team introduced ArtifactsBench, an automated evaluation pipeline for LLM-generated visual artifacts It assesses models on 1,825 diverse tasks with MLLM-as-Judge evaluating visual artifacts, achieving 94.4% ranking consistency with human experts
Tencent’s ArtifactsBench: Automated Evaluation for LLM Visual Artifacts
By
–
