LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE). Our method leverages recent progress in large language
@_akhaliq
-
Transformers’ Limitations in Compositional Reasoning Tasks
By
–
Faith and Fate: Limits of Transformers on Compositionality Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly
-
PaLI-X: Scaling Multilingual Vision and Language Models
By
–
PaLI-X: On Scaling up a Multilingual Vision and Language Model present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new
-
KAFA: Knowledge-Augmented Vision-Language Models for Image Ad Understanding
By
–
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world
-
HiFA: Advanced Diffusion Guidance for High-Fidelity Text-to-3D Synthesis
By
–
HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance
— AK (@_akhaliq) 31 mai 2023
Automatic text-to-3D synthesis has achieved remarkable advancements through the optimization of 3D models. Existing methods commonly rely on pre-trained text-to-image generative models, such as diffusion models,… pic.twitter.com/Jr3oJkNzFGHiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance Automatic text-to-3D synthesis has achieved remarkable advancements through the optimization of 3D models. Existing methods commonly rely on pre-trained text-to-image generative models, such as diffusion models,
-
StyleAvatar3D: High-Fidelity 3D Avatar Generation Using Diffusion Models
By
–
StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation
— AK (@_akhaliq) 31 mai 2023
present a novel method for generating high-quality, stylized 3D avatars that utilizes pre-trained image-text diffusion models for data generation and a Generative Adversarial Network… pic.twitter.com/mYxICiPCgHStyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation present a novel method for generating high-quality, stylized 3D avatars that utilizes pre-trained image-text diffusion models for data generation and a Generative Adversarial Network
-
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
By
–
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation propose Make-an-Audio 2, a latent diffusion-based T2A method that builds on the success of Make-an-Audio. Our approach includes several techniques to improve semantic alignment and temporal consistency: Firstly, we use
-
Nested Diffusion Processes for Anytime Image Generation
By
–
Nested Diffusion Processes for Anytime Image Generation propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed
-
RIVAL: Diffusion-Based Real-World Image Variation Pipeline
By
–
Real-World Image Variation by Aligning Diffusion Inversion Chain propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. Our pipeline enhances the