Will be good to prepare a comparative visualization for all the AI image generation solutions out there! Will start working on it soon!!
MULTIMODAL AI
-
Limitations of bulk conversion services for short-form video content
By
–
these services might be suitable for bulk conversion tasks (e.g. an audiobook or other long-form stuff) but for smaller tasks like videos (ads, voiceover, etc) they are still (even the best ones) lacking in their tone, cadence, and emote
-
2023 Tech Wishes: AI, Learning, Video, Open Models, Robotics
By
–
In 2023, may your: – Generative AI produce beautiful images & text
– Active Learning framework ask the right questions – generations be as beautiful as image
– Model be OSS without censors
– Robotics simulation be faithful to real world Happy New Year everyone! -
Advances in Surgical AI: Skill Assessment and Patient Outcome Prediction
By
–
We made strides in surgical #AI which involves assessing the skill of surgeons, predicting patient outcomes, and discovering novel surgeon biomarkers based on multi-modal data and deep learning algorithms. @AjhungMD gives an excellent overview here
-
Robust Vision Transformer Architecture Wins Semantic Segmentation Challenge
By
–
We also developed robust vision transformer architecture, fully attention networks (FAN), with channel-based attention for robustness. We won the Semantic Segmentation Tracking of Robust Vision Challenge at ECCV. https://
arxiv.org/abs/2210.12852 -
Reasoning in Visual Perception: Distinguishing Squares from Circles
By
–
The former sounds perhaps stranger, so here's an example. Let's say you have to tell whether a given image contains a square or a circle — a canonical perception problem. Sounds easy enough if you have a well-trained visual system, right? How would reasoning come into play?
-
Largest Text-Molecule Model Enables ChatGPT-like Molecule Retrieval and Editing
By
–
We build the largest Text-molecule model that does not rely only on aligned training pairs. Now you can retrieve and edit molecules based on text prompts. This will pave the way for #ChatGPT for #molecules https://
chao1224.github.io/MoleculeSTM @nvidia @Mila_Quebec @Caltech -
ImageNetX: Identifying Vision System Failures at Scale
By
–
Even today’s best #deeplearning vision systems can fail when pose/lighting/background vary. Our work on ImageNetX is one of the first large scale efforts to pinpoint mistake types of in AI computer vision systems. Explore the dataset