easy to compare a lot of images from both models on http://
stableboost.ai , e.g. "cute dog cooking tacos, photorrealistic", grid of boosted images from 1.5 (left) and 2.0 (right). 2.0 looking more distorted, cartoony, simpler, ignores text more. may need more prompt engineering
MULTIMODAL AI
-
Comparing Stable Diffusion 1.5 vs 2.0 Image Generation Quality
By
–
-
Stable Diffusion 2.0 Shows Quality Decline Compared to 1.5
By
–
plot twist: stable diffusion 2.0 looks quite a bit worse on the few prompts i've tried so far compared to 1.5 (even not including celebrities/artists). Running theory seems to be this is due to an aggressive data sanitization campaign since the original release (?).
-
Dreambooth Model Training Results with 22 Dog Photos
By
–
I trained dreambooth on just 22 pictures of my pup and I'm amazed by the results!
-
Mind Reading Technology, Brain-Computer Interfaces, and Latest Research Paper
By
–
Longer thread on "mind reading tech" and BCI and the latest paper to add to the mix. https://
mind-vis.github.io -
AI Computer Vision Detects Product Defects Perfectly
By
–
From Defect to Perfect! Impact of #AI #ComputerVision #ML #DeepLearning on Quality Great example @MarinerLLC @IntelIoT https://
insight.tech/industry/produ
ct-defect-detection-you-can-count-on-with-mariner-2?utm_source=twitter&utm_medium=organic&utm_campaign=2022-tdc-eaves
… #tech #DigitalTransformation @Shi4Tech #podcast #IntelPartner @insightdottech #ITInfluencer #Product @pierrepinna @SpirosMargaris -
AI Computer Vision Transforms Self-Checkout Retail Experience
By
–
Does self #checkout always end in friction?! Now #AI #Cloud #ComputerVision Transforms #CX & Reduces #waste ! https://
insight.tech/retail/ai-and-
computer-vision-accelerate-self-checkout?utm_source=twitter&utm_medium=organic&utm_campaign=2022-tdc-eaves
… #Retail #IntelPartner #SupplyChain #fintech #POS #Foodie #AgriTech #SDGs #IoT @DeepLearn007 @SpirosMargaris @OpenFoodChain @NutriSumit2023 -
Extending LLMs to Vision: Incremental Multimodal Integration with Flamingo
By
–
Extending LLMs from text to vision will probably take time but, interestingly, can be made incremental. E.g. Flamingo (
https://
storage.googleapis.com/deepmind-media
/DeepMind.com/Blog/tackling-multiple-tasks-with-a-single-visual-language-model/flamingo.pdf
… (pdf)) processes both modalities simultaneously in one LLM. -
Why LLMs Process Text Instead of Raw Pixels
By
–
Interestingly the native and most general medium of existing infrastructure wrt I/O are screens and keyboard/mouse/touch. But pixels are computationally intractable atm, relatively speaking. So it's faster to adapt (textify/compress) the most useful ones so LLMs can act over them
-
Create Midjourney-Style Images with Open Source Stable Diffusion
By
–
Midjourney without Midjourney… @prompthero has fine-tuned Stable Diffusion so you can create images in the style of Midjourney, but with an open source model. You can run it on the web or via an API on Replicate:
-
Guided Image Inpainting for Video Object Removal
By
–
Ever wondered how object removal can be done on a video?
— Satya Mallick (@LearnOpenCV) 14 novembre 2022
Here is an interesting one on Guided Image Inpainting
Paper: https://t.co/I1SBGjzEgh
Code: https://t.co/ssHayfP8EW #learnopencv #opencv #computervision #github #artificialintelligence #deeplearning #machinelearning #ai pic.twitter.com/TWgsUFdf4LEver wondered how object removal can be done on a video?
Here is an interesting one on Guided Image Inpainting Paper: https://
arxiv.org/pdf/2204.07845
.pdf
… Code: https://
github.com/runwayml/guide
d-inpainting
… #learnopencv #opencv #computervision #github #artificialintelligence #deeplearning #machinelearning #ai