Thread #108652848
File: highlights_g_108645344_1776783367_1.jpg (1.2 MB)
1.2 MB JPG
Discussion and Development of Local Image and Video Models
Previous: >>108645344
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>LTX-2
https://huggingface.co/Lightricks/LTX-2
>Wan
https://github.com/Wan-Video/Wan2.2
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Illustrious
https://rentry.org/comfyui_guide_1girl
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
318 RepliesView Thread
>>
>>
File: GPT Image 2.png (503.3 KB)
503.3 KB PNG
https://xcancel.com/sama/status/2046598595869331894#m
watch him releasing something that destroys the competition and then deletes it 6 months later like sora keek
>>
>>
>>
>>
>>
File: _AnimaPreview3_00190_.jpg (412 KB)
412 KB JPG
>>
File: 1748014903243570.png (1.6 MB)
1.6 MB PNG
>>108652877
I still can't fathom how good it is at text, it's so close it can almost do fucking poster movies with text that's just too small for anyone to bother to read kek
https://xcancel.com/harufit333/status/2046603596746436965#m
>>
>>
>>
>>
File: Gpt image 2.png (2.5 MB)
2.5 MB PNG
>>108652897
that's fucking impressive, but that doesn't look like michael jackson at all in 1983 keek
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: _AnimaPreview3_00193_.jpg (399.8 KB)
399.8 KB JPG
>>108652902
classic art lora, havent uploaded yet
>>
>>
>>
File: kek.png (1.3 MB)
1.3 MB PNG
>>108652979
>a hostile ideological antic
you mean the Chinese Culture? (we still don't have Z-image edit btw)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 580616098487758.png (2.4 MB)
2.4 MB PNG
>>108652928
>1973
>The Mummy at a press conference
???
>>
>>
>>108652564
>>108652621
So apifags are horny seeing a sex scene worthy of a TV movie rated 13+ recommended by the mother-in-law...
They have really no clue about local nsfw...
>>
File: 3455645646566.png (583.9 KB)
583.9 KB PNG
>>108653025
i saw reddit posts of gbt-4 with same text capabilities. Not saying images 2 isn't even better, but some people seem to never have seen good api text before?
>>
>>108653032
>So apifags are horny seeing a sex scene worthy of a TV movie rated 13+ recommended by the mother-in-law...
they can do actual porn anon
https://www.reddit.com/r/Grok_Porn/
>>
>>108653036
>but some people seem to never have seen good api text before?
in this place? probably not, they foam in the mouth so hard when they see API gens, they prefer to stay in their bubble and pretend that plastic skin and 1 line of text is the best humanity can do right now
>>
>>
File: ChatGPT Image Apr 21, 2026, 08_45_03 PM.png (1022.8 KB)
1022.8 KB PNG
>>108653036
it does long text SO MUCH BETTER than Nano Banana Pro and any other models, specially dense text packed images
Prompt:
>i am thinking of making a terminal user interface, but for browing 4chan, generate an aesthetic, riced, beautiful high info packed terminal user interface image of such software
>>
File: ChatGPT Image Apr 21, 2026, 09_28_37 PM.png (2.5 MB)
2.5 MB PNG
SAARS
>>
>>
File: local lost so hard.png (500.8 KB)
500.8 KB PNG
>>108653055
wait this isn't a real screenshot???
>>
File: WaiAnima1+Turbo_00002_.png (899.2 KB)
899.2 KB PNG
>>108652894
>>108652968
Noice
>>
File: 384092539163418.png (1.4 MB)
1.4 MB PNG
>>
>>
>>
>>
File: Screenshot 2026-04-21 213059.png (277.5 KB)
277.5 KB PNG
>>108653067
nope it's really that good
but they might nerf this model after some days as they always do
>>
>>
>>
>>108653055
Why don't these sorts of gens get posted in the cloud threads? All the gens I see in those threads are complete ass. Like only the good API gens get posted here while the cloud threads have only slop kek.
>>
File: 184033924066806.png (1.2 MB)
1.2 MB PNG
>>108653055
>>108653087
It's very impressive indeed. Too bad they don't seem to care about aesthetics though.
>>
>>
>>
>>108653115
>>108653125
Prompt: a video game called "Hollow Knight: Basedsong" (which is a parody of hollow knight silksong) where the player is a basedjak, other npcs are wojaks, basedjaks, chudjaks, pepe frog, gigachad etc. set in a meme world)
>>
>>
>>
>>
>>
>>
>>
File: 5bc7b77f-65ca-4db7-a25e-4ca20a4ea748.png (2.5 MB)
2.5 MB PNG
>>108653150
they haven't released it in the API yet
you can try this in chatgpt if you have plus or higher sub
>>
>>
>>
File: ComfyUI_temp_dccrg_00022_.jpg (271 KB)
271 KB JPG
>>108653150
gpt-image-1.5 costs 8 bucks/1 million tokens in input and 32/1m in output
https://files.catbox.moe/8qs2w0.png
>>
All right, that's it, get out!
>>108653190
>>108653190
>>108653190
>>
File: 42282729894941.png (3.9 MB)
3.9 MB PNG
>BEATH NOTEY
k-kawaii
>>
>>
>>108653169
this checkpoint of images v2 openai released is bad
appearently this is the worst, compute efficient version so it makes shittier results compared to the other checkpoints
the other checkpoints (tested on arena ai) had much better results unfortunately i did not save those samples to compare)
>>
>>
There are too many different families of models now. Could you please add a simple summary, please? I mean the pros and cons of each, which one to use for what use. Beginners will be totally lost. Especially the newer and lesser known models like Anima, Chroma, Wan...
Something like:
SDXL
+ rather good for photorealism and painting
- often fucks up details and body parts
- No so good for 2D
IllustriousXL
+ Good for manga and cartoon
+ millions of LoRAs available
- struggles with multiple characters
PonyXL
+ Rather good results and many LoRAs available
- if you don't use some very specific words in the prompt the result looks like shit
Flux
An improved SDXL but a lot heavier and slower to generate.
Flux Kontext
+ Very powerful to rework a picture, it understands what you ask it to do.
- Takes very long to generate
Z-Image Turbo
+ very good pictures and rather good consistency with the prompt
- very poor diversity, for a same prompt all pictures will look the same
>>
File: image_1200988.png (3.5 MB)
3.5 MB PNG
>>108653055
i expect this model to get nerfed very hard after 2-3 week when all the been mark scores are up and subscriptions increase. shame typical pattern and circle with all these western frontier models.
>>
>>
>>
>>
>>
File: ChatGPT Image Apr 21, 2026, 09_13_44 AM.png (903 KB)
903 KB PNG
the only good thing regarding openai releasing images v2 is that google will release their next nano banana pro version which would be handy making open source training datasets that chinese models can catch upto images v2
>>
File: ComfyUI_temp_zfspu_00026_.png (3.2 MB)
3.2 MB PNG
>>108653216
I use Z-Image Base for my everyday needs, I use Spark.Chroma for the porn and sometimes Wan 2.2 when I want a "photojournalism style" photo
https://files.catbox.moe/8rfj0k.png
>>
>>
>>
>>108653235
>>108653258
please discuss about API shit here instead >>108653194
>>
>>
>>
>>
>>
>>108652860
I mean it's probably not worth wasting bandwidth and compute on a pseudo-social media for images and videos, but they probably want to have an image model around. Their competitors Grok and Gemini have them.
Expect tight limits on free (if it even comes there) and non-pro tiers though as this is no longer a priority for them.
>>
>>
>>
File: Chroma_final_00007_.png (2.4 MB)
2.4 MB PNG
edgy af https://files.catbox.moe/wgio3h.png
>>
>>
>>
File: 1769057034544373.png (2.6 MB)
2.6 MB PNG
>>108653025
kek
>>
>>
>>
File: _AnimaPreview3_00246_.jpg (305.3 KB)
305.3 KB JPG
>>
>>
>>
>>
File: ComfyUI_temp_kyrpm_00098_.png (3.4 MB)
3.4 MB PNG
>>108653320
>>108653025
What is up with the grain/ blotches it insists on adding to the image? I can see how in a photo prompt it'd help sell the realism, but here it just plain don't make sense
https://files.catbox.moe/ajvza1.png
>>
>>
>>
>>
>>
>>
>>
File: its joever.jpg (34.5 KB)
34.5 KB JPG
>>108652928
I know we can cherry pick stuff and it's not perfect but this still mogs anything local so hard it's not even funny.
Can't wait for it to be censored to uselessness regardless in ClosedAI fashion.
>>
>>
>>
>>
>>
>>
>>
>>
File: ChatGPT Image Apr 21, 2026, 12_12_04 AM.png (1.5 MB)
1.5 MB PNG
how long till openkikes nerf this
>>
>>
>>
>>
>>
File: _AnimaPreview3_00265_.jpg (364.3 KB)
364.3 KB JPG
>>108653369
it's all lora
>>108653377
>Great example of the mistakeface
lora alters face
>>
>>108653460
>>108653434
>>108653359
>>108653235
Where are you running this?
>>
File: file.png (434.9 KB)
434.9 KB PNG
>>108653488
first time I heard of adg
>>
File: ComfyUI_temp_bpjqg_00010_.png (3.7 MB)
3.7 MB PNG
catbox and litter are being a dildo
>>
>>
>>
>>
>>108653512
>>108653533
it died after the 4chan hack
>>
>>
>>
>>108653509
in chatgpt web, some accounts have access to it
>>108653526
>>
File: ComfyUI_temp_ngjff_00054_.png (3 MB)
3 MB PNG
>>108653606
ltx 2.3 is as good as sora
and 4chan anons contradicting each other? Must be a day that ends with a "y"
https://files.catbox.moe/a1yc7c.png
>>
File: 582248531234447.png (2.2 MB)
2.2 MB PNG
>>
File: ComfyUI_temp_qpenk_00108_.png (2.8 MB)
2.8 MB PNG
>>108653634
>cameltoe
based beyond belief
https://files.catbox.moe/wbvhuh.png
>>
File: Anima-2026-04-21-18-35-11-2780968753-1.png (1.4 MB)
1.4 MB PNG
>>
File: 3467824724727.jpg (2.2 MB)
2.2 MB JPG
>>
File: _AnimaPreview3_00285_.jpg (447.7 KB)
447.7 KB JPG
>>
>>
File: HGcPBxybEAENHe6.jpg (480.4 KB)
480.4 KB JPG
>>
File: ChatGPT Image Apr 21, 2026, 00_34_08 AM.jpg (445.4 KB)
445.4 KB JPG
gpt-image-2 is great at terminals
"show me a screenshot of a mac desktop, large terminal window visible of the earth map in ASCII"
>>
File: 549665726255300.png (1.8 MB)
1.8 MB PNG
>>
>>108653670
>>108653697
what do you not understand in "local"?
-> >>108653190
>>
>>
File: Anima-2026-04-21-18-38-43-3061605949-2.png (1.6 MB)
1.6 MB PNG
>>
File: sydneysweeny_imagesv2.png (1.8 MB)
1.8 MB PNG
>>
>>
File: 835673734828.jpg (1.8 MB)
1.8 MB JPG
>>108653662
mix of base anima artists
>>
>>
File: _AnimaPreview3_00303_.jpg (478.6 KB)
478.6 KB JPG
>>108653700
Yoji Shinkawa?
>>
>>
>>
>>
File: Screenshot 2026-04-21 155501.png (88.3 KB)
88.3 KB PNG
>>108653764
>>108653747
looks like they nerfed this model with public figures/celebs already
>>
>>
>>108653777
>>108653764
She also doesn't have a flux chin
>>
>>108653764
>I am convinced that these models are instructed to make vaguely resembling but some details are changed versions of real people when prompted to draw them.
if only local was was better at deepfaking
>muh loras
too much drift and looks like shit
>muh flux klein
yea, if you dont mind plastic reptile skin
the local deepfaking scene hasnt evolved since 2020
>>
>>
>>
File: _AnimaPreview3_00317_.jpg (488.8 KB)
488.8 KB JPG
>>
>>
File: lol lmao.png (507.5 KB)
507.5 KB PNG
>>108653730
>>
>>108653764
Same with Qwen or Klein. The more you change the image via prompt, the more they converge towards sameface.
If you put her in different outfit or make her naked, you pretty much retain face structure, but as soon as you change her pose it diverges away from the original character.
I think that's just an artifact of edit models.
>>
File: Anima-2026-04-21-19-10-07-144037443-1.png (1.3 MB)
1.3 MB PNG
>>
>All this crying about API vs local
>Meanwhile API is just a glorified text and screenshot generator, nothing practical can come from the model they're releasing because it's caged
>APIcucks have zero control over what they generate. Like a style or gen? No guarantee it stays or that you can even use the model at all
>>
File: z-image_00969_.png (2.1 MB)
2.1 MB PNG
>>
>>
File: Hero US ARMY soldier salutes the Admiral.png (1.5 MB)
1.5 MB PNG
>>108654020
you'd have to get drafted
>>
File: _AnimaPreview3_00343_.jpg (429.6 KB)
429.6 KB JPG
>>
>>
File: Anima2026-04-21-19-22-34-4202572290-1.png (1.7 MB)
1.7 MB PNG
>>
File: fff.png (32.4 KB)
32.4 KB PNG
...guys I dont know even know what is going on or what is needed for these python scripts. I am that dumb. I'm not able to just run the scripts as posted in my Python shell, I'm like not even smart enough to ask the right questions. Just this whole git/pip thing eludes me. Trying to run WAN locally, have used StableDiffusion locally fine
>>
>>108654069
aint no fortunate son
>>
File: _AnimaPreview3_00350_.jpg (508.5 KB)
508.5 KB JPG
>>
File: 1093505729497538.png (1.2 MB)
1.2 MB PNG
>>108653754
Yes indeed
>>
>>108654078
Desu tell it you are tech illiterate and need things explained verbosely and just let the chatbot of your choice guide you through the installation of Comfyui.
Once you succeed that just open the template for Wan 2.2 text to video or image to video and roll with the defaults.
I don't know what else to say if you are at this level.
>>
File: z-image_00913_.png (1.7 MB)
1.7 MB PNG
>>108654078
>install comfyUI which is as easy as extracting shit out of a folder
>install comfyUI manager (also simple)
>open a wan workflow
>download the missing files
>???
>PROFIT
>>
>>
>>
>>
>>
>>
>>108654114
>>108654116
I have ComfyUI, I must have handled this before. I think I'm figuring it out, thanks
>>
>>108654036
>>108654089
Model?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108654323
Are you seriously suggesting an image can replace concept of an user interface which needs to be built as being user friendly? How? You never have created anything that people actually LIKE to use in your entire life. Using implies interaction. And again where is this "beautiful" as instructed by the prompt? >>108653055
Glad I dont have your eyes.
>>
>>
so anima is pretty much replacing illustrious and stable diffusion right? this model fuckin' shits all over every other local model I've used with the exception of flux MAYBE because I have no idea if flux was really good or not considering how slow it was I never used it.
>>
>>
>>
File: Untitled.png (151.4 KB)
151.4 KB PNG
It’s almost been 10 days, why do they keep bumping that thread? They don’t have collage anymore, they don’t have Anchor, scraps of a glorious past. I’m saying this as a former anon who was there during its golden days.
>>
>>
>>
File: died for freedom in iran.png (1.5 MB)
1.5 MB PNG
>>108654190
yes its flux klein edit
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: z-image_00923_.png (1.5 MB)
1.5 MB PNG
>>108654433
indeed we iz
>>
>>
File: notepad++_3SRsN9xbJV.jpg (77.7 KB)
77.7 KB JPG
>>108654690
>desktop
Idk if it's the same but portable (which you should be using anyway) has extra_model_paths.yaml
Edit it like this, just add your own folders
>>
>>
>>
>>
>>
File: file.png (268.1 KB)
268.1 KB PNG
>>108654748
Yes and a new model architecture. That's just 1.2 million steps (about 2 weeks) on a 5090. Qwen text encoder + contrastive flow match, block skipping and residual learning. 1B model.
>>
File: _AnimaPreview3_00380_.jpg (459.9 KB)
459.9 KB JPG
>>
>>
File: FluxKlein9BDistilled_KekstoneErnie.png (2.9 MB)
2.9 MB PNG
>>
>>
>>
>>
>>
>>
File: file.png (182.7 KB)
182.7 KB PNG
>>108654828
e2e-qwenimage-vae
>>108654838
Haven't heard of it, doesn't seem to show up on Google and the Reddit thread that may be it is deleted
>>
>>
>>
>>
>>
>>
File: 1775644309012042.png (311.9 KB)
311.9 KB PNG
>>108654870
https://huggingface.co/well9472/Nanosaur-1.2B-Preview
>>
>>
>>
>>
>>
>>108654926
Interesting, I guess you can tell his fetish, makes me wonder though if a model that size would be cheaper train with multiple 5090s. Given h100 and 5090 is about the same speed, the constraint is throughput not VRAM.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108655130
That's a lot of words that don't change the fact it ate my buzz. I'm on both websites, the buzz is gone my account did not transfer to .red at all. Maybe it's because I used a google account don't know.
>>
File: ComfyUI_10381_.png (1.1 MB)
1.1 MB PNG
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1753358694130373.png (64.6 KB)
64.6 KB PNG
nice tool cloudcucks
>>
>>
File: 1769950046265926.png (19.4 KB)
19.4 KB PNG
ZITGODS?
>>
>>
>>
>>108655300
Theres more tech discussion on preddit than there was on ldg for more than a year at this point, everyone who cared about tech doesn't post here since it was filled with newfags who need tech support to run something on their 8gb card or the mentally ill autists who destroy the threads, all the while there are barely any updates on new tech or anyone doing any testing they post anymore. Ldg was only good initially when most things were new, and its good enough on big model releases now.
>>
File: 1766789840673099.png (3.1 MB)
3.1 MB PNG
>>108655257
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108655349
I want, it looks good, btw, this shit is genuinely next level >>108655139
>>
>>108655442
dont worry given that every general on every board turns into a zoo of autists that make every general worthless to be in, ldg will continue to die and have less and less relevant things posted in it, although its already mostly dead. the anima dev posting his lora is the peak of this general lmao.
>>
>>
>>
>>
>>108655475
me speaking facts about the general going to shit is not equal to me hating the general itself, the fact that you are too autistic to see this is further proof of the enshittification of this site and what im talking about. a zoo of autists.
>>
File: 1652362253530.png (749.7 KB)
749.7 KB PNG
>end of sora this week
>nobody care, and sora already replaced by seedance
api kek. they are powerful model but without any honor. meanwhile, old sdxl is still respected
>>
>>
>>
>>
>>
>>108654268
Well, you can argue even Flux.1 was revolutionary in the sense that you no longer needed to hire anyone to make nice looking graphics. NBP was obviously a massive step up because of its world/UI knowledge, and now GPT Image 2 perfects it. But it has been the case for a while you don't need to hire artists. Image models are quite useful for mockups, but Claude literally released a tool that replaces Figma. That is more impressive that GPT Image 2.
>>
>>
File: 1756842631381643.png (749.1 KB)
749.1 KB PNG
>>108655615
probably a bug, I've seen hundreds of images of gpt image 2 and it's not outputing something like that, people absolutely love that model
>>
>>
File: 1762671166557327.jpg (888.4 KB)
888.4 KB JPG
The stupid questions thread is dead. How do I feed AI original character designs and have them fuck? I have a ref pose picture generated for each.
>>
File: 1756816496198355.png (797.3 KB)
797.3 KB PNG
>>108655640
holy shit, google got fucked. thats what they get for rug pulling nbp
>>
>>108655703
with this quality, maybe Ernie 2 will be decent, chinks only distill from western API images so if gpt image 2 is that good then local will be good, it's sad they prefer to eat the shit of western models instead of going for real data but it is what it is
>>
>>