//g/
File: collage.jpg (1.3 MB)
1.3 MB
Discussion and Development of Local Image and Video Models

Previous: >>108951930

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>Wan
https://github.com/Wan-Video/Wan2.2

>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/b/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
Showing all 110 replies.
>>
>>108958327
Elephants got legs like Kevin R. Nash
>>
Blessed thread of frenship
>>
>>108958327
>>Maintain Thread Quality
>https://rentry.org/debo
>https://rentry.org/animanon
This is like a troon tramp stamp. Guaranteed to have melties about "Julien" and "Nik". Debo standing by
>>
File: 191609CUI_00001_.png (1.5 MB)
1.5 MB
>>
File: unnamed.png (428.8 KB)
428.8 KB
Could someone improve the quality of this with img2img?
I want to print it on a shower curtain. Time is of the essence, I have to go in like 15min.
>>
Glad to see lilbro is back to his regular seething I was worried where he went
>>
>>108958356
basado
>>
what's stopping (You)?
https://civitai.red/creator-program
>>
>>108958370
I could but I won't
>>
I hate comfyUI because 1/4 of times it crashes.
>>
I think comfy is fine I don't think much about it at all
>>
>>108958458
I don't like the site and how coomer heavy it is
>>
>>108958461
Too bad, order placed
>>
>>108958327
Thank you for baking this thread, anon
>>108958345
Thank you for blessing this thread, anon
>>
>mfw Resource news

06/01/2026

>Bernini Latent Semantic Planning for Video Diffusion
https://bernini-ai.github.io

>NVIDIA Launches Cosmos 3, the Open Frontier Foundation Model for Physical AI
https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai

>LVSA: Training-Free Sparse Attention for Long Video Diffusion
https://github.com/JiusiServe/LongVideoSparseAttention

>RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video
https://compvis.github.io/rayder

>DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
https://jeffreyyzh.github.io/DecMem-Page

>Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models
https://jiazheng-xing.github.io/nexus-lumos-home

>Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation
https://github.com/iCVTEAM/DSP

>PEEK: Picking Essential frames via Efficient Knowledge distillation
https://github.com/momentslab/peek

>CameraNoise: Enabling Faithful Camera Control in Video Diffusion through Geometry-Flow-Guided Noise Warping
https://gulucaptain.github.io/CameraNoise

>Nvidia unveils new superchip to bring AI functions into personal computers
https://www.cbc.ca/news/business/nvidia-ai-personal-computer-9.7218820

>Qwen3.7-Plus: Multimodal Agent Intelligence
https://qwen.ai/blog?id=qwen3.7-plus

05/31/2026

>FLUX Identity Adjuster (V2)
https://github.com/Magirad/Flux_ID_Adjuster_V2

>ComfyUI AnimaFastTrain
https://github.com/quinteroac/ComfyUI-AnimaFastTrain

>MONET: Open-source dataset
https://huggingface.co/datasets/jasperai/monet

05/30/2026

>Pixal3D — Apple Silicon (MPS / Metal) Port
https://github.com/pawel-mazurkiewicz/Pixal3D-mac

>Comfy-Org/PixelDiT (diffusion models & upscalers)
https://huggingface.co/Comfy-Org/PixelDiT/tree/main/diffusion_models

>Orion4D Generative Paint
https://github.com/orion4d/Orion4D_generative_paint
>>
>mfw Research news

06/01/2026

>DTG-Restore: Training-Free Diffusion Refinement for Generative Video Super-Resolution
https://arxiv.org/abs/2605.30431

>TunerDiT: Training-free Progressive Steering of Diffusion Transformer for Multi-Event Video Generation
https://arxiv.org/abs/2605.31590

>SlotMemory: Object-Centric KV Memory for Streaming Long-Video Generation
https://tj12323.github.io/SlotMemory

>SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer
https://arxiv.org/abs/2605.30409

>OmniMem: Scalable and Adaptive Memory Retrieval for Long Video Generation
https://wuyushuwys.github.io/OmniMem

>Robust Dreamer: Deviation-Aware Latent Gaussian Memory for Action-Controlled AR Video Generation
https://arxiv.org/abs/2605.30855

>Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement
https://arxiv.org/abs/2605.30437

>Parallel Tempering Initial Sampling in Inference-Time Reward Alignment
https://arxiv.org/abs/2605.30991

>Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education
https://arxiv.org/abs/2605.31212

>Benchmarking Single-Step Inpainting Methods for Multi-Object 3D Gaussian Splatting Scenes
https://arxiv.org/abs/2605.30987

>MergeTok: Unified Continuous and Discrete Visual Tokenization via Token Merging
https://arxiv.org/abs/2605.30904

>Guidance for Low-Level Perceptual Editing in Unconditional Diffusion Models
https://arxiv.org/abs/2605.31162

>Representation Forcing for Bottleneck-Free Unified Multimodal Models
https://yuqingwang1029.github.io/RepresentationForcing

>A Unifying View of Variational Generative Wasserstein Flows
https://arxiv.org/abs/2605.31369

>Vision-Language Models Suppress Female Representations Under Ambiguous Input
https://arxiv.org/abs/2605.31556

>What Makes LVLMs Hallucinate Less? Unveiling the Architectural Factors Behind Hallucination Robustness
https://arxiv.org/abs/2605.30911
>>
>Using comfyUI
>Crashes
>Loads workflow
>Missing model
>Download model
>0% for 20 minutes
>Restart
>Crashes
>Try to download model again
>Stucked at 25%
>You need this extension
>Git clone
>Doesn't work
>Crashes
I sure do fucking love comfyUI, which is not comfy at all
>>
>>108958647
Maybe go simp some youtuber and ask him to 'make' a frontend for you
>>
>>108958661
Fix your piece of shit software
>>
File: civit payouts.png (56.5 KB)
56.5 KB
>>108958458
The website is total garbage and it is not worth engaging with the brown userbase for the sake of pennies they are paying.
200$ is a normal salary in India apparently but nothing worthwhile where I live.
You also need to game the system by spamming lots of poorly trained mediocre loras and jeetmixes to farm meaningful amount of buzz.
Not to mention, I have no faith that the website will be around for long, or that they won't arbitrarily suspend caching out.
>>
>>108958666
?
>>
File: 201637CUI_00002_.png (1.1 MB)
1.1 MB
>>108958697
How is Civitai still standing anyway? It must cost millions a month to maintain it. Who is funding it?
>>
>>108958753
Andreesen Horowitz
>>
>>108958629
>>108958637
thanks!
>>
Can someone release a new realistic edit model. I can't with Klein's dog shit anatomy anymore. Zedit was a meme and will never come out, what now? is there anything coming soon?
>>
>>108958884
You haven't heard?
>>
> >108958629
> >108958637
go back
>>
Why did Z lie? Why do Chinese developers lie? Why not just not lie?
>>
File: anima_baseV10_00446_.jpg (480.2 KB)
480.2 KB
>>
Someone let a bot loose in the previous thread let's pray they don't come into this one
>>
>>108959079
Let's take a look at a couple famous chinese sayings.

>He who has never been cheated cannot be a good businessman
>If you can cheat, then cheat
>The first time you cheat me, be ashamed. The second time it is I who must be ashamed.
>>
So, did anyone test the newest cosmos stuff?
I have a humble rig and don't want to hire a GPU on vast with the current inflated prices.
>>
>>108959275
Forgot to redirect the post, sorry >>108959236
>>
>>108953392
Catbox for Y'shtola
What model/Lora
>>
>>108959178
cool
>>
>>108959275
It's not like it's gunna blow any current image models out of the water right now. Maybe later thoughever.
>>
>>108959275
if its anything like the last cosmos then itll only be good once someone trains it with danbooru among other things...
>>
File: file.png (1.7 MB)
1.7 MB
>>108958923
Haven't heard what, vaguepost king?
>>
I'm trying to setup pixal3d in comfy and I'm becoming insane. There is always something breaking. Is there a guide or something?

I'm so tilted right now and I hate comfy with ally heart and soul I fucking hate it. I just want to use pixal3d.
>>
>started training at 40 epochs
>now extended 4 times to 100 epochs and probably counting because validation STILL keeps fucking dropping and samples STILL keep fucking improving
the things one does to goon in peace
>>
i'm scared
>>
>>108959444
Klein 9B is okay especially with a certain lora(s).
>>
>>108959477
>validation
>>
ultra cozy mode engaged
>>
https://huggingface.co/RuneXX/LTX-2.3-Workflows/blob/main/Video-2-Video/Extend-Any-Video/LTX-2.3_-_V2V_Extend_Any_Video_Multi-Extend_long_video.json

can extend any video and even clone voices, ltx 2.3 is pretty versatile.

https://files.catbox.moe/mmu2it.mp4
>>
>>108959965
https://files.catbox.moe/vr4m81.mp4
>>
File: 1770335662574855.png (5.6 KB)
5.6 KB
>>108959965
Mia Yikk won, mikutroons btfo
>>
>>108960103
this is so true wtf
>>
I managed to make pixal3d work. Despite the model themselves being good generations, the textures are fucking ass, specially the eyes. Any reason what I could be doing wrong? Because the example images and videos I've seen seem pretty accurate to the source image.
>>
>>108960289
post output? im just curious
>>
cozy monday breas
>>
https://desuarchive.org/g/search/text/mikutroon/
It's all petra, isn't it?
>>
>>108960409
>only 8 pages
dollar store schizo
>>
>>108960409
>a single term in a joke already got a mikutroon panties up in a bunch to search the entire archives for it and do the raped schizo special of accusing all people who ever used that word to be the same person
most mentally sane and not tranny-like mukutroon behaviour
>>
File: 011755CUI_00001_.png (1.5 MB)
1.5 MB
They just put cardboard cutouts of ghiblified characters in karate uniforms at the gym. Can't stop the slop.
>>
>>108958647
>>108958666
I haven't used the regular frontend in months. No one should.
>>
File: zit_00003_.png (1.1 MB)
1.1 MB
>>108958327
>>108958327
>Discussion and Development of Local Image and Video Models
AND MUSIC!!!
>>
anon?
>>
comfyui actually crashes a lot.
>>
usually you can fix the crashiness, but still...
>>
comfy doesnt crash for me but i fuck up the venv a lot it seems
>>
>>108961134
Yeah, the main reason I have problems is rdna2 just barely works. It is only half supported.
>>
>>108961134
also, tweaking the settings, maybe expandable segments is a bad idea since it's not really supported on my card. idk I had to use it to run songbloom iirc
>>
File: FK9B__00003_.png (1.6 MB)
1.6 MB
prompt from dalle thread:
>>108935557
>>
>>108959965
there are no shrimple straightforward workflows for ltx?
i checked some. they all look as "do not waste times with this".
>>
actual new sota model, nai killer
civitai.red/models/840276/
>>
>>108961259
that guy has a lot of neat ltx 2.3 workflows for diff tasks (video extend, custom audio, whatever).

then I have a basic workflow for z image turbo, klein edit, and some other stuff. but most of the time I just mess with LTX 2.3 i2v, klein edit, or zimage if I want to make realistic stuff.
>>
>>108961295
what an interesting account https://civitai.red/models/2266799/heavens-gate-lets-start-a-vaporwave-ufo-cult
>>
File: 1749062829797374.png (2.7 KB)
2.7 KB
what is this shit? did civit went woke?
>>
>>108961310
ahhhh it's a rainbow gonna piss and shit all over myself
>>
>>108961313
you're the one who has to wear adult diapers buddy
>>
>>108961320
it's a rainbow icon get the fuck over it for real, it's unintrusive and at least they used a proper rainbow not the transbipoc flag
>>
>>108961310
>did civit went woke?
like two years ago nearly
>>
can we have a yuri thread or is 2girls too high effort for genners here
>>
>>108961332
was it then they banned celeb loras?
>>
>>108961168
>>108961148
>>108961134
looks like my "fixes" have been causing crashes. I was using a model unloader, idk, we'll see, but looks like a vanilla launch is working better, and without that node. again, we'll see lol
>>
>>108961344
It's really dumb, too, because we are one release away from 1shot 3d pose from a/multi photo.
>>
>>108961335
I posted it in the last thread and I got ignored so... RIP
>>
>>108961346
and. nope lmao
>>
File: 1764307906992637.png (3.0 MB)
3.0 MB
>>108961335
>can we have a yuri thread
what should the 2girls be doing?
>>
>>108961352
2girls is KEKED
>>
>>108961335
The last faggollage had two bros kissing !
>>
>>108961310
>did the website created by a gay man go woke
hmmm is this a trick question?
>>
2.7 MB
nice to see ldg embracing pride month
>>
jerking it to only lesbian porn and hentai this month in support
>>
File: 1772264355612927.png (199.3 KB)
199.3 KB
>>108961421
what ever happened to the non-woke traditional gay men like this?
>>
nofap 1 week world champion.
>>
Was I supposed to laugh at that?
>>
The laugh track is queued, don't worry about laughing.
>>
File: 1771549469468585.png (606.7 KB)
606.7 KB
>>108961313
a tale as old as your average npc leftshit (underage)
>"its just a rainbow chuddy, it literally doesnt matter!"
>ok
>makes a client-side mod to remove it
>"REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
almost like its not just a rainbow but a humiliation ritual those who you view as enemies have to accept or else get censored or banned
>>
mikuroll, used the runeXX video extend workflow:

https://files.catbox.moe/ev4rw4.mp4
>>
File: kiss2.png (2.4 MB)
2.4 MB
>>108961335
>>
>>108961620
I'm pretty sure all he did was switch the game to use the Saudi Arabian version of the game for the flags. Wild that you could get banned for that. Meanwhile gays for palestine.
>>
File: 1772338748207604.png (1.8 MB)
1.8 MB
>>
>>108961621
Wow it retains the instrumental really well
>>
>>
lmao, it can sorta copy singing style if you extend too:

https://files.catbox.moe/w9wp2h.mp4
>>
>>108961714
>https://files.catbox.moe/w9wp2h.mp4
Neat
>>
getting closer!

https://files.catbox.moe/xt4kkw.mp4
>>
works well if you set the frame load cap just after their dialogue to time it better:

https://files.catbox.moe/prmbca.mp4
>>
>>108961301
i got few random ones from him to see what is up (including comfy default) and there are gazillion nodes.
btw which model version is the best to run, original or kijai one?
>>
ltx director is also fun. the node is like using premiere to add elements in the timeline:

https://files.catbox.moe/0v6ml7.mp4

>>108961922

im using 2.3 distilled fp8, seems fine
>>
>>108961933
better:

https://files.catbox.moe/yrqacs.mp4
>>
>>108961933
watchout spielberg!
>>
>>108961959
well its better than nolan and tranny achilles thats for sure.
>>
heh. miopen hip tuning...
>>
>>108961933
>>108961956
ty.
btw quality of the starfield guy is a bit off.
i saw on leddit star trek tng vids where they sing 90s euro-dance songs. must be higher precision since quality is quite up there (sound is a bit off tho).
>>
>>108961956
less audio glitching than before
>>
The letter... what letter was it I should type to represent the indian?
>>
>>108961305
Neil Breen avatar is killing me

Reply to Thread #108958327


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)