File: highlights_g_109001708_1780884657_1.jpg (1.1 MB)
Discussion and Development of Local Image, Video, and Music Models
Previous: >>109001708
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion -web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>Wan
https://github.com/Wan-Video/Wan2.2
>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
Showing all 124 replies.
>>
>mfw Resource news
06/07/2026
>Ideogram4 GGUF quantized files
https://huggingface.co/leejet/ideogram-4-GGUF
>‘A driver of political violence’: how the breakneck AI boom is fueling anti-tech extremism
https://www.theguardian.com/technology/2026/jun/07/anti-ai-tech-extrem ism-violence
>Ideogram 4 NF4 integration for Forge Neo with a visual JSON layout builder
https://github.com/Whatwhatio/forge-neo-ideogram4
>Huihui-gemma-4-12B-it-abliterated
https://huggingface.co/huihui-ai/Huihui-gemma-4-12B-it-abliterated
06/06/2026
>HugginFace VFS Plugin: Native Total Commander file system for Hugging Face models
https://github.com/mikinko/HuggingFace_WFX
>ComfyUI Lance AIO: Custom nodes to run Lance-3B
https://github.com/SteveImmanuel/comfyui-lance-aio
>Cube: Generative AI System for 3D
https://github.com/Roblox/cube
>The token bill comes due: Inside the industry scramble to manage AI’s runaway costs
https://techcrunch.com/2026/06/05/the-token-bill-comes-due-inside-the- industry-scramble-to-manage-ais-run away-costs
06/05/2026
>RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling
https://simon-dcs.github.io/Website-of-RhymeFlow
>Complexity-Balanced Diffusion Splitting
https://noamissachar.github.io/CBS
>Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?
https://github.com/LSU-ATHENA/HPM-Predict
>SAM-Flow: Source-Anchored Masked Flow for Training-Free Image Editing
https://github.com/chwbob/Sam-Flow
>Geometry-Aware Dataset Condensation for Diffusion Model Training
https://github.com/2018cx/GADC
>StoryVideoQA
https://github.com/nercms-mmap/StoryVideoQA
>Lightricks to split into two companies as it cuts 75 jobs
https://www.calcalistech.com/ctechnews/article/r1dgjt5gmg
>Akium Sampler: Custom k-diffusion sampler for Stable Diffusion Forge / A1111
https://github.com/AkiumAI/akium-sampler
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: animaHighres_00040_.png (1.4 MB)
>>109003927
there u go faggot is that what you meant and want? its OK you don't have to fucking hide it...
what you gonna do fucking mald and seethe about it for the next 24 hours?
>>
>>
>>
>>
File: FluxKlein9B_Distilled_Output_12515.png (1.5 MB)
reprompted the whole collage itself as a new image on Klein 9B with Gemini caption lmao, came out better than expected honestly
the prompt is very long so I put it here:
https://pastes.io/aywh1DMS
>>
>>
>>
File: animaHighres_00042_.png (1.3 MB)
>young girl wearing a cute bear suit in front of her computer in her messy bedroom, fast food, coke drink from typical fast food with clear plastic lid, disbelief and depression about what she is seeing on her computer monitor. Night time, dark, blue light from the screen, realistic, photo, high quality
dropped the reference and controlnet just to see how the prompt worked.
not bad.
>>
>>
File: FluxKlein9B_Distilled_Output_25412152.jpg (2.7 MB)
Bro seems to believe that this image was an example of something only Nano Banana could do, impossible to reproduce without Le Ebin Json Ideogram (it is not and was not)
https:/reddit.com/r/StableDiffusion/comments/1tzr6ci/an_experiment_rec reate_jsonprompted_closed_model/
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: nbp.png (1.6 MB)
>>109004351
>Three anime figures on or near the PC tower
there's four
>"stance": "standing in a slight contrapposto pose",
not really, she's just leaning
>negatives: Any appearance of pink/magenta anywhere
your troony keyboard and pride flag in the background??
not bad, a few more training runs on nano banana and gpt outputs and local should be there by 2027.
>>
>>109004502
??? wat? my picrel Klein one is closer overall to the original Gemini pic than his Ideogram one was. Had had rando curtains on the wall and four figures on top with none inside, instead of three plus one inside.
>>
>>
>>
>>
>>
>>109004532
the ideogram one is terrible. json prompting is a meme anyway, the only reason it works on nano banana is because their 3 trillion parameter LLM is re-writing the prompt in the background. jeets think it's some kind of 'computer language' that 'better represents how models think' but it's really just slop.
>>
File: animaHighres_00043_.png (1.5 MB)
>>109004502
he really cares about this, its his whole life...
>>
>>
>>
>>
>>109004588
>json prompting
i told gemini to fuck off with that shit, it seems to leaking more and more into mainstream cloud models, its is fucking garbage. The more they move away from us the more they will self destruct, so its win win.
>>
>>
>>
>>109004619
I mean I just posted this:
>>109004351
where the ATTACHED pic here on le 4Chinz (made with Klein) was just showing that the Gemini pic (left side of the Reddit thread) was not in any way an example of something that was difficult to gen to begin with. IDK about anything past that lol
>>
>>
>>
>>109004636
oh so it does, i guess they trained this model on some images from gemini, so fucking sue me? its not like those other companies didn't steal everyone else's shit and then charge money to use it...
"its okkay when we do it... "
>>
>>
>>
>>
>>
>>
>>109004619
Four hours ago someone posted a cute 1girl and he's been melting down ever since.
>>109003582
It's been a couple of months since this guy has had this particular brand of sperg out ITT. He used to do it like every other weekend.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>109004636
because he just i2i'd >>109004502
>>
>>
>>109004728
https://pastebin.com/MMGXzBv0
It's llm slop and too big to post here
>>
>>
>>
Japanese Folk Metal ACEStep XL LoRA. Trained on just 10 songs.
https://vocaroo.com/1hOnOf8ZWn71
https://vocaroo.com/18pRgXxfm3tj
https://vocaroo.com/15AMD9XrQ4Xl
>>
File: Ideogram_00096_.png (1.7 MB)
>>109004783
The latter. My pathetic human attempts ended up looking like shitty romance novel covers.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>109004890
Wrote a detailed guide here
https://rentry.co/s8fg8ber
But it uses custom scripts alongside Side-Step. By far easiest way is to just use Side-Step's options to caption the dataset, but my script is what I use since I can work around bugs once everything is fetched. The most tedious part is just the data curation, with doubledouble top and the web to curate lyrics, and the help of Gemini to structure them, it should not be too bad (done manually), though I still use a script to reformat my lyrics. For training, I use a cloud GPU (Modal) since it's free $30 in monthly credits, good for a few runs (this one was just $5). I will update the guide later to include a Modal training script. It can be done locally, but since I have a 3090 it would have to be left running overnight to get the 800 epochs. I don't like to use 60 sec chunks on Side-Step because I think it converges better without, so I train with the full songs which takes exponentially more time.
>>
>>
>>
>>
>>
>>
>>109004996 >>109005000
try https://pastebin.com/xpYezwZp as workflow
>>
>>109004990
>>109005000
it has sd1 face
so it seems all these three news one ernie,animu and ideogram are fails then
cant beat zimage and klein and even qwen aint that bad
>>
>>
>>
>>
>>109005000
ok, so it can do breasts
https://civitaiarchive.com/models/2679521?modelVersionId=3008701
I'm just not impressed with the weird grainy quality. I'm trying to find a reason to pull the trigger and start downloading the models but I just don't see how it's better than zit or k9 with some realism loras thrown in. Skin and faces look very, very, sloppy
>>
File: ig.jpg (268.0 KB)
>>109005043
good too.
anyhow if the censorship gives you trouble that's the least affected sensible workflow I've seen so far. makes the model usable.
i forgot to add "masterpiece" but it almost is
>>
>>
>>109005072
the 1girls are less pretty than in other models overall but not terrible
> I just don't see how it's better
you can prompt far more characters/objects in defined regions, that's the main thing IMO
>>
>>
>>
>>
>>
>>
File: ig.jpg (305.7 KB)
>>109005152
no. i just recommend trying >>109005019, it works quite well
>>
>>
>>109005072
this is jeetslop
>This workflow uses an uncensored text encoder: Qwen3VL-8B-Uncensored-HauhauCS-Aggressive, plus a latent upscaler before the image sampler. Right now, it works successfully around 30% of the time
bro thinks the text encoder has jack diddly fucking squat to do with anything here (it does not, "uncensored" text encoders are NOT a thing that serves any purpose in the context of image models)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>109005259
many options. world models
https://github.com/Tencent-Hunyuan/HY-World-2.0
https://github.com/robbyant/lingbot-world
https://over.world/
might be the most direct way sooner or later
or try to use 3d object generation splat models, idk which is currently best maybe try https://github.com/VAST-AI-Research/TripoSplat https://github.com/IgorAherne/TRELL IS.2-stableprojectorz etc, obviously this is for 3d engines
or use blender/krita/whatever with plugins to work with 3d or 2d textures
it's not what most people here usually do tho.
>>
File: idiotgram.jpg (256.8 KB)
>>109005273
>>109005263
>>109005301
saar you only need to draw more bounding boxes
>>
>>
File: ComfyUI_00747_.png (442.1 KB)
>>109005303
>>
File: ga.jpg (227.6 KB)
>>109005322
see >>109005019
it seems to mostly work for me. it may be that you do have to define some extra boxes. idk, add a /ldg/ logo
>>
>>
Also nvpf4 will "work" on 4000 and 3000 series, the speed will be ass due to lack of acceleration, similar to how fp8 works on 3000.
But that is also true, and possibly worse for Q quants.
Regardless it's not nvfp4 anyway.
>>
>>
>>