Thread #108687829
HomeIndexCatalogAll ThreadsNew ThreadReply
H
Discussion and Development of Local Image and Video Models

Previous: >>108682974

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
+Showing all 316 replies.
>>
skyler white lookin' thirsty, yo
>>
Blessed thread of frenship
>>
Sorry but I'll still be using Illustrious with Forge classic.
>>
>>108687829
You dropped cute Jenny >>108687122
>>
>mfw Resource news

04/25/2026

>StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition
https://kwanyun.github.io/StyleID_page

04/24/2026

>MAI-Image-2
https://playground.microsoft.ai/chat

>ComfyUI-NAG-Extended: NAG support for Flux 2 Klein and Anima
https://github.com/BigStationW/ComfyUI-NAG-Extended

>UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection
https://github.com/Zhangyr2022/UniGenDet

>VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
https://github.com/EternalEvan/VARestorer

>Sapiens2
https://github.com/facebookresearch/sapiens2

>Vista4D: Video Reshooting with 4D Point Clouds
https://eyeline-labs.github.io/Vista4D

>Pre-process for segmentation task with nonlinear diffusion filters
https://github.com/cplatero/NonlinearDiffusion

04/23/2026

>ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
https://shelley-golan.github.io/ParetoSlider-webpage

>DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion
https://github.com/Adamlong3/DynamicRad

>Normalizing Flows with Iterative Denoising
https://github.com/apple/ml-itarflow

>LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
https://github.com/inclusionAI/LLaDA2.0-Uni

>Illustrious XL & NoobAI-XL Style Explorer
https://github.com/ThetaCursed/Illustrious-NoobAI-Style-Explorer

>AI Model & ‘MAGA’ Influencer Emily Hart Unmasked as Indian Man
https://www.yahoo.com/news/articles/ai-model-maga-influencer-emily-091027504.html

04/22/2026

>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
https://github.com/cvims/EMBEDDING-ARITHMETIC

>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
https://github.com/CompVis/patch-forcing
>>
didn't read
>>
>mfw Research news

04/25/2026

>UniMesh: Unifying 3D Mesh Understanding and Generation
https://aigeeksgroup.github.io/UniMesh

>Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts
https://arxiv.org/abs/2604.18091

>DCMorph: Face Morphing via Dual-Stream Cross-Attention Diffusion
https://arxiv.org/abs/2604.21627

>Geometry-Guided 3D Visual Token Pruning for Video-Language Models
https://arxiv.org/abs/2604.18260

>TeMuDance: Contrastive Alignment-Based Textual Control for Music-Driven Dance Generation
https://arxiv.org/abs/2604.17005

>RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models
https://arxiv.org/abs/2604.19321

>TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models
https://arxiv.org/abs/2604.15756

>Context Unrolling in Omni Models
https://arxiv.org/abs/2604.21921

>GS-STVSR: Ultra-Efficient Continuous Spatio-Temporal Video Super-Resolution via 2D Gaussian Splatting
https://arxiv.org/abs/2604.18047

>S2H-DPO: Hardness-Aware Preference Optimization for Vision-Language Models
https://arxiv.org/abs/2604.18512

>Concept-wise Attention for Fine-grained Concept Bottleneck Models
https://arxiv.org/abs/2604.15748

>Dispatch-Aware Ragged Attention for Pruned Vision Transformers
https://arxiv.org/abs/2604.15408

>More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage
https://arxiv.org/abs/2604.17354

>Towards Joint Quantization and Token Pruning of Vision-Language Models
https://arxiv.org/abs/2604.17320

>Inductive Convolution Nuclear Norm Minimization for Tensor Completion with Arbitrary Sampling
https://arxiv.org/abs/2604.17001

>MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment
https://arxiv.org/abs/2604.21326

>VCE: A zero-cost hallucination mitigation method of LVLMs via visual contrastive editing
https://arxiv.org/abs/2604.19412
>>
Why is /ldg/ the most raided general on this site?
>>
>>108684432
Using Matchering 2 now (for the ref I used a real 320 KBPS symphonic metal song), proven SOTA solution by a real experiment. Note I also had changed ACEStep cpp's output to 320K as opposed to default 128K, maybe that's why it seemed to have a bit more noise on the voice before, but after mastering that is gone.

The result I got is so good that I will be exclusively using Matchering from now on.

youtube.com/watch?t=1061&v=wZRV2H4PK0Q&feature=youtu.be

https://github.com/sergree/matchering

https://vocaroo.com/1b0F41rAgXqR

This result is much more superior to what I got from Web Audio Mastering on Github. Vocals could use a bit of more enhancement, but overall quality is very good.

In terms of overall sound quality, I'd say this takes it from 40-50% as good as Udio to 90-95% if not already there.

Here's another example, plastic love used to autotune the city pop song.

https://vocaroo.com/1eFY2mmb1LJ1

>Udio quality is now possible locally with a few clicks and good prompts

Local is so back.
>>
>>108687947
Because /aicg/ is already raided to death.
>>
>>108687982
that pic is bad
>>
Hello,
Has anything come out that's objectively better than 2024 december Noob in every aspect or are we still in the partial and opinionated improvements phase?
>>
>>108687997
all sidegrades/experiments
>>
>>108687997
>objectively
How do you measure "better"?
>>
>>108687997
anima is heading in that direction. we're only on preview 3. its prompt adherence is already better than noob
>>
>>108688013
>How do you measure "better"?
Subjectively
>>
anons still talking about noob didnt even use it in dec of 2024 LOL everyone who was using it then is on Anima now
>>
Is there a way to natively search the gens in a directory by the metadata/tags within the file or do I need a program for that?
>>
>>108688034
this unironically, anima wipes noob/illust like how noob/illust wiped pony
>>
>>108687982
https://vocaroo.com/1eFY2mmb1LJ1
>Udio quality
>>
it's going to be an exciting and sad day when anima final/1.0 comes out.
>>
>>108688060
grep
>>
>>108688069
Maybe listen to 1980s Jap city pop recordings. They're not the best in quality, if Udio is significantly better than that it's not as accurate.
>>
>>108688084
disk usage goes brr
>>
>>108688095
>disk
nigga what year is it did I time travel
>>
>>108688081
just pretend like v3 is the final version and youll either be content if it really is or surprised if its not
>>
This shit's pretty useful for coming up with character designs desu
>>
>>108688081
>it's going to be an exciting and sad day when anima final/1.0 comes out.
why sad? I can't wait to see the final version
>>
>>108688105
it's okay, the sadness will quickly fade away
>>108688113
every lora trained right now will likely be useless and unusable. hopefully civit fixes this with a separate category or something
>>
>>108688113
one person will be sad and will make it the whole general's problem
>>
> >108687935
> >108687946
fuck off
>>
>>108688131
Pussy.
>>108687935
>>108687946
>>
>>108688092
Kek, found an Udio city pop example.
>Better than its worse gens

Sounds like it played from a cheap radio. Local can't stop winning
>>
christ, do filters on civit even fucking work? ive added every variation of shitted and ntr to the list and they still show up.
>>
>>108688154
https://www.udio.com/songs/matHShm1PPhpnhoaaoMMSL
>>
>>108688062
its a bit more nuanced than that you have to understand how models progress i.e. noob is a finetune of illust which is a finetune of kohaku which is a finetune of SDXL
one of the only things oldfags want in anima is e621 but even without that its pros are strong enough that theres little desire to go back to older gen models
>>
>>
absolute meltdown
>>
File: 31405676.png (186 KB)
186 KB
186 KB PNG
I see
>>
>>108688164
>but even without that its pros are strong enough that theres little desire to go back to older gen models
im not trying to troll here, what does anima do that i cant do with noob/illu outside of text? i haven't bothered because every anima gen i see is some basic 1girl standing. i dont need to increase gen times for no reason.

if you have an anima gen that can enlighten me, post it.
>>
>>108688164
useless cringe "uhm ackshually" post
>>
>>108688172
>trolling outside of /b/
>>
>>108688164
>one of the only things oldfags want in anima is e621
erm, nope, not at all
>>
>>108688084
>grep 1girl, standing
>computer freezes
>>
>>108688194
>outside of text?
spacial prompting and a higher channel VAE
if none of those things are exciting to you then i dont know what to say
>if you have an anima gen that can enlighten me
you have to lurk
>>108688214
notice how i left room for a few other things
>>
>>108688164
>noob is a finetune of illust which is a finetune of kohaku which is a finetune of SDXL
Since Anima is a finetune of Cosmos we're only at the Kohaku stage right now. And who the fuck actually used Kohaku when it was new? kek Anima already blows it out of the water.
>>
>>108688194
>>
>>108688218
took 12 seconds
>>
>>108688060
ask an llm to give you a one liner for it its that easy
>>108688242
nb4 muh regional promptan can do that
>>
>>108688227
>spacial prompting and a higher channel VAE
so its just the gens i keep seeing that are too basic to showcase it
>you have to lurk
like i said, i wasnt trolling so no need to tell me to lurk. i was just curious since you seem to really like it. ill just stick to noob for now.
>>
>>108688227
>notice how i left room for a few other things
sneaky
>>
>>108688172
And schizos called us shills for saying api nodes are local. This is a simple way to turn any model into local through comfyui
>>
>>108688242
so anima is good if you want to see one character humiliate other characters like dogs? what about male pov, pov hands enjoyers?
>>
>>108688264
just look at the toss comic in OP
no need for controlnets, inpainting, or attention masking. and its not even a super polished gen
try to do that with sdxl
>>
>>108688280
https://files.catbox.moe/u7mduo.png
>>
>>108688242
honestly teto deserves this
she is shit as a waifu
>>
>>108687614

It was all one prompt. Tbh I don't think how I formatted it really matters that much it was all over the place, speech actions and details were all going in the wrong panels, also aspect ratio was very important.

@stonetoss, A three panel comic in a horizontal layout featuring Jill Valentine and a man with purple hair reading from left to right.

In the first panel Jill Valentine is standing on a sidewalk wearing her iconic blue tube top and black knee length pencil skirt with some other people. Several zombies are attacking people near her and she's saying "Holy shit!".

In the second panel Jill Valentine is shooting her gun at a zombie and blood is splattering all over the zombie. With no speech. THe word "BLAM!" is visible in the panel.

In the third panel Jill Valentine is holding her gun and looking in confusion at a man with purple hair. The main with purple hair is yelling at Jill Valentine with his arms raised saying "Wait! He might be a doctor or an engineer!"
>>
>>108688060
find Documents -type f -name '*.png' -exec identify -verbose {} \; | grep -i anime
>>
>>108688321
>no tuft of pubes peeking
Meh.
>>
>>108688297
I see, pretty good for comics and shitposting then. My gens aren't really complicated or multi-faceted so ill just stick to my lane then.
>>
>>108688338
one thing ive learned is that for longer text phrases it works better if you put them in all caps
>>
>>108688349
Why would you pretend to be a different anon?
>>
>>108688321
this kind of looks like sh-
>@fiz-rot, @wamudraws, @krekkov
i see, i see. that explains it.

why is the red haired girl always getting bullied in these images? what did she do?
>>
>>108688359
Which anon am I pretending to be and how many voices are currently fighting inside of your head?
>>
File: exbff6.png (1.3 MB)
1.3 MB
1.3 MB PNG
>>
>>108688385
No need to get upset, anon.
>>
>>108688339
that looks really inefficient
>>
>>108688280
Your unwillingness or inability to abstract from what is directly presented to general statements about a model's capabilities is very interesting to me. I wish I could see what your internal monologue is like
>>
>>
>>108688339
>identify -verbose
That decodes the image before outputting anything. May as well just grep.
>>
>>
>>
>>108688194
>what does anima do that i cant do with noob/illu outside of text?
If you apply some reasoning, you can deduce that a higher level of intelligence is required for not only generating text but applying it (layout) in such a way as to be convincingly part and parcel of the output.

Your question is essentially "Why would I need to use a smarter model?"
>>
>>108688407
>I wish I could see what your internal monologue is like
i'll tell you, it goes
>1girl, standing
that's all i need inside this dome of mine, i wonder what yours looks like...
>>
>>108688172
top kek
>>
>>108688448
99% of people dont need a smarter model. look at ltx or anima gens, most of the prompted dialogue is the most uninspired shit ever or just regurgitated memes.
>>
>>108688506
You fail to understand that intelligence helps with both "difficult" and "easy" tasks.
>>
>>108687935
>https://github.com/EternalEvan/VARestorer
This would be far more interesting if 1. it was better at restoration, doesn't have to be NBP good, but still, better 2. it could handle video.
>>
File: kek.png (751.4 KB)
751.4 KB
751.4 KB PNG
>>108688506
>99% of people dont need a smarter model.
let me guess, you consider yourself to be part of the elite 1% is that right?
>>
>>108688516
im not failing to understand your point in the slightest, im just saying it is pointless since every gen here and civit does not require it outside of professional work or if you inpaint a lot.
>>
>>108688506
you realize the model's lack of capabilities is a huge part of this right? No one can prompt for the color of every nail in the fingers individually so no one will prompt for that, the prompts are dumb because yes people are dumb but also the model is dumb so its not like people could've done otherwise
>>
>>108688534
Thinking you don't need to be smarter because you only do dumb things is something an idiot would think
>>
>>108688528
>let me guess, you consider yourself to be part of the elite 1% is that right?
no, im a "male pov, pov hands, 1girl, hands on another's neck, strangling, shaded face, foaming at the mouth" enjoyer. i am an animal.
>>
Don't think. Click.
>>
>>108688547
can anima do that prompt?
>>
>>108688559
most likely since illustrious could do it. guess ill give it a shot in a minute or two
>>
>>108688566
just be sure to not sperg out when it doesnt work the very first time and you have to learn how to prompt an entirely new model arc
>>
>>108688546
post the smartest gen you have.
>>
>>
>>108688580
you wouldnt understand it
>>
>>108688615
is it the fabled...2girls, sitting?
>>
can anima do 3girls? 4girls? how many girls is too many?
>>
>>108688668
illu can do +6girls.

how many girls is too many?
retarded question, let the whole screen be full of girls.
>>
File: doe9o6.png (1.3 MB)
1.3 MB
1.3 MB PNG
>>
>>108688681
woah
>>
>>108688559
https://files.catbox.moe/ec7pic.png
>>
>>108688696
messed up the legs. should have gimped and did a low denoise img2img.
>>
>>108688158
>christ, do filters on civit even fucking work? ive added every variation of shitted and ntr to the list and they still show up.
Nothing works or barely works on the site, just block people if their gens bother you that much.
>>
>>108688828
>just block people if their gens bother you that much.
nta but this is the real solution. if I see a shitter I block them
>>
>>108687851
catbox? very cute
>>
>>108687845
thats mulder you retard
>>
>>108688867
wrong, it's nobody. not even real.
>>
>>108688828
>>108688852
fair enough, ill just block them then. i hope civit doesnt have a blocklist limit
>>
>>108688896
dont block sarah peterson
>>
>>108688904
>it's all blacked porn from a tranny
instantly blocking that retard
>>
Has furk the turk gotten off his ass and started grifting with modern models or is he still rambling on about da jooz
>>
>>108688904
that was the first motherfucker i blocked. shit kept appearing on the newest section and i had enough, fuck that guy.
>>
>>108688506
smarter models make more interesting images based on those uninspired prompts tho
>>
>>108688904
>>108688913
>>108688920
What makes me mad is that weight slider loras are banned on sight if they imply anorexia on civit but these shittedfaggots get to post/spam their dogshit with no issues.
>>
I did the pip thing and restarted but it still didn't work, what should I do to install sage for comfy?
>>
>windows
>>
>>108689047
you need to go into the python embeded folder and open the cmd there. Then python.exe -m pip install sageattention. Sounds like you accidentally dumped it into system python
>>
>>108689047
>doing computer stuff manually without an llm
ngmi
>>
>>108688015
Shame that Anoma keeps forgetting but It will be a very good and stable model, but with half of the concepts forgotten.
Partial improvement leads to partial forgetting (as has been happening until now).
True improvement or grand improvement leads to true forgetting or grand forgetting.
>>
baker should replace Z down to Illustrious with links to Pony V7 and Ernie and nothing else and refuse to elaborate further
the mald would be topkek
>>
>>108688194
put shit in like, specific places in the image, for one. or build vaguely complex scenes with multiple characters. and so on
>>
>>108689100
>forgetting
LR status?
>>
can someone explain this forgetting thing and give a example? I want to test with my loras
>>
>>108689160
It's something that only happens if you fry the model. Training with normal params is fine.
>>
>>108689071
thanks I'll try that, should I figure out how to uninstall the system python one?
>>
>>108689174
ah makes sense, I suspected it was the case, but wanted to make sure, ig if it really was that big a problem more people would talk about it
>>
you guys actually moved on from the sdxl architecture and use anima for gens now or is it just shilling?
>>
>>108687947
Because this general was made as a joke and it was always a troll's playground.
>>
>>108689047
sageattn_2 (and 3) requires it to be compiled in order to work.
Do you have a 4000 or 5000 series gpu?
If not, you can't run it eitherway. Try sageattention 1 (16bit) for 2000, 3000 series.
>>
>>108689226
anima still misses on some points, its not fully trained, but it has its uses, cant really replace sdxl with it but I can make stuff in it that I can't in sdxl (and vice-versa desu)
>>
>>108689186
Just don't train the llm_adapter (You shouldn't need to anyway for overwhelming majority of loras) and it won't forget shit.
>>
>>108689226
I legit haven't touched SDXL since anima released.
Sure it's not perfect but it mogs anything sdxl brutally.
>>
>>108689261
i dont think sdxl will ever be replaced at this point but yeah, having the option to switch between models and architectures as needed is always nice.
>>
>>108689226
i stopped using XL since neta desu unironically
>>
>i dont think sd1.5 will ever be replaced at this point
kek
>>
>>108689280
>I legit haven't touched SDXL since anima released.
what do you gen that made you drop sdxl like a sack of bricks?
>>
>>108689282
kek, hope it's a bait
>>
The VAE on it's own is a huge reason to switch.
>but I can just upscale illust
You are not upscaling it to the point where that'd actually be the case.
>>
>>108689303
i havent relied on shitmixes and jeetmerges since 1.5 so no it was not b8
>>
>>108689295
baited exactly for this response and hoped someone would see the simile. thanks.
>>
>>108689303
>hope it's a bait
It's a master bait.
>>
>>108689342
non-sequitur
>>
File: rr2rul.png (1.5 MB)
1.5 MB
1.5 MB PNG
>>
I haven't tried anima because some stupid idiot here made me try neta lumina after non-stop gargling its balls and it fucking SUCKED so i assume its the same with anima.
>>
I am a giant faggot please reply to this post.
>>
>>108689376
Anima is much better than those other stuff, I've also tried all of these alt models and they all sucked, anima is actually usable
>>
>>108689260
I have a 4000 card

>>108689376
Anima is good.
>>
>>108689376
neta was shilled by sub-90 iq retards who think that newer automatically means good
>>
>>108689297
I like genning with different styles and anima's amazing out of the box style accuracy without needing to sift through garbage in civit to find a decent lora or train my own.
I can make memes and shitposts with text
Thanks to qwen vae I can train loras for detailed styles SDXL can never learn properly
I am also fucking fed with SDXL at this point, I know all quirks of pony, illustrious and noob. I needed something new with decent enough quality and anima was that.
>>
Trying to recreate Lilisen in anima.
>>
>>108689399
Activate your venv. (Probably source venv/bin/activate)
pip uninstall sageattn_2
git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
pip install --no-cache-dir -e . (Maybe it needs --no-build-isolation I don't remember try that if it fails)
Proceed to launch you comfy as normal
>>
>>108689482
It should be pip uninstall sageattention I think my bad.
>>
The only way local will improve is if labs stop catering to VRAMlets. Anima should be double its current size.
>>
>>108689508
vramlets are the vast majority of aislop consumers. 99% of people are not interested in spending 5k dollars on computer-hallucinated pornography
>>
>>108689508
>>
>>108689508
localkeks are poorfags. they larp as if they own a fleet of h100s but models like hunyuan 3 prove they're all just 3060 vramlet turdworlders. they can't even run flux 2
>>
>>108689457
Could you catbox an anima gen for me to copy your homework as a baseline? I'd like to try it but not sure if I should go with the waianima or main one?
>>
>>108689071
tried this and it also didn't work, hmm.

>>108689482
>>108689499
I'll try this next, thanks anon
>>
>falling for the scale at all costs b8
>again
>>
>>108689529
didn't know comfy was this based, I kneel
>>
based comfy for representing all of india. thank you china for selling out to comfy api
>>
So as soon as Cumfar got sold to investors, the API spam started here?
How predictable.
>>
here we go again
>>
>>108689462
Just train a nihei lora
>>
>>108689537
https://github.com/woct0rdho/SageAttention/releases/tag/v2.2.0-windows.post4
>>
>>108689617
Nihei's in the dataset. Lilisen's not (and I'm lazy)
>>
>>108689628
Nihei being there doesn't mean Tower Dungeon is there as well.
>>
>>108689534
Sure. Make sure to read the model card on huggingface also.
https://litter.catbox.moe/u1tj8s69genll5q1.png
>>
>>108689639
thanks mate
>>
>>108687982
>https://github.com/sergree/matchering
Looks interesting, thanks for this.
>>
>>108689635
Could you even make a lora from the manga pages? There's not a lot of colored Lilisen art
>>
>>108689628
>and I'm lazy
it cant be that hard to collect 20 images of your waifu and go for a jog while it trains, bitch.

makes me want to train one and post the gens here and not share the lora with you.
>>
>>108689659
Never said she was my waifu. Also I was under the impression a lora required more images than that. Never made one before though.
>>
>>108689462
prompt?
>>
File: 70.png (1.4 MB)
1.4 MB
1.4 MB PNG
>>108689508
true, this is why HunyuanImage-3.0 took off locally, they just cared about prompt understanding and image quality
>>
>>108689697
best quality, score_8, score_9, highres, safe, @nihei tsutomu

1girl, solo, blonde hair, medium hair, red eyes, black dress, juliet sleeves, puffy sleves, staff, holding staff, earrings, pendant, jewelry, boots, lace-up boots, jewel earring, jewel pendant, necklace, turtleneck dress, ruins, serious, bangs, pov

A high quality fantasy illustration of a girl standing in the ruins of an old castle. She is holding a staff in one hand and her other hand is on her hip. She is looking at the viewer impatiently. The scene is cozy and whimsical and invokes a sense of nostalgia. The sky is blue and has fluffy clouds. A massive tower is visible in the distance.


Might just be placebo but I've found describing the feel of the scene improves the quality.
>>
>>108689733
sovl
>>
>>108689733
I wouldn't use arbitrary line breaks in the tag part of the prompt.
Not a fan of the pony scores neither.
But hey if it works out for you.
>>
fun fact: Ernie Turbo is so heavily trained on NB Pro that Hive will return 100% Gemini 3 for images generated with it, and the actual SynthID checker will also come back true for them
>>
>>108689861
does any trainer even support newline characters? I know for a fact that at least last time I checked, Kohya outright drops everything after ANY newline character it hits, like that part of caption won't be trained at all
>>
>>108689861
The line breaks are more for organization purposes. Do they affect the quality?
>scores
These are left over from someone else's prompt. No idea if they actually do anything.
>>
>>108689726
What's the minimum specs for such a big model?
>>
>>108689917
at least 170 gb of vram if you go for bf16, and something like 90 gb of you go for Q8
>>
>>108689891
Tried without the scores
>>
>>108689947
looks better, huh
>>
diffusing WHAT I CALL MUSIC
>>
>>108689889
Damn I didn't know that. I should edit my captioning prompt and explicitly add a single paragraph without any newlines there. (I don't think I ever got a multi-paragraph cation, but still)
>>108689891
>The line breaks are more for organization purposes
Well that's rather self-evident, I used to do that too.
>Do they affect the quality?
Probably not by a huge margin, but it's not the way the model was trained. It should be sub-optimal even if it doesn't matter that much overall.
>No idea if they actually do anything.
It tells the model to draw images according to nonsensical "quality" biases taught to it by some brony weirdo.
https://desuarchive.org/g/thread/108404735/#108406271
>>
>>108689941
It's crazy, what can run that? rtx 6000 is "just" 48gb...
>>
>>108689961
wow this is going to destroy anima
>>
>>108689974
>It's crazy, what can run that?
I have no idea dude, and the worst part is that this model doesn't produce good images, Tencent is legit a retarded company
>>
There's a really simple guy here who doesn't know what Ace Step is. Well, that's ok, they let him sleep on the streets.
>>
why on earth would you use windows for any of this
>>
klein makes my black women white. sad
>>
>>108689860
>>108689947
You need to look through many prompts at many seeds to see the actual effect. You are just looking at the luck of the draw noise here.
Anyway use masterpiece, best quality, newest, recent, highres, absurdres, etc. when not using scoreslop. These at least correspond to tangible real world data.
>>
>>108690034
for edit? how are you prompting
>>
>>108689974
>>108689987
even if no one could actually run the model, comfy still should have implemented it out of pure respect for tencent, who took the time to contribute something to the local ecosystem
this marked the beginning of the end for comfy
>>
>>108690030
kys freetard
>>
>>108690034
What sucks is no model, including commercial ones, lets you describe a face and successfully get a full gamut of faces by description.

I hate it, it's so painful.

This is a face that can't come out of any current model:
>>108689961
I mean, yeah, it got the eyes wrong, but I am referring to the appearance. head shape, the fact the mouth and nose aren't really "moden" versions. She's probably the prettiest girl in her apartment building kind of thing.
>>
>>108690048
who cares about tencent? we respect bytedance and alibaba, companies who actually push this tech forward even if it means going API
>>
>>108690048
Why doesn't comfy have dcw (split wavelet thing)? Apparently it's a huge innovation.
>>
>>108690043
ye. i just prompt "realistic" on my sdxl women and they become white quite often
>>
>>108690064
In most models if you say someone has a round head it thinks that the person must be asian. It's very annoying.

round heads make up something like 2% of heads maybe in the white population, but they're not THAT rare.
>>
>>108690048
>even if no one could actually run the model, comfy still should have implemented it out of pure respect for tencent
why didn't you implement it on your wrapper anifart? do you not respect tencent?
>>
>>108690048
Just do the work yourself and open a PR if you care so much. He would merge it.
The facts that no one did that, and there isn't any evidence whatsoever anyone anywhere is using this model, proves that Comfy was right about not implementing it.
>>
Are "hunched over or bad posture" the right tags for getting someone to slightly hunch forward in SDXL? They keep arching backwards or bending over fully.

I tried adding arched back and bent over to the neg but it's like the act of slightly being hunched over or having bad posture is impossible.
>>
so like I do a thing now where I gen based on description. Like I try to memorize facts about how a person looks.

black hair, then I note the shirt type for example and find out the name etc.

anyway, I get it all together and gen it. It's kind of funny to me. It's like photography by description.

This is more useful than I thought.

>oversize crop t-shirt

hey, that's what it was.

Kind of fun, useful for regular gens after that.
>>
File: cloudai.png (3.2 MB)
3.2 MB
3.2 MB PNG
>>
>>108690064
so you're not doing it right at all, then
>>
>>108690084
what in the fuck is that KEK
>>
>>108690080
One issue you will have pretty quickly is that eg you prompt a character. But um. like there may be nothing in the dataset showing the character from the top of the head.

This is a big problem, in fact.
>>
File: 16982.jpg (1005.7 KB)
1005.7 KB
1005.7 KB JPG
>>
>>108690048
>out of pure respect for tencent
how far down your throat does the boot go? do you even have a gag reflex anymore?
>>
>>108690084
2-channel vae ahhhh image
>>
>>108690088
i have no idea what i'm doing but it often looks good. if there are secret tips i'm all ears
>>
the localkek bootlickers are crazy, worshipping chinese scraps. just use api and stop worshipping corporations
>>
>>108690048
bro your shitty ass wrapper doesnt support it either
>>
>>108687829

which model can i use to make porn?
>>
>>108690105
what makes you think anybody here exclusively uses local models?
>>
>>108690097
what do you mean 2 channel vae
>>
>>108690117
sdxl finetunes (illustrious/noob derivatives), anima, chroma

softporn and SOME porn with a lora - pretty much all the other models, too
>>
>>108690117
>>108690133
You should be prepared to see some body horror if you are going with chroma or SFW model with loras route though.
Anima and noob are stable for drawn porn, but there is no good way to gen realistic porn reliably.
>>
>>108690103
well for Klein in general, for example, I wanted to recreate an image made by GPT Clouposter here yesterday. However, Klein does't know Basedjak out of the box, so what I did was pass the image on the left here in as an edit reference with a prompt of:

A retro video game scene is captured as it appears on a vintage Sony Trinitron CRT television screen. In the center of a pastel-colored digital nursery, a large, wailing character modeled after the iconic white colored red eyes crying Basedjak meme character who is literally the exact same character from cartoon image 1, wearing a light blue baby onesie and a bib labeled "LOCALKEK" sits on a pink and white checkered floor. Three small, chibi-style anime girls with large fox ears and tails are running toward him, each struggling to carry a large, colorful toy block. The pink block is inscribed with the text "1GIRL" the green block reads "BLURRY ANALOG REALISM LORA" and the blue block displays "CHROMA EPOCH 74". The game’s interface features a bold "OBJECTIVE: STOP LOCALKEK FROM CRYING!" message at the top, while the bottom HUD shows a "TIME LEFT" of 01:28, a "SCORE" of 000420, and a life counter represented by a heart icon. The scene is rendered in a clean, low-poly 3D aesthetic typical of the early 2000s console era. The CRT monitor itself is highly detailed, featuring visible horizontal scanlines, a distinct curvature to the thick glass, and subtle color fringing around the high-contrast text. The lighting within the game is flat and soft, characteristic of artificial indoor environments, creating gentle drop shadows beneath the characters. The monitor's screen emits a soft electronic bloom that reflects off the dark plastic bezel of the television set, giving the entire image a nostalgic, physical presence. Style: Retro 3D video game photography on a CRT display. Mood: Surreal, ironic, and nostalgic.

which gave the image on the right.
>>
>>108690121
>exclusively uses local models
we used to, but then we found an epic hack that lets us diffuse api locally >>108688172
>>
>>108690150
(samefag, other comment exceeded limit lol)
So TLDR it helps to refer to any edit inputs like "image 1", "image 2", directly prefixed by sort of a description of their style. And keep in mind that you can sort of combine "editing" with "text-to-image" as I did here any way you want, like the output resolution doesn't have to be in any way similar to the edit input resolutions. And also keep in mind telling it to specifically preserve specific things does work and will sometimes be the only way to get exactly what you want (this even works for penises and breasts and nipples, like "while keeping the blah blah blah exactly the same as it is")
>>
File: 00023071.jpg (71 KB)
71 KB
71 KB JPG
>>108690080
Dunno, although this posture is quite popular I wouldn't be surprised if there was a tag for it.
>>
>>108690150
oh also this said uh, the S word originally before the jak lol, I forgot 4chin changes that automatically to Based for whatever reason
>>
Need some advice bros. I got sick of using online tools and I have a decent GPU so I downloaded ComfyUI and the files required for Z-Image Turbo.

It's working well but is there any additional "datasets" (I don't know the correct term) I can download and add to the workflow specifically for generating smut? I'm very new to local generation
>>
>>108690080
>>108690165
https://danbooru.donmai.us/wiki_pages/slouching?z=2
Also, use anima you bakas.
>>
QWEN IMAGE 2 LAUNCHING!!!
I made sure to load up on ComfyCredits, this will be a big one
>>
>>108690130
i just meant the image looks like patchy dithery dogshit
>>
>>
>>108690160
I'll repeat the question in case it was unclear; what makes you think anybody here exclusively uses local models?
>>
>>108690176
this ranking is increasingly less believable. Like who the fuck has EVER even knowingly seen an image generated by Mai Image lol.
>>
>>108690176
fuck you alibaba I hate you so much, where's Z-image edit??
>>
>>108690209
why do you even want it? it's guaranteed to have Z Base or worse aesthetics, it's not gonna like like Z Turbo. And it'll be as slow as Base.
>>
do you guys really write walls of text to prompt?
>>
>>108690176
>>
>w-we don't want the models anyway
true, just use api nodes
>>
>>108690218
>why do you want an edit model that's likelyu better and less slopped than klein
jeez anon, that's a good question
>>
>>108690220
I type
>>
>>108690220
I don't write them by hand really but I sometimes prompt with them if I'm like, recreating something somebody else made, based on a longboi caption of the original image grabbed from Gemini 3 or whatever
>>
>>108690188
does anima know stonetoss straight-up? lmao
>>
>>108690174
>try slouching
>nothing
>try (((slouching:1.4)))
>get this
my waiillustriousv14 must be faulty or its not a real tag
>>
>>108690145
sure. various concepts result in body horrors, only some more or less just work.

depending on what you wanted, you can't get it at all without lora even on the NSFW trained models, we have nothing that is anywhere close to being "all" NSFW.
>>
>>108690251
>waiillustriousv14
You're using a shitmix made by retards for retards. Use anima. https://civitai.com/models/2458426?modelVersionId=2836417
>>
File: qI4JSPC.jpg (189.5 KB)
189.5 KB
189.5 KB JPG
>>108690227
you still did not explain how something that is as slow as Z Base and definitely will look the same or worse than Z Base (not Z Turbo) is desirable.

Reminder that IRL, actual fully stock Z Image Turbo looks like picrel, also, which is to say it's really not actually that detailed or realistic and has ridiculously strong sameface no matter what ethnicity you prompt for.
>>
>>108688095
Why is this a problem? You need to load the files in order to search its contents. There's no way around that.
>>
>>108690265
>wishful thinking
we'll see about its quality if it gets released, hard to speculate over something that doesn't even exist yet
>>
>>108690277
??? they've been clear about what it is in their chart forever, it won't have the RL training / step distillation of Turbo
>>
>>108690277
i'd much rather they just do a Z 2.0 that uses F2 vae and something that isn't the slow retarded Lumina 2 arch, at this pointa
>>
File: 1873.jpg (791.9 KB)
791.9 KB
791.9 KB JPG
>>108690176
qwen image 2 showcase image btw
>>
>>108690265
skill issue desu
>>
>>108690264
post an anima gen with "1girl, kuroki tomoko, slouching" and i'll download it right now.
>>
>>108690264
anima doesnt replace SDXL, its good to have but not a replacement, very far from it
>>
>>108690288
>??? they've been clear about
they've also been clear that they were gonna release the model, did it happen? why do you take their words like gospel anon?
>>
>>108690220
LLMs are good when you want verbal diarrhea.
I don't mind typing a small paragraph myself though, usually.
>>
File: AWOOO.png (222.8 KB)
222.8 KB
222.8 KB PNG
>>108690303
>gay bestiality
I approve
>>
ace step XL improved drops.
>>
>>108690326
>>
>>108690340
China is still in the "gay is just a joke" phase of degeneracy cope.
>>
>>108690188
I just use Anima and put the lora loader in front of the sampler?
>>
File: kys.png (413.6 KB)
413.6 KB
413.6 KB PNG
>>108690326
>>
ltx2.3 chads where we at?
>>
>>108687935
>>108687946
thanks!
>>
>>108690381
rn I'm butthurt that I can't figure out how to vibe-workflow dcw in Comfyui.

nobody on reddit cares about it lol.

Like nobody is doing dcw at all, except the ace step c thing, but idk if I can figure out how to this.
>>
>>108690299
>i'd much rather they just do a Z 2.0 that uses F2 vae
maybe that's the reason of the delay, maybe they switched to F2's vae and a better text encoder (like qwen 3.5)
>>
>>108690355
>>108690377
fine, ill download it. those gens look really fucking simplistic though, illu usually provides a background even when unprompted.
>>
>>108690220
For most prompts, no.
>>
which one of you posted this?
https://www.reddit.com/r/StableDiffusion/comments/1svnmvz/i_think_its_fair_to_say_weve_passed_the_threshold/
>>
>>
>>108690303
>happyhorse buttfucking western localkeks
they knew
>>
>>108690404
my prompting imagination has been shot for years but anima just feels like a different illu, it's not the same but similar overall, like illu vs pony but a bit more different
>>
>>
>>108690417
kekk you're onto something anon
>>
File: 77830746.jpg (147.6 KB)
147.6 KB
147.6 KB JPG
>>
>>108690251
Seems to be working somewhat
>>
Brazilian Migu molesting Teto
>>
>>108690145

What model like use, for example, X to create images? I remember grok doing pretty fine with a picture and tell him what poses and nsfw words in the prompt
>>
>>108690392
>dcw
whats that?
>>
>>108690251
the issue could be an overbaked lora if you are using one.
>>
>>108690439
>She doesn't exist and you will never plap Brazillian Migu irl
Feels bad man
>>
>>108690446
If you can formulate this question in some other way besides incomprehensible ESL gibberish, I might be able to help.
>>
>>108690404
>I want the model to read my mind
Nevermind, stick to your slopmix, you are the exact kind of person it was created for. Also, kys.
>>
>>108690220
if you ever catch yourself typing gibberish and bullshit tags that are mostly snake oil, youre just performing a humiliation ritual
have some self respect and only write prompts that sound human and natural when spoken
>>
>>108690471
>have some self respect and only write prompts that sound human and natural when spoken
but those diffusion models have only been trained with AI text slop though
>>
>>108690457
this is without any loras.
>>
>>108690458
Life is unfair.
>>
>>108690457
noobai without loras. i guess i'll go with anima in a bit.

>>108690468
now make one with konata
>>
>>108690251
>>108690476
Wai itself is a shitmix of many loras, most of those loras are fried or poorly trained to some degree.
Pretty much every shitmix suffers from certain tags ceasing to work properly, to varying degrees.
I bet it works fine on base illustrious or noob but I won't bother testing.
>>
>>108690404
retard
>>
>>
>>108689508
> Anima should be double its current size
which means at least 4x cost to train, retard
>>
>>108690420
> but anima just feels like a different illu
lol lmao even
>>
>>108690519
>feeding the troll this hard
>>
yet to see a single anima gen that isnt ugly
>>
>>108690420
if you can't tell the aesthetic difference between a fucking e-pred sdxl model and a rectified flow dit its beyond over, at least you will be happier than others though
>>
>>108690532
well excuse me

>>108690528
that's part of the charm
>>
>>108690039
Why recent and newest?
>>
>>108690476
>>108690492
Dunno then, but wai14 is working just fine from what I can see.
>>
>>
>>108690492
>>
what does anima do better than illustrious?
>>
>>108690558
Models uploaded in recent years are more likely to be higher quality images than images from 2000s, where lower resolutions and using tightly compressed jpgs were more common.
This loosely translates to better quality and other desirable traits on average.
None of these tags are perfect, all of them still have their own biases that can cause issues in some cases, but they are a lot more stable than scoreslop.
>>
>>108690581
natural looking images, specially high frequency details like textures/rough sketches
>>
>>108690595
proof?
>>
>>108690581
prompt adherence and image quality
>>
>>108690595
sloppers don't want "natural looking images". they want exactly what illustrious slopmixes provide - shiny butiful style with detailed saargrounds
>>
>>
>>108690560
maybe im retarded, could you give me the prompt you used for that one and ill try it on wai14. i am using the slouching tag so maybe im missing something else.
>>
>>108690612
it's anime apparently. how can it look "natural"
>>
>>108690457
catbox?
>>
>>108690581
Much better text gens, natural language instruction following, out of the box style knowledge and accuracy
>>108690586
>Models
I meant images, oops.
>>
>>108690639
meaning it doesn't look like stereotypical civitai aislop and looks closer to human-made art
>>
>>108690639
scan, film grain, jpeg artifacts, cover, original, masterpiece, volumetric lighting, anime coloring, 
simple background, scenery, indoors, tavern,
1girl, solo, <your character tags>,
standing, full body, slouching, holding own arm, bags under eyes, depressed,

>>108690642
It's a messy workflow, I will clean it up and post it.
>>
>>
ok so which one of these models can generate nude images.
>>
>>108690672
Grok Imagine
NovelAI
Seedream 4
>>
Fresh

>>108690678
>>108690678
>>108690678
>>108690678
>>
>>108690659
What the fuck?
>>
>>108690449
>whats that?
I know the image is split using "wavelets". I know the issue is that parts of the gen get out of synch vs the model's training.

But I don't know the flow of the calculation, I didn't understand that part. Realize the paper applied to images but it's being applied to the c version of Ace Step:
https://github.com/ace-step/ACE-Step-1.5/issues/1119
>>
Regarding LoRA training in SDXL, what's the minimum total steps for characters or styles?
I c hecked various LoRA metadata ranging from 700 steps, but people say anime needs 2,500 to 3,500 steps for proper details.
Datasets vary from 20 to 30 images up to 150 to 250 images. I've got 39 images, all same style to guide the AI, trained for ~1,000 steps, but still not happy with results.
Won't touch Noob or Anima until I nail a decent LoRA with Illustrious. I must clear this hurdle first.

Reply to Thread #108687829


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)