Thread #108659074
File: highlights_g_108655751_1776841017_1.jpg (1.1 MB)
1.1 MB JPG
Discussion and Development of Local Image and Video Models
Previous: >>108655751
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>LTX-2
https://huggingface.co/Lightricks/LTX-2
>Wan
https://github.com/Wan-Video/Wan2.2
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Illustrious
https://rentry.org/comfyui_guide_1girl
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
312 RepliesView Thread
>>
>>
File: 1749795266694806.jpg (343.5 KB)
343.5 KB JPG
>>108659074
requesting this in any scenery and any effects/editing, please
>>
File: ANIMA_P___00011_.png (1.5 MB)
1.5 MB PNG
>>108659100
ok
>>
>>
>>
>>
File: HGckzKgXIAAq-uM.jpg (425.6 KB)
425.6 KB JPG
local having an absolute meltdown over saas superiority. gpt image 2 is insane
>>
>>
File: 1753214892060802.png (3.7 MB)
3.7 MB PNG
*yawn*
change bait please it's getting stale
>>
>>108659179
the nonstop posting doesn't come across as desperate at all.
btw it has been 7 and a half hours since the last image was posting in the api general, maybe you should post that there before the general falls off the board.
>>
>>
File: 130877049.jpg (494.7 KB)
494.7 KB JPG
>>108659074
>>
File: Screenshot_20260422_022745.png (218 KB)
218 KB PNG
There is "people" that actually waste compute on this btw
>>
>>
>>
File: ComfyUI_22295.png (2.5 MB)
2.5 MB PNG
>>108659100
Is this the fabled "girlfriend" that you wanted a LoRA of?
>>108659133
No. I don't want Jenny coming after me... she's very petty and never forgives!
>>108659179
Go back to bed, faggot.
>>
>>
>>108656011
>>108656084
Where the hell are you posting that an OF whore found you?
>>
>>
>>
>>
>>
File: 63465456456546.png (1.1 MB)
1.1 MB PNG
>>108659270
nta but anima does most of the heavy lifting out of the box. after that just pick your poison for upscaling/detailing.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108659696
i was/am surprised by it. i don't use anime models so maybe it's not a big deal, but the ability to prompt "cosplay photo of 1girl dressed as whoever from whatever" and have it replicate that in a realistic and somewhat accurate way is pretty neat. looking forward to the final release.
>>
>>
>>
File: 1755707775628992.jpg (2.7 MB)
2.7 MB JPG
>>
File: ComfyUI_22349.png (2.3 MB)
2.3 MB PNG
>>108659856
>clothed female
>Ferengi
>>108659910
Cute!
>>
File: 1763137162985373.png (2 MB)
2 MB PNG
https://github.com/AMAP-ML/DCW
Babe wake up, Alibaba improved the inference quality of images, when ComfyUi custom node?
>>
>>
>>
>>108659655
>>108659662
>It's that grease layer on the surface
you better get used to it
the chinese are already slurping up gpt image 2 outputs as we speak to train their upcoming model
>>
File: 1756593024652590.jpg (459.2 KB)
459.2 KB JPG
>>108660027
>the chinese are already slurping up gpt image 2 outputs as we speak to train their upcoming model
all I wanted was Z-image edit...
>>
>>
File: file.png (355.5 KB)
355.5 KB PNG
>>108659982
https://arxiv.org/pdf/2604.16044
I tend to believe this isn't a nothingburger when the improved numbers are miles ahead the original, and not just some statistical noise, and it seems to be the case here
>>
>>
>>
File: 1776834835251509.png (2.4 MB)
2.4 MB PNG
>>108659982
>>108660045
it looks more detailled indeed, if that removes a bit of the slop I'm all for it
>>
>>
>>
File: 1752486393062836.png (1.5 MB)
1.5 MB PNG
>>
File: ComfyUI_temp_ouvkt_00002_.png (3.3 MB)
3.3 MB PNG
>try the new diff-aid node that inpaints and masks(?) the area you prompted for
>suffers the same issue of all other masked image edit features
Localbros..
>>
>>108660093
get out, this is a local thread >>108653190
>>
>>
>>
>>108660117
well yeah faster convergence is always a plus, but publishing your benchs when comparing against a 20 step normal target is so fucking dishonest, and even then I still think that 50 step looks better, so I guess it's a middle of the road method they have.
>>
File: 1754413071436930.jpg (147.2 KB)
147.2 KB JPG
>>
File: I DONT GIVE A FUCK.png (1.4 MB)
1.4 MB PNG
>>108660131
>>
>>
File: HGgTbDDagAAIBZx.jfif.jpg (952.3 KB)
952.3 KB JPG
This. Is. Insane. GPT-Image 2 just changed the game for indie devs. High-quality assets delivered in seconds
>>
>>
>>108660295
>>108660322
thanks Julien, very cool
how'd the funding talks go btw?
>>
>>
>>
>>108660131
go home sd3 you're drunk
>>108660295
https://x.com/imgborba/status/2046696599389143169/photo/1
gpt-2 is tibia pilled
>>108660338
>and obviously ClosedAI doesn't care about that
neither does local
>>
File: HGd7dyEbwAA6ZNJ.jpg (273.3 KB)
273.3 KB JPG
Lots of seething in response to the latest state-of-the-art model. What's with the crying? We're here to have fun with AI
>>
>>
>>108660376
why do you want to be off topic so bad? this is a thread for local threads, if you want to post api shit go elsewhere
>>108653190
>>108653190
>>108653190
>>
>>108660370
>>108660376
>>108660381
Julienxisters how are we gonna become millionaires now that GPT 2 exists?????
We're CONDEMNED to be the miserable life of a bottom feeder
>>
>>
File: 1763005140699377.png (475.4 KB)
475.4 KB PNG
>>108660376
>>
>>
>>
File: 17612451.jpg (882.9 KB)
882.9 KB JPG
>>
File: file.png (3.5 MB)
3.5 MB PNG
>>108660415
lame gen
>>
>>108660376
>>108660460
>>108660470
>hates local
>but loves to lurk on a local thread anyway
what kind of mental illness is this?
>>
>>
File: 12231212211.jpg (3.1 MB)
3.1 MB JPG
>>108659074
>Local can't do text
>t. cloudcuks conveniently forgetting that Ernie image exists and already mostly caught up to NBP with text.
The gap really isn't as is big as it's made out to be by cloudcucks.
>>
>>
>>
>>108660595
it's not just text, it's also realism + being accurate to the original Ui (whether it's twitter, twitch, youtube...) Ernie can't do that, memes are fun when they are accurately transformative to the real world
>>108660496
>>108660555
>>108660477
>>108660103
>>
>>
>>
>>
>>108660657
what a fucking failure of a model. not even 24 hours and we've gone from "wow it can fucking think! and you can read text on a grain of rice" to "well it's pretty good at making fake twitch screenshots."
i never expect much from openai, but man they always find a way to underdeliver.
>>
>>
File: 1754409269665820.png (749.1 KB)
749.1 KB PNG
>>108660731
what do you mean? people absolutely love that model, it has the highest gap in the history of Arena AI
>>
>>
>>
>>108660792
it's already been created >>108653190
if he's still posting here it means that he's just a troll, and you know how to deal with trolls that do "off topic posts"
>>
File: fellforitagainaward.png (670 KB)
670 KB PNG
>>108660779
well what the heck happened overnight?
from being able to zoom in on a single grain of rice and read text to squiggle slop books and posters is concerning, to say the least.
>>
>>
>>
>>
File: ComfyUI_00044_.png (3.2 MB)
3.2 MB PNG
new 'toss
>>
>>108660437
you know its ai because her right breast is floating in mid air. at that angle, both breasts shouldnt be clearly visible like that
this happens because local models cannot think like api models can. they see "breasts" in the prompt and insist on showing both, even when it breaks anatomy
>>108660814
>>108660825
if only you had paid attention during the openai livestream, youd actually know how to use the advanced features
>>
>>
>>
>>108660841
Well, what kind of local genner do you want to be?
>The local purist
Sticks to 100% local models with permissive licenses like SDXL. The gens might look 3 years out of date, but its not about the outputs it’s about sending a message. Uses Forge because ComfyUI is API-focused. A gpu with less than 12gb vram is recommended for the true local experience.
>The hybrid enhancer
Takes advantage of power API Nodes through ComfyUI to level-up workflows. Weave in and out of the localspace, harnessing the power of both top API-models and local tooling
>The API Ascension
Fully decked out with a suite of top-class API models ready to roll. Maximum prompt enhancements accelerated by Gemini 3.1, fed into the latest GPT Image-2 and hand-animated by the diffusion deities at Seedance 2. Capable of generating insane production-ready outputs at a fraction of the cost.
>>
>>
>>
>>
>>
File: Ernie-Image-Turbo_00044_.png (1.5 MB)
1.5 MB PNG
>>108660093
>>108660595
>>108660657
>Cloudcucks still think their models have moat
No you don't. Plus they revealed on stream that GPT image 2 literally looks up images on the internet, once local gets the upcoming Ernie Image edit we will be 90% of the way there. And that is naturally enough to surpass cloudcucks, because local is uncensored. GPT is literally just an autoregressive model, like the one we have, but with added tool usage.
>>
>>108660924
>models with permissive licenses like SDXL
wtf why not use an Apache 2 licensed model like Wan as an example. oh because this is ai generated slop i got baited by slop
you're also forgetting one of the schizo dimensions of a true local purist - if you're running a local model on someone else's PC you're still a cuck
>>
>>
Just how astroturfed are the top accounts on civitai?
>1girl with midjourney
>never posts prompts
>no model info
>always makes it to the front page with a ton of likes
There's no way this is organic, indodogs are botting this shit to hell.
>>
>>
>>
>>
File: firin mah lazer.png (383.2 KB)
383.2 KB PNG
>>108660925
Use anima.
>>
>>
>>
>>
File: 440576101526255.png (2.3 MB)
2.3 MB PNG
>>
>>
>>
File: 514418981906913.png (1.5 MB)
1.5 MB PNG
>>
File: Ernie-Image-Turbo_00050_.png (1.8 MB)
1.8 MB PNG
>>108660376
>>
File: 677671984496153.png (1.8 MB)
1.8 MB PNG
>>
File: 360915760523097.png (3.5 MB)
3.5 MB PNG
>>
File: 68297983825226.png (1.6 MB)
1.6 MB PNG
>>
File: 1759796031452476.png (1.3 MB)
1.3 MB PNG
>>108661148
>It looks worse than Nano Banana Pro at realism.
debatable. the outputs are less generic than nbp, but at the cost of speed so it takes longer to experiment with prompts
nbp feels like zit (generic but consistent) while gpt image 2 is like z base if was actually good (more variety but slower)
>>
File: 1031824269.png (1.5 MB)
1.5 MB PNG
>>
>>
>>
File: Untitled.png (3.3 KB)
3.3 KB PNG
What can we do about this?
>>
>>
File: 884541212545.jpg (644.1 KB)
644.1 KB JPG
>SOTA at photorealism
Not so fast cloudcucks. Nothing has surpassed the Flux.2 VAE yet.
>Technically BTFOs Google
>Technically BTFOs ClosedAI
Idk how BFL do it, and the model is open and free! One thing is certain, more compute is simply not enough to be objectively better than local across the board.
>>
>>
>>
>>
>>
>>108661438
>it kinda matters
no, I don't really give a fuck about other generals, like I said, I'm here to talk about local models, not about some petty rivalry between generals, dunno why /adt/ rent so free in your head but you need some help
>>
>>
File: Flux2-Klein_00136_.png (1.6 MB)
1.6 MB PNG
>>108661422
Maybe if you are blind. Here's a closer look at the images, which one captures more detail? Even a kindergartener could figure that out
>>
>>
File: GPT-Image-2.jpg (1014.1 KB)
1014.1 KB JPG
>>108661480
Real image I used for inspiration- https://thumbs.dreamstime.com/b/blonde-woman-washing-plates-sponge-dom estic-kitchen-48728607.jpg
Flux 2 Klein knows about real photographs, not the slop you see AI models usually output and it's unmatched in that aspect.
>>
>>108661354
Good point, I think we should conquer them and make them our colony, honestly, it’s probably the only way they’ll survive at this point.
For example:
/adt/ would become: /ldg/’s anime thread
/atg/ (Ace Step General) would become: /ldg/’s local music diffusion thread
/adg/ (API diffusion general) would become: /ldg/’s API diffusion thread
If people see “/ldg/” in the thread name, I think it would attract new anons on its own, since it’s a recognizable and trusted name.
In other words, /ldg/ is guarantee of good quality.
>>
>>
>>
>>
>>108661505
>>108661527
what kind of mental illness is this?
>>
>>108661527
>>108661505
tdrusell here, if /adt/ and especially the /h/ diffusion threads start becoming /ldg/ colonies, I’d officially post my Anima news there, since they’d basically be extensions of this great general.
>>
>>
File: 33e42dbd5b44e58200c06cbc8aefc170.png (48.6 KB)
48.6 KB PNG
WHY THE FUCK ARENT WIDGETS SHOWING ON ON THE SUBGRAPH?! I'M ADDING A TON BUT NONE SHOW REEEEEEEEEEE
>>
>>
>>
File: 711112888240732.png (1.9 MB)
1.9 MB PNG
>>108661498
thx
>>
>>108661550
Also, it's quite sad that a possibly 100B+ parameter cloudshit model performs worse than a tiny open model, and one of the main things it users brag about is already surpassed by it. I want to see this tech advance as much as you do, and that's quite disappointing.
>>
>>
File: e88d9a946ee79d80cc8de86219d91ce0.png (131.8 KB)
131.8 KB PNG
>>108661582
>>108661551
I asked claude and it knew right away, it's so good at finding answers.
Using basic reroute nodes solves it.
>>
>>
File: 1773950784540672.png (2.3 MB)
2.3 MB PNG
>muh magic flux 2 vae
lmao
>>
>>
>>
>>
>>
>>
Is there any Flux Klein LORA for turning images into video stills? The I2V services and local models tend to do better with images that look like a video still (for obvious reasons) but I've had pretty mixed results with asking Klein to change the style of photos to look like home video stills/etc. Seems like something that could be refined a lot with a well-made LORA
>>
>>
File: 1761405861412.png (3 MB)
3 MB PNG
I will not be impressed by an image model until it has spatial intelligence, it's the god given mission of image models to server as world models with spatial intelligence.
This is GPT-Image. Better than every other model but not that big of a jump.
>>
>>108661787
I just want it to
- change focal length to a more realistic one for a video camera
- remove photo postprocessing effects that would not be present in a video
- remove any visual signs that this is a physical photograph (dust, scratches, glossy surface, etc)
- add mp4 compression artefacts
- add blur if appropriate
- leave the subject matter unaffected as much as possible (it won't be perfect obviously)
etc.
>>
>>
>>
>>108661815
I disagree. https://chatgpt.com/share/69e8e833-83f8-83ea-a453-75a7c3e53af5
>>
>>
>>
>>
File: ComfyUI_temp_xnbin_00066_.png (1.8 MB)
1.8 MB PNG
>>
>>
File: 4545412121.jpg (699.5 KB)
699.5 KB JPG
>>108661639
Relax API cuck. This time I included a real image for comparison because I'm not kidding when I say you don't know what real images look like.
>>
>>
File: 1749611799298273.png (267.5 KB)
267.5 KB PNG
>>108661895
why her hair is aliased on flux 2 klein? looks like it's from a video game lmao
>>
>>
>>
>>
>>
File: 1773171557855444.png (2.3 MB)
2.3 MB PNG
>>108661895
we can do this all day buddy, your little toys dont stand a chance
>>
>>
>>
>>
>>
>>
>>
>>
>>108661401
Agreed. Most people here are a bit blind. I've learned a lot from here about how much 'visual sensitivity' or 'visual IQ' (or whatever we might call it) varies among the general population. I'm big on the blurry older style photos, but that's not the only kind of photo that really exists. The fact that the Flux example is a hair more "plastic" (which sort of disappears if you zoom out a little) is not sufficient to declare it the loser when it is obviously giving a more real overall impression (and it's not close)
>>
>>
File: 1754361845390167.png (57.8 KB)
57.8 KB PNG
>>108662209
>The fact that the Flux example is a hair more "plastic" (which sort of disappears if you zoom out a little)
>which sort of disappears if you zoom out a little
the absolute state of localkeks
>>
>>
File: 44454564854.jpg (1 MB)
1 MB JPG
>>108662046
>Gets rekt at realism by local
>Still thinks his shitty slopped model stands a chance
>>
>>
>>
File: Chroma-65601346659266_00001_.png (853 KB)
853 KB PNG
>>108661401
for all its faults I still think Chroma is best
>>
>>
>>
>>108662384
>why are you using flux 2 klein to make your point?
I am interested in small details. E.G. skin, hair textures, every grain in the image. I know it's hard to notice, but the Flux.2 VAE captures those details very well which makes it stand out in realism, while every other model that doesn't use this VAE (including NBP/GPT Image 2) is less than perfect at capturing such details.
>>
>>
>>
>>
File: 1746004302702907.jpg (882.2 KB)
882.2 KB JPG
>>
>>
>>
>>
>>
>>
>>
>tfw you're completely lost, have lost track and have no idea of what the present state of the art meta is, and hopping between tag based and natural language prompting has caused your prompt technique to atrophy into pidgin prompt nonsense that is the worst of both worlds.
>...and then you updated comfy and everything just stopped working
I'm tired boss
>>
>>
>>108662629
>>108662657
samefag
>>
File: 5e400ef3-c781-436d-8a66-e8f2ae8c4d72.png (868.2 KB)
868.2 KB PNG
>>108662657
>>
>>
>>108662753
>>108662793
samefag
>>
>>108659289
>Is this the fabled "girlfriend" that you wanted a LoRA of?
i dont' know what you're talking about. i have no knowledge of ai stuff. i just want her with any ai edit. maybe cyberpunk or prehistoric hunter-gathering style, please. >>108659100
>>
>>
File: typicalcomfyuiwf.png (216.9 KB)
216.9 KB PNG
>>108662850
I fucking hate people who upload their workflows and hide all their messy shit behind a node
>>
File: 1745665572615614.png (69.7 KB)
69.7 KB PNG
>>108662793
sharty bullshit is kind of obvious
>>
>>
>>
>>
File: 1_00031_.jpg (3.3 MB)
3.3 MB JPG
so its true that cloudcucks cant generate any goon material?
>>
>>
>>108663187
http://127.0.0.1:8188 is the default for comfyui, you can specify a port in the launch options
>>
>>
File: 1748095745299476.jpg (1.2 MB)
1.2 MB JPG
>>
>>
>>
>>
>>
>>
File: 1766558929554669.jpg (505.7 KB)
505.7 KB JPG
>>
>>
>>
>>
File: _AnimaPreview3_00454_.jpg (445 KB)
445 KB JPG
>>108663394
https://huggingface.co/Yyyueyu/Flux2-Klein-9B-Consistency
>>
>>
File: ComfyUI_22406.png (2.1 MB)
2.1 MB PNG
>>108662823
>i just want her with any ai edit
Can't help you bully girls at school, sorry!
>>108663223
I don't see it.
>>
>>108663409
>>108663431
dunno why that anon is linking that, is not from the original author, might be just a mirror, but beware
https://huggingface.co/dx8152/Flux2-Klein-9B-Consistency
>>
>>
File: FluxKlein9BDistilled_Output_836226.jpg (2.2 MB)
2.2 MB JPG
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00008_.jpg (312.7 KB)
312.7 KB JPG
>>
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00010_.jpg (296.7 KB)
296.7 KB JPG
>>
>>
>>
File: 1767809748567445.webm (3.3 MB)
3.3 MB WEBM
love me some 'eedance
>>
>>
File: ComfyUI_temp_pfsdp_00016_.png (2.7 MB)
2.7 MB PNG
>>108663484
>>108663511
Nice anon, I was checking out older gens from Chroma and DC-2K is a pretty good model, I dunno why I switched to the hd-fp8 one
>>
File: anima3_00032_.png (2.3 MB)
2.3 MB PNG
>>
>>
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00015_.jpg (728.5 KB)
728.5 KB JPG
>>108663561
>dunno why I switched to the hd-fp8 one
speed!
>>108663629
very cool, is it just anima3 or with lora?
>>
File: anima3_00019_.png (2.5 MB)
2.5 MB PNG
>>108663648
anima3>zit refiner, workflow from here >>108642763 with some tweaks
>>
>>
File: thumb-1920-1406220.jpg (190 KB)
190 KB JPG
[general]
shuffle_caption = true
keep_tokens = 2
caption_extension = ".txt"
[[datasets]]
resolution = 1024
batch_size = 2
enable_bucket = true
min_bucket_reso = 256
max_bucket_reso = 2048
bucket_reso_steps = 32
[[datasets.subsets]]
image_dir = "E:\\sd-scripts\\mylora_images\\myfirstlora"
num_repeats = 96
is_reg = false
@echo off
cd /d E:\sd-scripts
accelerate launch --num_cpu_threads_per_process 1 sdxl_train_network.py ^
--pretrained_model_name_or_path="E:\ComfyUI_windows_portable\ComfyUI\m odels\checkpoints\illustriousXL_v01 .safetensors" ^
--dataset_config="E:\sd-scripts\config.toml" ^
--output_dir="E:\sd-scripts\mylora_output" ^
--output_name="myfirstlora" ^
--save_model_as=safetensors ^
--network_module=networks.lora ^
--network_dim=16 ^
--network_alpha=32 ^
--learning_rate=0.0003 ^
--unet_lr=0.0003 ^
--text_encoder_lr=0.00003 ^
--optimizer_type="Adafactor" ^
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" ^
--lr_scheduler="linear" ^
--max_train_epochs=1 ^
--save_every_n_epochs=1 ^
--mixed_precision="bf16" ^
--gradient_checkpointing ^
--cache_latents ^
--clip_skip=2 ^
--min_snr_gamma=5 ^
--multires_noise_iterations=6 ^
--multires_noise_discount=0.3 ^
--max_token_length=225 ^
--seed=42
pause]
This is my first animu character lora. I’m making it on illustriousXL_v01 to start with the basics, kind of like reading the Greeks when you begin studying philosophy.
My dataset consists of 22 images ,I know it’s smal, but I want to see what I can get out of it.
For the next version, I’m planning to increase the dataset to around 50 images and include tags that I didn’t add before.
Right now, all the images have white backgrounds, so it might be overfitting, but I still feel the need to test how the model behaves, at least for this first attempt.
>>
>>
>>
>>
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00021_.jpg (407 KB)
407 KB JPG
>>108663688
cool, have to check it out
>>108663704
Don't train text encoder, it's already fried to shit
>>
>>
>>108663704
Increase num epochs and decrease repeats, more useful output. I usual do 10 epochs or less. If you end up frying you can use earlier epoch output this way.
Invert alpha and dim values
That LR might fry it by being too high.
Don't attempt to specify different lr for unet and te on your first try.
Either replace optimizer with AdamW or Prodigy (and set LR and alpha to 1 if you choose the latter)
lr schheduling should be cosine
Don't smash random shit you don't understand into optimizer args
Multires noise crap is supposed to work with older epsilon models but I don't recall having success with it, I would recommend skipping that shit.
>images have white backgrounds
I would just bucket paint different colors to a few of them.
I would bump weight decay a bit in the optimizer args and add <0.1 dropout when training with a small dataset, but that depends on how noisy the data is.
>>
>>
>>108663688
>>108663629
tdrussel should create an Anima realism and furry fine tune and put it behind a paywall to finance the anime one.
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00026_.jpg (302.8 KB)
302.8 KB JPG
>>
>>108662209
>Some people have low visual sensitivity or, heh, i call it visual IQ *does epic fedora tip to establish dominance*
>Anyway, if the image looks bad you can simply impair your vision. Try taking off your glasses or walking to the other side of the room
I do not think it is a 'visual' IQ issue we are dealing with here
I think Klein gets the edge in that example too but c'mon, don't be silly
>>
File: 1748626742022658.jpg (157.5 KB)
157.5 KB JPG
>>108663704
gpt image 2 just dropped and this goofy ahh talmbout illustriousXL
wtf yall doing
>>
File: nxyz-2026-04-22 15-36-02-er_sde-3.5-32-0153.jpg (781.6 KB)
781.6 KB JPG
anima is lit bruh, fr fr
>>
>>
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00034_.jpg (360.1 KB)
360.1 KB JPG
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00036_.jpg (290.6 KB)
290.6 KB JPG
>>
File: Sentenced_to_Be_a_Hero_anime_Teoritta_render.png (695.4 KB)
695.4 KB PNG
>>108663753
I tagged everything I saw using the appropriate Danbooru tags. Maybe it won’t be so overcooked.
>>108663768
Thanks. Which option is the text encoder? Is it `--text_encoder_lr=0.00003`? Should I set it to 0?
>>108663800
I’ll take that into account. Do you have a good all purpose SDXL lora or a user who makes good lora that I can check to look at their metadata?
>108663867 >108663726 >108663774
I need to start somewhere. My next step will be Anima, but before that I have to pay tribute to Noob after this lora.
>>
>>
>>
>>
>>
File: Screenshot_64.png (1.1 MB)
1.1 MB PNG
Is there a model that's actually good at breaking down images to guidelines/sketch? I wanna study a bug girl, and NAI is failing me comically bad.
>>
File: Chroma-DC-2K-T2-SL4-Q8_0_00044_.jpg (298.2 KB)
298.2 KB JPG
>>108664108
>https://github.com/lllyasviel/Paints-UNDO
>https://lllyasviel.github.io/paints_alter_web/
closest implementation I know of
>>
>>
>>
>>
>>108663688
>anima3>zit refiner, workflow from here
does the workflow need https://catalog.ngc.nvidia.com/orgs/nvidia/teams/maxine/collections/ma xine_vfx_sdk this shit
>>
>>
>>
>>
File: anima_00063_.png (2.6 MB)
2.6 MB PNG
>>108664190
when i imported the workflow i got missing node errors about that so I just removed it from the workflow. maybe it makes it better?
>>
File: ANM-2026-04-23-00-54-45_00003_.jpg (500.4 KB)
500.4 KB JPG
>>108664349
I ran this in comfy folder and it worked.\python_embeded\python.exe -m pip install nvidia-vfx --extra-index-url https://pypi.nvidia.com
>>
>>
>>
File: 1752476611193235.jpg (1.8 MB)
1.8 MB JPG
>>
>>108663960
>Do you have a good all purpose SDXL lora or a user who makes good lora that I can check to look at their metadata?
I can't endorse anyone. Just checks loras you have been using and know that aren't underbaked or fried.
This is the command I used to run for SDXL loras btw (for Noob):
>python sdxl_train_network.py --v_parameterization --pretrained_model_name_or_path ~/models/NoobAI-XL-Vpred-v1.0.safetensors --tokenizer_cache_dir ~/lora/tokenizercache/ --train_data_dir ~/lora/images/ --shuffle_caption --caption_separator , --caption_extension .txt --keep_tokens 1 --resolution 1024 --cache_latents --cache_latents_to_disk --enable_bucket --min_bucket_reso 256 --max_bucket_reso 2048 --bucket_reso_steps 64 --dataset_repeats 8 --output_dir ~/lora/output/ --save_precision fp16 --train_batch_size 2 --max_token_length 225 --xformers --max_train_epochs 10 --persistent_data_loader_workers --max_data_loader_n_workers 1 --seed 44453 --gradient_checkpointing --mixed_precision bf16 --logging_dir ~/lora/logs --log_with tensorboard --zero_terminal_snr --loss_type l2 --training_comment "Trigger word is blabla" --save_model_as safetensors --optimizer_type Prodigy --learning_rate 1.0 --max_grad_norm 1.0 --optimizer_args weight_decay=0.01 decouple=True d_coef=1 use_bias_correction=True safeguard_warmup=True betas=0.9,0.999 --lr_scheduler cosine --lr_warmup_steps 0 --min_snr_gamma 5 --prior_loss_weight 1.0 --network_dim 16 --network_alpha 1 --network_dropout 0.08 --network_module networks.lora --save_every_n_epochs 1
Use 0.10 warmup if using AdamW
>Noob after this lora.
Noob uses v-pred so ditch min_p too on top of noise offset if you haven't already when you move on to that.
I will also for the last time recommend training the text encoder, but I guess you should see it yourself how it works out.
>>
>>108662629
I'd do local but Gemini is just better in everything except it can't do porn and deepfakes, why can't local models be as good?
Why can't I throw an image and tell it "Hey, put her in a bikini" and just understand and do that instead of generating some other bitch in a bikini with 3 legs?
>>
>>
>>108664536
By 100B what are you referring to? I can believe the TE/(V|L)LMs are a few hundred something billion MOEs with say 20-50 billion active params, but I doubt they are running massive UNETs. It would be too economically impractical.
>>
>>
File: ANM-2026-04-23-01-37-54_00004_.jpg (445.2 KB)
445.2 KB JPG
>>
>>108664579
I mean yes sure but that is not as directly related as you think. After all these models are also on the API and being offered presumably not at loss.
Z-Image base is 0.01$ per megapixel of FAL and banana 2 is 0.08. They are bigger than the average local model but I don't they are 100B big. Comparing API inference costs we can assume they would bleed a lot of money with current prices if they were that big.
>>
>>
>>108664522
because the local community is stuck in a perpetual emperors new clothes delusion, convincing themselves that cucked models like flux klein 2 are actually good.
why would china or whoever bother making a genuinely good model when they can just capitalize on the shit eaters who will hype any new slop release (like ernie) as long as its "local"?
all the serious genners have been using apis exclusively for a while now because local models have hit a brick wall in realism and surrendered to churning out anime models instead
>>
>>
File: Gemini_Generated_Image_cveefcveefcveefc.png (3.3 MB)
3.3 MB PNG
>>108664499
I got a pretty good breakdown from Nano Banana Pro, I'm hoping it's actually decent. I'm noticing the furthest left claw has the wrong number of joints and the numbering on the body segments is wrong, so we'll see...
>>
>>
>>
>>
>>
>>
>>
>>
>>