Thread #108668921
File: highlights_g_108664784_1776956890_1.jpg (2.5 MB)
2.5 MB JPG
Discussion and Development of Local Image and Video Models
Previous: >>108664784
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>LTX-2
https://huggingface.co/Lightricks/LTX-2
>Wan
https://github.com/Wan-Video/Wan2.2
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Illustrious
https://rentry.org/comfyui_guide_1girl
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
335 RepliesView Thread
>>
>>
File: 1755007656454961.jpg (392.1 KB)
392.1 KB JPG
>>
File: ComfyUI_10703_.png (367.9 KB)
367.9 KB PNG
>>
>>
>>
>>
Why is civitai full of new accounts literally named "abc123abc" commenting in every single z-image lora to make an Ernie version. For fuck sake, just take a look at the Commodore64 lora for Ernie, is disgusting, makes me puke just to stare at the images.
>>
>>
>>108668948
get out! >>108653190
>>
>>
>>
>>
>>
File: image.png (32.1 KB)
32.1 KB PNG
>>108668972
chinks shill army nothing new
they are also shilling chink models in r/localllama right now
>>
File: 1760920978918124.png (26.2 KB)
26.2 KB PNG
>>108668954
the room was prompted to be bathed in warm light with dusty color pallete because it looks cozy
>>108669037
facts. i really like what it did with groks coffee cup
>>
>>
>mfw Resource news
04/23/2026
>ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
https://shelley-golan.github.io/ParetoSlider-webpage
>DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion
https://github.com/Adamlong3/DynamicRad
>Normalizing Flows with Iterative Denoising
https://github.com/apple/ml-itarflow
>LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
https://github.com/inclusionAI/LLaDA2.0-Uni
>Illustrious XL & NoobAI-XL Style Explorer
https://github.com/ThetaCursed/Illustrious-NoobAI-Style-Explorer
>AI Model & ‘MAGA’ Influencer Emily Hart Unmasked as Indian Man
https://www.yahoo.com/news/articles/ai-model-maga-influencer-emily-091 027504.html
04/22/2026
>Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
https://github.com/cvims/EMBEDDING-ARITHMETIC
>Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation
https://github.com/CompVis/patch-forcing
>TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
https://github.com/Hong-yu-Zhang/TS-Attn
>AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
https://yutian10.github.io/AnyRecon
>SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing
https://github.com/vivoCameraResearch/SmartPhotoCrafter
>Soft Label Pruning and Quantization for Large-Scale Dataset Distillation
https://github.com/he-y/soft-label-pruning-quantization-for-dataset-di stillation
>Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
https://github.com/AMAP-ML/EMF
>Enhancing Continual Learning of Vision-Language Models via Dynamic Prefix Weighting
https://github.com/YonseiML/dpw
>IR-Flow: Bridging Discriminative and Generative Image Restoration via Rectified Flow
https://github.com/fanzh03/IR-Flow
>>
>>
>mfw Research news
04/23/2026
>Image Generators are Generalist Vision Learners
http://vision-banana.github.io
>Camera Control for Text-to-Image Generation via Learning Viewpoint Tokens
https://randdl.github.io/viewtoken_control
>Hallucination Early Detection in Diffusion Models
https://arxiv.org/abs/2604.20354
>Wan-Image: Pushing the Boundaries of Generative Visual Intelligence
https://arxiv.org/abs/2604.19858
>MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings
https://arxiv.org/abs/2604.19902
>Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing
https://arxiv.org/abs/2604.20258
>Amodal SAM: A Unified Amodal Segmentation Framework with Generalization
https://arxiv.org/abs/2604.20748
>FluSplat: Sparse-View 3D Editing without Test-Time Optimization
https://arxiv.org/abs/2604.20038
>HumanScore: Benchmarking Human Motions in Generated Videos
https://arxiv.org/abs/2604.20157
>Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
https://arxiv.org/abs/2604.20730
>Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation
https://arxiv.org/abs/2604.20366
>Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers
https://arxiv.org/abs/2604.20027
>X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference
https://arxiv.org/abs/2604.20289
>Self-supervised pretraining for an iterative image size agnostic vision transformer
https://arxiv.org/abs/2604.20392
>Efficient INT8 Single-Image Super-Resolution via Deployment-Aware Quantization and Teacher-Guided Training
https://arxiv.org/abs/2604.20291
>From Diffusion to Flow: Efficient Motion Generation in MotionGPT3
https://arxiv.org/abs/2603.26747
>>
>>108669037
that's basically what image 2 is doing.
it's a second pass that projects the text onto the genned image. the easiest way to spot it is on clothing, the X for example, it's just sitting on her dress. it's actually almost pixel perfect with the X on the laptop.
>>
>>
>>
>>108669088
>>108669090
thanks
>>
>>
File: Untitled-1.png (191.3 KB)
191.3 KB PNG
>>108669107
probably because they don't care, it's a parlor trick to impress indians and boomer investors. sorry to pull the curtain back.
point in case, the gen has uses the same X, it just has a slight skew on the dress. same with the openAI logo, it's just sitting on her shirt.
>>
>>
>>108669093
Gay
>>108669089
There is no way it's that simple. But now that I think of it putting tags like "masterpiece" seem to help
>>
File: image.png (44.4 KB)
44.4 KB PNG
>>108669137
?
>>
>>108669190
api image thread is here >>108653190
>>
>>
>>
>>
>>
>>108669243
honestly i think a random person could figure out a better implementation in a few days, local has a lot more head room to fuck around. there are 3d models, i assume they have some kind of texture projection.
you could probably jury-rig something from preexisting nodes. convert a masked area into a plane or 3d topology, project text or an image onto it, then lay it on top of the gen.
>>
>>
File: _AnimaPreview3_00155_.jpg (382.3 KB)
382.3 KB JPG
>>
File: 1748109684850279.jpg (646.6 KB)
646.6 KB JPG
aight, you can now use NAG on Anima
https://github.com/BigStationW/ComfyUI-NAG-Extended
https://github.com/BigStationW/ComfyUI-NAG-Extended/blob/main/workflow s/NAG-Anima-ComfyUI-Workflow.json
https://civitai.com/models/2560840/anima-turbo-lora
>>
>>
>>
File: _AnimaPreview3_00156_.jpg (387.4 KB)
387.4 KB JPG
>>
File: 1563934765591.png (4.8 KB)
4.8 KB PNG
>turbo lora for a 2b model
>>
>>
>>
>>108669455
https://github.com/pamparamm/ComfyUI-ppm
I've been using this for negative weights while at CFG 1.0, works great, you just have to get used to the fact that you are putting negative weighted tags in the positive prompt instead of writing in the negative prompt. This has worked better for me than NAG ever did.
>>
File: 1765410005510165.jpg (33.8 KB)
33.8 KB JPG
>>108668921
>my Roll-chan made it in the OP
>>
>>
>>
>>
>>
>>
File: _AnimaPreview3_00162_.jpg (409.6 KB)
409.6 KB JPG
>>108669513
probably zimage turbo
>>
>>
File: file.png (17.8 KB)
17.8 KB PNG
I wonder if there is a way to automate gemma 4 with its vision capabilities as an agent + whatever model + inpainting tools to approach the result of the gpt autoregressive model.
>>
>>
>>108669503
the issue is that those NAG parameters don't work for cfg > 1, it can be used yeah but I'm just too lazy to find the right values again, I mean if you already have CFG, adding NAG on top of that is kinda useless imo (and it's slower)
>>
>>
>>
>>
>>
>>
File: 1763564705331420.jpg (242.1 KB)
242.1 KB JPG
How do I anima with krita?
>>
>>
>>
>>108669544
>>108669563
I've seen some workflow where they use ZIB to do the begining of the image (like the first 50% of steps), then switch to ZiT to make it look good
>>
>>108669553
Since always. https://civitai.com/models/1662740/lenovo-ultrareal?modelVersionId=288 2170 This lora helps a tiny bit.
>>
>>
>>108669555
issue is that edit won't be able to target specific things to enhance
>>108669528
can gemma select a part of an image?
>>
>>
>>108669582
>edit won't be able to target specific things to enhance
yes it can, edit can just modify one specific part of the image, that makes shit easier because you just have to say "hey, add a hat to that girl's head" instead of trying to automate an inpainting process
>>
File: 646065966145594.png (1.1 MB)
1.1 MB PNG
>>108669513
>>108669522
Anima -> ZIT
>>
>>108669528
I don't know what it is but they did something else than just "look at this image and fix it".
Even SOTA API models don't really have a super great visual reasoning.
Again I don't know what precisely it is, but they are feeding ChatGPT more than a few hundred visual tokens.
>>
>>
File: _AnimaPreview3_00173_.jpg (382.6 KB)
382.6 KB JPG
>>108669553
I use loras for photography and interior. Haven't uploaded anywhere yet.
>>
>>108669528
>>108669629
it's probably something like this
>it makes the image -> it uses its visual encoder to see mistakes -> it makes an edit prompt -> it edits the model
a gemma 4 + klein combo could definitely do the trick
>>
>>
>>
>>
>>108669503
>>108669531
give me a prompt and a negative prompt, I'll try it out
>>
>>
>>
>>
>>108669629
Anything can be bruteforced with enough tokens, and seeing the prices on the api sides, I'm pretty sure it's feeding a whole lot of tokens to refine the image.
The result is good though, and I'd like to see that locally done with the tools we have.
>>
File: t5g_gallery_00125_.png (911 KB)
911 KB PNG
https://huggingface.co/TheRemixer/ChenkinNoobRF-T5Gemma-adapter
Neat, T5gemma adapter for Chenkin Noob!
>>108669726
Anima is x2 slower than SDXL and for train Anima loras it's between 2.3 -2.5 slower than sdxl
>>
>>108669653
>it uses its visual encoder to see mistakes
this is probably their secret sauce (along with using agents), I think they trained specifically for "wrong looking texts" and details, which means the model is probably very good at spotting that
>>
>>
>>
>>
File: 410146798802683.png (3.9 MB)
3.9 MB PNG
>>108669684
Pretty much, expect I start with a realistic Anima gen.
>>
>>
File: 968433991998314.png (2.1 MB)
2.1 MB PNG
>>
>>
>>
>>108669731
I guess you can take your time trying to min-max llama.cpp params and see if it scales up well enough? I wouldn't be too hopeful but worth a shot.
Maybe 3.6 works better for this, that's also worth experimenting.
>>
File: 1747808016100949.jpg (258.8 KB)
258.8 KB JPG
>>108669653
>>108669747
you need a very good model that doesnt use a vae in order to do what gpt 2 is doing
dont waste your time trying to squeeze water out of a stone with these outdated latent diffusion models
>>
File: absolute gpt image 2 slop.png (3 MB)
3 MB PNG
>>108670003
>you need a very good model that doesnt use a vae in order to do what gpt 2 is doing
the thing is that it's obvious that gpt 2 is still using a vae, when you go for very complex images, it starts to be slopped fast and there's more and more noises and artifacts, it's probably the result of the model doing like 10 edits and at this point the vae issues starts to be really amplified
>>
File: woman 2026 04 23 1.png (1.5 MB)
1.5 MB PNG
>>
>>
>>
File: 1770711519026006.png (2.3 MB)
2.3 MB PNG
>use pear-shaped figure tag
>turns her into a literal fucking pair
Kek
>>
>>
>>
>>
File: _AnimaPreview3_00224_.jpg (370 KB)
370 KB JPG
>>
>>
>>
>>108669856
The problem with this method is that the anatomy more or less sucks.
Talking about anatomy, what model produces the most anatomically accurate gens? I have been using virt a mate + ZIT to make my gens look realistic, but that's kind of a hassle.
>>
File: 921750962661488.png (2.3 MB)
2.3 MB PNG
>>
File: 1750158164106180.png (2.3 MB)
2.3 MB PNG
>>108670022
that mainly happens to posters and cartoons, i havent seen that happen to real images yet. it could be using a different model for realism
>>108670060
its basically magic compared to local.
youre better off trying to reverse engineer it to gain some understanding than acting like a know it all
>>
File: 1767269344628861.jpg (576 KB)
576 KB JPG
>>108669503
>Would you be so kind as to compare non turbo lora with regular CFG vs NAG?
here you go
>>
File: 575979444952377.png (1.8 MB)
1.8 MB PNG
>>
>>
>>
>>
>>
File: 1776968397991114_.png (2.4 MB)
2.4 MB PNG
>>108670294
>>
>>
File: _AnimaPreview3_00241_.jpg (386 KB)
386 KB JPG
>>108670241
>>
File: 1_00027_.jpg (3.3 MB)
3.3 MB JPG
also whats cool is that you can use AI nowadays to remove ugly tattoos from woman.
>>
>>
File: 869726735467031.png (2.4 MB)
2.4 MB PNG
>>
File: zazed.png (168.9 KB)
168.9 KB PNG
>>108670396
>I can fix her!
and you did
>>
>>
File: 238674633911580.png (2.1 MB)
2.1 MB PNG
>>
>>
File: 430128115435224.png (2.1 MB)
2.1 MB PNG
>>
>>
>>
>>
File: 166514189754498.png (2 MB)
2 MB PNG
>>108670425
He's shy bro.
>>
File: 1759207999061119.jpg (559 KB)
559 KB JPG
How come anima uses @ for artist tags?
>>
>>
>>108670648
There's also artists with common nouns in their name in the dataset and on illustrious/noob you will ALWAYS get that common noun in your gen if you prompt for them. Haven't seen that issue on anima yet.
>>
File: 1746837922993034.png (2 MB)
2 MB PNG
>>108670411
we all cant wait to see your inpainting skills
why dont you show us an example of how gpt 2 handles image editing?
>>
>>
>>108670728
it was already explained at the start of the thread.
it generates the image then does a semg inpaint to projects text and logos.
>>108669135
>>
>>
>>
>>
File: 1770353268506318.png (1.2 MB)
1.2 MB PNG
Seems like some people want GPT-2 at home, you got your wish granted lol
https://github.com/inclusionAI/LLaDA2.0-Uni
https://huggingface.co/inclusionAI/LLaDA2.0-Uni
>>
>>
>>
>>
>>
>>
>>
>>108670929
Oh somehow I missed that# Understand the image
response = model.understand_image(
image_tokens, h, w,
question="Describe this image in detail.",
steps=32, gen_length=2048,
)
Holy hell it is diffusing text.
>>
File: 2026-04-23153725_stealthmeta.png (658.4 KB)
658.4 KB PNG
>>
File: 1761391883384476.jpg (178.3 KB)
178.3 KB JPG
>>108670859
>Diffusion Large Language Model
Huh? it's diffusing text too?
also what an awful way of showing perf
>>
>>108670979
yep, it's a diffusion LLM model, and those things are like 5X faster than your regular autoregressive LLM, the issue is that for the moment no one has managed to make them as smart as autoregressive, I hope it'll happen
>>
https://civitai.red/models/2553102/editanything?modelVersionId=2869279
edit lora for LTX 2.3, pretty funny what you can do. "replace the man in the red shirt with a green orc from lord of the rings."
https://files.catbox.moe/i3agwk.mp4
>>
>>
>>
>>
>>
File: We'll see about that.gif (1.4 MB)
1.4 MB GIF
>>108670859
>https://github.com/inclusionAI/LLaDA2.0-Uni
>It enables precise modifications while perfectly preserving original details.
>>
>>
>>108671011
kek, "replace the man in the red shirt with mickey mouse from Disney."
https://files.catbox.moe/9av1e7.mp4
>>
>>
>>
File: _AnimaPreview3_00309_.jpg (464.2 KB)
464.2 KB JPG
>>108670968
I might but I don't think it's good enough
>>
File: 1766663339136520.png (1.5 MB)
1.5 MB PNG
Is there a rule of thumb for how many artists you can mix with anima? I've had good results with 2 (using weights of course) but I've had trouble keeping the style consistent with more.
>>
File: 1759704504932123.mp4 (232.2 KB)
232.2 KB MP4
>>108670859
it's interesting to look at, the way it process text with the diffusion process
>>
>>
>>
>>108671011
>>108671176
>>108671187
Already for old porn with ugly but competent actresses, I can see the use case by replacing their face and body with unreal hotties.
>>
>>
>>108671224
no it's random, and it's kind of an art like cooking : the stronger the artist style, the less it can marry with anything else strong too, and some artists are similar enough they can be together with each bringing something specific (like one good at hosiery, one good at poses etc)
>>
>>
>>
>>
>>
>>108671314
Prostitution no, not until we have hacked all senses at least.
Porn, the industry itself is already kind of being killed by OF and "amateur" before that.
OF, yeah but it's imploding by itself, and tbdesu it's a very recent category, pretty girls doing lewd stuff on cam is a very recent thing. I see it as fleeting.
>>
File: ComparisonLatest.jpg (2.5 MB)
2.5 MB JPG
A fair-skinned young Caucasian woman with long, sleek copper-red hair stands centrally on a weathered stone walkway, posing directly for the camera. She wears a whimsical pastel lavender mini-dress featuring a tiered skirt, ruffled bodice with lace trim, and sheer long sleeves, accessorized with a metallic gold crossbody bag. Her legs are clad in intricate white patterned lace tights, ending in chunky two-tone black and white platform oxford shoes. She is situated in a formal garden setting, flanked by stone balustrades topped with large white classical urns containing manicured green bushes. Immediately behind her stands a white architectural frame structure bearing the text "1GIRL GARDENS" in bold serif capital letters. The background reveals terraced flower beds, classical white statues, and a green hillside dotted with buildings. The lighting is soft, flat, and diffused from an overcast sky, creating shadow-free illumination that enhances the soft pastel colors of her dress and the even tones of her complexion. Style: whimsical street fashion photography. Mood: sweet, composed, and serene.
>>
>>
File: _AnimaPreview3_00320_.jpg (422.8 KB)
422.8 KB JPG
>>
>>
>>
File: thank you tdrussell miku 2.png (865.6 KB)
865.6 KB PNG
After failing a few times before I was finally able to train my lora on anima thanks to the official configuration tdrussell shared. (I also bumped training dataset from 80 to 130 in between)
It's not perfect, but it actually feels usable now without coping extensively. Some gens still undershoot and maybe still just a little bit of overlearning irrelevant noise in the others but it came off better than previous attempts where I either fried it to oblivion or undershot massively.
In the interest of perhaps helping an interested party here is the command I last used:python anima_train_network.py --tokenizer_cache_dir /home/user/myloras/tokcache/ --metadata_trigger_phrase "@tag. " --cache_info --resolution 1024 --cache_latents --enable_bucket --min_bucket_reso 256 --max_bucket_reso 2048 --bucket_reso_steps 16 --resize_interpolation lanczos --pretrained_model_name_or_path /home/user/models/anima-preview2.safetensors --qwen3 /home/user/models/qwen_3_06b_base.s afetensors --vae /home/user/models/qwen_image_vae.sa fetensors --output_dir /home/user/myloras/output/ --save_precision bf16 --save_every_n_epochs 1 --save_state --train_batch_size 2 --xformers --max_train_epochs 10 --persistent_data_loader_workers --seed 999 --gradient_checkpointing --mixed_precision bf16 --logging_dir /home/user/myloras/logs/ --log_with tensorboard --optimizer_type AdamW --learning_rate 0.00003 --optimizer_args weight_decay=0.01 betas=0.9,0.99 --lr_scheduler cosine --lr_warmup_steps 0.1 --save_model_as safetensors --network_dim 32 --network_alpha 16 --network_dropout 0.075 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_dis k --timestep_sampling sigmoid --sigmoid_scale 1.3 --network_module networks.lora_anima --dataset_config /home/user/myloras/animaconfig.toml --max_grad_norm 1.0 --network_train_unet_only --split_attn
4 repeats so 5.2k steps in total.
>>
>>108671332
>not until we have hacked all senses at least.
we're truly not far off. we don't need smell. if anything excluding smell is a bonus for most. we have the visual and physical stimulation already it just needs to be all thrown together in a kit.
>Porn, the industry itself is already kind of being killed by OF and "amateur" before that.
even most "amateur" stuff was being ran by essentially a pimp.
>OF, yeah but it's imploding by itself
Yeah it's not really "imploding" but the fake AI girls are starting to get more subscribers than the real thing. OF is nearly going to become entirely fake AI girls and the retards paying for it don't care they're gooning retards who just want a weird parasocial relationship. AI sex bots are ultimately the future, though. The price of pussy and affection will soon be at an all time low.
>>
File: _AnimaPreview3_00335_.jpg (364.4 KB)
364.4 KB JPG
>>
>>
>>
>>
File: Ernie-Image-Turbo_00018_.png (1.9 MB)
1.9 MB PNG
>>108671341
>prompt
>>108671409
>Caucasian
>Ernie
>>
>>
>>
>>
>>
File: 1776600359777472.png (2.1 MB)
2.1 MB PNG
>>108671314
the problem with nbp is that it struggles with creating unique looking faces and attractive bodies
also gpt image seems to handle prompts differently to any other model, slight changes can make or break a gen so its not fair to use the same prompt
>>
>>
>>108671500
meant for
>>108671341
>>
>>
>>
>>
>>
>>108671502
ZIT has been aggressively post-training RL'd, to the point where they destroyed all seed variance. Pretty women are a result of that (good luck getting big boobs though)
ZIB swings wildly in terms of how a human should look like, because it's a rawer base model.
>>
File: _AnimaPreview3_00354_.jpg (484.9 KB)
484.9 KB JPG
>>108671341
>>
>>
>>
File: Comfy_00021.jpg (1.7 MB)
1.7 MB JPG
>>108671341
zit
>>
>>108671271
I have been curating my own realism dataset last few weeks. Aiming for 1k mark.
It's roughly:
35% solo women
10% various backgrounds
20% various sex acts
The remaining are miscellaneous shit (transports, actions, objects, plants, etc.)
I am yet to prune and caption.
I will post if I proceed to train and it goes well.
Open to suggestions in terms of dataset curation for realism lora task.
>>
>>
>>
File: Comfy_00022.jpg (1.7 MB)
1.7 MB JPG
>>108671674
4U
>>
File: 1761204260818296.png (1.4 MB)
1.4 MB PNG
>>108671653
>>108671697
Anima + that realism turbo lora
https://civitai.red/models/1862761/nicegirls-ultrareal?modelVersionId= 2882216
>>
File: Comfy_00023.jpg (1.9 MB)
1.9 MB JPG
>>108671709
does anima do 'natural language' prompts? try 'deep depth of field'
>>
>>
>>
>>
>>
>>
why is my anima lora so small? 66mb? I have a huge dataset so why is this so small? if I keep training is it going to jump up in size? epoch 1-8 have barely any bytes added. i'm using that anima standalone trainer but I'm not sure if I should just use russ's diffusion-pipe or what? help a nigga out pahLEASE.
>>
File: _AnimaPreview3_00341_.jpg (295.4 KB)
295.4 KB JPG
>>108671659
Start pruning, cropping and captioning while you collect images. You'll end up with much higher quality dataset.
>>108671687
I will have to retrain, but it's not far off
>>
>>
>>
>>108671769
what rank did you use? and what training res?
>>108671659
make sure to resize/crop them to good "normal resolution" sizes (divisible by 32 preferably but 16 ok)
get more people in groups/pairs
obviously anything larger than 3-4MP would be best, i aim for 5-10mp if i can find sources that high, and just let the trainer downscale them (or run a script)
get more general stuff too (animals/pets, cars, locations, etc)
even if a realism lora doesnt work out, that would be a nice dataset for reg images
>>
>>
>>
>>108671797
and of course use LLM to caption as much as possible
>>108671801
ah i misread then. i'm going for the opposite, no bokeh/blur, the opposite of >>108671775
>>
>>
File: ss_04-23-2026_006.png (32.3 KB)
32.3 KB PNG
>>108671797
>what rank did you use? and what training res?
16 rank but like I said my dataset is huge. every image has a max pixel resolution on x or y of 1024. when I trained on pony it just werked so not sure what I'm doing wrong.
>>
>>
>>108671812
>>108671828
Stop thinking in terms of what the correct photography terms are, these models are generally not trained with that stuff. Bokeh in positive prompt = blurry background, put it in negative if you want a sharp in focus background, simple as.
>>
>>108671837
>16 rank
well there you go
huge dataset means incrase yur rank to at least 64 (128 if you have over say 300 images)
you can always reduce it back down later but it wont train that well at rank 16
and if your'e doing large dataset + high res (1024+) then yeah you gotta push to at the very least 64, i'd go 256 if your'e able to
>>108671848
you do it your way then, i've tested this extensively and it does matter with models using LLM (or at least t5) as text encoder
>>
File: 1747837639991977.png (2.4 MB)
2.4 MB PNG
>>
>>
>>
>>108671774
I have cropped the watermarks etc. while collecting them. Going through more than a thousand images at one go would drive me crazy.
I will use some llm to caption. I prefer to caption when dataset is complete.
>>108671797
I know there is possibly some quality to squeeze by manually preparing but I will just let trainer's lanczos to do the job. I will use bucket step of 16 since dataset is large, shouldn't be too disruptive to images.
>get more general stuff too (animals/pets, cars, locations, etc)
Yeah around one third of the dataset is "general stuff".
To some degree I need to focus on the primary purpose of the lora though (1girl and coom.)
>>
>>
>>108671775
>>108671887
>:3
OWO WHATS THIS?
https://www.youtube.com/watch?v=7mBqm8uO4Cg
>>
>>108671890
>I have cropped the watermarks etc. while collecting them. Going through more than a thousand images at one go would drive me crazy.
I went to the extreme on my 1800+ data set. I literally photoshopped out watermarks and signatures by hand lmao.
>>
>>108671837
Irrespective of dataset size I would go above 16 rank for cramming multiple characters on a single lora.
>>108671880
256 might be too much even for 1835 images. These characters aren't complex enough to warrant it.
128 should work better.
>>
>>108671880
np
>>108671890
>I will use bucket step of 16 since dataset is large
the reason to crop/resize is if your reso doesnt fit a bucket it'll get ignored, and if the buckets dont have enough images to get filled, they dup (or drop, depending on the trainer). each trainer does it differently too so you gotta adapt or at least keep the resolutions at the expected "normal" sizes. and lately most images found online are random res since people crop/screenshot without thinking
but your dataset's big enough you might not notice that too much
if you're just traiing nsfw women+sex, then keep the rest out or use it as reg dataset. otehrwise your lora's just gonna learn a bunch of crap and not work too well for the purpose
>>108671911
there's ways to avoid doing that nowadays but yah been there done that
>>108671914
>Irrespective of dataset size I would go above 16 rank for cramming multiple characters on a single lora.
this too
>>
>>
>>
>>108671918
>>108671914
Nicesu, thank you helpful anons.
>>
File: kekestone will probably love it AWOOO.jpg (516.4 KB)
516.4 KB JPG
Babe wake up, another pixel space image model got released
https://pixeldit.github.io/
https://github.com/NVlabs/PixelDiT
https://huggingface.co/nvidia/PixelDiT-1300M-1024px
>>
File: 1759732755864422.png (954.9 KB)
954.9 KB PNG
>>108671964
VAEs BTFOOO
>>
>>108671918
Aren't buckets based on aspect ratios? 4000x2000, and 2000x1000 belong to the same bucket after resizing.
At least that's how sd-scripts do it I believe.
I am not certain but I don't think it is dropping images.
>if you're just traiing nsfw women+sex, then keep the rest out or use it as reg dataset. otehrwise your lora's just gonna learn a bunch of crap and not work too well for the purpose
No I also want it to be a decent general purpose realism lora. It's just that 1girl coom is the primary purpose of anima.
The reason I am going out of my way to hand curate thousand images is to cram both into a single lora.
>>
File: lmaooo.png (414.8 KB)
414.8 KB PNG
>>108671964
>Text encoder: Gemma-2-2B-IT
Nvidia bros are still living in 1983
>>
>>
>>108671984
yes, and i think only sd-scripts (or the kohya gui to be specific) will crop resolutions that dont fit to a close bucket, but all the rest dont, they just skip images with resolutions not fitting a bucket aspect ratio and/or use less image in that bucket (aitoolkit) and/or fill it with dupes (diff-trainer) unless they fixed that shit in the past couple of months
>to cram both into a single lora.
do two loras, one with all the images, one with just the sex/girl stuff
the full set will lean heavily toward coom material so it may not work well as a general realism, for that you'd need way more general stuff to cover many more concepts. with a heavy bias toward sex/girl stuff you're just gonna get more of that
>>108671998
this would be like the third "pixel space" model that i've seen reported this year, not counding furraidiance
>>
>>
File: file.png (268.3 KB)
268.3 KB PNG
>>108671964
that's the most important part, they show that removing the VAE helped them reaching new heights, that's promising as fuck
>>108672010
really? I thought he was using the pixnerd method, this one is something new
>>
>>
>>
Fuck, I managed to get a scraped NovelAI API key, and honestly, there's no comparison. They won. I don't want this key to stop working. NAI is perfect, it's incomparable. The colors, the style... the style. That copyrighted style cannot be matched by any anime model out there. No LoRA can even come close. And the speed.. generating kino images in 5 seconds, my God. And to think I used to underestimate it before I got this key...
>>
>>
>>
>>108672030
>he goes around claiming to have created the first vaeless model
if he really said that then it's retarded as fuck, like all he does is just copy papers, the same papers that created those models before him
>>
Never again. Never again Anima. Never again SDXL. NAI won, NAI won by a landslide. Local anime simply isn't worth it anymore. It's beyond saving, there's no fix, no cure. There aren't enough skilled people working on it, and there probably never will be. NAI won, and it pains me deeply to say it, it genuinely hurts, but now that I'm using it again, I can clearly see it's superior in every single aspect.
>>
>>
>>
>>
>>
File: wen.png (727.4 KB)
727.4 KB PNG
>>108671964
wen comfyui?
>>
>>
>>
I am completely defeated, defeated by beauty, by aesthetics, by pure quality by GOOD TASTE. It's been so long since I've seen intrinsic quality in my local generations. I had forgotten what beauty even looked like, and NAI made me rediscover it.
>>108672054
No. Anima has no beauty, no quality. It has stiffed intelligence alone, it lacks life, it lacks beauty. That feeling of looking at an image and instantly falling in love with it... I hadn't experienced that in a long time and NAI brought it back.
>>
>>
File: 2026-04-23-03-00-25_00001_.png (2.5 MB)
2.5 MB PNG
>>
>>
>>
>>
>>108671964
it really makes no sense to me that nvidia uses weird ass licenses like that
its not like their money comes directly from training ai models apart from shit like dlss which is tied to their gpus anyways
>>
>>
File: kantokus_21.jpg (601.2 KB)
601.2 KB JPG
>>108672032
>>108672044
I'll take the bait.
NAI's 4.5 model is worse than anima (although inpaint, vibe transfer and PT are really good). As an example, try getting any kind of reasonably complex backgrounds out of NAI.
>>
>>108672096
No anon, no. I'm looking at my local gens from the past year since I started using local, and I can tell you with confidence that Anima is junk food. Anima is slop. I don't want to touch any local anime model ever again, and it's not personal with Anima specifically, but with the entire local ecosystem as a whole.
>>
>>108672129
get your ai genned text posts the fuck outta here, faggot. I hate you api niggers, we're not switching to your token system. my shit will always work on my computer I don't care if it's several magnitudes worse than your shit ass cloud. fuck you cock sucker nigger faggot.
>>
No, no, no. Everything is wrong with local. Nothing makes sense. Everything is half baked, everything is low quality, everything is left to the free will of the community, and the community produces something truly awful.
>>108672121
this is not copyrighted my friend, tdrusell fears copyright
>>
>>
>>108672129
Then go away: >>108653190
Oh right, you pissed yourself having to share the same room with Nano Banana Pro and GPT-Image-2.
>>
File: 1756755676500602.png (495 KB)
495 KB PNG
>>108671964
I can see lodestone switch to that method, it's way more accurate than pixnerd
>>
File: pixelDit0001.jpg (212.3 KB)
212.3 KB JPG
>>108671341
>>108671964
PixelDiT
>>
>>108672140
I'm using a translator dude, and I want you to know that I completely agree with you. I HATE NAI. I HATE that something so good is out of my reach, that I can't own it, that I can't have it in a physical form.
I HATE NAI.
I HATE NAI FOR BEING SO PERFECT
>>
>>108672116
Nvidia has like 4 licenses they release stuff under and they seemingly choose it at random. Off the top of my head: apache 2, nvidia open weights commercial license, nvidia research license, and now this one. They really ought to keep things simple and just make everything apache 2.
>>
>>
>>
File: Make it look like a photo, okay? A realistic photo! No CGI look! A spontaneous photograph capturing .jpg (212.6 KB)
212.6 KB JPG
>>108672180
i'm trying to copy the sample prompts which come out okish but not this one
>>
>>
>>
>>108672161
>>108672187
can you also try that model >>108670859
>>
>>
>>
>>108672121
The character is stiff and doesn’t say anything. Its only intelligence shows in building the background and making the character ride a scooter, but the image itself doesn’t make you fall in love with it. It doesn’t say anything, it’s an image without life, it has only 2026 level intelligence.
>>
>>108672161
Dios mio...
>>108672187
Okay, now that's just genuinely scary.
>>
>>108672203
Yes, for enterprise shit where the money is.
I am very skeptical the small subset of AMD users who might bother with these weird research experiment models are considered at all when writing the license.
>>
File: PixelDIT00002.jpg (150.1 KB)
150.1 KB JPG
8-bit scroller on PixxelDiT
>>108672205
just the commands on the github (it downloads the model so dont bother with the HF files). i just changed the "prompt.txt" file to whatever, using same neg and other CLI options shown on the example command
it takes maybe 5-8gb? the model is like 5gb
>>108672199
>60.3 GB
nope lol
maybe later if no one else does
>>
>>
>>
>>
>>
>>
>>
>>108672205
>>108672224
>python inference.py --config configs/PixelDiT_1024px_pixel_diffusion_stage3.yaml --model_path pixeldit_t2i_v1.pth --txt_file prompts.txt --custom_height 1024 --custom_width 1024 --cfg_scale 2.75 --seed 2025 --negative_prompt "low quality, worst quality, over-saturated, blurry, deformed, watermark" --work_dir "."
i used the comfyui venv, only needed a couple packages from the requirements.txt
didnt even unload comfy lol
>>108672228
50902026-04-24 06:15:33 - [PixDiT] - INFO - Inference with torch.bfloat16, guidance_type: classifier-free, flow_shift: 4.0
loading text encoder from Efficient-Large-Model/gemma-2-2b-it
Loading checkpoint shards: 100%|| 2/2 [00:00<00:00, 57.57it/s]
2026-04-24 06:15:44 - [PixDiT] - INFO - PixDiTTrainer:PixDiTTrainer, Model Parameters: 1,311,388,547
2026-04-24 06:15:44 - [PixDiT] - INFO - Generating sample from ckpt: pixeldit_t2i_v1.pth
2026-04-24 06:15:46 - [PixDiT] - WARNING - Missing keys: []
2026-04-24 06:15:46 - [PixDiT] - WARNING - Unexpected keys: []
2026-04-24 06:15:46 - [PixDiT] - INFO - Saving images at ./vis
2026-04-24 06:15:46 - [PixDiT] - INFO - Eval first 1/1 samples
2026-04-24 06:15:46 - [PixDiT] - INFO - Sampler flow_dpm-solver
2026-04-24 06:15:46 - [PixDiT] - INFO - Inference with torch.bfloat16, guidance_type: classifier-free, flow_shift: 4.0
100%|| 49/49 [00:03<00:00, 12.51it/s]
>>
>>108672224
>the model is like 5gb
Oh so they released it on fp32. No one has bothered with that for a while.
Coupled with >>108671990, it makes me think that they trained this model a while back but only releasing now.
>>
File: 1760908533743.png (2.6 MB)
2.6 MB PNG
>>108672238
zeta isn't pixel. Only radiance is.
>>
>>
>>
>>
>>
>>
>>
File: PixelDIT0003.jpg (196.2 KB)
196.2 KB JPG
it's not very good at complex models unless you do the exact sample prompts provided by them lel
>>
>>
>>
>>
>>
>>108672273
>>108672284
zeta = zit+chroma
kaleidoscope = klein4b+ radiance
radiance = pixel space
there may be another test of his i missed
>>108672300
huh i missed that
>>
>>
>>
>>
File: awooooooo.jpg (3.3 MB)
3.3 MB JPG
>>108672300
>No he decapitated Z-Image
based Robestone
>>
>>
actually the noisiness is much less bad when you change the saving format from jpg to png in inference.py
>>
>>
>>
>>
>>
File: A blonde woman, busty, full body, bikini, looks straight into the camera, soft light, shot on Agfa V.png (1.4 MB)
1.4 MB PNG
>>108672364
yeah but its also one of their cherrypicked prompts and seeds
>>
>>
>>
>>
>>
>>108672411
the simple fact that it looks like your regular diffusion model (VAE) at the same size tells the whole story, we're talking about a pixel space model, those things are supposed to look like shit, look at kekestone's attempt, this shit is new and hard to master
>>
>>108672161
>>108671964
this will be the next model comfyorg and tdrussel choose btw
>>
>>
>>
>>
>>
File: 1746778314755135.png (2.1 MB)
2.1 MB PNG
>>108672161
>>108672387
seems like local is regressing all the way back to sd 1.4 era
this is just sad at this point
>>
>>108672468
dude ive been through this so many times already
if you want to use the 1b excuse then they should have made the model bigger (2b anima already looks good)
there is so much unusable dogshit coming out that will always be unusable dogshit and i dont have the patience to copium and hopium for the 50th time
>>
>>108672485
>if you want to use the 1b excuse then they should have made the model bigger (2b anima already looks good)
remind me what anima used as a base model, fucking cosmos 2b, do you remember how bad this model was? there's a whole universe between a base model and a highly finetuned model, come on dude
>>