Thread #108604726
HomeIndexCatalogAll ThreadsNew ThreadReply
H
Discussion and Development of Local Image and Video Models

Previous: >>108597963

https://rentry.org/ldg-lazy-getting-started-guide

>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP

>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows

>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe

>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/

>Qwen
https://huggingface.co/collections/Qwen/qwen-image

>Klein
https://huggingface.co/collections/black-forest-labs/flux2

>LTX-2
https://huggingface.co/Lightricks/LTX-2

>Wan
https://github.com/Wan-Video/Wan2.2

>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46

>Illustrious
https://rentry.org/comfyui_guide_1girl

>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage

>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg

>Local Text
>>>/g/lmg

>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
+Showing all 305 replies.
>>
>mfw Resource news

04/14/2026

>ERNIE-Image: Text-to-image generation model built on a single-stream Diffusion Transformer
https://huggingface.co/baidu/ERNIE-Image

>Danbooru Dataset Filter: High-Speed Metadata Explorer for AI Training
https://github.com/ThetaCursed/Danbooru-Dataset-Filter

>ChatGPT will praise the mood and 'bedroom/DIY texture' of fart sounds pulled from YouTube
https://www.pcgamer.com/software/ai/chatgpt-will-praise-the-mood-and-bedroom-diy-texture-of-fart-sounds-pulled-from-youtube

>RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
https://limuloo.github.io/RefineAnything

>Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation
https://github.com/leeruibin/hybrid-forcing

>Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
https://jinnh.github.io/E-Bridge

>FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data
https://github.com/yuandaxia2001/FashionMV

>Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolution
https://github.com/jiyang0315/DASP-SR.git

04/13/2026

>LTX 2.3 Distilled v1.1
https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-1.1.safetensors

>UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via-Compressed-Continuous-Semantic-Representations

>CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation
https://catalogstitch.github.io

>Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement
https://github.com/Metaverse-AI-Lab-THU/ImViD

>Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
https://github.com/gezbww/Vis_Prompt

>MixFlow: Mixed Source Distributions Improve Rectified Flows
https://github.com/NazirNayal8/MixFlow
>>
>mfw Research news

04/14/2026

>EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model
https://editcrafter.github.io

>VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation
https://arxiv.org/abs/2604.10127

>FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
https://arxiv.org/abs/2604.10954

>AIM-Bench: Benchmarking and Improving Affective Image Manipulation via Fine-Grained Hierarchical Control
https://arxiv.org/abs/2604.10454

>Continuous Adversarial Flow Models
https://arxiv.org/abs/2604.11521

>OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
https://arcomniscript.github.io

>Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation
https://arxiv.org/abs/2604.10837

>Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
https://arxiv.org/abs/2604.10546

>Rethinking the Diffusion Model from a Langevin Perspective
https://arxiv.org/abs/2604.10465

>Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
https://arxiv.org/abs/2604.11177

>SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
https://arxiv.org/abs/2604.11530

>Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
https://arxiv.org/abs/2604.11496

>LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning
https://arxiv.org/abs/2604.11091

>Agentic Video Generation: From Text to Executable Event Graphs via Tool-Constrained LLM Planning
https://arxiv.org/abs/2604.10383

>Omnimodal Dataset Distillation via High-order Proxy Alignment
https://arxiv.org/abs/2604.10666

>What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
https://arxiv.org/abs/2601.06165
>>
ok ernie turbo is fucking garbage at prompt following
>>
>>108604751

forgot prompt
>A photorealistic candid photo of a woman with long, flowing hair that transitions from icy white at the roots to vibrant cyan-blue at the tips, cascading over her shoulders and partially obscuring her face as she looks downward. She wears a form-fitting, sleeveless top with a high neckline, primarily white with bold geometric yellow trim and a large, faceted blue diamond-shaped emblem centered on the chest. The garment has a structured, armored appearance with gold-brown segmented panels along the waist and hips, suggesting a fantasy or sci-fi outfit. Her right hand rests on a smooth, light-colored surface in the foreground, fingers slightly curled. The background is an out-of-focus twilight landscape under a deep indigo sky, with a soft gradient of magenta and purple along the horizon. A faint, glowing horizontal line runs across the lower portion of the frame, possibly a railing or edge of a platform. The lighting is directional, casting soft shadows and highlights on her hair and clothing, emphasizing texture and form with natural depth and contrast. No text, speech bubbles, or tears are visible.
>>
>>108604754
wait nvm im gay, fucked up a setting
>>
can some littlebox or gofile some nsfw gens of ernie image? the huggingface demo is too censored.
>>
>>108604759
But can it do anime loli porn?
>>
File: Ernie.png (2.1 MB)
2.1 MB
2.1 MB PNG
>>108604759
>no edit
that's a shame, imagine doing edit with such a monster of a model, the prompt following is on another level, can't believe it's using a simple 3b text encoder to get that shit, and fucking ministral of all things
>>
>>108604786
ZAMN!
>>
>>108604759
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/image_ernie_image_turbo.json
https://huggingface.co/Comfy-Org/ERNIE-Image
>AttributeError: 'Ministral3_3B' object has no attribute 'generate'
thanks Comfy
>>
>>108604759
Can it do nude?
>>
>>108604759
Can it do shrek?
>>
>>108604759
bruh, turbo has garbage anatomy, downloading the base model
>>
>>108604759
buy an ad
>>108604810
have you pulled?
>>
>>108604842
>implying the monk didn't cultivate enough to master the four immeasurables and grow two extra arms
lol?
>>
The gen times for non-turbo on my 3060 is a bit slow, 2 and half minutes for 20 steps, probably needs more steps, but it's not unusually slow for a model of this size.
Let's see how it holds up further testing.
>>
>>108604817
>Can it do nude?
https://litter.catbox.moe/9z9qwbnxpflyqt27.jpg
>>
>>108604751
>>108604763
What did you fuck up so I can avoid it
>>
>>108604861
I see you tasted the base model, I hope it's the good one, I don't really like my tests on turbo so far
>>108604843
yes I'm on the latest version, seems like comfy hasn't implemented the prompt rewriting yet
https://github.com/Comfy-Org/ComfyUI/pull/13395
>Needs template before it works properly.
>>
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_A14b_high_noise_lightx2v_4step_720p_260412.safetensors

Why is the high and low noise close to 60gb?
>>
>>108604759
What VAE does it use?
>>
>>108604817
>>108604862
https://litter.catbox.moe/tz2g5anklf3bmmmt.jpg
as expected, garbage genitals lol
>>108604879
the best one, flux 2's vae
>>
>>108604817
>>108604772
It hasn't been trained on boobs, it generates mediocre breasts. Though from my very limited testing it doesn't seem to be deliberately poisoned like Flux models are.
>>108604871
I just had a feeling that the distill will be problematic and went for the base immediately.
>>
>>108604888
Is this turbo or base
>>
>>108604889
>I just had a feeling that the distill will be problematic and went for the base immediately.
good, was about time that we got a fully finetuned model that isn't distilled, no need for some NAG cope, we can directly use CFG, and we'll be able to train and make loras on it
>>108604893
turbo
>>
>>108604872
FP32 precision.
4 bytes for every weight:
14b x 4 = 56
>>
>>108604906
Huh, I haven't seen the fp32 version before.
>>
>>108604842
>downloading the base model
I really don't like the anatomy, like this is base at 50 steps, come on
>>
>migu
:)
>>
>>108604940
smells like more and more like a nothingburger, the realism quality is Klein tier, but ernie can't even edits to compensate, sad
>>
>>108604940
I am wondering if Comfy fucked something up, or did they do Chroma-tier cherry picking for the images?
>>108604922
FP32 is usually only used for training because the benefits to inference are almost non-existent.
>>
>>108604974
50 steps turned out better.
Seems also a bit wild when it comes to adding shit to the image. First time I have seen AI add a knife to 1girl, standing prompt unsolicited.
>>
>>108604959
>>
>>108604983
Oh I think image is so different due to the fact that Control after generate is bugged with the retarded subgraph Cumfy has shipped with the template. So it ran a whole new seed.
The point about knife stands though, same prompt.
>>
>>108604991
>>
File: o_00245_.png (390.7 KB)
390.7 KB
390.7 KB PNG
>>
>>108605000
Ernie knows only one anime style: "Nano Banana Pro"

:]
>>
>>108605023
kek, I think I've seen enough
>>
>>108605045
maybe turbo at 16 steps is the best it can get
>>
File: file.png (1.5 MB)
1.5 MB
1.5 MB PNG
ernie base with the default settings and default prompt in comfyui gave me a guy with 3 legs.. not a great start
>>
>>108605060
Z-image turbo be like:
https://youtu.be/WO23WBji_Z0?t=10
>>
One of the better gens I got.
Still has this Kleiny look to it.
>>
File: o_00247_.png (1.7 MB)
1.7 MB
1.7 MB PNG
>>
>>108605080
something is wrong with the proportion of their body, looks like they're midgets, Flux Kontext style lool
>>
>>108605064
>3 feet
>>108605080
>3 hands
lol I think I won't downloading this
>>
it's all right, the jews will save us
https://xcancel.com/ltx_model/status/2044108750592643279#m
>>
This model has been trained on 3 billion images of Nano Banana Pro kek.
>>
>>
>>108605126
>This model has been trained on 3 billion images of Nano Banana Pro kek.
Z-Image supremacy, yeaaah! We had Qwen Edit and then the Tongyi model/s, but all other Chinese t2i are all equally sloppy, GLM, this, whatever.
>>
I am kinda liking things about it despite it's faults.
But they probably either overcooked this thing or it needed a little bit of post training aesthetic alignment to temper schizo anatomy.
>>
File: o_00252_.png (770.7 KB)
770.7 KB
770.7 KB PNG
>>
>>108604974
>I am wondering if Comfy fucked something up
I think the model is just not that good, in my tests it's inferior to Z-image turbo almost everywhere
It can be a great base model to train on though, but yeah, 8b is big, people prefer something smaller like 2b so that they can do Anima type of models or some shit
>>
>>108605183
>8b is big
>>
>>108605183
yup, same experience, back to zit for me
>>
>>108605183
>it's inferior to Z-image turbo almost everywhere
the niggas thought that training a model only on Nano Banana Pro's images would do the trick, all we got is that Synth-ID watermark pattern everywhere lmao, once again, synthetic data BTFO
>>
>>108605115
oops, forgot to attach their paper
https://arxiv.org/abs/2604.11788
>>
>>108605183
I think there are issues with finetuning klein and ZIB for some reason.
If it responds to training well this look salvageable. Decent text encoder + best vae + good size balance between quality and being able to be run on most hardware + OK quality bar anatomy issues + mid instruction following but can be possibly ironed out.
I hope someone besides Kekstone takes a crack at it.
>>108605208
Can't we improve realism with finetuning/lora? I know training on slop sucks but banana pro is really high quality baseline.
>>
File: weird.png (97.8 KB)
97.8 KB
97.8 KB PNG
>>108604759
>https://ernieimageprompt.com/
or else something is wrong with ComfyUi, or those baidu fucks are straight up lying to us, I'm not getting something even close to those images in that site
>>
>>108605236
Chinks lying? How can it be...
>>
I love to complain about the jpeg artifacts on Z-image turbo, but for Erenie we arrived to a whole other level, jesus this is ugly af
>>
>>108605262
I don't think those are jpg artifacts, probably the watermark patterns of NBP >>108605126
>>
>>108605262
Is this Turbo? I am not really getting these on the Base.
>>
turbo seems more slopped overall, and if there's one thing I can say base does better than Z-image turbo, is that it seems to know more stuff, but knowing more stuff is useless if the anatomy is ass and the realism is not even close too
>>
>comparing z turbo to ernie base
Why not compare base to base tho
>>
>>108605278
I think you are right anon, base doesn't seem to have that much noise
>>
>>108605321
as a ""base"" model it looks like it's destroying Z-image base, let's hope that we can train it well then, both ZIB and Klein had their issues
>>
I don't see anything in which Ernie is the best at, Chroma has the best kino, Z-image has the best realism and anatomy, this shit is just slop after slop
>>
>>108605317
it's been compared here >>108605080
>>
>>
File: kek.jpg (830.4 KB)
830.4 KB
830.4 KB JPG
>>
File: now what?.png (113.9 KB)
113.9 KB
113.9 KB PNG
the ledditors are loving it though
https://www.reddit.com/r/StableDiffusion/comments/1slg4wh/we_may_have_a_new_sota_opensource_model/
>>
>piggies love slop
STOP THE PRESSES A FROGFAG IS SPEAKING !!!
>>
Can't the chinks do anything else than just make cheap copies of murica's products?
>>
>>
>>
>>108605408
>>
>Tezuka Rin \(katawa shoujo\) sitting on a bench
is that how you're supposed to prompt on Anima? I can't manage to get her
>>
>>108605115
distilled seedance 2.0 (ltx 4) and kazar milkers honeypot spy gf was promised to me 6 gorillion years ago but unironically.
>>
>>108605262
>>108605278
>>108605276
i never had the artifacts problem with zit, just dont use the suggested retard samplers and instead use:
euler (/euler_a) + simple (/normal)
>>
>>108605468
Yes for tag based prompts but I don't think there is full consensus on how to prompt characters when prompting with natural language. Try Tezuka Rin from Katawa Shoujo.
If all options are exhausted try it on preview 2.
>>
>>
>>
File: blaze it.png (1.5 MB)
1.5 MB
1.5 MB PNG
>>108605468
>Tezuka Rin from Katawa Shoujo, a girl with short messy red hair and green eyes and no arms, sitting on a wooden bench, wearing her school uniform, calm distant expression, soft afternoon light, On the left knee there's a plush of Hatsune Miku, on the right there's a plush of Kazane Teto
skill issue
>>
https://xcancel.com/DylanTFWang/status/2043952886166761519
>Open-source tomorrow
damn, if it's not too big to run locally maybe Tencent finally cooked
>>
big jump in real time interactable video gen

Waypoint-1.5 apache2 first person shooter focused 1.2b 720p 512 frames of context 56fps on 5090, need at least 30xx

online demo https://www.overworld.stream/
https://github.com/Overworldai/world_engine
>>
>>108605539
Anons what's the actual use case for this world model thing?
Every single world model I see looks like "cool tech demo you play for five minutes and then never touch again".
>>
>>108605539
forgot that link too
https://3d-models.hunyuan.tencent.com/world/
>>
>>
>>108605552
newfag. luddite. brown, even.

the point is to enjoy the cool new tech and tinker with it while thinking about how you can maybe use it and change it yourself now while also thinking about how cool it will be in a year from now on.

for example chaining multiple generated rooms you can traverse infinitely is a software problem and thus solveable relatively easily while allowing you to get much more out of that tech there.
>>
>>108605550
>512 frames of context 56fps on 5090
So? less than 10 seconds? lol
>>108605552
desu I'd enjoy lurking on a world made out of a cool drawing image, like this shit
>>
A very sloppy double exposure sloppa.
>>
>>108605586
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Lightx2v
kijai made the loras out of the new lightning version of Wan 2.2
>>
There are some who call me...Tim
>>
>>
audio in ltx 2.3 1.1 seems nicer. we wuz hogwarts:

https://litter.catbox.moe/hnjzczuml64krkjr.mp4
>>
>>108605592
not bad, Wan 2.2 may be an ancient model, it's still the best thing we have :')
>>
>>108605651
that's cool, I was tired of the ultra metalic sound of ltx, if those jews keep improving on that shit it might end up being a genuinely good model, still a long way to go to seedance 2.0 though lol >>>/wsg/6128285
>>
>>108605654
>first frame + last frame
kek, I forgot how much vram wan 2.2 is asking, I think I might return to LTX just for that
>>
>>108605627
lul, did you combine monty python screenshot with the cat meme?
>>
>>108605686
What do you mean by that, isn't LTX to heavier on resources?
>>
>>108605701
it uses a less heavy VAE so the kv cache usage is less punitive, good luck going for 720p on wan 2.2
>>
>>108605723
this. i can make 720p resolution gens on ltx. literally impossible on wan-hunyuan
>>
>>108604726
does any of this shit run simply and reasonably well on AMD cards yet?

I have tried multiple times over the last couple of years to get a functional pipeline up and running on my 6800xt 16gb and it has never once worked

I'm no genius but I'm also not retarded
>>
>>108605689
yeah
>>
MLEM MLEM MLEM HECKIM CHNGUS
>>
>>108605813
if linux rocm + forge neo work fine
if windows i pray for you
>>
>>
File: BULLSHIT.png (632.6 KB)
632.6 KB
632.6 KB PNG
https://youtu.be/XUxKm40X__g?t=907
benchmarks was a mistake...
>>
using ltx 2.3 ver 1.1 (new one):

the man says "I'm LITERALLY Ryan Gosling in the movie Drive", and the camera zooms out through the windshield as he speeds down the road in new york city at night.

https://litter.catbox.moe/ekn0ujlh88fd37ox.mp4
>>
https://xcancel.com/flowersslop/status/2043591433731408126
those retards at Ernie should've trained their model on GPT-image 2's output instead lool
>>
>>108606080
when it zooms out it looks like some video game LOD trick shit when the character gets has lower and lower polygons as he moves away from the camera kek
>>
yo is anima good yet?
>>
>>108606080
this turned out better

the man says "I'm LITERALLY Ryan Gosling in the movie Drive", and the camera zooms out very far through the windshield as he drives the car off a ramp on a road in new york city at night.

https://litter.catbox.moe/z04qhvm91v3etfmw.mp4
>>
>>108606080
Omg he's litter.catbox moe
>>
>>108606114
not bad at all actually, if you want you can also share your videos here, it allows sound >>>/wsg/6126746
>>
>>108606114
>>108606080
are you using the distilled model or the base model + you apply the distilled lora on top of it?
>>
babe, wake up, a second image model has hit the tower
https://huggingface.co/NucleusAI/Nucleus-Image
>>
>>108606190
>We release the full model weights, training code, and dataset, making Nucleus-Image the first fully open-source MoE diffusion model at this quality tier.
kek, if they release the dataset it means they trained this shit with only copyright-free garbage, DOA
>>
>>108606190
>>
>>108605855
rip
>>
>>108606219
this is so deceptive, it's not a 2b model, you still need to load the whole model (17b) to get this shit running
>>
>>108606219
we're supposed to take them seriously when they don't even put anima on the board?
>>
>>108606219
>no Z-image
>no Flux 2
kek
>>
File: 42.png (7.2 KB)
7.2 KB
7.2 KB PNG
>>108606219
geg
>>
>>108606190
why does every company insist on training on utter trash datasets
>>
>>108606219
>lumina
>janus
>hidream
>sana
Holy throwback
>>
File: 242605513.png (228.6 KB)
228.6 KB
228.6 KB PNG
>>108606275
even pixart BIGMA is in their report, a shitton of models I've never heard about or only heard on release and never again
>>
>>
>>108606265
sar please the benchmarks
>>
>>108606265
saas doesnt
>>
>>108606317
thats why openai is training their new model on youtube shorts and lets play videos.
>>
>>108606333
impressive, maybe local should try that next instead of training on dall-e outputs
>>
>>108606356
they are open models, you can train them on whatever you want.
now feel free to have a meltdown about "loRa cope" and "shitmixes".
>>
Ernie Base, 20 steps
>Touhou Project characters in a screenshot of Diablo 1. Screenshot set in a gothic, candlelit cathedral dungeon — stone floors, blood-stained altars, flickering torches casting long shadows. Reimu Hakurei appears as a weathered warrior, clad in rusted plate armor with subtle Shinto motifs, wielding a glowing sword and heavy iron shield. Marisa Kirisame is a gritty sorceress, her blackened robe frayed at the edges, holding a staff crackling with low-res magical sparks. Patchouli Knowledge floats slightly above the ground like a corrupted cleric, surrounded by ancient grimoires emitting a ghostly glow. All characters match the sprite-based, isometric art of Diablo 1

>Visual fidelity must match Diablo 1’s aesthetic: muted earth tones, dark reds and greens, harsh shadows, dithering effects, and low ambient lighting. The entire composition should be a screenshot from a 1996 pre-rendered isometric dungeon crawler. Include UI elements.
Trying again with 50
>>
>>108606372
50 steps, I guess it's better?
>>
>>108605045
You saved me so much time downloading that garbage, I love you.
>>
is that new ltx2.3 version worth downloading? apparently it has much better sound or something
>>
>>108605813
On newer GPUs it should work, but 6800XT is not officially supported so far. I think the quickest way to try is to update your AMD GPU driver (to either 26.2.2 or 26.3.1), then download the latest ComfyUI portable AMD release from their Github, and see if it just werks:
https://github.com/Comfy-Org/ComfyUI/releases
>>
does okay with schizoprompts, but honestly it's just not good at fine details, this is base model at 50 steps and it should be way better for how long it takes to gen
>>
>>108606379
50 does keep the head directions and proportions more consistent-looking to me.
>>
>>
what diffusion model is best for modifying an image based on text input
>>
>>108606407
the sound is great, but the model is still slop. especially t2v. they cleary hired cheap indians devs. no kino at all for now
>>
>>108606481
>modifying an image based on text input
if you mean something like "change this sword to a baseball bat and make her a black woman." qwen image edit or flux 2 klein.
>>
why doesn't lodstone hire the greatest model trainer alive, Sarah Peterson?
>>
is there anything like 'Kohya Deep Shrink' for anima? i mainly just want a smaller initial latent for composition and then upscale it halfway without having to set up a double pass
>>
cozy
>>
>>108606472
too old
>>
>>108606370
>>
>>108606190
why are those people wasting money on making the most slopped shit ever, the fuck do they expect? no one is gonna bat an eye on such a piece of shit
>>
>>108606618
Based
>>
>>108606263
But still werks and is the best image model ever existed
>>
File: lmaoo.png (975.9 KB)
975.9 KB
975.9 KB PNG
>>108606190
absolute slop
>>
>>108606190
>we have zit at home
>>
>>108606219
Laxhar should train Noob2 on Qwen image. Yes, I know nobody is going to be able to run it, finetune it, and shitmerge it, but:

1. There are no good finetunings or shitmergers.
2. Most of them don't know what they're doing, or they call "improving the dataset" contaminating it with their slop.
3. It's better that this behemoth of a model only gets finetuned and updated by him and his team.
4. LLM bros have been renting GPUs to run their Noromaids since early times.

It's the best option regarding quality. At the end of the day, I want an anime image model that's excellent quality. It doesn't bother me to use a free trial from some GPU rental startup to be able to run it. Better to have good models that I can't run than to have millions of snake oil models that make me waste my time.
>>
>>108607186
on a 20b model? really? why not training ernie instead, the quality is similar, it has a better vae and it's a 8b model
>>
Finally, Tongyi has released what we've been waiting for!
>>
>>
>>108606647
I think Kohya Deep Shrink should work for anima but you need a different block number and you need to figure out what that is.
>>
https://xcancel.com/peter9863/status/2044269457086779877#m
babe wake up, Flow Matching is not the best diffusion architecture anymore
>>
File: file.png (3.7 MB)
3.7 MB
3.7 MB PNG
DED
>>
>>108607260
https://xcancel.com/bdsqlsz/status/2044308129043886119#m
it's obvious we're still far from having found the perfect way to train those image/video models, at some point it'll be so elaborate we'll get a 6b model as good as Seedance 2.0, we're still in the era of computers as big as a house and as powerful as a modern calculator lol
>>
>>
>>108607290
it's impressive how well it's able to reproduce the original image, tencent is shit at making models, but when it's about making cool new training methods they are definitely cooking
https://hy-soar.github.io/
>>
>>108607260
Sounds like another garbage p-hacked meme paper that will be forgotten desu.
They apparently trained Z-Image on this thing, but while the (most probably cherry picked) prompt adherence often looks better, the images look dogshit aesthetically and fried.
>>
>>108607352
glad that there's someone here that knows what it's talking about, what do you think of that method too? >>108607290 >>108607345
>>
>>108607363
The examples are better, includes more concrete benchs like OCR (Although these too can easily be benchmemed).
If I must criticize, there is relatively limited data about comparisons between SOAR and RL, despite "Better results than RL at the roughly same cost of SFT" being a central part of the paper's premise.
But overall looks more credible than the other paper.
Also, I have no idea what I am talking about.
>>
https://www.reddit.com/r/StableDiffusion/comments/1slz1rq/last_week_in_generative_image_video/
the absolute state of localkeking, while seedance 2.0 is making hollywood sweat, we're still trying to figure out how to make a local model count to 3
>>
>>108607345
>RL be like: "I must make everything realistic!"
is that why models are so slopped and biased towards "realism" nowdays?
>>
>>
>>108607433
>implying seedance 2.0 doesn't fall apart with multiple characters in a scene
>>
>>
https://xcancel.com/ErnieforDevs/status/2044290766349185257#m
Oh great, another Klein tier edit model
>>
>>108607521
>another Klein tier edit model
yeah, but this one will have the apache 2.0 licence, so it's a win in my book lol
>>
How well can Ernie do anime? I'm not interested in genning 3DPD.
>>
>>108607530
>How well can Ernie do anime?
>>108605023
>>108605126
>>
File: file.png (58.1 KB)
58.1 KB
58.1 KB PNG
just AI things
>>
>>
File: banished.png (3.5 MB)
3.5 MB
3.5 MB PNG
>>
So Comfy doesn't have Nucleus support right now, right?
I would try the inference code but I need offloading as I am a VRAMlet
>>
>>108607521
it's always the same. Model can generate sloppy looking, generic stock-image trash. Needs a stack full of loras for anything else. You might get lucky if they don't deliberately make the model untrainable.

I want a text to vid/image model trained on the entire EvilAngel catalog (including early Rocco Siffredi titles)
>>
>>108607606
that's probably why Alibaba will never release Z-image edit, it was just too good and unslopped for the gweilos
>>
>>
>>108607541
what sampler + scheduler for turbo?
>>
>>108607663
the same as the one on the official comfyui's template
>>
>>108607199
>the quality is similar
Heh
> it has a better vae
I still use sdxl, VAE never was a problem for me but a spook
>>
>>108607694
>the same as the one on the official comfyui's template
no such thing exists
>>
>>108607707
>I still use sdxl, VAE never was a problem for me but a spook
>>
>>108607713
>no such thing exists
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/image_ernie_image_turbo.json
>>
GGUF version seems completely broken
>>
>>108607521
speaking of edit models, why Comfy didn't implement that one?
https://github.com/jd-opensource/JoyAI-Image/tree/main/joyai_image_comfyui
>>
what model should i use to put a woman on a cross with nails n shit
>>
https://xcancel.com/bdsqlsz/status/2044317768414310633#m
>Illstrious SFT based on Z-image-turbo with S3-DiT.
Are we back?
>>
>>108607834
piece of fucking garbage
>>
>>108607834
Animasissies... our answer?
>>
>>108607855
anima is good but I fucking hate the backgrounds, why are they so fucking empty??
>>
>>
>>108607868
Describe the things you want to see in the background and it will put them there. You could also try describing the background as cluttered, messy, detailed, etc. (haven't tried this one yet.). You may be using artists that tend to draw undetailed backgrounds, this has a huge effect.
>>
>>108607855
here’s our answer: “BWAHAHAHAHA”
>>
you fools
>>
>>108607868
Just write, retard
>>
>>108607884
just draw, retard
>>
>>108607834
looks like ass, and ever since he started putting ai gens in his dataset i have 0 belief in any future model
>>
>>108607817
Good old ZIT or Klein maybe if it hasn't been safety trained against gore.
>>108607834
>Fine-tuned on Z-image-turbo
This is kinda scary. I am skeptical that they managed to pull it off without frying or undershooting on a distilled model.
Also every gen they use to showcase its textual capabilities have very short text. Makes me further worried that it's so fried it can't gen longer than a word now.
>>
>>108607888
>he started putting ai gens in his dataset
wait really? it's fucking doa then...
>>
Fruit punch with vodke and unpeeled bananas
>>
>>108607888
Wtf why???
>>
File: file.png (2.5 MB)
2.5 MB
2.5 MB PNG
>>108607834
v3.5 has a lot of sovl, what happened?
>>
File: file.png (2.9 MB)
2.9 MB
2.9 MB PNG
>>108607912
sovl vs sovless
>>
>>
>>108607834
Holy fuck, Illustrious?!? We just need to go back in time and have everything done again. Wait, noob NTR mix.
>>
>>108607834
We are back! No memory problems like some piece of shit model...
>>
>>108607834
I wished they could finetune an edit model instead, Imagine doing shit like "take this character from this input image, put him there, apply this @artist style to it", the fucking dream
>>
>>108607834
Can't wait for ZYume
>>
>>108607834
Neat, never liked Anima's overall aesthetics
>>
>>
>>108607834
NEAT, fuck tdrusell,l anything is better than a 2b model
>>
>>108607916
>Nonsensical second tail on the left.
>Nonsensical background object (lamp) on the right
>The hand and the cup are broken: massive thumb, distorted fingers and handle
>Melted "cleavage" and clothes texture
>And this is probably a good gen that got picked
Yep, they cooked this thing.
It's so fucking over. We will still be finetrooning SDXL clip in 2032 at this rate.
>>
>>108607834
Give me the WAIZ
>>
>>108607834
Why Z-image turbo though? I thought this one was impossible to finetune, we have Z-image base now
>>
>>108607947
>We will still be finetrooning SDXL clip in 2032 at this rate.
anima went for a retarded base model but it's still better than SDXL so I guess we're moving in the right direction... really slowly though...
>>
>>108607949
You can finetune anything you want nonnie, jews comfy and nvidia does't want you to finetune turbo models like a free man
>>
>>108607834
>However, as prompts became longer and more descriptive—and as users increasingly required multi-character interactions and structured scene composition—the limitations of the existing architecture became more apparent.
holy LLM slop, come on guys you can't write that shit by yourselves?
>>
>>
>>108607834
we don't even know if they're gonna open source it lol
>>
>>108607976
Yes because it's unfinetuneable fail bake
>>
>>108607834
SDXL ones are less sloppy
>>
>>108608011
to be fair I don't think they finished the training, let them time (yes I'm coping how do you know?)
>>
>>108607834
Neat, Prefect illustrious Z One obsession Z WaillustriousZ
>>
>>108608019
From the start, Z turbo is already sloppy and generates cold, boring results, its model is designed that way.
>>
>>108607834
wake me up when someone will manage to bring back the kino of midjourney
>>
>>108607959
They don't know english bro.
>>
>>108608039
Alibaba designed the training to make the model solid but predictable and boring, but the Illustrious guys are supposed to change that with their own training, I hope they'll succeed
>>
>>108608047
Is there a statement that they are working toward that objective, or is it another local delusion, erotomania, or schizophrenic thought #93949?
>>
>>108608061
the schizophrenic thought is actually beliving that the Illustrious guys make boring models, have you even tested one of them?
>>
>>108607894
>Good old ZIT
could you please give me a quick rundown on how to use it?
>>
>>108607834
>Are we back?
Don't get your hopes up, I've tested both models they trained. Other is non turbo version.
>>
>>108608078
>I've tested both models they trained. Other is non turbo version.
showcase images or it never happened
>>
>>108608074
Just the default Comfy template but I add an extra step and sometimes use model sampling auraflow node around 6-7.
Boomerprompt what you want to see.
>>
>>
Ace Step 2.0 when
>>
>>108608040
Is there any good theory on how Midjourney created this "Midjourney style"?
>>
>>108607834
WE BACK
>>
>>108608171
If there was, there would already be replications.
>>
>>108607834
>wirr u pick burned cfg 8 or wiped detail distill slop
>>
>>
>>
>>
>>108608171
>>108608304
>>108608368
I genned 20 images at 512p with the model. 32 steps, cfg 4, euler simple.
Some images had lesser issues like weird composition for backgrounds and problems with minor details like blurry eyes but overall I didn't get any body horror like extra limbs you get at 1024p.
Makes me think that fucked up high res training (perhaps they intentionally had too few high res steps to save money, Chinese culture shenanigans) which makes it further likely that the body horror possibly can be ironed out during finetuning.
That is again IF it responds well to training, which is sadly a big if nowadays.
>>
>>108607521
>we are getting literally showered with new stuff.
but we are getting showered with a bunch of nothingburger, it's literally a golden shower, except that this isn't molten gold, but piss
>>
ComfyUI Cloud for SeedDance2. Yes or no?
That would be... 30 15s videos for $100.
>>
What happened to Z-Image team asking for Noob dataset?
Yeah, nothing came out of that too.
>>
>>108608469
i think it is impossible to source pre-ai slop image datasets anymore unless you're a gigacorp that's been hoarding data for decades, the days of LAION are over, everything new is trained on synthetic slop
>>
>>108608530
just train on images that have been uploaded on the internet before 2022
>>
>>108608542
and how do you verify it's been uploaded before 2022 when mass scraping billions of images? that kind of metadata simply isn't available reliably
>>
>>108608551
only take images that have the metadata, there's a lot of mainstream sites that shows the date of the upload
>>
>>108608542
I think it's more about cost.
Mass generating synth slop is relatively cheap.
You need to scrape entire internet (in an era where many websites actively fight bots), extensively prune and filter the dataset, and then generate reliable enough captions. It's bandwidth and time intensive.
Of course it's still the way to go if you are aiming to make a SOTA API model where you hope that you can turn profit, but if you are making slop for freeloader local peasants, whom you are only helping to get your name out there, why bother?
>>
>>108608530
>i think it is impossible to source pre-ai slop image datasets anymore
its trivial to detect and filter ai images now, its basically solved
also cameras still exist so you can just create your own dataset if you wanted to
>>
>>108608601
>its trivial to detect and filter ai images now, its basically solved
A bold claim. Do you have anything to back that up?
>also cameras still exist so you can just create your own dataset if you wanted to
Just travel around the world and take millions of photos. Easy.
>>
>>108608530
lol bullshit, those cocksuckers do it on purpose
>>
its........ kino
>>
prompt: swastika (flux could do it, and it was made by germans kek)
>>
>>108607834
WE
ARE
BACK
Until the model flops like every local model,but in the meantime
WE
ARE
BACK


ANI WON
TDRUSELL LOST
6B ITS BETTER THAN 2B
>>
>>108608688
>bigger than Z-image turbo
>sloppier than Z-image turbo
>Worse anatomy than Z-image turbo
what were they thinking? this model has a place nowhere
>>
>>108608688
>the sky
rofl
>>
>>108608688
>2024
>a woman lying on grass

>2026
>a man lying on sand

2 years later and we still have those issues, sad
>>
>>108608709
Is this what banana (pro) gens when you prompt it swastika?
Someone should test it kek.
>>
>>108608749
nah, google is more based than that
https://arena.ai/
>>
>>108607834
sounds like it might be good, where can I download it?
>>
>>108607834
>shill handpicked the best of the lot
How about posting the rest lol. I'd take 8 fingers per hand over this pure slop.
>>
>>108608798
>passable
>totally deepfried
>underbaked
Just bake your own loras, faggots and use them with a model you want. All these AI "researchers" and can't make a passable bench image lmoa.
>>
https://civitai.com/models/2544636/wai-anima?modelVersionId=2859702
Anima won. Every single Civit slopper will switch over, now that it has been blessed by the #1 Civit creator.
>>
BABE BABE, WAKE UP WAI ANIMA HAS RELEASED!
https://civitai.com/models/2544636/wai-anima

>>108608824
shill better
>>
File: slop.png (6.6 KB)
6.6 KB
6.6 KB PNG
>>108608861
>>
Thoughts on Ernie Image? I think it's ok but I'm not sure it offers that much overall in terms of the actual quality / speed ratio
Seems a little too reliant on the extra prompt enhancer model also which adds even more overhead
>>
>>108608876
pure synthslop
>>
>>108608861
You’re late, /hgg/ won the shill race >>>/h/8860915
>>
kek, bye
https://files.catbox.moe/uxtaqp.jpg
>>
kek, bye
https://files.catbox.moe/uxtaqp.jpg
>>
kek, bye
https://files.catbox.moe/uxtaqp.jpg
>>
>>108608876
It's weird they used a vision-capable text encoder like Ministral but didn't leverage it for any kind of edit capability
>>
>>108608918
they will >>108607521
>>
>>108608876
The best opinion isn't here but CivitAI. When ZImage, Anima, and Klein were released, there were instantly a ton of loras and fine tunes. It seems that Ernie’s lab paid someone to shill here, like you mentioning Ernie again.
>>
>>108608634
>Do you have anything to back that up?
There are a ton of ai detectors online + any classifier model would do.
>>
>>108609050
I was wondering whether you are trolling or regular retarded, thanks for the response.
>>
>>108609123
Learn more about the tech before sperging out in /ldg/ thanks
>>
>>108608824
>>108608861
ill wait for the noob tune........
>>
>>108607834
hasn't illustrious gone closed source why would I give a shit, they last released version 2 didn't they
>>
>>108607868
it's insane to say this when anima has the best backgrounds of all current anime models, NAI 4.5 included
>>
>>108608788
the undotted straight lined one gets filtered though
>>
>>108609123
until we can emulate the physics of reality then every ai image is theoretically detectable
you dont understand how image generation works if you are confused by this
>>
>>108609134
Start by taking your own advice midwit faggot.
>>108609437
There is a difference between "theoretically detectable" and "its practical and cost efficient to implement accurate and reliable detection for wide variety of different image generation models each with their own idiosyncratic quirks, that are getting increasingly subtle with each generation, and keep doing this as new models get released every week" but sure go on.
>>
File: int8.jpg (850.3 KB)
850.3 KB
850.3 KB JPG
>>108607786
The base works pretty well with int8. I haven't tested turbo though.
15322MB -> 8238MB, from 2.38s/it -> 1.4s/it on my 3090.
https://github.com/BobJohnson24/ComfyUI-INT8-Fast
>>
>>108609537
>int8
cope quant, we're all ada and blackwell chads here
>>
>>108609502
you know you are in an ai general and you can just train a classifier right, a drastic majority of slop gens is going to be SD1.5/SDXL/FLUX/CHATGPT and even a minimal effort to reduce those is going to be better than 99% of models
but the people that train those ultra slopped models train on synthetic garbage on purpose to avoid copyright issues and morons having a melty about deepfakes/csam
on danbooru for example you are gonna get AT MOST like 10k untagged ai images out of several million
>>
>>108609502
it doesnt need to memorize or know every model, it just needs to detect synthetic patterns that arent seen in real images
>>
>>108609580
If you think a universal "classifier" is this easy, why not train it yourself? Why isn't there any such classifiers without non-meme accuracy out there despite huge demand to filter out AI slop? Could it be more complex than reiterating that one word which stuck with you after you watched some oversimplified goyslop youtube video a year ago?
>but the people that train those ultra slopped models train on synthetic garbage on purpose to avoid copyright issues and morons having a melty about deepfakes/csam
This is true but:
a) Costs of processing real data are still a huge factor
b) It's irrelevant to the first point
>on danbooru for example you are gonna get AT MOST like 10k untagged ai images out of several million
This is true for making an anime tune of existing model but for training a model from scratch you need data from wide variety of contaminated sources and you run back to the curation/filtering problem.
>>108609601
It needs to know them because said synthetic patterns are different for every diffusion model out there.
>>
>>108609631
there is
https://thehive.ai/demos/ai-generated-content-detection
which is quite decent
why would you even train a generation model if you are too incompetent or broke to not make it look like synthetic dogshit?
>>
>
>>
>>108604759
The Chinese always come out with really nice architectures, but they really can't into quality training data. Shame, the model had potential, but it's clearly slopped. It's very strange, due to Flux 2 VAE, some photos look very realistic, while others don't look it at all. They likely used a mixture of both slopped and real data, and it shows. Now I wait until BFL releases a model with similar capabilities.
>>
>>108609650
1) This thing seems to test each model individually (or at least under an umbrella), so it's not universal as claimed possible earlier, and it needs to be updated for each new model.
2) Even assuming it's accurate enough (won't spend hours testing) it's going to cost metric ton of money to test many millions of images through it.
>>
>
>>
>>108609706
the anima gens i tried out on it get classified as "other" ai just fine
convenient that you assert that a reliable detector cant be trained yet refuse to test it, clearly it is possible
it does not have to even be 100% reliable as reducing them would already would be a massive positive compared to everyone else that just trains on ai on purpose like i stated before
>>
Fresh when ready

>>108609718
>>108609718
>>108609718

Reply to Thread #108604726


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)