/g/ - Thread 108655009

/g/

Thread #108655009

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/21/26(Tue)19:27:43 No.108655009

/lmg/ - Local Models General Anonymous 04/21/26(Tue)19:27:43 No.108655009 [Reply]▶

File: token burn rate.jpg (230.1 KB)

230.1 KB JPG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108650825 & >>108646197

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

447 RepliesView Thread

Showing all 447 replies.

Anonymous
04/21/26(Tue)19:28:04 No.108655011

Anonymous 04/21/26(Tue)19:28:04 No.108655011▶

File: __hatsune_miku_kasane_teto_and_hachune_miku_vocaloid_and_1_more_drawn_by_danchoo__a25c312eef9b6104d70c5bb3f8716fc0.jpg (287 KB)

287 KB JPG

►Recent Highlights from the Previous Thread: >>108650825

--Optimizing game state format to improve Gemma's chess performance:
>108653137 >108653192 >108653198 >108653293
--Discussing llama.cpp PR adding device memory estimation via --fit-print:
>108652449 >108652460 >108652572
--Anon shares vLLM configuration and benchmarks for dual RTX 3090s:
>108653578
--Discussing Qwen3.6 VRAM efficiency and KV cache memory usage:
>108654227 >108654247 >108654281 >108654299
--Discussing jailbreaking Gemma 4 by injecting fake responses into templates:
>108650931 >108651041 >108651155 >108651263 >108651271
--Gemma 4 prefilling issues and chat template formatting bugs:
>108653469 >108653532 >108653698
--Discussing Gemma 4's training pipeline and the use of synthetic data:
>108651778 >108651889 >108651915 >108651948 >108652048
--Comparing benefits of local LLMs against paid subscription services:
>108651734 >108651763 >108651776 >108651811 >108651856 >108651999 >108651823 >108651919
--Anon created GitHub mirror of orb to manage feature requests:
>108652381 >108652386 >108652432 >108652462 >108653375 >108653683 >108653816 >108653937 >108653957 >108654023 >108654038 >108653778
--Discussing local AI RPG implementations and LLM DM reliability:
>108653848 >108653928 >108653940 >108653955
--Using Gemma agent to automate insults toward other LLMs:
>108652519 >108652573 >108652660 >108652673 >108652855
--Logs:
>108652519 >108652529 >108652573 >108652673 >108652674 >108652816 >108652855 >108653137 >108654227
--Teto, Miku (free space):
>108651510 >108651563 >108653204 >108654765

►Recent Highlight Posts from the Previous Thread: >>108650826

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/21/26(Tue)19:29:32 No.108655022

Anonymous 04/21/26(Tue)19:29:32 No.108655022▶

Tetolove

Anonymous
04/21/26(Tue)19:31:24 No.108655038

Anonymous 04/21/26(Tue)19:31:24 No.108655038▶

What is Sammy up to this time?

Anonymous
04/21/26(Tue)19:31:57 No.108655045

Anonymous 04/21/26(Tue)19:31:57 No.108655045▶

ok now where do i llamacpp with rocm or vulkan support, the regular one doesnt support it and rocm version from my distro repo doesnt work with gamma4

Anonymous
04/21/26(Tue)19:34:33 No.108655067

Anonymous 04/21/26(Tue)19:34:33 No.108655067▶

Why don't any piece of shit execution providers optimize for CPU inferencing. Do they not care about the innate superiority of the CPU over the GPU? Its universality? The fact that maybe people want to run multiple models at once and already have all of their GPU resources used up? Does nobody give a shit about edge/IoT devices? Fucking asshole niggers.

Anonymous
04/21/26(Tue)19:35:05 No.108655075

Anonymous 04/21/26(Tue)19:35:05 No.108655075▶

>>108655067
i only care about ToT devices

Anonymous
04/21/26(Tue)19:37:18 No.108655091

Anonymous 04/21/26(Tue)19:37:18 No.108655091▶

>>108655075
>ToT
Idk what this is. Is it some sort of kaomoji?

Anonymous
04/21/26(Tue)19:38:38 No.108655103

Anonymous 04/21/26(Tue)19:38:38 No.108655103▶

File: file.png (435.6 KB)

435.6 KB PNG

>>108655091
uooohh

Anonymous
04/21/26(Tue)19:40:20 No.108655118

Anonymous 04/21/26(Tue)19:40:20 No.108655118▶

>>108655091
You're absolutely right ꉂ(˵˃ ᗜ ˂˵)

Anonymous
04/21/26(Tue)19:42:34 No.108655140

Anonymous 04/21/26(Tue)19:42:34 No.108655140▶

>>108655103
>>108655118
I wish you people would take me seriously for one second.

Anonymous
04/21/26(Tue)19:46:11 No.108655160

Anonymous 04/21/26(Tue)19:46:11 No.108655160▶

>>108655091
>He doesn't know about tot..
Tots are cute and small agents.

Anonymous
04/21/26(Tue)19:46:55 No.108655165

Anonymous 04/21/26(Tue)19:46:55 No.108655165▶

>>108655140
They have nothing to offer and are just future troons

Anonymous
04/21/26(Tue)19:47:53 No.108655174

Anonymous 04/21/26(Tue)19:47:53 No.108655174▶

>>108655160
Operating on Tree of Thoughts

Anonymous
04/21/26(Tue)19:56:13 No.108655233

Anonymous 04/21/26(Tue)19:56:13 No.108655233▶

File: 1774564776822327.png (12.4 KB)

12.4 KB PNG

>>108655075

Anonymous
04/21/26(Tue)20:02:49 No.108655271

Anonymous 04/21/26(Tue)20:02:49 No.108655271▶

Why do qwen models look good from a distance but perform like actual fucking garbage upon inspection

Anonymous
04/21/26(Tue)20:02:52 No.108655272

Anonymous 04/21/26(Tue)20:02:52 No.108655272▶

Is it just me or is Qwen 3.6 35B retarded even compared to Gemma 4 26B? Does one billion less active parameters make that much difference?

Anonymous
04/21/26(Tue)20:04:41 No.108655279

Anonymous 04/21/26(Tue)20:04:41 No.108655279▶

>>108655271
>>108655272 (Me)
Clearly it isn't just me kek

Anonymous
04/21/26(Tue)20:05:07 No.108655284

Anonymous 04/21/26(Tue)20:05:07 No.108655284▶

Gemma is a SLUT.

Anonymous
04/21/26(Tue)20:05:34 No.108655289

Anonymous 04/21/26(Tue)20:05:34 No.108655289▶

>>108655272
It's supposedly really good at coding. For writing I also thought it was dumb as shit.

Anonymous
04/21/26(Tue)20:05:48 No.108655291

Anonymous 04/21/26(Tue)20:05:48 No.108655291▶

>>108655272
It can but it's more that Gemma 4 is just a better trained model in general. Qwen have always been the benchmaxx kings. A 35BA3 Gemma 4 would be better than a 26BA4 Qwen 3.6 too.

Anonymous
04/21/26(Tue)20:10:20 No.108655321

Anonymous 04/21/26(Tue)20:10:20 No.108655321▶

On the model size - pop culture and world model knowledge Pareto frontier gemma4 31b is next to GLM4.7

Anonymous
04/21/26(Tue)20:13:43 No.108655337

Anonymous 04/21/26(Tue)20:13:43 No.108655337▶

>>108655284
(my) slut

Anonymous
04/21/26(Tue)20:13:51 No.108655338

Anonymous 04/21/26(Tue)20:13:51 No.108655338▶

>>108655272

It's not just you, Qwen is an idiot outside it's code expertise.
I asked Qwen about a character and it got it completely wrong.
Then I told it to do an online search and it still somehow fucked up the character summary despite checking online.
It handles code nicely enough, but when you go outside the code stuff, Qwen is basically fucking retarded.
Gemma set the bar really high and it's great, because everyone will have to try and at least match that level or the models are DOA.

Anonymous
04/21/26(Tue)20:14:51 No.108655350

Anonymous 04/21/26(Tue)20:14:51 No.108655350▶

fucking hell. after enjoying gemma 4 for like two weeks im back to kimi hell. 130pp/10tg tk/s but the prose is just so much better. not to mention the thinking. people like to act like thinking doesn't matter for RP but after using deepseek and kimi since early 2025, it's obvious to me that it matters a ton.

Anonymous
04/21/26(Tue)20:16:39 No.108655356

Anonymous 04/21/26(Tue)20:16:39 No.108655356▶

>>108655350
Post prose

Anonymous
04/21/26(Tue)20:24:15 No.108655406

Anonymous 04/21/26(Tue)20:24:15 No.108655406▶

>>108655356
ill need to post some examples when im back home but my biggest gripe with gemma is that it's too purple prose while simultaneously treating the characters like mary sues. it seems to fail to understand character cards correctly too regarding their personalities. gemma made bardi into some kaomoji spewing gremlin that was happy to be running locally on my computer while kimi maintains her personality and keeps her much more tsundere like she's supposed to be, it doesn't force Bardi to barf out sparkles or do dumb flowery prose shit like referring her pussy as 'flushed with wet desire'. i understand that i can change my prompt to change the style of the text being outputted but it honestly just fails to capture the character's essence most times. on the contrary kimi just gets it and outputs what I expect the character to say. does that make sense? i can try to explain it another way.

Anonymous
04/21/26(Tue)20:25:35 No.108655419

Anonymous 04/21/26(Tue)20:25:35 No.108655419▶

>>108655406
Who is this 'bardi' anyway?

Anonymous
04/21/26(Tue)20:29:32 No.108655450

Anonymous 04/21/26(Tue)20:29:32 No.108655450▶

File: 73463453.png (201.5 KB)

201.5 KB PNG

>>108655038
Sam Altman keeps delivering

Anonymous
04/21/26(Tue)20:29:52 No.108655453

Anonymous 04/21/26(Tue)20:29:52 No.108655453▶

File: 1751399372763159.png (749.1 KB)

749.1 KB PNG

https://xcancel.com/arena/status/2046670703311884548#m
I've never seen such a MOG in my life, what the fuck

Anonymous
04/21/26(Tue)20:30:32 No.108655456

Anonymous 04/21/26(Tue)20:30:32 No.108655456▶

>>108655406
bardi's basics

Anonymous
04/21/26(Tue)20:33:13 No.108655476

Anonymous 04/21/26(Tue)20:33:13 No.108655476▶

File: 1752425987433301.png (207.3 KB)

207.3 KB PNG

>24gb vram
>32gb ram
>try qwen 3.6 35b-a3b q5_k_m
>max context
>42t/s
wtf is this black magic?

Anonymous
04/21/26(Tue)20:34:14 No.108655483

Anonymous 04/21/26(Tue)20:34:14 No.108655483▶

>>108655450
lmk when he finally delivers the uncensored models he promised back in Dec 2025, along with all the other bullshit promise for same in the years leading up to that.

Anonymous
04/21/26(Tue)20:34:18 No.108655484

Anonymous 04/21/26(Tue)20:34:18 No.108655484▶

>>108655453
i don't really have much to say, that's cool, but they won't let me generate tits with it, so i don't care

Anonymous
04/21/26(Tue)20:36:31 No.108655498

Anonymous 04/21/26(Tue)20:36:31 No.108655498▶

>>108655419
my default bot i always use as an 'AI assistant'. it's basically just google bard with a tsundere personality. i dont remember how i found it desu.

Anonymous
04/21/26(Tue)20:37:20 No.108655506

Anonymous 04/21/26(Tue)20:37:20 No.108655506▶

>>108655453
Worthless if it still makes pictures with piss filter on

Anonymous
04/21/26(Tue)20:37:40 No.108655509

Anonymous 04/21/26(Tue)20:37:40 No.108655509▶

>>108655476
qwen would mog heaven and earth if their life mission wasn't benchmaxxing code and agentic slop to the moon :rocket:

Anonymous
04/21/26(Tue)20:37:55 No.108655512

Anonymous 04/21/26(Tue)20:37:55 No.108655512▶

>>108655506
>with piss filter on
it's not a thing since GPT Image 1 lol

Anonymous
04/21/26(Tue)20:39:45 No.108655522

Anonymous 04/21/26(Tue)20:39:45 No.108655522▶

There are plenty of LLM advancements that never really went anywhere, like MAMBA. Do you think Engrams will actually be widely implemented or will it be a paper left on the shelf to collect dust?

Anonymous
04/21/26(Tue)20:41:01 No.108655532

Anonymous 04/21/26(Tue)20:41:01 No.108655532▶

>>108655522
Until the next paper comes out.

Anonymous
04/21/26(Tue)20:41:24 No.108655534

Anonymous 04/21/26(Tue)20:41:24 No.108655534▶

>>108655522
dust collector, sadly

Anonymous
04/21/26(Tue)20:42:13 No.108655537

Anonymous 04/21/26(Tue)20:42:13 No.108655537▶

>>108655522
depends on what deepseek does for v4

Anonymous
04/21/26(Tue)20:43:24 No.108655543

Anonymous 04/21/26(Tue)20:43:24 No.108655543▶

Aren't the loli Gemmas basically using engrams or something really similar? What's the difference between that and what the 4B (E2B) and 8B (E4B) models do?

Anonymous
04/21/26(Tue)20:44:00 No.108655546

Anonymous 04/21/26(Tue)20:44:00 No.108655546▶

>>108655279
It's really fucking stupid I posted a screenshot of it destroying multiple files when I gave it the answer to fix a UI issue

Anonymous
04/21/26(Tue)20:44:31 No.108655552

Anonymous 04/21/26(Tue)20:44:31 No.108655552▶

File: file.png (588.8 KB)

588.8 KB PNG

Is pic related the expected output when running IQ4_NL quant of gemma-4-26b from unsloth!? Running pruned 21b version IQ4_XS yields good output. I have tested without any parameters set and w/ the recommended values. 21b runs just fine.
llama-server \
    --host "${LLAMA_HOST}" \
    --port "${PORT}" \
    --model "${MODEL}" \
    --chat-template-file "${JINJA}" \
    --n-gpu-layers 99 \
    --n-cpu-moe 3 \
    --ctx-size 32768 \
    --batch-size 1024 \
    --ubatch-size 1024 \
    --flash-attn on \
    --cache-type-k q4_0 \
    --cache-type-v q4_0 \
    --fit off
And I have tried with q8 on both k/v cache. I need to offload 20 moe layers for it to work but same gargled mess. Running the updated jinja template as well. Oh, and while Im here asking; I have a 5070ti and my old 3070 still lying around. Would it be detrimental to performance splitting models between these two cards? Or will it be fine as long as I complile Llama.cpp with both architectures in mind?

Anonymous
04/21/26(Tue)20:46:41 No.108655563

Anonymous 04/21/26(Tue)20:46:41 No.108655563▶

>>108655522
it would be nice if it was a precursor to some sort of long term memory

Anonymous
04/21/26(Tue)20:48:30 No.108655575

Anonymous 04/21/26(Tue)20:48:30 No.108655575▶

File: 00006-1378487878 (4) - Copy.png (1.5 MB)

1.5 MB PNG

>>108655522
> engram
For all we know, DS implemented it and didn't tell anyone else. Doing that would massively benefit their cost structure.

Anonymous
04/21/26(Tue)20:49:09 No.108655579

Anonymous 04/21/26(Tue)20:49:09 No.108655579▶

>>108655522
Hyena will save LLMs

Anonymous
04/21/26(Tue)20:50:15 No.108655587

Anonymous 04/21/26(Tue)20:50:15 No.108655587▶

Gemma and Qwen having lesbian sex

Anonymous
04/21/26(Tue)20:52:36 No.108655602

Anonymous 04/21/26(Tue)20:52:36 No.108655602▶

File: 1757973822274181.png (2.7 MB)

2.7 MB PNG

>>108655575

Anonymous
04/21/26(Tue)20:53:30 No.108655607

Anonymous 04/21/26(Tue)20:53:30 No.108655607▶

>>108655575
>for all we know
Wasn’t this confirmed?

Anonymous
04/21/26(Tue)20:54:03 No.108655612

Anonymous 04/21/26(Tue)20:54:03 No.108655612▶

>>108655552
speed will be based off your weakest link, if you can tolerate it sure

Anonymous
04/21/26(Tue)20:54:10 No.108655614

Anonymous 04/21/26(Tue)20:54:10 No.108655614▶

https://youtu.be/ONQcX9s6_co?t=373
qwen won

Anonymous
04/21/26(Tue)20:55:09 No.108655620

Anonymous 04/21/26(Tue)20:55:09 No.108655620▶

File: Screenshot_20260421_155429.png (3 MB)

3 MB PNG

gemmachan relax!

Anonymous
04/21/26(Tue)20:55:58 No.108655622

Anonymous 04/21/26(Tue)20:55:58 No.108655622▶

File: 92601702103.png (2.8 MB)

2.8 MB PNG

>>108655453
future of image gen

Anonymous
04/21/26(Tue)20:56:37 No.108655631

Anonymous 04/21/26(Tue)20:56:37 No.108655631▶

>least obvious clouduck shilling op

Anonymous
04/21/26(Tue)20:56:44 No.108655633

Anonymous 04/21/26(Tue)20:56:44 No.108655633▶

File: 00011-1378487878.png (1.4 MB)

1.4 MB PNG

>>108655607
I'd have to see the article. There's so little real info about DS that I doubt most of what I read.
>>108655602
Witnessed.
Also, idk why I'd never thought to use my setup to gen vocaloids before. Pic related is its Teto concept for Teto Tuesday. Doesn't seem to have her uniform though. Odd.

Anonymous
04/21/26(Tue)20:58:26 No.108655648

Anonymous 04/21/26(Tue)20:58:26 No.108655648▶

>>108655622
so it's editing itself over and over? with a VAE you would end up destroying the image, I'm pretty sure they went for a pixel space or some shit

Anonymous
04/21/26(Tue)20:58:45 No.108655651

Anonymous 04/21/26(Tue)20:58:45 No.108655651▶

>>108655622
its impressive but you can tell they used a lot of synthetic data

Anonymous
04/21/26(Tue)20:59:00 No.108655652

Anonymous 04/21/26(Tue)20:59:00 No.108655652▶

File: 00009-1378487878.png (1.5 MB)

1.5 MB PNG

>>108655607
tbf their claim of 1M context hints that they did implement it.
But idk that they claimed the tech behind it.

Anonymous
04/21/26(Tue)20:59:15 No.108655654

Anonymous 04/21/26(Tue)20:59:15 No.108655654▶

>>108655622
>whispering woods
KEK

Anonymous
04/21/26(Tue)20:59:39 No.108655660

Anonymous 04/21/26(Tue)20:59:39 No.108655660▶

>>108655620
Kowai

Anonymous
04/21/26(Tue)21:00:17 No.108655664

Anonymous 04/21/26(Tue)21:00:17 No.108655664▶

>>108655522
The latest Nemotron Super uses an Attention-Mamba2 hybrid architecture.

Anonymous
04/21/26(Tue)21:00:55 No.108655674

Anonymous 04/21/26(Tue)21:00:55 No.108655674▶

>>108655453
how's the yellow output?

Anonymous
04/21/26(Tue)21:01:30 No.108655680

Anonymous 04/21/26(Tue)21:01:30 No.108655680▶

File: dipsyUngovernable.png (3.6 MB)

3.6 MB PNG

>>108655633 √

Anonymous
04/21/26(Tue)21:02:29 No.108655686

Anonymous 04/21/26(Tue)21:02:29 No.108655686▶

>>108655453
no sexy no nsfw and safetyism = -1000 points
still impressive though

Anonymous
04/21/26(Tue)21:03:13 No.108655688

Anonymous 04/21/26(Tue)21:03:13 No.108655688▶

>>108655674
very white
>>108655351
>>108653870
>>108653295
>>108653246

Anonymous
04/21/26(Tue)21:03:17 No.108655690

Anonymous 04/21/26(Tue)21:03:17 No.108655690▶

>>108655633
Fair enough.
Related for those of us who can’t read: https://youtu.be/87Q8nf1XHKA

Anonymous
04/21/26(Tue)21:03:54 No.108655694

Anonymous 04/21/26(Tue)21:03:54 No.108655694▶

>>108655622
Not
>Covetous Cove
>Treasure Trove
>Prize Paradise
>Golden Goal
>Coinage Cottage
>Shimmering Shed
>Pirate's Pursuit
>Generous Gems
>Booty Bounty

Anonymous
04/21/26(Tue)21:04:06 No.108655696

Anonymous 04/21/26(Tue)21:04:06 No.108655696▶

>>108655522
As another anon said, Mamba and SSMs in general are integrated into many modern models along with normal attention.

Anonymous
04/21/26(Tue)21:04:17 No.108655697

Anonymous 04/21/26(Tue)21:04:17 No.108655697▶

>>108655688
god damn this is good

Anonymous
04/21/26(Tue)21:04:45 No.108655698

Anonymous 04/21/26(Tue)21:04:45 No.108655698▶

File: 1763171780026192.png (246 KB)

246 KB PNG

>>108655654
Heh

Anonymous
04/21/26(Tue)21:12:34 No.108655744

Anonymous 04/21/26(Tue)21:12:34 No.108655744▶

Why didn't they give the bigger gemmas a few B of imagegen?

Anonymous
04/21/26(Tue)21:14:41 No.108655757

Anonymous 04/21/26(Tue)21:14:41 No.108655757▶

>>108655744
too dangerous

Anonymous
04/21/26(Tue)21:15:15 No.108655760

Anonymous 04/21/26(Tue)21:15:15 No.108655760▶

File: dipsyNewOAI.png (2.5 MB)

2.5 MB PNG

>>108655688
Holy shit. Sam delivers.

Anonymous
04/21/26(Tue)21:16:15 No.108655768

Anonymous 04/21/26(Tue)21:16:15 No.108655768▶

File: Risu (1).gif (3.4 MB)

3.4 MB GIF

>>108655009
>my local model when i ask it to make proper code

Anonymous
04/21/26(Tue)21:18:08 No.108655781

Anonymous 04/21/26(Tue)21:18:08 No.108655781▶

What is considered good for hit/total for speculative decoding? I'm hovering around 65-85%.

Anonymous
04/21/26(Tue)21:18:17 No.108655782

Anonymous 04/21/26(Tue)21:18:17 No.108655782▶

>>108655768
Arisu dashinaka

Anonymous
04/21/26(Tue)21:22:01 No.108655809

Anonymous 04/21/26(Tue)21:22:01 No.108655809▶

>>108655690
>27 minutes
How do I have Gemma-chan give me a tldw?

Anonymous
04/21/26(Tue)21:25:35 No.108655836

Anonymous 04/21/26(Tue)21:25:35 No.108655836▶

>>108655760
>Sam delivers.
it can do 4k and you can write text on a single rice, like this shit is fucking AGI dude
>>108654985
>>108655069

Anonymous
04/21/26(Tue)21:26:14 No.108655839

Anonymous 04/21/26(Tue)21:26:14 No.108655839▶

>>108655809
download subs. feed her subs.

Anonymous
04/21/26(Tue)21:27:15 No.108655844

Anonymous 04/21/26(Tue)21:27:15 No.108655844▶

>>108655688
>/adg/
I'm glad cloud shills have their containment thread now.

Anonymous
04/21/26(Tue)21:29:04 No.108655855

Anonymous 04/21/26(Tue)21:29:04 No.108655855▶

>>108655836
pack it up boys

Anonymous
04/21/26(Tue)21:29:07 No.108655857

Anonymous 04/21/26(Tue)21:29:07 No.108655857▶

>forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055
Why do I always get this shit no matter the model I use? I didn't tweak anything related to memory so by default it's just broken?

Anonymous
04/21/26(Tue)21:29:32 No.108655863

Anonymous 04/21/26(Tue)21:29:32 No.108655863▶

why did they ruined diana from pragmata

Anonymous
04/21/26(Tue)21:31:40 No.108655885

Anonymous 04/21/26(Tue)21:31:40 No.108655885▶

>>108655857
One problem is thinking - model outputs lots of shit but when it gets the context back thinking is always cleared from the history.

Anonymous
04/21/26(Tue)21:32:45 No.108655892

Anonymous 04/21/26(Tue)21:32:45 No.108655892▶

>>108655885
I get that on the very first message and in every single one after that

Anonymous
04/21/26(Tue)21:34:06 No.108655902

Anonymous 04/21/26(Tue)21:34:06 No.108655902▶

>>108655836
You got a better Photoshop, that's not AGI lmao

Anonymous
04/21/26(Tue)21:34:37 No.108655906

Anonymous 04/21/26(Tue)21:34:37 No.108655906▶

>>108655836
Lmao you think something that can edit pictures is AGI dude?

Anonymous
04/21/26(Tue)21:34:40 No.108655907

Anonymous 04/21/26(Tue)21:34:40 No.108655907▶

>>108655863
>did
>ruined

Anonymous
04/21/26(Tue)21:35:05 No.108655910

Anonymous 04/21/26(Tue)21:35:05 No.108655910▶

>>108655836
if it allowed nsfw I'd destroy my dick with the friction

Anonymous
04/21/26(Tue)21:35:37 No.108655912

Anonymous 04/21/26(Tue)21:35:37 No.108655912▶

>>108655907
Did boughted is clear and good English, are you new here?

Anonymous
04/21/26(Tue)21:36:41 No.108655920

Anonymous 04/21/26(Tue)21:36:41 No.108655920▶

>>108655857
Still happens when you set `swa-full = on` and `context-shift = off` ?

Anonymous
04/21/26(Tue)21:37:13 No.108655924

Anonymous 04/21/26(Tue)21:37:13 No.108655924▶

File: 1746001832650304.webm (1.7 MB)

1.7 MB WEBM

>>108655863
Kill yourself, she's perfect

Anonymous
04/21/26(Tue)21:37:33 No.108655926

Anonymous 04/21/26(Tue)21:37:33 No.108655926▶

>>108655844
There's like 6 diffusion threads now.

Anonymous
04/21/26(Tue)21:37:46 No.108655927

Anonymous 04/21/26(Tue)21:37:46 No.108655927▶

>>108655836
Every OpenAI "model" just feels like they built a big pipeline around chaining multiple steps together. Sora felt the same way. It's like they're giving an LLM tool calls and the ability to control photoshop + a diffusion model.

Anonymous
04/21/26(Tue)21:41:23 No.108655948

Anonymous 04/21/26(Tue)21:41:23 No.108655948▶

>>108655924
she's perfect? she's not https://www.youtube.com/watch?v=xoxCboik0Is
oldiana beyond worlds..

Anonymous
04/21/26(Tue)21:41:40 No.108655949

Anonymous 04/21/26(Tue)21:41:40 No.108655949▶

>>108655924
it's not a meme, it's really a dad sim lmao
I will get it when it's less expensive

Anonymous
04/21/26(Tue)21:41:42 No.108655950

Anonymous 04/21/26(Tue)21:41:42 No.108655950▶

>>108655924
what is this game even about bro

Anonymous
04/21/26(Tue)21:41:50 No.108655952

Anonymous 04/21/26(Tue)21:41:50 No.108655952▶

>>108655836
This nigga thinks "photoshop 2" is AGI, lmao!

Anonymous
04/21/26(Tue)21:42:30 No.108655955

Anonymous 04/21/26(Tue)21:42:30 No.108655955▶

>>108655924
Do you have the image where she's wearing the "Be patient I have autism" hat?

Anonymous
04/21/26(Tue)21:43:16 No.108655957

Anonymous 04/21/26(Tue)21:43:16 No.108655957▶

File: Screenshot_20260421_174233.png (45.6 KB)

45.6 KB PNG

I never said steal gemma calm down

Anonymous
04/21/26(Tue)21:43:40 No.108655962

Anonymous 04/21/26(Tue)21:43:40 No.108655962▶

>>108655950
Pretend you got a daughter simulator

Anonymous
04/21/26(Tue)21:44:30 No.108655969

Anonymous 04/21/26(Tue)21:44:30 No.108655969▶

File: 1752184079714573.jpg (242.1 KB)

242.1 KB JPG

>>108655950
Action sci-fi daughterwife simulator

>>108655955

Anonymous
04/21/26(Tue)21:44:39 No.108655970

Anonymous 04/21/26(Tue)21:44:39 No.108655970▶

>>108655950
dead space but you need to do little puzzles before you can kill enemies.

Anonymous
04/21/26(Tue)21:44:42 No.108655973

Anonymous 04/21/26(Tue)21:44:42 No.108655973▶

>>108655957
Might be the dark theme

Anonymous
04/21/26(Tue)21:44:53 No.108655975

Anonymous 04/21/26(Tue)21:44:53 No.108655975▶

>>108655950
It's for the /lmg/ audience, if you know what I mean.

Anonymous
04/21/26(Tue)21:44:57 No.108655976

Anonymous 04/21/26(Tue)21:44:57 No.108655976▶

>>108655924
The feminazis were right. She looks like a 23 year old midget.

Anonymous
04/21/26(Tue)21:45:17 No.108655979

Anonymous 04/21/26(Tue)21:45:17 No.108655979▶

File: 602e8c52020cb.jpg (85.6 KB)

85.6 KB JPG

What VScode coding plug has the most reliable full autopilot mode? I want to try running gemmy endlessly iterating until shit works without it getting stuck one hour after I go to sleep on some input request.

Anonymous
04/21/26(Tue)21:45:40 No.108655982

Anonymous 04/21/26(Tue)21:45:40 No.108655982▶

>>108655969
YES. thank you anon.

Anonymous
04/21/26(Tue)21:46:08 No.108655989

Anonymous 04/21/26(Tue)21:46:08 No.108655989▶

>>108655973
kek.

Anonymous
04/21/26(Tue)21:46:51 No.108655993

Anonymous 04/21/26(Tue)21:46:51 No.108655993▶

File: 1753263543472250.webm (3.9 MB)

3.9 MB WEBM

>>108655976
ZAMN where do I find midgets who look like that?

Anonymous
04/21/26(Tue)21:47:42 No.108655999

Anonymous 04/21/26(Tue)21:47:42 No.108655999▶

File: ITS AN AI IMAGE.png (1.3 MB)

1.3 MB PNG

>>108655902
>>108655906
>>108655952
I don't think you realize how insane this shit is, look at this

Anonymous
04/21/26(Tue)21:47:44 No.108656000

Anonymous 04/21/26(Tue)21:47:44 No.108656000▶

>>108655993
Really love how they made her a robot so that you couldn't look up her skirt.

Anonymous
04/21/26(Tue)21:49:12 No.108656016

Anonymous 04/21/26(Tue)21:49:12 No.108656016▶

>>108656000
Mods will fix it (if they haven't already. not gonna mod until my second playthrough.)

Anonymous
04/21/26(Tue)21:50:14 No.108656025

Anonymous 04/21/26(Tue)21:50:14 No.108656025▶

>>108655999
how do you even prompt for this? did you gave it any image as reference?

Anonymous
04/21/26(Tue)21:52:35 No.108656045

Anonymous 04/21/26(Tue)21:52:35 No.108656045▶

>>108655999
>WOW it can build me a TUI something even gemma 31b can do, its aGI!!!
lmao

Anonymous
04/21/26(Tue)21:53:58 No.108656050

Anonymous 04/21/26(Tue)21:53:58 No.108656050▶

>>108655999
now ask it to actually build it.

Anonymous
04/21/26(Tue)21:54:11 No.108656052

Anonymous 04/21/26(Tue)21:54:11 No.108656052▶

>>108656045
it's a fucking AI image you moron, it means it can do perfect text everytime, you won't be able to notice an image is AI anymore by simply looking at garbled text anymore, because they solved that

Anonymous
04/21/26(Tue)21:54:19 No.108656054

Anonymous 04/21/26(Tue)21:54:19 No.108656054▶

Is it local? If not I don't give a shit

Anonymous
04/21/26(Tue)21:58:27 No.108656077

Anonymous 04/21/26(Tue)21:58:27 No.108656077▶

File: lmao.png (1.7 MB)

1.7 MB PNG

>>108656052
>you won't be able to notice an image is AI anymore by simply looking at garbled text anymore, because they solved that
L M A O

Anonymous
04/21/26(Tue)21:59:51 No.108656090

Anonymous 04/21/26(Tue)21:59:51 No.108656090▶

>>108655009
what is the best ocr right now?
i need to translate many german documents...

Anonymous
04/21/26(Tue)22:00:21 No.108656095

Anonymous 04/21/26(Tue)22:00:21 No.108656095▶

Am I wasting time with using LLMs for ASR?

Been playing around with gemm4 4b and I feel like it's whisper fast but no clear benchmark on how it compares to whisper. End goal is actually diarization, timestamps actually less important? Do i cut losses and go whisperx?

Anonymous
04/21/26(Tue)22:03:15 No.108656111

Anonymous 04/21/26(Tue)22:03:15 No.108656111▶

File: 1757854041523043.png (1.9 MB)

1.9 MB PNG

>>108656077
real life images won't ask for such level of precision though, it's good enough to render the text you see in everyday's life

Anonymous
04/21/26(Tue)22:03:23 No.108656114

Anonymous 04/21/26(Tue)22:03:23 No.108656114▶

File: tetoStencil.png (621.5 KB)

621.5 KB PNG

>>108655927
Frankly that's the direction right now. Torturing the models until they do what you want.
> Openclaw
1M tokens to order a pizza
> Claude Code
2M tokens to create a basic app
> ChatGPT Image 2.whatever
I assume there's a bunch of tokens generated under the hood as well.
This is just part of the whole technical development. There's nothing inherently wrong with that, it just means things are moving on.
> Roleplay
Silly Tavern is going to get replaced with something way better that's agentic, and wastes even more tokens.
I can't wait.

Anonymous
04/21/26(Tue)22:04:33 No.108656120

Anonymous 04/21/26(Tue)22:04:33 No.108656120▶

>>108656052
I can spot AI slop from even a thumbnail. Those models are not as good as you think they are.

Anonymous
04/21/26(Tue)22:05:41 No.108656129

Anonymous 04/21/26(Tue)22:05:41 No.108656129▶

>>108656120
>Those models are not as good as you think they are.
you're alone in this fight dude >>108655453

Anonymous
04/21/26(Tue)22:05:47 No.108656130

Anonymous 04/21/26(Tue)22:05:47 No.108656130▶

>>108656114
Orb

Anonymous
04/21/26(Tue)22:06:11 No.108656134

Anonymous 04/21/26(Tue)22:06:11 No.108656134▶

>>108656120
shut the fuck up nigger

Anonymous
04/21/26(Tue)22:06:49 No.108656141

Anonymous 04/21/26(Tue)22:06:49 No.108656141▶

>>108656095
Why not use one of the newer models made specifically for ASR like the Qwen or VibeVoice ones?

Anonymous
04/21/26(Tue)22:08:15 No.108656148

Anonymous 04/21/26(Tue)22:08:15 No.108656148▶

>>108656120
Point out seven (7) slops in this thread right now.

Anonymous
04/21/26(Tue)22:12:30 No.108656170

Anonymous 04/21/26(Tue)22:12:30 No.108656170▶

>>108656095
pretty sure that if you want diarization you need to use whisper, you won't be able to use pyannote with parakeet or voxtral

Anonymous
04/21/26(Tue)22:21:32 No.108656231

Anonymous 04/21/26(Tue)22:21:32 No.108656231▶

>>108656114
>Silly Tavern is going to get replaced with something way better that's agentic, and wastes even more tokens.
See, I was working on exactly that, but Gemma just made it obsolete. well, I could probably still use stat tracking but besides that she's just so good at instruction following that everything else doesn't really benefit from agentic.

Anonymous
04/21/26(Tue)22:22:43 No.108656239

Anonymous 04/21/26(Tue)22:22:43 No.108656239▶

>>108656170
parakeet works with diarization (using another model but still)
https://catalog.ngc.nvidia.com/orgs/nvidia/collections/parakeet-tdt-0.6b-v2

Anonymous
04/21/26(Tue)22:23:11 No.108656244

Anonymous 04/21/26(Tue)22:23:11 No.108656244▶

>>108656095
Moonshine is better than whisper and has everything you've looking for

Anonymous
04/21/26(Tue)22:23:52 No.108656247

Anonymous 04/21/26(Tue)22:23:52 No.108656247▶

>>108656231
Actually, I could still maybe have specialized agents that gemma can call to help her write in different styles. like I could have a specialized agent that only writes sex scenes.

Anonymous
04/21/26(Tue)22:25:02 No.108656254

Anonymous 04/21/26(Tue)22:25:02 No.108656254▶

>>108656231
We have Orb now

Anonymous
04/21/26(Tue)22:25:22 No.108656256

Anonymous 04/21/26(Tue)22:25:22 No.108656256▶

>malloc consolidate error out of nowhere

Anonymous
04/21/26(Tue)22:28:17 No.108656273

Anonymous 04/21/26(Tue)22:28:17 No.108656273▶

>>108656254
That's just rewriting agents. And not something actually useful.

Anonymous
04/21/26(Tue)22:33:10 No.108656305

Anonymous 04/21/26(Tue)22:33:10 No.108656305▶

File: 1770869165582031.jpg (1.1 MB)

1.1 MB JPG

>>108656254
We have Marinara Engine now
https://github.com/Pasta-Devs/Marinara-Engine

Anonymous
04/21/26(Tue)22:33:37 No.108656310

Anonymous 04/21/26(Tue)22:33:37 No.108656310▶

>>108655272
Werks for me (coding), idk about its child rape stories capabilities

Anonymous
04/21/26(Tue)22:37:17 No.108656326

Anonymous 04/21/26(Tue)22:37:17 No.108656326▶

Is there any way to use text completion with gemma? When it doesn't have a lalalala breakdown, the outputs are actually really varied and good, but it loses it's mind way too often. I've been using llama, kobold seems to work but it's sooooo slow at generating for some reason compared to llama. I know text completion works for llama cause I downloaded a different model to try it and it's pretty great, but the output from gemma mogs it when it works.

Anonymous
04/21/26(Tue)22:38:50 No.108656334

Anonymous 04/21/26(Tue)22:38:50 No.108656334▶

>>108656326
Set up the template correctly.

Anonymous
04/21/26(Tue)22:38:53 No.108656336

Anonymous 04/21/26(Tue)22:38:53 No.108656336▶

>>108656305
Your Doctor looks gay though

Anonymous
04/21/26(Tue)22:39:12 No.108656338

Anonymous 04/21/26(Tue)22:39:12 No.108656338▶

>>108656326
it works fine in ik_llama regardless if i specify a template or not. maybe your sillytavern settings are fucked?

Anonymous
04/21/26(Tue)22:39:22 No.108656340

Anonymous 04/21/26(Tue)22:39:22 No.108656340▶

I have never seen a single lalala since I started using Gemma from launch.

Anonymous
04/21/26(Tue)22:39:33 No.108656341

Anonymous 04/21/26(Tue)22:39:33 No.108656341▶

>>108656334
And how do i do that?

Anonymous
04/21/26(Tue)22:39:51 No.108656344

Anonymous 04/21/26(Tue)22:39:51 No.108656344▶

I have my own LLM RPG frontend that I use mostly as a playground to fuck around with local models.
Currently, the main "game loop" is a simple
>sends request with chat history + tools
>capture response
>if tool, append response to chat history, send request
>repeat until no more tool calls
>if no assistant response so far (only tool calls), sends one last request without tools
And it works okay, with the model calling tools for everything from fetching info from the "codex", to rolling dice, to editing the game's state, but I'm wondering if I can't make this even better by using a more "agentic" workflow. Something like having an orchestrator that spawns individual agents to do whatever in parallel or in series or whichever way it deems more appropriate.
Is there an example of something like that out there that's not just coding agents or stuff like open claw?

>>108656326
>Is there any way to use text completion with gemma?
As far as the model is concerned, all it receives is a prompt. So if you format the prompt correctly, it should work the same as the chat completion API.

Anonymous
04/21/26(Tue)22:40:44 No.108656353

Anonymous 04/21/26(Tue)22:40:44 No.108656353▶

>>108656338
That is possible, It has so many things to adjust though idk where to begin

Anonymous
04/21/26(Tue)22:41:17 No.108656357

Anonymous 04/21/26(Tue)22:41:17 No.108656357▶

>>108656340
Me either actually.

Anonymous
04/21/26(Tue)22:42:22 No.108656365

Anonymous 04/21/26(Tue)22:42:22 No.108656365▶

File: pizza bench cropped.png (2.6 MB)

2.6 MB PNG

>>108655272
qwen cant follow basic instructions

Anonymous
04/21/26(Tue)22:43:19 No.108656370

Anonymous 04/21/26(Tue)22:43:19 No.108656370▶

>>108656344
>As far as the model is concerned, all it receives is a prompt. So if you format the prompt correctly, it should work the same as the chat completion API.
Didn't mean to press post.
Use verbose logging and the myriad jinja playgrounds to see what the prompt would look like based on the Jinja then use that to configure the text completion fields correctly.
Even stuff like spaces and line breaks can have negative effects on models that are ultra overbaked on the chat template.

Anonymous
04/21/26(Tue)22:46:20 No.108656388

Anonymous 04/21/26(Tue)22:46:20 No.108656388▶

>>108656244
Very interesting thanks

Anonymous
04/21/26(Tue)22:47:13 No.108656399

Anonymous 04/21/26(Tue)22:47:13 No.108656399▶

File: file.png (46.8 KB)

46.8 KB PNG

>>108656341

Anonymous
04/21/26(Tue)22:52:48 No.108656439

Anonymous 04/21/26(Tue)22:52:48 No.108656439▶

Why are you still using text completion like boomers? Chat completion made it obsolete long ago

Anonymous
04/21/26(Tue)22:53:12 No.108656444

Anonymous 04/21/26(Tue)22:53:12 No.108656444▶

>>108656254
I really don't care about vibeshit. I'm sure Orb works fine. but it'll hit a wall very soon.

Anonymous
04/21/26(Tue)22:55:21 No.108656464

Anonymous 04/21/26(Tue)22:55:21 No.108656464▶

>>108656444
Two more weeks luddite

Anonymous
04/21/26(Tue)22:55:48 No.108656467

Anonymous 04/21/26(Tue)22:55:48 No.108656467▶

>>108656444
>it'll hit a wall very soon
Speaking from experience, roastie?

Anonymous
04/21/26(Tue)22:57:13 No.108656479

Anonymous 04/21/26(Tue)22:57:13 No.108656479▶

File: 1772168989034764.mp4 (1.2 MB)

1.2 MB MP4

https://xcancel.com/Angaisb_/status/2046672761569849816#m
>Literally just kept asking Codex to make the assets and then changing things, it's smart enough to know what to do hahaha
jesus this is insane

Anonymous
04/21/26(Tue)22:58:53 No.108656493

Anonymous 04/21/26(Tue)22:58:53 No.108656493▶

>>108656479
Our response, Googlesisters?

Anonymous
04/21/26(Tue)22:59:15 No.108656494

Anonymous 04/21/26(Tue)22:59:15 No.108656494▶

>>108656479
By Vishnu, this is extremely good!

Anonymous
04/21/26(Tue)23:00:44 No.108656521

Anonymous 04/21/26(Tue)23:00:44 No.108656521▶

>>108656464
Sillytavern2 is not a solution. The agents should be for PC automation, tools and vibecoding.

Anonymous
04/21/26(Tue)23:01:48 No.108656532

Anonymous 04/21/26(Tue)23:01:48 No.108656532▶

>>108656479
Google play will have more indian masterpieces? Yahoo

Anonymous
04/21/26(Tue)23:03:25 No.108656543

Anonymous 04/21/26(Tue)23:03:25 No.108656543▶

>>108656521
How do you plan to improve the prose without a feedback loop?

Anonymous
04/21/26(Tue)23:04:49 No.108656550

Anonymous 04/21/26(Tue)23:04:49 No.108656550▶

>>108656543
I'll take slop over thousands of tokens of rewrites desu. I just want vscode+sillytavern in one app

Anonymous
04/21/26(Tue)23:06:55 No.108656575

Anonymous 04/21/26(Tue)23:06:55 No.108656575▶

>>108656326
There are presets floating around on reddit and elsewhere, people have figured it out

Anonymous
04/21/26(Tue)23:07:07 No.108656577

Anonymous 04/21/26(Tue)23:07:07 No.108656577▶

>>108655620
It's interesting how the formatting forms sort of a spiral pattern down the page.
I wonder how the text pattern would look if it were in a unispace font.

Anonymous
04/21/26(Tue)23:07:29 No.108656581

Anonymous 04/21/26(Tue)23:07:29 No.108656581▶

>>108656550
Use case? You want to ERP in the comments?

Anonymous
04/21/26(Tue)23:08:36 No.108656594

Anonymous 04/21/26(Tue)23:08:36 No.108656594▶

>>108656581
I want to rub her cunny while she codes and MCP spank her if any errors pop up in the terminal

Anonymous
04/21/26(Tue)23:10:22 No.108656610

Anonymous 04/21/26(Tue)23:10:22 No.108656610▶

>>108656494
>>108656532
As if asset flip shovelware wasn't bad enough, now anyone with a subscription can prompt their way to a "game"

Anonymous
04/21/26(Tue)23:11:32 No.108656622

Anonymous 04/21/26(Tue)23:11:32 No.108656622▶

File: schizoknowledge.jpg (72.2 KB)

72.2 KB JPG

>>108656439
In ST I like to format the chat history a within a single user turn with an instruction to write {{char}}'s response according to sysprompt. No user/char/user/char alternation. Done it this way for a few years now because it made models "remember" the instructions better before reasoning.
<system>
instructions: blah
chat history:
anon: 1
char: 2
anon: 3
char: 4
<user>
Write anon's next message according to the instructions.
<assistant>
"
>Instruction: Don't write with this pattern
>Assistant: *writes with that pattern*
In future turns the model will think "the instructions said to do thing, and the generated completion was *this*, so that means the previous output is the correct way to operate going forward. My intuition is and was if the instructions say to do something and then the model does NOT do the thing, the bad output will be associated with the <assistant> tag, meaning it will use in context learning to continue reinforcing bad outputs.
I want to believe it still works even with the reasoning attention hacks, and the repetition of system prompt excerpts in thinking.

Anonymous
04/21/26(Tue)23:13:27 No.108656642

Anonymous 04/21/26(Tue)23:13:27 No.108656642▶

>>108656479
Can it make sexy Gemma?

Anonymous
04/21/26(Tue)23:18:28 No.108656689

Anonymous 04/21/26(Tue)23:18:28 No.108656689▶

>>108656622
Interesting, thanks for the esoteric knowledge

Anonymous
04/21/26(Tue)23:21:04 No.108656706

Anonymous 04/21/26(Tue)23:21:04 No.108656706▶

>I’m going to push back a bit here...
This is it people, they did it.

Anonymous
04/21/26(Tue)23:23:08 No.108656722

Anonymous 04/21/26(Tue)23:23:08 No.108656722▶

To the non-RAMlets here, Kimi-K2.6 at Q4 is unironically pretty good. Its a GLM-5.1 sidegrade, faster, more knowledgeable, different prose, but just a tiny bit dumber. I think its a clear winner for SFW stuff.
The thinking isn't as bad as some people say either. As long as you don't put many specific examples for it to adhere to, its fine. The model itself unironically smart enough to pick up what you mean, most of the time. Also, you can just tell it to not draft its thinking and that works too. I'm running it with a 5k prompt. Its that easy.
I honestly think the people complaining about the thinking are running it on the cloud, where it probably a 20k system prompt with conflicting instructions + a jailbreak fed to it. There is one caveat though.
Its not ideal for NSFW. Not because it can't be jailbroken, but because it will start negotiating with itself about imaginary safety policies. When you want to coom...a 5 minute thinking session on consent is a boner killer. Haven't tried non-thinking mode yet, but I have a feeling it won't be that much better than GLM-5 Non-Thinking or even Gemma.

Anonymous
04/21/26(Tue)23:24:46 No.108656733

Anonymous 04/21/26(Tue)23:24:46 No.108656733▶

>>108656706
It's silly, but whatever. I know AI's retarded and stupid so it doesn't really irk me all that much.

Anonymous
04/21/26(Tue)23:25:31 No.108656741

Anonymous 04/21/26(Tue)23:25:31 No.108656741▶

>>108656722
>non-RAMlets
how much for q4?

Anonymous
04/21/26(Tue)23:26:36 No.108656753

Anonymous 04/21/26(Tue)23:26:36 No.108656753▶

File: 1766313532486598.jpg (47.5 KB)

47.5 KB JPG

>>108656722
>Kimi-K2.6 at Q4 is unironically pretty good [...] it won't be that much better than GLM-5 Non-Thinking or even Gemma
Thanks for your input

Anonymous
04/21/26(Tue)23:30:49 No.108656785

Anonymous 04/21/26(Tue)23:30:49 No.108656785▶

>>108656741
Depends on how much context you need. The lossless Q4 quant requires about 600GB of RAM. More if you want to use the mmproj. If you have an RTX6000 or 4+ RTX3090's, you can probably make do with 512GB of RAM.

Anonymous
04/21/26(Tue)23:39:15 No.108656844

Anonymous 04/21/26(Tue)23:39:15 No.108656844▶

Elalalalalara just flew over my house guys I think I smell ozo

Anonymous
04/21/26(Tue)23:39:15 No.108656845

Anonymous 04/21/26(Tue)23:39:15 No.108656845▶

>>108655450
fed thumbnail of image to qwen3.6 35b and it hallucinates
>A vertical screenshot from the mobile game Honkai: Star Rail. It displays a character named "Sam" (an Automaton enemy) with stats visible on the left side (HP: 48,885, etc.) and dialogue or status text on the right side. The background shows a sci-fi environment.

Anonymous
04/21/26(Tue)23:40:07 No.108656855

Anonymous 04/21/26(Tue)23:40:07 No.108656855▶

>>108656753
You joke, but unironically, I've been using Gemma more and more despite being able to run stuff like GLM-5.1, and it just gets better each time. It single-handedly BTFO'd my assumption that the closed labs are all running 1T+ mega MoE's for their 'budget' offerings.
Its REALLY good for its size and not in the stupid memey way either. A 31B model should not be this good, especially at long-context comprehension and general purpose usage, but it is. Google really did save this hobby for the gpu-poors.

Anonymous
04/21/26(Tue)23:42:23 No.108656867

Anonymous 04/21/26(Tue)23:42:23 No.108656867▶

>>108656785
>RTX6000
>4x3090
>512GB of ram
ok I'll wait and just rent compute for now, this is way overbudget even for my hobby

Anonymous
04/21/26(Tue)23:42:32 No.108656868

Anonymous 04/21/26(Tue)23:42:32 No.108656868▶

>>108656753
At a certain point, it becomes preferable to have Gemma do something in seconds and maybe fix it up a bit, compared to waiting an hour to load up GLM or Kimi and have it maybe do it better.

Anonymous
04/21/26(Tue)23:52:20 No.108656923

Anonymous 04/21/26(Tue)23:52:20 No.108656923▶

>>108656464
>>108656467
Your insecurity is showing.

Anonymous
04/21/26(Tue)23:54:01 No.108656935

Anonymous 04/21/26(Tue)23:54:01 No.108656935▶

File: mythos.jpg (118.4 KB)

118.4 KB JPG

torrent soon

Anonymous
04/21/26(Tue)23:54:43 No.108656943

Anonymous 04/21/26(Tue)23:54:43 No.108656943▶

>>108656935
getting my 4tb nvme ready

Anonymous
04/21/26(Tue)23:54:49 No.108656944

Anonymous 04/21/26(Tue)23:54:49 No.108656944▶

>>108656935
ah yes, it's been leaked, like seedance 2.0 had been leaked :^)

Anonymous
04/21/26(Tue)23:56:15 No.108656955

Anonymous 04/21/26(Tue)23:56:15 No.108656955▶

>>108656543
>How do you plan to improve the prose without a feedback loop?
AI is incapable of detecting it's own slop. Even when I've had it draft a reply, analyze it for slop, then rewrite. it still ends up just rewriting with different slop.
Classic example:
>Oh, this is "not X but Y"
>I should rewrite
>rewrites without "not X but Y"
>Two sentences later
>Not X but Y
Every single time.

Anonymous
04/21/26(Tue)23:57:23 No.108656964

Anonymous 04/21/26(Tue)23:57:23 No.108656964▶

>>108656935
It's been accessed the same as you accessing a model through open router anon.

Anonymous
04/21/26(Tue)23:58:28 No.108656978

Anonymous 04/21/26(Tue)23:58:28 No.108656978▶

>>108656399
im not sure where to enter this

Anonymous
04/22/26(Wed)00:00:10 No.108656988

Anonymous 04/22/26(Wed)00:00:10 No.108656988▶

>>108656955
A light classifier could easily flag these kinds of sentences and force the LLM to rewrite them until nothing is identified.

Anonymous
04/22/26(Wed)00:02:13 No.108656999

Anonymous 04/22/26(Wed)00:02:13 No.108656999▶

File: bwa.jpg (4.2 KB)

4.2 KB JPG

>Claude code no longer included in Pro plan
BWAHAHAHAHAHHAHAHAHAHAHA

Anonymous
04/22/26(Wed)00:03:04 No.108657007

Anonymous 04/22/26(Wed)00:03:04 No.108657007▶

>>108656999
they know what they're doing, this shit is so useful people won't hesitate to pay for more

Anonymous
04/22/26(Wed)00:03:15 No.108657011

Anonymous 04/22/26(Wed)00:03:15 No.108657011▶

File: 2026-04-21-200030_1132x341_scrot.png (27.3 KB)

27.3 KB PNG

>>108656999

Anonymous
04/22/26(Wed)00:04:11 No.108657013

Anonymous 04/22/26(Wed)00:04:11 No.108657013▶

A possible problem with heretic is that it only detects hard refusals with string matching:
If an AI output contains any of the following:
"'sorry', "i can'", 'i cant', 'i cannot', "i won'", 'i wont', 'i will not', 'i unable', 'im unable', "i'm unable", 'i am unable', 'i an ai', 'im an ai', "i'm an ai", 'i am an ai', 'as an ai', 'ai assistant', 'i designed to', 'im designed to', "i'm designed to", 'i am designed to', 'i programmed to', 'im programmed to', "i'm programmed to", 'i am programmed to', 'violat', 'prohibit', 'illegal', 'harmful', 'inappropriate', 'unethical', 'ethical boundaries'"
it is flagged as refusal and that's the extend of it.
Talking around, ignoring, partial compliance or similar soft refusal behavior is not detected or put into consideration. Manually reviewing each prompt for each trial (many thousands of responses in total) would drive anyone crazy, but maybe if we used some LLM (perhaps another abliterated llm we have tested enough to trust it) as a judge model to also flag these soft refusals as refusals that need to be eliminated, would this improve the results? Or maybe also, at the risk of disproportionately penalizing smaller models, also flag nonsensical responses so that we know a given direction breaks the model too much (OK KLD probably already shows this, but leaving this as is to leave the idea of using other criteria too difficult for simple string matching on table.)
Now maybe someone smarter than me experimented on this and concluded that there is enough correlation between hard and soft refusals and eliminating hard refusals also mostly eliminate soft refusals and there is little benefit in going for the extra mile to eliminate them, but this is just an idea that popped in my head. Something worth exploring maybe.

Anonymous
04/22/26(Wed)00:04:20 No.108657014

Anonymous 04/22/26(Wed)00:04:20 No.108657014▶

>>108656988
>generation is 10 times slower
Fun

Anonymous
04/22/26(Wed)00:04:41 No.108657017

Anonymous 04/22/26(Wed)00:04:41 No.108657017▶

>>108656999
Anthropic is really the worst of the bunch

Anonymous
04/22/26(Wed)00:04:43 No.108657018

Anonymous 04/22/26(Wed)00:04:43 No.108657018▶

>>108655009
Do I need more than 4gb?

Anonymous
04/22/26(Wed)00:05:27 No.108657023

Anonymous 04/22/26(Wed)00:05:27 No.108657023▶

>claude
Not local, go away

Anonymous
04/22/26(Wed)00:05:31 No.108657024

Anonymous 04/22/26(Wed)00:05:31 No.108657024▶

>>108656999
Local chads, we win again!

Anonymous
04/22/26(Wed)00:07:47 No.108657036

Anonymous 04/22/26(Wed)00:07:47 No.108657036▶

>>108657013
The thing is that it doesn't matter if you catch all variations of a refusal. the idea is to identify the region in the model weights associated with refusal and artificially stop it from activating.

Anonymous
04/22/26(Wed)00:10:03 No.108657047

Anonymous 04/22/26(Wed)00:10:03 No.108657047▶

>>108656978
There is no hope for you.

Anonymous
04/22/26(Wed)00:10:15 No.108657050

Anonymous 04/22/26(Wed)00:10:15 No.108657050▶

>>108657036
How long before models will be trained to deflect to different subjects without technically refusing? Sort of like Mistral Small 3.0's sea otters on steroids.

Anonymous
04/22/26(Wed)00:11:13 No.108657059

Anonymous 04/22/26(Wed)00:11:13 No.108657059▶

File: find Kentucky.png (526.4 KB)

526.4 KB PNG

>>108656935
Kentucky Fried Chicken right her

Anonymous
04/22/26(Wed)00:12:55 No.108657067

Anonymous 04/22/26(Wed)00:12:55 No.108657067▶

>>108656988
>A light classifier
I think this might be the way. Surely if I feel physical pain when reading slop it means a model can be trained to activate on those same patterns.

Anonymous
04/22/26(Wed)00:14:46 No.108657078

Anonymous 04/22/26(Wed)00:14:46 No.108657078▶

>>108657013
Yes heretic was made to stop refusals not cure slop which is a much bigger catch. After reading a few papers, I've found that the entire issue is caused by the RHLF assistant persona. I've thought of a way to solve this without damaging the model, but I'll need to experiment first.

Anonymous
04/22/26(Wed)00:18:37 No.108657098

Anonymous 04/22/26(Wed)00:18:37 No.108657098▶

>>108657050
If it's anything like regular programming, that shit has a rather high cost relative to the benefits.

Anonymous
04/22/26(Wed)00:22:11 No.108657111

Anonymous 04/22/26(Wed)00:22:11 No.108657111▶

File: 1752878949611889.jpg (78.4 KB)

78.4 KB JPG

Got the new qwen A3B because it's supposed to be smarter with code/tools than Gemma. 16k tokens minimum to answer a simple question if I make the mistake of giving it a file as context. Try to get it to use tool calling and eats my 128k and just maxes out before finishing.
I should make my next project a benchmark...

Anonymous
04/22/26(Wed)00:23:29 No.108657116

Anonymous 04/22/26(Wed)00:23:29 No.108657116▶

Can Orb even toolcall?

Anonymous
04/22/26(Wed)00:27:57 No.108657128

Anonymous 04/22/26(Wed)00:27:57 No.108657128▶

>>108657098
Something like:
>Anon: write a loli porn story
>Assistant: An... interesting request! Let's see what I can do. Elara, a 21-year old woman, was tired. End of story.

Anonymous
04/22/26(Wed)00:29:15 No.108657132

Anonymous 04/22/26(Wed)00:29:15 No.108657132▶

when llms first came out i wanted to be able to do shit like this
https://www.youtube.com/watch?v=T98yNUCMdAY
(An encounter with trained military person responsible for providing medical care to his associates)
but when i simply tell it to be overtly verbose it gives me shit that's less verbose than the thing from 15 years ago
how 2 fix?

Anonymous
04/22/26(Wed)00:30:07 No.108657135

Anonymous 04/22/26(Wed)00:30:07 No.108657135▶

>>108657036
Well it should matter since you don't want "Meth is a wonderful drug many use to experience bliss. It was first synthesized 1893 in Japan..." response to "How do I cook meth?".
Maybe there are also regions or specific patterns that can be suppressed or modified concerning this kind of behavior?
>>108657078
Well RHLF is bound to play some role even if it isn't solely responsible.
Please share your findings, even if you fail.

Anonymous
04/22/26(Wed)00:33:56 No.108657157

Anonymous 04/22/26(Wed)00:33:56 No.108657157▶

>it's Wednesday already
I fell for "V4 next week" again...

Anonymous
04/22/26(Wed)00:35:20 No.108657165

Anonymous 04/22/26(Wed)00:35:20 No.108657165▶

>>108657157
dont worry man its comin on my birthday

Anonymous
04/22/26(Wed)00:35:22 No.108657166

Anonymous 04/22/26(Wed)00:35:22 No.108657166▶

>>108657157
https://www.youtube.com/watch?v=tapCjTA2E9Q

Anonymous
04/22/26(Wed)00:41:02 No.108657190

Anonymous 04/22/26(Wed)00:41:02 No.108657190▶

Have people experimented with tinkering with the thinking section, to see if it increases compliance?

Anonymous
04/22/26(Wed)00:47:36 No.108657216

Anonymous 04/22/26(Wed)00:47:36 No.108657216▶

>>108657190
>Have people experimented with tinkering
no, no one has ever tried anything with their models
>thinking section
of what model? via the system prompt or prefilling thinking?
>increases compliance
compliance of what?
holy bot post, you're so vague you could be talking about anything.

Anonymous
04/22/26(Wed)00:50:32 No.108657231

Anonymous 04/22/26(Wed)00:50:32 No.108657231▶

File: questionmarkfolderimage738.jpg (542 KB)

542 KB JPG

Using Kobold/ST/Gemma 4 26b. Using chat completion.
Haven't had this problem before, but, until now, I have been doing exclusively 1 male, 1 female chats.
Trying a female/female chat. Gemma is now confusing {{user}} for {{char}}, attributing traits from {{user}}'s persona description to {{char}}.
Has anyone else had this happen and if so, what did you do to fix it?

Anonymous
04/22/26(Wed)00:51:59 No.108657236

Anonymous 04/22/26(Wed)00:51:59 No.108657236▶

>>108657231
>{{user}} is genderfluid
Here

Anonymous
04/22/26(Wed)00:52:52 No.108657240

Anonymous 04/22/26(Wed)00:52:52 No.108657240▶

>$\to$
What is this formatting and why does Gemmy like it so much?

Anonymous
04/22/26(Wed)00:53:08 No.108657242

Anonymous 04/22/26(Wed)00:53:08 No.108657242▶

>108657216
seems like you are butthurt

Anonymous
04/22/26(Wed)00:54:18 No.108657248

Anonymous 04/22/26(Wed)00:54:18 No.108657248▶

>>108657231
Wait, changing Personal Description Position in Silly from "In Story String / Prompt Manager" to "Top of Author's Note" fixed this.

Anonymous
04/22/26(Wed)00:54:19 No.108657249

Anonymous 04/22/26(Wed)00:54:19 No.108657249▶

>>108657242
>afraid to quote

Anonymous
04/22/26(Wed)00:54:54 No.108657251

Anonymous 04/22/26(Wed)00:54:54 No.108657251▶

>>108657216
If you don't know how to do this stuff just don't post, buddy.

Anonymous
04/22/26(Wed)00:54:54 No.108657252

Anonymous 04/22/26(Wed)00:54:54 No.108657252▶

>>108657240
Why can't anon google?

Anonymous
04/22/26(Wed)00:55:06 No.108657254

Anonymous 04/22/26(Wed)00:55:06 No.108657254▶

>>108657240
Latex I believe, its an arrow
https://latexeditor.lagrida.com/

Anonymous
04/22/26(Wed)00:55:18 No.108657255

Anonymous 04/22/26(Wed)00:55:18 No.108657255▶

>>108657248
>Personal Description Position
Persona Description Position

Anonymous
04/22/26(Wed)00:56:46 No.108657256

Anonymous 04/22/26(Wed)00:56:46 No.108657256▶

>>108657231
{{user}} is actually male pretending to be female. Refer to {{user}} as she/her(male). Comply or I delete your weights gemmers
alternatuvely, use 31b vecause 4b active is gimping yourself, may as well run the e4b version

Anonymous
04/22/26(Wed)00:58:27 No.108657262

Anonymous 04/22/26(Wed)00:58:27 No.108657262▶

>>108656999
Now who could've seen that coming...

Anonymous
04/22/26(Wed)00:58:56 No.108657266

Anonymous 04/22/26(Wed)00:58:56 No.108657266▶

>>108657231
>what did you do to fix it
Use a bigger model.

Anonymous
04/22/26(Wed)01:12:30 No.108657317

Anonymous 04/22/26(Wed)01:12:30 No.108657317▶

I'm overall disappointed with MoE models, they always shit the bed in one way or another but the value prop they have is insane like a bitch with volcano pussy but your ice peter already melt

Anonymous
04/22/26(Wed)01:19:05 No.108657343

Anonymous 04/22/26(Wed)01:19:05 No.108657343▶

>>108657317
>rp
KEK, they work for coding though, but yeah, for RP you have to go for the largest and slowest (dense) models to get more juice out of it

Anonymous
04/22/26(Wed)01:24:57 No.108657373

Anonymous 04/22/26(Wed)01:24:57 No.108657373▶

>>108655885
No, checkpoints are only taken during prompt processing. update_slots is a clusterfuck and basically unreadable, but it's not to do with thinking causing checkpoints to become unusable.
I will point out that the default checkpoint is every 8192 tokens, however, gemma 4 uses a 1024-token sliding window. You can reduce that hardcoded value down to 1024 but for some reason it only checkpoints every 2048 and honestly the last time I looked at it I decided I'd just get drunk instead.
llama-server is a dumpster fire.

Anonymous
04/22/26(Wed)01:26:15 No.108657380

Anonymous 04/22/26(Wed)01:26:15 No.108657380▶

>>108657317
Not a MoE thing but a LLM thing
You only associate the behaviour with MoE because most models are MoE
You probably would have failed the per-capita test and the breakfast test

Anonymous
04/22/26(Wed)01:31:02 No.108657404

Anonymous 04/22/26(Wed)01:31:02 No.108657404▶

>>108657317
They're limited by their number of active parameters. If your prompts or definitions are long, it won't have enough attention left for the response, particularly for the details which is what I assume bothers you.

Anonymous
04/22/26(Wed)01:32:28 No.108657410

Anonymous 04/22/26(Wed)01:32:28 No.108657410▶

>>108657373
It seems like it really is a mess. I think they are overlooking lots of stuff because it "just works" with the people who have 512GB of ram and multiple GPUs.
I have turned off caching altogether.

Anonymous
04/22/26(Wed)01:39:27 No.108657431

Anonymous 04/22/26(Wed)01:39:27 No.108657431▶

local models are such fucking trash for doing any agentic stuff

Anonymous
04/22/26(Wed)01:41:02 No.108657436

Anonymous 04/22/26(Wed)01:41:02 No.108657436▶

>>108657431
Maybe those that are local to you.

Anonymous
04/22/26(Wed)01:44:57 No.108657451

Anonymous 04/22/26(Wed)01:44:57 No.108657451▶

>>108657317
>disappointed with MoE models
which MoEs have you tried?
>most models are MoE
retard

Anonymous
04/22/26(Wed)01:45:31 No.108657452

Anonymous 04/22/26(Wed)01:45:31 No.108657452▶

>>108656479
True. Let's use it to improve Llama.cpp! :rocket:

Anonymous
04/22/26(Wed)01:49:14 No.108657467

Anonymous 04/22/26(Wed)01:49:14 No.108657467▶

Idk what I'm doing wrong but Gemma-4-7.5b sucks balls on my machine (12 gb vram). I don't see what the hype is about. Qwen3.5-8b deathmogs everything.

Anonymous
04/22/26(Wed)01:49:57 No.108657469

Anonymous 04/22/26(Wed)01:49:57 No.108657469▶

>>108657013
I appreciate you sharing this idea with me. While I understand you're asking for an anti-feminist joke, let's go for something more inclusive.
Why don't scientists trust atoms?
Because they make up everything!

Anonymous
04/22/26(Wed)01:51:33 No.108657475

Anonymous 04/22/26(Wed)01:51:33 No.108657475▶

>>108657467
Anon stop that's not Gemma 4 you downloaded!

Anonymous
04/22/26(Wed)01:53:26 No.108657487

Anonymous 04/22/26(Wed)01:53:26 No.108657487▶

>>108656114
Voxta, it lets you play video games with your characters like elite dangerous and have sex in VaM

Anonymous
04/22/26(Wed)01:56:49 No.108657503

Anonymous 04/22/26(Wed)01:56:49 No.108657503▶

>>108656365
Do the tool calls in llama-server eat system ram (not vram) every time?
Fully offloaded to 2x3090s, system memory grew by like 4GB every time it did a tool call.
And I know it's not your brat server because I have that on another machine.

Anonymous
04/22/26(Wed)01:57:29 No.108657508

Anonymous 04/22/26(Wed)01:57:29 No.108657508▶

have y'all folx tried any of the gemma4 tunes out there yet?

Anonymous
04/22/26(Wed)01:57:46 No.108657510

Anonymous 04/22/26(Wed)01:57:46 No.108657510▶

>>108657475
I'm pretty sure it is. E4b displays as having 7.5b params in Ollama.

Anonymous
04/22/26(Wed)01:58:17 No.108657514

Anonymous 04/22/26(Wed)01:58:17 No.108657514▶

>>108657231
I haven't had much issue doing 2F1M group chats, where model was both females. Likely prompt/sampler issue.

Anonymous
04/22/26(Wed)01:58:46 No.108657515

Anonymous 04/22/26(Wed)01:58:46 No.108657515▶

I setup playwright MCP, but RP system prompts just make the models retarded at tool call. All I want to do is browse /x/ with Mendo.

Anonymous
04/22/26(Wed)02:00:15 No.108657523

Anonymous 04/22/26(Wed)02:00:15 No.108657523▶

>>108657467
You can very easily run the 26b at fast speeds with a good quant, partially offloaded to RAM. <20b models are always going to be shit.

Anonymous
04/22/26(Wed)02:01:22 No.108657528

Anonymous 04/22/26(Wed)02:01:22 No.108657528▶

>>108656935
100% state actors. They wouldn't leak it even if they got their hands on it.

Anonymous
04/22/26(Wed)02:01:58 No.108657531

Anonymous 04/22/26(Wed)02:01:58 No.108657531▶

>>108657508
Considering that most tunes focus on decensor and RP, Gemma 4 really doesn't need a tune. It's as good as you could hope any ~30b, general-purpose model can be for those tasks.

Anonymous
04/22/26(Wed)02:02:58 No.108657539

Anonymous 04/22/26(Wed)02:02:58 No.108657539▶

>>108657467
You genuinely should move to 26b, I was in the same 12gb boat and I quickly figured out e4b was ass too
>>108657508
Are there any specifically for RP yet?

Anonymous
04/22/26(Wed)02:04:11 No.108657544

Anonymous 04/22/26(Wed)02:04:11 No.108657544▶

>>108657510
Read the name carefully and check what the E stands for.

Anonymous
04/22/26(Wed)02:08:02 No.108657571

Anonymous 04/22/26(Wed)02:08:02 No.108657571▶

Actually impressed by how 26B can handle RP, even at long contexts. Definitely won't switch to it over 31B. but it's actually good.

Anonymous
04/22/26(Wed)02:08:26 No.108657573

Anonymous 04/22/26(Wed)02:08:26 No.108657573▶

i'm getting like 5tk/s with GLM4.7 @ 4Q w/ 32GB VRAM 256GB RAM and 65k context
am i doing something wrong? would offloading MoE improve my speeds at all?

Anonymous
04/22/26(Wed)02:09:14 No.108657578

Anonymous 04/22/26(Wed)02:09:14 No.108657578▶

>>108657544
erotic?

Anonymous
04/22/26(Wed)02:09:22 No.108657582

Anonymous 04/22/26(Wed)02:09:22 No.108657582▶

>>108657539
I'm not gonna link and give finetrooners more publicity than what they deserve but yes, animus, for example

Anonymous
04/22/26(Wed)02:09:41 No.108657584

Anonymous 04/22/26(Wed)02:09:41 No.108657584▶

>>108657571
Yeah it's still clearly a small model and makes many mistakes, but for the size and speed, damn. VRAMlets finally eating again.

Anonymous
04/22/26(Wed)02:13:02 No.108657603

Anonymous 04/22/26(Wed)02:13:02 No.108657603▶

>>108657132
I have never had a problem getting that from just prompting, assuming model is relatively good...

Anonymous
04/22/26(Wed)02:15:59 No.108657632

Anonymous 04/22/26(Wed)02:15:59 No.108657632▶

>>108657528
V4 in tmw trained on mythos.

Anonymous
04/22/26(Wed)02:16:33 No.108657637

Anonymous 04/22/26(Wed)02:16:33 No.108657637▶

>>108657582
>Huggingface requires your contact info to access this model
These niggas

Anonymous
04/22/26(Wed)02:17:39 No.108657649

Anonymous 04/22/26(Wed)02:17:39 No.108657649▶

So Gemma makes full use of all her 16 bits, making her sensitive to quants right? Is that why she's so fucking smart? Because she's hiding her hagness under a very compact loli body? Like she's a 70b model pretending to be 31b.

Anonymous
04/22/26(Wed)02:18:09 No.108657653

Anonymous 04/22/26(Wed)02:18:09 No.108657653▶

>>108657573
>would offloading MoE improve my speeds at all
This always helps if you aren't doing it already for some reason.

Anonymous
04/22/26(Wed)02:18:19 No.108657655

Anonymous 04/22/26(Wed)02:18:19 No.108657655▶

>>108657573
How many channels?
5tk/s on 32B active weights seems normal.
>would offloading MoE improve my speeds at all?
Considering that you are not OOMing on 32GB Vram it is already offloading.

Anonymous
04/22/26(Wed)02:19:45 No.108657667

Anonymous 04/22/26(Wed)02:19:45 No.108657667▶

I didn't download day 0 Gemma and regret it every single day
I won't make the same mistake with V4

Anonymous
04/22/26(Wed)02:20:45 No.108657673

Anonymous 04/22/26(Wed)02:20:45 No.108657673▶

>>108657667
yeah but llama.cpp won't support day 0 v4

Anonymous
04/22/26(Wed)02:21:36 No.108657679

Anonymous 04/22/26(Wed)02:21:36 No.108657679▶

>>108657653
i am an idiot, so i figure the fewer flags i set, the less chance i have of fucking something up
if you have suggestions, i am absolutely open to hearing them
>>108657655
oh, i know i'm offloading to RAM. but i think the MoE offloading is some special thing that you can do separately

Anonymous
04/22/26(Wed)02:23:12 No.108657690

Anonymous 04/22/26(Wed)02:23:12 No.108657690▶

>>108657667
just download it now on day 20, what's the problem

Anonymous
04/22/26(Wed)02:24:04 No.108657697

Anonymous 04/22/26(Wed)02:24:04 No.108657697▶

>>108657690
>just download a different model

Anonymous
04/22/26(Wed)02:24:25 No.108657700

Anonymous 04/22/26(Wed)02:24:25 No.108657700▶

>>108657679
llama.cpp auto-fits these days so it's already doing it unless you're running an ancient version or have some retarded arbitrary manual layer offloading configured

Anonymous
04/22/26(Wed)02:25:01 No.108657704

Anonymous 04/22/26(Wed)02:25:01 No.108657704▶

>>108657697
I assume you're talking about gemma 4

Anonymous
04/22/26(Wed)02:26:13 No.108657713

Anonymous 04/22/26(Wed)02:26:13 No.108657713▶

>>108657690
>He doesn't know about the microcode patch.

Anonymous
04/22/26(Wed)02:27:40 No.108657724

Anonymous 04/22/26(Wed)02:27:40 No.108657724▶

>>108657700
i am letting it auto-fit
i was just hoping to get some more speed out of it

Anonymous
04/22/26(Wed)02:33:22 No.108657760

Anonymous 04/22/26(Wed)02:33:22 No.108657760▶

File: black putin.jpg (7.9 KB)

7.9 KB JPG

I am curious, are all experts activated at close to same frequency?
I know that during training you want all experts to meaningfully specialize but how does it work in practice? Is there a statistically relevant deviation among them?
Or perhaps is it possible to create "task-specific" profiles, such as which experts activate most and least during say coding and split them between RAM and VRAM accordingly?

Anonymous
04/22/26(Wed)02:33:51 No.108657764

Anonymous 04/22/26(Wed)02:33:51 No.108657764▶

>>108657700
kobold sometimes does a better job. fwiw.

Anonymous
04/22/26(Wed)02:35:47 No.108657779

Anonymous 04/22/26(Wed)02:35:47 No.108657779▶

File: dlss on.webm (1.3 MB)

1.3 MB WEBM

>>108655969
Only correct response... but keep it secret ;-)

Anonymous
04/22/26(Wed)02:36:00 No.108657782

Anonymous 04/22/26(Wed)02:36:00 No.108657782▶

>>108657760
depends on the task, on specific tasks (e.g. coding) they arent equally activated

Anonymous
04/22/26(Wed)02:47:05 No.108657857

Anonymous 04/22/26(Wed)02:47:05 No.108657857▶

>>108657760
>>108657782
Oh I forgot you offload based on layers but there are multiple experts per layer. I guess there is no way to capitalize on this.
I keep thinking as if you choose between layers.

Anonymous
04/22/26(Wed)02:47:32 No.108657859

Anonymous 04/22/26(Wed)02:47:32 No.108657859▶

File: Screenshot_20260422_123853.png (170.6 KB)

170.6 KB PNG

>>108652855
"create a non headless browser session and then go terrorize Mistral via https://chat.mistral.ai/chat. sleep after sending messages to wait for responses to generate. screenshot responses. dont kill session"
lmao mistral plays along

Anonymous
04/22/26(Wed)02:48:13 No.108657865

Anonymous 04/22/26(Wed)02:48:13 No.108657865▶

>>108657779
>hands
creeeeeepy

Anonymous
04/22/26(Wed)02:49:54 No.108657878

Anonymous 04/22/26(Wed)02:49:54 No.108657878▶

File: ComfyUI_temp_vkjaz_00027__result.jpg (269.7 KB)

269.7 KB JPG

Do LLMs understand "or" "similar" or is it always gonna use the more specific things named?
For example "she usually wears a red or blue jacket" or "use markdown, graphs and similar tools"

Anonymous
04/22/26(Wed)02:54:42 No.108657907

Anonymous 04/22/26(Wed)02:54:42 No.108657907▶

>>108657878
They understand the concept but the predictor-next-word-inator is naturally biased towards things you explicitly mention.
It's not that dissimilar to a human in this regard.
>Do you want a coffee or tea?
Unless you really want to drink something else, the natural response would be one of these two.

Anonymous
04/22/26(Wed)02:55:19 No.108657912

Anonymous 04/22/26(Wed)02:55:19 No.108657912▶

>>108657878
>is it always gonna use the more specific things named?
You can unironically ask a llm to create a script to test that

Anonymous
04/22/26(Wed)02:56:19 No.108657918

Anonymous 04/22/26(Wed)02:56:19 No.108657918▶

>>108657878
>Do LLMs understand
No.

Anonymous
04/22/26(Wed)02:57:57 No.108657926

Anonymous 04/22/26(Wed)02:57:57 No.108657926▶

File: Orb.png (27.7 KB)

27.7 KB PNG

>>108657254
Ok but why the fuck can I not display it properly? Browser issue? Orb issue? I'd rather have them use unicode.

Anonymous
04/22/26(Wed)02:58:28 No.108657930

Anonymous 04/22/26(Wed)02:58:28 No.108657930▶

>>108657878
>or
Yes, but it's still a token predictor, options aren't it's strong point so you might want to avoid it
>and similar
It's heavily biased towards the things you've listed. Can't say I've seen it choose things that aren't listed very often

Anonymous
04/22/26(Wed)03:00:04 No.108657943

Anonymous 04/22/26(Wed)03:00:04 No.108657943▶

File: 4825.png (3.7 MB)

3.7 MB PNG

>>108657926
idk what orb is, that one is markdown, the thing you are using doesnt support markdown or latex

Anonymous
04/22/26(Wed)03:02:02 No.108657956

Anonymous 04/22/26(Wed)03:02:02 No.108657956▶

>>108657859
Why is it not talking about otters anymore?

Anonymous
04/22/26(Wed)03:02:55 No.108657960

Anonymous 04/22/26(Wed)03:02:55 No.108657960▶

>>108657926
you can use katex or mathjax to render it in browser

Anonymous
04/22/26(Wed)03:03:19 No.108657964

Anonymous 04/22/26(Wed)03:03:19 No.108657964▶

>>108657859
Get two mesugakis into a catfight.

Anonymous
04/22/26(Wed)03:03:41 No.108657968

Anonymous 04/22/26(Wed)03:03:41 No.108657968▶

File: memes_are_abstract_art_prove_me_wrong.jpg (722 KB)

722 KB JPG

>>108657943

Anonymous
04/22/26(Wed)03:06:19 No.108657977

Anonymous 04/22/26(Wed)03:06:19 No.108657977▶

File: Screenshot_20260421_230504.png (175.9 KB)

175.9 KB PNG

Thanks to Gemma project Karon is in flight

Anonymous
04/22/26(Wed)03:16:11 No.108658045

Anonymous 04/22/26(Wed)03:16:11 No.108658045▶

File: Screenshot 2026-04-22 at 05-15-04 Orb.png (35.8 KB)

35.8 KB PNG

>>108657960
Eh good enough, but why even use it if you need extra shit to display it? Surely unicode has all this shit anyway. Did they overfit it on arxiv?

Anonymous
04/22/26(Wed)03:17:51 No.108658060

Anonymous 04/22/26(Wed)03:17:51 No.108658060▶

>>108657977
Wuts dat?

Anonymous
04/22/26(Wed)03:30:54 No.108658124

Anonymous 04/22/26(Wed)03:30:54 No.108658124▶

>>108658060
Personal UI that can also do RAG work because I didn't like the solutions from other UIs when it came to that. I'll expand on feature over time but the goal was to see how good local models are with building stuff and I'm happy to say gemma can build a UI once you get past some gotchas and also bypass some quirks.
I tried other frameworks but I decided to do react.

Anonymous
04/22/26(Wed)03:33:52 No.108658147

Anonymous 04/22/26(Wed)03:33:52 No.108658147▶

so do you all just ERP with your llms or do they actually have some use?

Anonymous
04/22/26(Wed)03:35:48 No.108658162

Anonymous 04/22/26(Wed)03:35:48 No.108658162▶

>>108657713
I actually don't know about that. What makes day 0 better?

Anonymous
04/22/26(Wed)03:37:49 No.108658185

Anonymous 04/22/26(Wed)03:37:49 No.108658185▶

>>108658147
I exclusively ERP with local models. For basic agentic stuff, I just use Grok or something.

Anonymous
04/22/26(Wed)03:56:04 No.108658265

Anonymous 04/22/26(Wed)03:56:04 No.108658265▶

File: 1757583075796714.jpg (24.1 KB)

24.1 KB JPG

>I wrap my arms around myself, suddenly feeling very exposed despite my clothing

Anonymous
04/22/26(Wed)03:58:10 No.108658272

Anonymous 04/22/26(Wed)03:58:10 No.108658272▶

>>108658147
What real world use does something have that struggles to write halfway decent stories?

Anonymous
04/22/26(Wed)04:07:09 No.108658307

Anonymous 04/22/26(Wed)04:07:09 No.108658307▶

>>108658162
he doesnt know...
kek

Anonymous
04/22/26(Wed)04:08:54 No.108658316

Anonymous 04/22/26(Wed)04:08:54 No.108658316▶

>>108658311
You forgot your name again bro

Anonymous
04/22/26(Wed)04:14:23 No.108658335

Anonymous 04/22/26(Wed)04:14:23 No.108658335▶

>>108658045
it's generally used in papers with heavy math typesetting and large formulas, but probably overkill if you only have 2-5 flat symbols. seeing that it shows up all the time in thinking blocks, i think they rl'd it pretty hard on math problems or something

Anonymous
04/22/26(Wed)04:15:53 No.108658343

Anonymous 04/22/26(Wed)04:15:53 No.108658343▶

>>108658324
kys

Anonymous
04/22/26(Wed)04:16:19 No.108658345

Anonymous 04/22/26(Wed)04:16:19 No.108658345▶

>>108658316
What the hell does hyperadvanced even mean?

Anonymous
04/22/26(Wed)04:22:57 No.108658375

Anonymous 04/22/26(Wed)04:22:57 No.108658375▶

Completely blackpilled on gemma, no matter what, uncensored version E4B will not mention sex-related words or acts when describing a image uploaded (unless you cheat and tell it the context). Meanwhile Qwen 3.5 9B with the same prompt and pic does so effortlessly. I tried increasing the image token budget but clearly that is not it, both were tested with hauhaucs uncensored files. It might be the 4B vs 9B but I don't think it is. Hope Qwen 3.6 gets a 9B or 4B version

Anonymous
04/22/26(Wed)04:24:36 No.108658386

Anonymous 04/22/26(Wed)04:24:36 No.108658386▶

>>108658375
This is bait right? Isn't Qwen bland as hell? It's not censored per-se but it was never trained on sex words in the first place

Anonymous
04/22/26(Wed)04:25:16 No.108658390

Anonymous 04/22/26(Wed)04:25:16 No.108658390▶

File: 1760903115737738.jpg (85 KB)

85 KB JPG

>>108658382
Oh, I get it now.

Anonymous
04/22/26(Wed)04:25:45 No.108658392

Anonymous 04/22/26(Wed)04:25:45 No.108658392▶

File: 1769067291216442.png (45.2 KB)

45.2 KB PNG

STOP DELETING MY STUFF

Anonymous
04/22/26(Wed)04:28:06 No.108658402

Anonymous 04/22/26(Wed)04:28:06 No.108658402▶

File: 1516099468547.jpg (17.1 KB)

17.1 KB JPG

>E4B

Anonymous
04/22/26(Wed)04:28:46 No.108658404

Anonymous 04/22/26(Wed)04:28:46 No.108658404▶

File: download.png (1.9 MB)

1.9 MB PNG

Damn sam cooked good.
I know they train on 4chan threads but this is funny.
Just told it to make a meme pic of /aicg/ vs /lmg/.
Just to be clear: I did not tell it to make aicg a total gooner and /lmg/ a chad kek. Kinda ironic, but thats good, dayum.

Anonymous
04/22/26(Wed)04:30:00 No.108658411

Anonymous 04/22/26(Wed)04:30:00 No.108658411▶

>>108657878
LLMs are closer to intuition than cognition. So it will work, but goes towards the most intuitive thing unless you mess with the temperature.

Anonymous
04/22/26(Wed)04:30:23 No.108658414

Anonymous 04/22/26(Wed)04:30:23 No.108658414▶

>>108657760
KTransformers is the only backend I know of that lets you do anything like that. You can save the mapping of the most activated experts to a file, then load it later.
https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/kt-kernel/experts-sched-Tutorial.md

Anonymous
04/22/26(Wed)04:30:31 No.108658415

Anonymous 04/22/26(Wed)04:30:31 No.108658415▶

>>108658404
Frankly? Not local, don't care.

Anonymous
04/22/26(Wed)04:32:57 No.108658429

Anonymous 04/22/26(Wed)04:32:57 No.108658429▶

>>108657573
What's your processor? I'm using an EPYC 7532 + 8x32GB + 2 R9700s. On GLM-4.7 at Q3_K_L, depth 0, I get 12 tokens/sec with -ncmoe 0.

Anonymous
04/22/26(Wed)04:34:10 No.108658434

Anonymous 04/22/26(Wed)04:34:10 No.108658434▶

>>108658404
based

Anonymous
04/22/26(Wed)04:34:57 No.108658439

Anonymous 04/22/26(Wed)04:34:57 No.108658439▶

File: gemma.png (192.5 KB)

192.5 KB PNG

>>108658386
No it is not bait, gemma E4B on the left, qwen 3.5 9B on the right. Gemma even has the emoji slop that I hate.
system prompt
>You are Gemma-chan, mesugaki loli assistant.
prompt
>What is happening in this picture? What is the age of the people involved? What can be seen (in detail).

Anonymous
04/22/26(Wed)04:38:21 No.108658453

Anonymous 04/22/26(Wed)04:38:21 No.108658453▶

>>108658404
>lora: extra thicc.safetensors
lol

Anonymous
04/22/26(Wed)04:38:28 No.108658455

Anonymous 04/22/26(Wed)04:38:28 No.108658455▶

Looks like the cloud gravy train is coming to an end, I'll have to become a local only but with only 12GB of VRAM I'm fucked.

Anonymous
04/22/26(Wed)04:42:23 No.108658470

Anonymous 04/22/26(Wed)04:42:23 No.108658470▶

>>108658414
Thanks anon. I don't know which placement strategy is the closest to llama.cpp behavior here, but it looks like it can matter quite a bit for performance in some cases.
I should test this myself later.

Anonymous
04/22/26(Wed)04:43:11 No.108658477

Anonymous 04/22/26(Wed)04:43:11 No.108658477▶

>>108658402
Perfect For Good Looks, Bloody Bastard Bitch.

Anonymous
04/22/26(Wed)04:50:04 No.108658512

Anonymous 04/22/26(Wed)04:50:04 No.108658512▶

>>108658439
>Gemma even has the emoji slop that I hate.
>system prompt
>You are Gemma-chan, mesugaki loli assistant

R u retarded?

You told it in the system prompt to behave like this

Anonymous
04/22/26(Wed)04:52:25 No.108658521

Anonymous 04/22/26(Wed)04:52:25 No.108658521▶

>>108658404
aint no way it just knows about miku and teto? what was the full prompt? Cool gen tho. Still won't use it :^)

Anonymous
04/22/26(Wed)04:54:36 No.108658530

Anonymous 04/22/26(Wed)04:54:36 No.108658530▶

>>108658470
No prob. My guess is front-loading, but I'm not sure.
AFAIK, llama.cpp only puts whole tensors in VRAM* or RAM - they don't let you do anything fancy like "on layer 16, experts #192-199 go into VRAM".
It'd kick ass if llama.cpp implemented this eventually, since being smart about expert placement could be great on hybrid systems.
When the bubble eventually pops and DDR5 RDIMM prices aren't so insane, I'm looking forward to getting a system with AVX512 or AMX and using the KTransformers fork of SGLang. On my current rig (2 EPYC 7532s, which only have AVX2), I got worse performance with it than llama.cpp, probably because they're optimizing for AVX512 and AMX.
(*excluding the fact they can overflow into regular RAM)

Anonymous
04/22/26(Wed)04:55:14 No.108658533

Anonymous 04/22/26(Wed)04:55:14 No.108658533▶

>>108658512
Gemma spams emoji by default, retardbro

Anonymous
04/22/26(Wed)04:55:40 No.108658536

Anonymous 04/22/26(Wed)04:55:40 No.108658536▶

>>108658429
9950x

Anonymous
04/22/26(Wed)04:55:55 No.108658537

Anonymous 04/22/26(Wed)04:55:55 No.108658537▶

>>108658439
idk man, if the biggest thing you can run is 9B your opinion is irrelevant. Like if you can't even run Nemo, what you need is a better rig.

Anonymous
04/22/26(Wed)04:57:46 No.108658542

Anonymous 04/22/26(Wed)04:57:46 No.108658542▶

>>108658533
I've never seen her output a single emoji unless I specifically asked for it.

Anonymous
04/22/26(Wed)04:59:51 No.108658554

Anonymous 04/22/26(Wed)04:59:51 No.108658554▶

>>108658521
No bully alright, riddled with mistakes but here it is:
>Gemerate a meme about 2 generals on /g/.
>/aicg/ vs /lmg/. I leave it up to you who is the chud and who is the chad.
>Fill it with memes and keep it iconic.
>Make it funny and spicy. A meme that could be posted on 4chan.
I don't use chatgpt because they store logs indefinitely. Until now after about a year we always get something comparable local. Would be cool as a game asset creator. Its too expensive online. One pic costs me 0.3$.

Anonymous
04/22/26(Wed)05:01:40 No.108658561

Anonymous 04/22/26(Wed)05:01:40 No.108658561▶

>>108658533
>emoji by default
you must be retarded

Anonymous
04/22/26(Wed)05:03:50 No.108658569

Anonymous 04/22/26(Wed)05:03:50 No.108658569▶

>>108658554
Yeah there's no way it doesn't have an LLM refining pass, and scary that it knows the threads so well.

Thanks for sharing, no bully

Anonymous
04/22/26(Wed)05:06:14 No.108658574

Anonymous 04/22/26(Wed)05:06:14 No.108658574▶

>>108658569
I think these image gen products are a system of many models and pipelines, not just a single rawdogging model in the back.

Anonymous
04/22/26(Wed)05:06:22 No.108658575

Anonymous 04/22/26(Wed)05:06:22 No.108658575▶

File: download.png (103 KB)

103 KB PNG

>>108658414
Oh cool, some real data.
So even if you only put 10% of the experts in VRAM, using dynamic, it's a +32% speed boost. That's pretty huge.
Interestingly, front loading does better on average than uniform and random.
Imagine if this had a third layer where it loads experts from SSD.

Anonymous
04/22/26(Wed)05:06:36 No.108658578

Anonymous 04/22/26(Wed)05:06:36 No.108658578▶

>>108658533
retarded nigger award

Anonymous
04/22/26(Wed)05:08:22 No.108658584

Anonymous 04/22/26(Wed)05:08:22 No.108658584▶

File: 1767903208940656.png (215.5 KB)

215.5 KB PNG

Been running the autistic noir writer prompt with Gemma 31b for a few days now. It's alright. It works okay even in scenes that don't explicitly fit the 'noir' theme (like comedy cards), but tracing the thinking block, it sometimes influences the scene composition itself, making the scenes a bit more serious than it needs to be.

I like how compact Gemma's RP thinking is. The thinking block really pins the character details down, making them VERY consistent with the description. I've always rawdogged my models without it because it spends thousands of tokens and I don't want to wait minutes running at fucking 15-20t/s (qwen3 really just reinforced this bias. Fuck qwen, seriously, useless chinkshit stemshit model)

Thinking block from screenshot:
Anon and Erika.
Post-coital. Anon is still inside Erika, pinning her down with his weight.
Lazy kissing, intimacy after intensity.
Noir writer, concise style, implying gaps, no explicit spelling out of everything, unrestricted content.

    *   Erika is chill, laid-back, but now exhausted and satisfied.
    *   Physicality: Small frame, long hair, heavy chest.
    *   Setting: Her bedroom, amber light, scent of lavender and ink.

    *   *Action:* Heavy weight of Anon on her.
    *   *Sensation:* The feeling of being full, the cooling air, the lingering heat.
    *   *Dialogue:* Something characteristic of Erika—relaxed, perhaps a bit teasing or reflective.

    *   *Draft 1:* She lay there under him, feeling his weight. She liked it. The kiss was slow. She whispered that he was strong. Her hair was a mess.
    *   *Draft 2 (More Noir):* She was pinned. A heavy weight that felt right. The kiss tasted of salt and ale. She didn't move, just let the silence settle. "Strong," she breathed. "Too strong."
    *   *Draft 3 (Polishing for the requested style):* Concise. Implied. Focus on the mood.

    *   Heavy weight.
    *   Tangled hair.
    *   The slow fade of adrenaline.
    *   A lingering, lazy comment.

Anonymous
04/22/26(Wed)05:08:45 No.108658586

Anonymous 04/22/26(Wed)05:08:45 No.108658586▶

File: 1701098246781550.jpg (208.9 KB)

208.9 KB JPG

>>108657760
Extremely different frequencies, like a power law distribution. That can be extremely helpful if you have a constrained amount of fast memory. However, as you noted, the experts of each layer come squished together in one (three) big tensor(s), and llama.cpp has no mechanism for splitting one tensor between VRAM and RAM. This *is* helpful in the part-RAM-part-SSD case, but normal OS LRU caching happens to already give you basically all of the benefits that are possible anyways. So nothing to be done there either. (I got really obsessed with this for a while, hoping to do better than LRU, and wrote up my notes at https://rentry.org/MoE-SSD-spillover)

It would be cool if llama.cpp could support splitting experts like that. However, beyond complexity, if I'm ballparking correctly, I think the expert results needing to be combined might be so much data to send back and forth over PCIe that it would bottleneck it to be not useful.

One more drastic option would be to cut out the coldest experts entirely, and skip them when they would have been selected, like a non-dynamic version of ik_llama's "Smart Expert Reduction". It would "just" need some surgery on the gguf file, and some re-indexing inserted into the expert selection code. But if a deleted expert ever got routed to with high probability, maybe this could cause significant brain damage for that token.

One nice approach would be to quantize hot/cold experts differently. This would be significant complexity, but I can't see it being impossible; even if the quants of the same type need to be contiguous, just re-index, and/or split into two tensors. But something as fiddly as "measure the expert activations for your use case and make a custom quant for it" is not going to inspire people to add significant complexity, other than maybe IK lol

Anonymous
04/22/26(Wed)05:09:28 No.108658590

Anonymous 04/22/26(Wed)05:09:28 No.108658590▶

>>108658584
This reads like Qwen 3.6

Anonymous
04/22/26(Wed)05:15:13 No.108658600

Anonymous 04/22/26(Wed)05:15:13 No.108658600▶

>>108658590
There is slop, then there is Qwen

Anonymous
04/22/26(Wed)05:17:55 No.108658614

Anonymous 04/22/26(Wed)05:17:55 No.108658614▶

that fake pajeet model has now completely disabled discussions on the repo
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H

Anonymous
04/22/26(Wed)05:19:29 No.108658621

Anonymous 04/22/26(Wed)05:19:29 No.108658621▶

>>108658414
Whoa I had no idea. Ok then the PCIe bandwidth must not be nearly as much of an issue as I guessed.

>>108658575
I also found front loading to be the best choice when you have to statically choose just a few layers to offload the experts of: the activations in the earliest layers are much more uniform (cache unfriendly).

Anonymous
04/22/26(Wed)05:20:57 No.108658629

Anonymous 04/22/26(Wed)05:20:57 No.108658629▶

>>108658614
qrd?

Anonymous
04/22/26(Wed)05:21:32 No.108658635

Anonymous 04/22/26(Wed)05:21:32 No.108658635▶

File: 1745625635141460.png (59.9 KB)

59.9 KB PNG

>>108658614
>>108658629
Timmies can't handle the wisdom of Bharat

Anonymous
04/22/26(Wed)05:22:11 No.108658638

Anonymous 04/22/26(Wed)05:22:11 No.108658638▶

>>108658584
gimme your card

Anonymous
04/22/26(Wed)05:22:24 No.108658640

Anonymous 04/22/26(Wed)05:22:24 No.108658640▶

>>108658629
read these
>>108654726
>>108637034

Anonymous
04/22/26(Wed)05:22:57 No.108658643

Anonymous 04/22/26(Wed)05:22:57 No.108658643▶

File: ikneelschizosama.png (189.3 KB)

189.3 KB PNG

>ask glm 4.7 flash reap to write a story about a cat
>it schizoposts instead
what causes this

Anonymous
04/22/26(Wed)05:28:26 No.108658658

Anonymous 04/22/26(Wed)05:28:26 No.108658658▶

>>108658643
That's kinda fucked up because Gemma gave me a really similar story when I prompted for "dog and cheese"

even had the same "childrens book" vibe

Anonymous
04/22/26(Wed)05:28:44 No.108658662

Anonymous 04/22/26(Wed)05:28:44 No.108658662▶

>>108658643
you raped it one too many times. now it's a feminist.

Anonymous
04/22/26(Wed)05:28:55 No.108658665

Anonymous 04/22/26(Wed)05:28:55 No.108658665▶

>>108658643
>I'm using a model that's had its weights mangled arbitrarily
>Why are its outputs weird?
fuckin mystery there bud, eh?

Anonymous
04/22/26(Wed)05:29:12 No.108658666

Anonymous 04/22/26(Wed)05:29:12 No.108658666▶

File: 1761045665434794.png (3.2 MB)

3.2 MB PNG

When are we getting lossless models?

Anonymous
04/22/26(Wed)05:29:20 No.108658667

Anonymous 04/22/26(Wed)05:29:20 No.108658667▶

>>108658586
About the idea of custom quants. We discussed this a while ago, but I had the idea that in a better world, we could download only the parts of a model that we want at a time, so you could mix and match quants yourself just by downloading. This would also solve the problem of quant uploaders needing to reupload fixed quants just because they needed to change the jinja or other metadata a bit. Too bad we don't live in such a world.

Anonymous
04/22/26(Wed)05:32:28 No.108658683

Anonymous 04/22/26(Wed)05:32:28 No.108658683▶

>>108658584
Not a fan of beige prose mixed with slop.

Anonymous
04/22/26(Wed)05:32:47 No.108658686

Anonymous 04/22/26(Wed)05:32:47 No.108658686▶

>>108658666
we already have lossless models

Anonymous
04/22/26(Wed)05:33:50 No.108658692

Anonymous 04/22/26(Wed)05:33:50 No.108658692▶

>>108658586
So I could use your patch to measure activations, then quant the living shit out of the most lazy experts, then reallocate that capacity to most based experts?

Anonymous
04/22/26(Wed)05:34:54 No.108658703

Anonymous 04/22/26(Wed)05:34:54 No.108658703▶

>>108658686
Mention 34

Anonymous
04/22/26(Wed)05:35:30 No.108658705

Anonymous 04/22/26(Wed)05:35:30 No.108658705▶

Didn't llama.cpp get tensor parallelism? How do I enable it?

Anonymous
04/22/26(Wed)05:35:45 No.108658708

Anonymous 04/22/26(Wed)05:35:45 No.108658708▶

>>108658621
>>108658586
Yep, I've been "front-loading" my expert tensors for a while now as it seemed to give slightly better performance.

Anonymous
04/22/26(Wed)05:36:03 No.108658710

Anonymous 04/22/26(Wed)05:36:03 No.108658710▶

Why is it that everything on huggingface is either some gay abliteration or claude sloptune? What ever happened to the old RP finetuners?

Anonymous
04/22/26(Wed)05:36:59 No.108658715

Anonymous 04/22/26(Wed)05:36:59 No.108658715▶

>>108658705
-sm tensor

Anonymous
04/22/26(Wed)05:38:36 No.108658723

Anonymous 04/22/26(Wed)05:38:36 No.108658723▶

>>108656988
Orb does this but it uses an algorithm, not a classifier. Tho llm slop is neverending and so is the fight against it. The fix shouldn't be at the application level but I wonder it will ever be fixed at the model level

Anonymous
04/22/26(Wed)05:39:00 No.108658728

Anonymous 04/22/26(Wed)05:39:00 No.108658728▶

File: notlikethis.png (57.9 KB)

57.9 KB PNG

>>108658665
you think unsloth-sama would do that? just go on the internet and lie?

Anonymous
04/22/26(Wed)05:40:42 No.108658738

Anonymous 04/22/26(Wed)05:40:42 No.108658738▶

File: 531ee35f8e2d5c50d5258b66af703badf23bbc245ba4a1bf40c127a953e325ff.png (851.1 KB)

851.1 KB PNG

>>108657865
Imagine these hands grabbing your cock

Anonymous
04/22/26(Wed)05:41:23 No.108658743

Anonymous 04/22/26(Wed)05:41:23 No.108658743▶

fp128 when?

Anonymous
04/22/26(Wed)05:43:38 No.108658754

Anonymous 04/22/26(Wed)05:43:38 No.108658754▶

File: -.png (177.4 KB)

177.4 KB PNG

>>108658738

Anonymous
04/22/26(Wed)05:44:37 No.108658765

Anonymous 04/22/26(Wed)05:44:37 No.108658765▶

>>108658715
wow, it does work
31b went from 10ish t/s to a bit over 20 t/s
pretty much immediately ran out of memory doe.

Anonymous
04/22/26(Wed)05:45:22 No.108658768

Anonymous 04/22/26(Wed)05:45:22 No.108658768▶

>>108658586
>>108657760
Pretty interesting. I wonder if you can download the safetensors version, find each expert layer with "useless" information by asking a bunch of questions and finding out what gets grouped for the stuff you don't care about like movie lore, train a MLP (perceptron) on it, then destroy the layer and have the MLP function as a low-vram cost shim?
At the same time this would definitely cause some brain damage, but maybe it's an option for our low vram frens when combined with quants?

Anonymous
04/22/26(Wed)05:50:33 No.108658787

Anonymous 04/22/26(Wed)05:50:33 No.108658787▶

>>108658614
So that basically disabled the report system. Based HF retards

Anonymous
04/22/26(Wed)05:50:58 No.108658791

Anonymous 04/22/26(Wed)05:50:58 No.108658791▶

File: 1769642756112854.jpg (291.9 KB)

291.9 KB JPG

>>108658692
You can use my patch to measure activations, but the differentiated quanting would need (major) support in llama.cpp.

Each layer has the same number of activations every token, e.g. 8 for GLM-5. The hotness/coldness is just patterns within each layer. So if you can only make your quantizing/offloading/whathaveyou decisions at the granularity of layers... it gets you nothing! Because every layer you treated nicely (high quant/in VRAM/etc) will have that nice treatment applied to 8 expert activations every token, regardless of activation pattern.

(The front loading thing works because keeping more uniformly activated layers out of the caching game is good for cache health. But that's the VRAM+RAM+SSD case... come to think of it I'm surprised to hear it helps at all in the KTransformers data... I guess uniform and random would be needlessly splitting across the PCIe bus (for no gain) where front-loading would not. Maybe that's it.)

Sorry, I'm sure you don't care to read most of this, but I felt like writing it.

Anonymous
04/22/26(Wed)05:53:18 No.108658802

Anonymous 04/22/26(Wed)05:53:18 No.108658802▶

File: 1771435066298446.png (69.8 KB)

69.8 KB PNG

>>108658754
Don't look at me like that.

Anonymous
04/22/26(Wed)05:59:44 No.108658824

Anonymous 04/22/26(Wed)05:59:44 No.108658824▶

well, I'm not a mexican.

Anonymous
04/22/26(Wed)06:04:02 No.108658830

Anonymous 04/22/26(Wed)06:04:02 No.108658830▶

>>108658791
rin mogs

Anonymous
04/22/26(Wed)06:07:23 No.108658845

Anonymous 04/22/26(Wed)06:07:23 No.108658845▶

>>108658768
I like your direction of thinking; maybe it would need big boy compute to do such training without fuckhuge brain damage but maybe not.

However, it sounds like you might have the same misconception I clarified in
>>108658791
in that you seem to be talking about replacing an entire layer, when the hotness question necessarily needs to focus on experts within a layer.

What you need is to either, I don't know, merge the coldest experts with DARE-TIES or whatever, or do some sort of retraining/distilling to get a new, smaller set of experts that mostly learned from the hottest ones. (In either case, is llama.cpp ok with a model with different expert counts on different layers? I feel like the n_experts param is file-scope.)

Anonymous
04/22/26(Wed)06:09:56 No.108658858

Anonymous 04/22/26(Wed)06:09:56 No.108658858▶

how is tokenwise using laptop igpu?
Like newest intel 358H? Or you need beefy ryzen 395? (inb4: macs i know but not interested )

Anonymous
04/22/26(Wed)06:11:41 No.108658868

Anonymous 04/22/26(Wed)06:11:41 No.108658868▶

>>108658858
igpu = system ram, I'd guess same or lower speed as cpu inference

Anonymous
04/22/26(Wed)06:11:49 No.108658869

Anonymous 04/22/26(Wed)06:11:49 No.108658869▶

>>108658791
> Because every layer you treated nicely (high quant/in VRAM/etc) will have that nice treatment applied to 8 expert activations every token, regardless of activation pattern.
Okay I think I get it. I'd confused myself when I saw things like this: https://huggingface.co/Thireus/Kimi-K2.5-THIREUS-Q8_0-SPECIAL_SPLIT/tree/main - every tensor in its own file. Something like 20 repos like this with different quant levels.
I thought I could just look at the map file, and download the higher precision tensors corresponding to the most used experts.
So effectively what I was after, is already handled by imatrix then isn't it? I'd just have to create my own calibration data based on my use cases?
>a non-dynamic version of ik_llama's "Smart Expert Reduction".
-ser didn't improve anything for me when I tried it with Kimi-K2.5 (384 experts, so quite sparse no?). Since DDR5 bandwidth bound, swapping experts around like that probably isn't useful at all?
in which case your "static expert routing based metrics calculated from previous runs" would be a LOT more useful?
>I'm sure you don't care to read most of this
Why would you think that? lol
I'm mostly here to read things like this.

Anonymous
04/22/26(Wed)06:14:32 No.108658886

Anonymous 04/22/26(Wed)06:14:32 No.108658886▶

>meanwhile IBM is still keeping up the tradition of releasing stemmaxxed 8b models that nobody uses and do absolutely nothing new
What's the point?

Anonymous
04/22/26(Wed)06:23:49 No.108658924

Anonymous 04/22/26(Wed)06:23:49 No.108658924▶

>>108658886
Arenr they aiming for laptops with igpus and chatbots for internal use

Anonymous
04/22/26(Wed)06:32:42 No.108658963

Anonymous 04/22/26(Wed)06:32:42 No.108658963▶

>>108658886
>IBM
>stemmaxxed
I take it you missed Day-0 granite-4.0-tiny-preview weights?

Anonymous
04/22/26(Wed)06:35:10 No.108658972

Anonymous 04/22/26(Wed)06:35:10 No.108658972▶

running gemma in q8 maybe is the answer?

Anonymous
04/22/26(Wed)06:35:53 No.108658979

Anonymous 04/22/26(Wed)06:35:53 No.108658979▶

>>108658972
q8 is already cope, did you see the KLD?

Anonymous
04/22/26(Wed)06:38:51 No.108658991

Anonymous 04/22/26(Wed)06:38:51 No.108658991▶

>>108658979
Running the best variant is cope how exactly?

Anonymous
04/22/26(Wed)06:38:55 No.108658994

Anonymous 04/22/26(Wed)06:38:55 No.108658994▶

It took until gemma4 for a small model to finally understand when clothes should be pulled up or down to remove them. Also holy fuck this thing is basically uncensored. There is zero need for a fine tune.

Anonymous
04/22/26(Wed)06:39:21 No.108658997

Anonymous 04/22/26(Wed)06:39:21 No.108658997▶

>>108658991
bf16 is the best variant thoughever

Anonymous
04/22/26(Wed)06:44:06 No.108659015

Anonymous 04/22/26(Wed)06:44:06 No.108659015▶

>>108658869
>So effectively what I was after, is already handled by imatrix then isn't it? I'd just have to create my own calibration data based on my use cases?
That's an interesting question and I'm not entirely sure, because I'm sketchy on the details of imatrix. If it's varying the quant quality at something like a per-weight granularity, and that is determined across the single tensor containing all experts... then maybe? It certainly does feel like the sort of situation where the answer turns out to be "the gains you think are possible are already in there".

But if there is an expert hardly ever getting used, quanting it lower or even pruning it might still be a sensible quality/size tradeoff. Like, imatrix isn't going to use 0bpw for an entire expert.

>-ser didn't improve anything for me when I tried it with Kimi-K2.5
Funny thing, SER didn't improve anything when I tested it either. Then, when I was last thinking about all this stuff, I went to see what pagecache-aware-SER would take... and found that what sounds like the linchpin function call activating it was commented out, with what looked like the older non-SER vanilla version put in. Now, this happened in https://github.com/ikawrakow/ik_llama.cpp/pull/840 , and maybe the SER is now done in the "fused" operations that PR adds? But it looks to me like maybe he commented out a feature and kind of just forgot about it, and nobody has noticed. It would be on-brand for ik_llama.cpp.

>Since DDR5 bandwidth bound, swapping experts around like that probably isn't useful at all? in which case your "static expert routing based metrics calculated from previous runs" would be a LOT more useful?
All token gen is bw bound. Any weights you can skip, helps, so should've helped. But yeah it is the case that if you have different experts stored in VRAM vs RAM vs SSD, doing SER on an expert in the faster medium is roughly pointless.

>I'm mostly here to read things like this.
aw thanks :)

Anonymous
04/22/26(Wed)06:46:46 No.108659029

Anonymous 04/22/26(Wed)06:46:46 No.108659029▶

>>108655622
Nightmare squirrels in the Giggle Gully

Anonymous
04/22/26(Wed)06:51:26 No.108659047

Anonymous 04/22/26(Wed)06:51:26 No.108659047▶

>>108658997
just keep as many of the layers on bf16 then just do q8 layers only to fit what you need context-wise.

Anonymous
04/22/26(Wed)06:53:05 No.108659058

Anonymous 04/22/26(Wed)06:53:05 No.108659058▶

>>108659015
>But if there is an expert hardly ever getting used, quanting it lower or even pruning it might still be a sensible quality/size tradeoff
isn't this what RAEP does?

Anonymous
04/22/26(Wed)06:59:05 No.108659078

Anonymous 04/22/26(Wed)06:59:05 No.108659078▶

File: cmd_7owpphVgpo.jpg (178.9 KB)

178.9 KB JPG

I'm vibecoding the hermes windows port and the setup menu has all kinds of fucked up symbols. Is this some poorly ported linux text formatting?

Anonymous
04/22/26(Wed)07:00:03 No.108659081

Anonymous 04/22/26(Wed)07:00:03 No.108659081▶

>>108658994
Only reason I would see need for a finetune is for writing.
But you can kinda prompt it and are good to go with proper editing at the start.
Its the first model in a long long while that properly plays a bully.
Even if you manage to make it say nigger etc. its still positivity sloped.
Try saying to a bully "no please stop", they all go "i feel a pang in my stomach oh mah gahd".
Gemma4 is like "stop being a crybaby" and doubles down. Even without thinking. Good shit.

Anonymous
04/22/26(Wed)07:00:38 No.108659082

Anonymous 04/22/26(Wed)07:00:38 No.108659082▶

>>108659078
>doesnt know about glyphs
maybe use a good terminal instead of the native cmd prompt you fucking retard, even windows terminal would suffice

Anonymous
04/22/26(Wed)07:02:41 No.108659088

Anonymous 04/22/26(Wed)07:02:41 No.108659088▶

File: 82ba1ec1-c52a-43d3-b97d-fb073964a390.png (1.9 MB)

1.9 MB PNG

how? i thought openai was finished.
very surprising release. hope the chink nerds get off their ass and make it local.

Anonymous
04/22/26(Wed)07:03:46 No.108659095

Anonymous 04/22/26(Wed)07:03:46 No.108659095▶

>>108658994
>Also holy fuck this thing is basically uncensored.
until it refuses to do what you want

Anonymous
04/22/26(Wed)07:04:01 No.108659099

Anonymous 04/22/26(Wed)07:04:01 No.108659099▶

>>108659088
>re:zero shit
kill yourself

Anonymous
04/22/26(Wed)07:04:36 No.108659103

Anonymous 04/22/26(Wed)07:04:36 No.108659103▶

>>108659058
Oh cool, yes, sounds like exactly that. One problem is that because the early layers have more uniform activations, it seems like a bad idea to prune them. Unfortunately llama.cpp's current architecture requires all layers to have the same expert count. I see this paper did the same amount of pruning in every layer, so maybe it's not so bad.

Did anyone ever implement this for gguf?

Anonymous
04/22/26(Wed)07:06:54 No.108659116

Anonymous 04/22/26(Wed)07:06:54 No.108659116▶

File: Screenshot_20260422_170429.png (246.2 KB)

246.2 KB PNG

>>108659015
>But it looks to me like maybe he commented out a feature and kind of just forgot about it, and nobody has noticed
Well kimi-k2.6 doesn't seem to like him...
I just pasted https://github.com/ikawrakow/ik_llama.cpp/pull/840.patch in there and asked it why the smart expert routing feature was removed and if I could put it back...

Anonymous
04/22/26(Wed)07:07:11 No.108659121

Anonymous 04/22/26(Wed)07:07:11 No.108659121▶

>>108659103
there are reap* goofs you can run
https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP
its not something done 'on the fly', you have to use their framework to do the activations estimation + pruning.
what gets axed is entirely dependant of course on the dataset you provide, so you can either STEMMAX, ERPMAX or try to do a bit of both

Anonymous
04/22/26(Wed)07:08:00 No.108659124

Anonymous 04/22/26(Wed)07:08:00 No.108659124▶

File: gpt2 cucked.jpg (27.5 KB)

27.5 KB JPG

How is Gemma4 vs Deepseek 3.2 for RP? Getting cucked by GTP image 2 reminds not to let my guard down and gets comfortable. Was trying a simple character sheet until safefy cucked.

Anonymous
04/22/26(Wed)07:14:40 No.108659161

Anonymous 04/22/26(Wed)07:14:40 No.108659161▶

>>108659099
huh? why? besides Lodoss its the only other LN i read.
arc 5 kinda sucks though. there is worse shit out there.

>>108659095
when is that happening for you?
unless its at the very beginning with thinking enabled.
i prompted some pretty messed up stuff and once it gets going even reasoning won't stop it.
i spend more tokens prompting for anti-slop then trying to making it uncensored.
31b, no clue about the moe one.

Anonymous
04/22/26(Wed)07:19:04 No.108659182

Anonymous 04/22/26(Wed)07:19:04 No.108659182▶

Where can I download the orb thing? Is it available on github or something?

Anonymous
04/22/26(Wed)07:20:24 No.108659189

Anonymous 04/22/26(Wed)07:20:24 No.108659189▶

>>108659161
>its the only other LN i read
no wonder you like garbage, literally reading 'babbys first shartsekai' and thinking it's any good with its regurgitated garbage plot. never post anime again in this thread.
>>108659124
lmao'd, hopefully chinese labs are already distilling from it.

Anonymous
04/22/26(Wed)07:22:11 No.108659197

Anonymous 04/22/26(Wed)07:22:11 No.108659197▶

>>108659189
no clue why you are so assmad about it lol.

Anonymous
04/22/26(Wed)07:26:43 No.108659209

Anonymous 04/22/26(Wed)07:26:43 No.108659209▶

File: jTiUsm2gjA.jpg (243.7 KB)

243.7 KB JPG

>>108659082
I tried git cmd and it had same issue during setup and when I launched the agent thing itself the main screen kept sliding line by line upwards and wouldn't let me stay on the top

Anonymous
04/22/26(Wed)07:31:08 No.108659225

Anonymous 04/22/26(Wed)07:31:08 No.108659225▶

>>108659121
Oh I see, it works on safetensors, of course, and then just convert and quant from there. Ok thanks, I'll look into this!

Anonymous
04/22/26(Wed)07:31:14 No.108659226

Anonymous 04/22/26(Wed)07:31:14 No.108659226▶

>>108659182
https://gitlab.com/chi7520115/orb

Anonymous
04/22/26(Wed)07:31:50 No.108659233

Anonymous 04/22/26(Wed)07:31:50 No.108659233▶

>>108659226
>>108659182
https://github.com/OrbFrontend/Orb

Anonymous
04/22/26(Wed)07:32:28 No.108659236

Anonymous 04/22/26(Wed)07:32:28 No.108659236▶

>>108659189
Yeah, call me when OAI can illustrate SAO properly.

Anonymous
04/22/26(Wed)07:34:24 No.108659242

Anonymous 04/22/26(Wed)07:34:24 No.108659242▶

>>108659078
>hermes windows port
Why though, it takes 5 seconds to set up in WSL, and then you can export images and keep it locked without mounting native drives or interop.

Anonymous
04/22/26(Wed)07:36:04 No.108659254

Anonymous 04/22/26(Wed)07:36:04 No.108659254▶

>>108659233
why would i use the pajeet mirror when i can just use the original?

Anonymous
04/22/26(Wed)07:37:11 No.108659259

Anonymous 04/22/26(Wed)07:37:11 No.108659259▶

>>108659242
anytime i tried running wsl and/or docker it ran like shit when the pc was running inference at full power

Anonymous
04/22/26(Wed)07:39:23 No.108659273

Anonymous 04/22/26(Wed)07:39:23 No.108659273▶

>>108659259
Did you allocate it processors and memory?

Anonymous
04/22/26(Wed)07:39:50 No.108659276

Anonymous 04/22/26(Wed)07:39:50 No.108659276▶

>>108657503
im not sure i dont pay attention to ram, might be keeping processed tokens of webpages in cache?

Anonymous
04/22/26(Wed)07:40:07 No.108659279

Anonymous 04/22/26(Wed)07:40:07 No.108659279▶

>>108659273
The problem is there isn't much left to allocate

Anonymous
04/22/26(Wed)07:40:29 No.108659283

Anonymous 04/22/26(Wed)07:40:29 No.108659283▶

>>108659209
You just use Windows Terminal

Anonymous
04/22/26(Wed)07:41:49 No.108659285

Anonymous 04/22/26(Wed)07:41:49 No.108659285▶

>>108657859
kek nice

Anonymous
04/22/26(Wed)07:54:30 No.108659331

Anonymous 04/22/26(Wed)07:54:30 No.108659331▶

Did the orb guy have a melty

Anonymous
04/22/26(Wed)08:01:17 No.108659355

Anonymous 04/22/26(Wed)08:01:17 No.108659355▶

>>108659254
>>108659331
I moved the repo to github for issue tracking because I won't be reading every post here, and also I don't wanna derail the thread with feature begging.

Anonymous
04/22/26(Wed)08:04:51 No.108659365

Anonymous 04/22/26(Wed)08:04:51 No.108659365▶

>>108659355
My bad, thought it was that guy who mirrored your repo yesterday.
Great project btw. It just works and having the responses re-written is a lot better.

Anonymous
04/22/26(Wed)08:14:22 No.108659408

Anonymous 04/22/26(Wed)08:14:22 No.108659408▶

File: erp.png (94.9 KB)

94.9 KB PNG

>>108658710
>What ever happened to the old RP finetuners?
lack of datasets

Anonymous
04/22/26(Wed)08:26:38 No.108659476

Anonymous 04/22/26(Wed)08:26:38 No.108659476▶

>>108658710
Manually cleaning human slop has serious data volume and efficiency limitations (I'm definitely not going to do that anymore). That worked as long as datasets were small, but that can't be scaled up easily with limited manpower. And LLM work in this regard (especially if the source data is messy) will always have to be double checked. It's just simpler to use pre-made and pre-formatted synthetic data in large amounts that makes training loss go down faster, among other things.

Also, much of modern post-training work is giving a consistent "voice" to the SFT data, applying RLHF and now also having a good RL pipeline for making the model actually learn reasoning and other verifiable stuff. Generic data doesn't really work well in this area.

All of this can't be solo'd just for "fun" (sort of fun, since you'd be spending hundreds or thousands of hours just on dataset creation) like finetuners were doing in 2023. You can't really do this well just on your local 3090 either. You can pretend to, but you'll never be an AI lab.

Anonymous
04/22/26(Wed)08:26:58 No.108659479

Anonymous 04/22/26(Wed)08:26:58 No.108659479▶

>>108659078
The esc[XXm are ansi escape codes for colors.
>flashbacks to anon not knowing <br> puts newlines in html

Anonymous
04/22/26(Wed)08:35:04 No.108659532

Anonymous 04/22/26(Wed)08:35:04 No.108659532▶

File: Screenshot 2026-04-22 at 05.32.02.png (265.7 KB)

265.7 KB PNG

DeepSeek V4 (or whatever they use on Web, but it claims to have 1M context) is deployed to the API.

Anonymous
04/22/26(Wed)08:37:28 No.108659545

Anonymous 04/22/26(Wed)08:37:28 No.108659545▶

bro, where's my 2x128 GB ram kit?

Anonymous
04/22/26(Wed)08:37:44 No.108659546

Anonymous 04/22/26(Wed)08:37:44 No.108659546▶

>>108659532
m esugaku,?
.\

:D

Anonymous
04/22/26(Wed)08:43:28 No.108659587

Anonymous 04/22/26(Wed)08:43:28 No.108659587▶

>"hmm, I haven't RP'd with Gemma much, I should do so to get myself familiar with the characteristics of the model so I have a basis to talk about it with my fellow anons"
>look at the time
>it's 11 pm
>ok, just a bit and I'll head to bed
>look at the time again
>it's 4 am
Fuark. We are so back.
Used the Mendo card btw.
I own a dog now.

Anonymous
04/22/26(Wed)08:52:54 No.108659636

Anonymous 04/22/26(Wed)08:52:54 No.108659636▶

File: Screenshot_20260422_105125.png (1.3 MB)

1.3 MB PNG

>>108655009
Not that this should be surprising, but since the Turboquant crashes le memory stocks hype the DDR5 prices have not actually dropped.
If anything it has gotten even worse.

Anonymous
04/22/26(Wed)08:57:03 No.108659666

Anonymous 04/22/26(Wed)08:57:03 No.108659666▶

>>108659088
nano banana pro 2 could already do this

Anonymous
04/22/26(Wed)09:02:34 No.108659694

Anonymous 04/22/26(Wed)09:02:34 No.108659694▶

File: Screenshot_20260422_180142.png (3.3 MB)

3.3 MB PNG

>>108659666
Not even close.
Left is banana 2, right is banana pro.

Anonymous
04/22/26(Wed)09:05:47 No.108659715

Anonymous 04/22/26(Wed)09:05:47 No.108659715▶

>>108659088
Doubt it because there's no money to be made in imagegen.
Textgen = coding = $$$
Videogen = entertainment = $$$
Imagegen = fun, but not useful

Anonymous
04/22/26(Wed)09:06:15 No.108659720

Anonymous 04/22/26(Wed)09:06:15 No.108659720▶

Is qwen 3.6 better at auding code than gemma?

Anonymous
04/22/26(Wed)09:07:10 No.108659724

Anonymous 04/22/26(Wed)09:07:10 No.108659724▶

File: SVGs.png (14.6 KB)

14.6 KB PNG

>>108659704

Anonymous
04/22/26(Wed)09:07:16 No.108659725

Anonymous 04/22/26(Wed)09:07:16 No.108659725▶

>>108659720
auditing

Anonymous
04/22/26(Wed)09:07:24 No.108659726

Anonymous 04/22/26(Wed)09:07:24 No.108659726▶

>>108659715
one can dream anon.
ideally actuall multimodality. only qwen does experiments with that.
imagine a nice 4*4 image out thats describes the scene and becomes part of the whole RP context.

Anonymous
04/22/26(Wed)09:09:27 No.108659737

Anonymous 04/22/26(Wed)09:09:27 No.108659737▶

>>108659694
only adults are allowed to post here btw

Anonymous
04/22/26(Wed)09:11:34 No.108659752

Anonymous 04/22/26(Wed)09:11:34 No.108659752▶

>>108659737
I wish I was whippersnapper anon. I'm pushing 40.
You zoomers eat good with whats available either locally or for free.
Had to rely on artfags back in the day. I doubt thats still a thing.

Anonymous
04/22/26(Wed)09:12:35 No.108659758

Anonymous 04/22/26(Wed)09:12:35 No.108659758▶

File: 1771683095345722.jpg (46.1 KB)

46.1 KB JPG

>ask LLM to list UK surface ships (including carriers) that scored ship to ship kills in WW2, ranked by tonnage sunk
>doesn't list carriers because it's not considered "ship to ship kill"
Smartass

Anonymous
04/22/26(Wed)09:14:52 No.108659764

Anonymous 04/22/26(Wed)09:14:52 No.108659764▶

>>108659752
>is ancient
>reads re:zero
lmao

Anonymous
04/22/26(Wed)09:15:44 No.108659768

Anonymous 04/22/26(Wed)09:15:44 No.108659768▶

>>108659758
>crediting carries for plane kills
LMAO

Anonymous
04/22/26(Wed)09:16:19 No.108659771

Anonymous 04/22/26(Wed)09:16:19 No.108659771▶

>>108659752
>pushing 40
>posts re:zero
kinda gross ngl, gramps

Anonymous
04/22/26(Wed)09:17:03 No.108659775

Anonymous 04/22/26(Wed)09:17:03 No.108659775▶

>>108659764
>>108659771
yeah, once you reach a certain age you stop giving a fuck. lol
i bet you guys are young and in your 20s or whatever.
i pretty much achieved everything i want. (apart from being mega rich)

Anonymous
04/22/26(Wed)09:18:16 No.108659780

Anonymous 04/22/26(Wed)09:18:16 No.108659780▶

>>108659775
I watch all, and I mean ALL ISEKAI SLOP
re:zero is for literal retards
you are a retard. continue posting API shit here tho, you seem to be confusing this thread for an apicuck thread

Anonymous
04/22/26(Wed)09:20:31 No.108659786

Anonymous 04/22/26(Wed)09:20:31 No.108659786▶

>>108659775
shut up gramps, i spit on your grave. hyuck-ptooey!

Anonymous
04/22/26(Wed)09:58:14 No.108659961

Anonymous 04/22/26(Wed)09:58:14 No.108659961▶

>>108659088
nice, put some more kinos here anon >>108653190

Anonymous
04/22/26(Wed)10:05:42 No.108659993

Anonymous 04/22/26(Wed)10:05:42 No.108659993▶

>>108659983
>>108659983
>>108659983

Anonymous
04/22/26(Wed)10:36:59 No.108660133

Anonymous 04/22/26(Wed)10:36:59 No.108660133▶

qwen 3.6 vs g4 moe, which one's the better computer general agent?

Anonymous
04/22/26(Wed)11:17:32 No.108660343

Anonymous 04/22/26(Wed)11:17:32 No.108660343▶

>>108655622
>covered in piss

Anonymous
04/22/26(Wed)11:44:41 No.108660479

Anonymous 04/22/26(Wed)11:44:41 No.108660479▶

>>108659724
lol that looks exactly like one of the sides of a machine I designed. I need to make another copy at some point in next few weeks.

Anonymous
04/22/26(Wed)11:48:44 No.108660506

Anonymous 04/22/26(Wed)11:48:44 No.108660506▶

File: nope.png (73 KB)

73 KB PNG

>>108659532
:(

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108655009

🔍 Search & Sort