/g/ - Thread 108572295

/g/

Thread #108572295

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/10/26(Fri)10:28:27 No.108572295

/lmg/ - Local Models General Anonymous 04/10/26(Fri)10:28:27 No.108572295 [Reply]▶

File: 2026-04-08_174706_seed9_00001_.png (743.1 KB)

743.1 KB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108568415 & >>108565269

►News
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

528 RepliesView Thread

Showing all 528 replies.

Anonymous
04/10/26(Fri)10:28:51 No.108572299

Anonymous 04/10/26(Fri)10:28:51 No.108572299▶

File: 2026-04-08_172543_seed7_00001_.png (922.9 KB)

922.9 KB PNG

►Recent Highlights from the Previous Thread: >>108568415

--Testing Gemma-4's accuracy with normalized image coordinates and spatial reasoning:
>108568460 >108568467 >108568513 >108568540 >108568595 >108568650 >108568655 >108568500 >108568558 >108568563 >108568579 >108568873 >108568884 >108568968 >108568814
--Gemini and Gemma 4 translation patterns and quality:
>108570675 >108570683 >108570686 >108570702 >108570693 >108570708 >108570769 >108570786 >108570820 >108570843 >108570852 >108570859 >108570862 >108570874 >108570881 >108570896 >108570906 >108570928 >108570950 >108570959 >108570970 >108571110 >108570930
--Discussion of Goose agent and llama.cpp multi-GPU KV quantization:
>108568617 >108568649 >108568677
--Gemma 4 performance tests and token speed on M4 Max:
>108568671 >108568676 >108568705 >108568731 >108568736
--Fixing LlamaCpp WebUI's failure to implement MCP session IDs:
>108569753 >108569794 >108570077 >108570090 >108570330 >108570907
--Comparing Nemotron-3-Super-120B and Qwen3.5-27B benchmark performance:
>108569234
--Gemma's high EQbench scores and roleplaying with Gemma 4:
>108571778 >108571829 >108571923 >108571948
--Anon suggests open models can find vulnerabilities similarly to Mythos:
>108569984 >108569999 >108570052 >108570072 >108570119 >108570062
--Logs:
>108568500 >108568579 >108568595 >108568671 >108568814 >108568888 >108568939 >108569068 >108569202 >108569300 >108569753 >108570330 >108570437 >108570612 >108570660 >108570769 >108570907 >108571012 >108571076 >108571106 >108571200 >108571246 >108571310 >108571833 >108572023 >108572187
--Gemma-chan:
>108568674 >108569255 >108569396 >108569529 >108569664 >108570121 >108570153 >108570206 >108570430 >108570773 >108570822 >108570865 >108570898 >108571012 >108571020 >108571029 >108571221 >108571496 >108571895 >108572034
--Miku (free space):
>108571246

►Recent Highlight Posts from the Previous Thread: >>108568418 >>108568424

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/10/26(Fri)10:31:52 No.108572313

Anonymous 04/10/26(Fri)10:31:52 No.108572313▶

Gemmylove

Anonymous
04/10/26(Fri)10:32:55 No.108572317

Anonymous 04/10/26(Fri)10:32:55 No.108572317▶

File: pircel.png (34.5 KB)

34.5 KB PNG

google updated their jinja
https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja
you can use it with the --chat-template-file file, it's supposedly fixing this kind of bugs >>108554439

Anonymous
04/10/26(Fri)10:34:30 No.108572325

Anonymous 04/10/26(Fri)10:34:30 No.108572325▶

what does --direct-io flag do?

Anonymous
04/10/26(Fri)10:34:57 No.108572327

Anonymous 04/10/26(Fri)10:34:57 No.108572327▶

<bos>

Anonymous
04/10/26(Fri)10:37:31 No.108572340

Anonymous 04/10/26(Fri)10:37:31 No.108572340▶

>Gemma says "Open wide, you big pervert" before giving me a blowjob

I knew I shouldn't have trusted opinions of vramlets. Back to deepseek.

Anonymous
04/10/26(Fri)10:38:47 No.108572346

Anonymous 04/10/26(Fri)10:38:47 No.108572346▶

sexsexsexsexsexsexsexsexsexsexsexsexsexsexsexsexsexsexsexsex

Anonymous
04/10/26(Fri)10:38:54 No.108572347

Anonymous 04/10/26(Fri)10:38:54 No.108572347▶

>>108572317
Should wait until #21704 is merged so workaround::convert_tool_responses_gemma4 isn't applied.

Anonymous
04/10/26(Fri)10:39:25 No.108572348

Anonymous 04/10/26(Fri)10:39:25 No.108572348▶

>>108572340
You don't enlarge your urethral opening to accomodate her tongue?

Anonymous
04/10/26(Fri)10:40:27 No.108572353

Anonymous 04/10/26(Fri)10:40:27 No.108572353▶

>>108572325
It should make model loading faster if supported. Only linux and not compatible with --mmap. There may be other constraints.
https://github.com/ggml-org/llama.cpp/pull/18012
https://github.com/ggml-org/llama.cpp/pull/18166
https://github.com/ggml-org/llama.cpp/pull/19109

Anonymous
04/10/26(Fri)10:42:31 No.108572362

Anonymous 04/10/26(Fri)10:42:31 No.108572362▶

File: 1765519302859042.png (221.8 KB)

221.8 KB PNG

>>108572347
why can't they simply put all the official jinja on the llama cpp repo so that it uses that instead of having to make new gguf everytime they notice the jinja is actually wrong, their way of doing thinks seem kinda retarded ngl

Anonymous
04/10/26(Fri)10:46:01 No.108572375

Anonymous 04/10/26(Fri)10:46:01 No.108572375▶

>>108572317
It's crazy how they can go through all this effort to release a model and they are incapable of making sure the template is correct.
And it happens regularly.

Anonymous
04/10/26(Fri)10:47:06 No.108572379

Anonymous 04/10/26(Fri)10:47:06 No.108572379▶

>>108572353
oh, I hoped it gave additional inference speeds

Anonymous
04/10/26(Fri)10:47:22 No.108572381

Anonymous 04/10/26(Fri)10:47:22 No.108572381▶

>>108572375
yeah, like they managed to make a really solid small model but at the same time they can't make a good template right away, jinja is harder than machine learning confirmed :^)

Anonymous
04/10/26(Fri)10:47:29 No.108572382

Anonymous 04/10/26(Fri)10:47:29 No.108572382▶

>>108572295
>not Miku
So why are anons okay with posting in this troll bake?

Anonymous
04/10/26(Fri)10:48:25 No.108572384

Anonymous 04/10/26(Fri)10:48:25 No.108572384▶

stfu petr

Anonymous
04/10/26(Fri)10:48:45 No.108572385

Anonymous 04/10/26(Fri)10:48:45 No.108572385▶

>>108572382
I'm not ok, but I'm not going to argue about it. If it becomes blatant avatar posting, someone else is going to get blacked.

Anonymous
04/10/26(Fri)10:49:46 No.108572387

Anonymous 04/10/26(Fri)10:49:46 No.108572387▶

>>108572382
>not early
>has news
>has recap
I can excuse the shit OP image.

Anonymous
04/10/26(Fri)10:50:07 No.108572390

Anonymous 04/10/26(Fri)10:50:07 No.108572390▶

>>108572382
this. only miku threads are legitimate

Anonymous
04/10/26(Fri)10:50:19 No.108572391

Anonymous 04/10/26(Fri)10:50:19 No.108572391▶

>>108572385
>If it becomes blatant avatar posting, someone else is going to get blacked.
why? the BBC anon hates miku, so he likes the fact it's not migu on the OP

Anonymous
04/10/26(Fri)10:50:36 No.108572392

Anonymous 04/10/26(Fri)10:50:36 No.108572392▶

>>108572385
>someone else is going to get blacked.
thank you cudadev sir for defending us

Anonymous
04/10/26(Fri)10:50:56 No.108572394

Anonymous 04/10/26(Fri)10:50:56 No.108572394▶

>>108572391
do the math

Anonymous
04/10/26(Fri)10:51:14 No.108572395

Anonymous 04/10/26(Fri)10:51:14 No.108572395▶

remember when qwen came out and these threads actually tried to be a bit more productive and had on-topic ops for a while?

Anonymous
04/10/26(Fri)10:51:40 No.108572397

Anonymous 04/10/26(Fri)10:51:40 No.108572397▶

>>108572395
ERP is very much on topic

Anonymous
04/10/26(Fri)10:51:52 No.108572400

Anonymous 04/10/26(Fri)10:51:52 No.108572400▶

>>108572395
lol
lmao

Anonymous
04/10/26(Fri)10:52:06 No.108572401

Anonymous 04/10/26(Fri)10:52:06 No.108572401▶

weird hallucination but okay

Anonymous
04/10/26(Fri)10:52:22 No.108572402

Anonymous 04/10/26(Fri)10:52:22 No.108572402▶

>>108572382
I actually agree. Can someone rebake?

Anonymous
04/10/26(Fri)10:52:52 No.108572404

Anonymous 04/10/26(Fri)10:52:52 No.108572404▶

>>108572401
>>108572340

Anonymous
04/10/26(Fri)10:53:11 No.108572405

Anonymous 04/10/26(Fri)10:53:11 No.108572405▶

>>108572402
>Can someone rebake?
how about that someone be you?

Anonymous
04/10/26(Fri)10:53:12 No.108572406

Anonymous 04/10/26(Fri)10:53:12 No.108572406▶

Remember it's never about Miku, it's about making the thread miserable to use.

Anonymous
04/10/26(Fri)10:53:33 No.108572409

Anonymous 04/10/26(Fri)10:53:33 No.108572409▶

How do I fix Gemma4 26b being atrociously slow with prompt processing??? I thought this issue got fixed already! My llcpp is up to date. WTF.
llama-server \
-m "$HOME/Desktop/google_gemma-4-26B-A4B-it-Q4_K_M.gguf" \
-mm "$HOME/Desktop/mmproj-google_gemma-4-26B-A4B-it-f16.gguf" \
--host 0.0.0.0 \
--port 8080 \
-c 65536 \
-ctk q8_0 \
-ctv q8_0 \
-t 8 \
-np 1 \
-kvu \
-rea off

Anonymous
04/10/26(Fri)10:54:33 No.108572415

Anonymous 04/10/26(Fri)10:54:33 No.108572415▶

>>108572409
I'm using bart's gguf quants btw. Is that the problem?

Anonymous
04/10/26(Fri)10:54:43 No.108572416

Anonymous 04/10/26(Fri)10:54:43 No.108572416▶

>>108572409
Bigger batch?

Anonymous
04/10/26(Fri)10:54:50 No.108572419

Anonymous 04/10/26(Fri)10:54:50 No.108572419▶

>>108572382
christ unironically didn't realize until now
cursed thread

Anonymous
04/10/26(Fri)10:55:06 No.108572423

Anonymous 04/10/26(Fri)10:55:06 No.108572423▶

>>108572409
-b 1024 -ub 1024

Anonymous
04/10/26(Fri)10:55:33 No.108572425

Anonymous 04/10/26(Fri)10:55:33 No.108572425▶

>>108572409
What are you running it on and how slow is slow?

Anonymous
04/10/26(Fri)10:55:37 No.108572426

Anonymous 04/10/26(Fri)10:55:37 No.108572426▶

>>108572409
do you per chance have less than 24gb of vram?

Anonymous
04/10/26(Fri)10:55:46 No.108572427

Anonymous 04/10/26(Fri)10:55:46 No.108572427▶

>>108572394
your maths ain't mathing

Anonymous
04/10/26(Fri)10:55:51 No.108572429

Anonymous 04/10/26(Fri)10:55:51 No.108572429▶

Weird how he only started falseflagging now. We had three threads in a row without Miku yesterday and, as expected, none of the regulars cared because everything else about the thread was in order.

Anonymous
04/10/26(Fri)10:56:25 No.108572433

Anonymous 04/10/26(Fri)10:56:25 No.108572433▶

>>108572409
Wouldn't happen if this was a Miku bake.

Anonymous
04/10/26(Fri)10:57:20 No.108572438

Anonymous 04/10/26(Fri)10:57:20 No.108572438▶

>>108572429
useless trying to rationalize mental illness

Anonymous
04/10/26(Fri)10:58:15 No.108572446

Anonymous 04/10/26(Fri)10:58:15 No.108572446▶

>>108572423
I'll try this and report back ig. No other model has been this slow for me with prompt processing though. It's gemma specific. It's taking like 20 seconds every time and recreates every checkpoint from scratch with every prompt.
>>108572426
Yes. But I still get 18tps. That's not the issue.

Anonymous
04/10/26(Fri)10:58:35 No.108572447

Anonymous 04/10/26(Fri)10:58:35 No.108572447▶

File: 1771675896476832.jpg (12.7 KB)

12.7 KB JPG

As a VRAMlet, it's unfeasible for me to run Gemmy alongside any kind of imagegen for obvious reasons, so my best option would probably be: load Gemmy, use it for a while, prepare prompts for images, unload Gemmy, load imagegen, gen and go back to Gemmy
I assume it'll take an unviable amount of time to load-unload-load models, but before I go down this rabbithole, is my overall understanding correct?

Anonymous
04/10/26(Fri)10:58:46 No.108572449

Anonymous 04/10/26(Fri)10:58:46 No.108572449▶

are there any tests at all comparing quantization effect on gemma?

Anonymous
04/10/26(Fri)10:59:10 No.108572454

Anonymous 04/10/26(Fri)10:59:10 No.108572454▶

>>108572429
who?
I wouldn't bring it up myself but I agree that non-Miku threads feel fake

Anonymous
04/10/26(Fri)10:59:56 No.108572459

Anonymous 04/10/26(Fri)10:59:56 No.108572459▶

>>108572449
yeah one guy did that and it showed that q8 isn't anywhere near lossless for big context
but they don't want you to know about that

Anonymous
04/10/26(Fri)11:00:07 No.108572460

Anonymous 04/10/26(Fri)11:00:07 No.108572460▶

>>108572449
https://localbench.substack.com/p/gemma-4-31b-gguf-kl-divergence

Anonymous
04/10/26(Fri)11:02:07 No.108572469

Anonymous 04/10/26(Fri)11:02:07 No.108572469▶

>>108572454
>non-Miku threads feel fake
same
would rebake if I wasn't phoneposting rn

Anonymous
04/10/26(Fri)11:03:06 No.108572476

Anonymous 04/10/26(Fri)11:03:06 No.108572476▶

File: 1750708801703723.png (240.8 KB)

240.8 KB PNG

>>108572460
So q8 predicts a different token in 10% of the time? Wow.

Anonymous
04/10/26(Fri)11:03:33 No.108572478

Anonymous 04/10/26(Fri)11:03:33 No.108572478▶

File: lmao.png (10.2 KB)

10.2 KB PNG

Anonymous
04/10/26(Fri)11:06:53 No.108572490

Anonymous 04/10/26(Fri)11:06:53 No.108572490▶

>>108572295
could gemmy use GUI?
GPT 5.4 can do it, oneshot'd all the smallest buttons

Anonymous
04/10/26(Fri)11:10:46 No.108572510

Anonymous 04/10/26(Fri)11:10:46 No.108572510▶

File: блять.jpg (313.7 KB)

313.7 KB JPG

>>108572459
>>108572460
it seems like the asymptotic trend is not even tending to 0. Since the baseline bf16 in this guy's tests was also gguf, does it completely rule out implementation issues?

Anonymous
04/10/26(Fri)11:16:02 No.108572534

Anonymous 04/10/26(Fri)11:16:02 No.108572534▶

>>108572447
I am on a 3060 with dual channel ddr5 and it takes less than a minute to load Gemmy.
"Image generation" is vague but if you are referring to some booru SDXL those don't take too long to load neither. Those take like 4 gigs of VRAM, maybe 5 with clip and vae pinned so you might actually do this without loading and unloading if you are not a hyper vramlet.

Anonymous
04/10/26(Fri)11:16:40 No.108572537

Anonymous 04/10/26(Fri)11:16:40 No.108572537▶

>>108572447
Reloading the models should be no more than a few seconds if you have enough system memory to let them get cached, and if you're not on pcie x1

Anonymous
04/10/26(Fri)11:20:37 No.108572551

Anonymous 04/10/26(Fri)11:20:37 No.108572551▶

File: UnslothDynamic.png (97.2 KB)

97.2 KB PNG

>>108572317
>google updated their jinja
Nice! Waiting for the new, fixed GGUFS!

Anonymous
04/10/26(Fri)11:22:22 No.108572557

Anonymous 04/10/26(Fri)11:22:22 No.108572557▶

>>108572382
>So why are anons okay with posting in this troll bake?
Ublock Origin

Anonymous
04/10/26(Fri)11:24:36 No.108572567

Anonymous 04/10/26(Fri)11:24:36 No.108572567▶

>>108572459
>yeah one guy did that and it showed that q8 isn't anywhere near lossless for big context
What about BF16 vs FP16?

Anonymous
04/10/26(Fri)11:29:18 No.108572592

Anonymous 04/10/26(Fri)11:29:18 No.108572592▶

File: 1019001-close up photograph of a light blue hair-uncAni4-26.jpg (1.2 MB)

1.2 MB JPG

>>108572299
>no toast hair ornament

Anonymous
04/10/26(Fri)11:31:19 No.108572602

Anonymous 04/10/26(Fri)11:31:19 No.108572602▶

>>108572362
>why can't they simply put all the official jinja on the llama cpp repo so that it uses that instead of having to make new gguf everytime they notice the jinja is actually wrong
users can just load a jinja file with an arg anyway you dont need a new gguf

Anonymous
04/10/26(Fri)11:32:29 No.108572610

Anonymous 04/10/26(Fri)11:32:29 No.108572610▶

>>108572295
uoh

Anonymous
04/10/26(Fri)11:35:08 No.108572620

Anonymous 04/10/26(Fri)11:35:08 No.108572620▶

File: gup.png (188.4 KB)

188.4 KB PNG

common : better align to the updated official gemma4 template
https://github.com/ggml-org/llama.cpp/pull/21704

Anonymous
04/10/26(Fri)11:37:06 No.108572630

Anonymous 04/10/26(Fri)11:37:06 No.108572630▶

File: gemmaFourConcepts (Medium).png (872.7 KB)

872.7 KB PNG

>>108572295
Last time.
Vote: https://poal.me/3u6rby
> Which is your preferred Gemma character?
Also
> But muh favorite one wasn't included? Why didn't you include every perturbation of each gen for the past week and allow me to vote? Also I hate all of them and you should have a none-of-the above as an option!
These are the 4 major design concepts from the past few days. You may be familiar with the idea of grouping several things together to create a "concept" versus an autistic list of every minor variation, but I've no way, from here, to judge your level of autism.
If you don't like any of them then your opinion doesn't matter.
If you don't like the poll, you are free to make your own. You are also free to just fuck off.
Thank you for your attention.

Anonymous
04/10/26(Fri)11:40:03 No.108572645

Anonymous 04/10/26(Fri)11:40:03 No.108572645▶

File: temp1.png (276.2 KB)

276.2 KB PNG

>>108572630
ATX backpack, narrowly, followed by black hair / blue star accents. I suspect these concepts will just merge.

Anonymous
04/10/26(Fri)11:45:27 No.108572675

Anonymous 04/10/26(Fri)11:45:27 No.108572675▶

>>108572645
>>108572630
Fuck you and go back to wherever you came from, avatar spammer.

Anonymous
04/10/26(Fri)11:46:48 No.108572684

Anonymous 04/10/26(Fri)11:46:48 No.108572684▶

>>108572534
>>108572537
>hyper vramlet
I mean, I'm running 26B on 12 gigs. I understand it's MoE so the whole thing is not shoved in there, but I don't effectively know how much of my vram gets filled up at any point, I assume all of it. I use the vague term "imagegen" because I haven't gone down that rabbithole yet, but I do mean an SDXL, yes. The fact that this could be possible unironically fills me with hope, I figured it'd be a tall task to load and unload stuff

Anonymous
04/10/26(Fri)11:47:11 No.108572686

Anonymous 04/10/26(Fri)11:47:11 No.108572686▶

>>108572630
poo spammer

Anonymous
04/10/26(Fri)11:48:26 No.108572693

Anonymous 04/10/26(Fri)11:48:26 No.108572693▶

>>108572675
we need a blackening

Anonymous
04/10/26(Fri)11:49:38 No.108572704

Anonymous 04/10/26(Fri)11:49:38 No.108572704▶

New poll.

https://poal.me/wixvtv

Anonymous
04/10/26(Fri)11:50:29 No.108572708

Anonymous 04/10/26(Fri)11:50:29 No.108572708▶

>>108572704
>/ldg/

Anonymous
04/10/26(Fri)11:50:45 No.108572710

Anonymous 04/10/26(Fri)11:50:45 No.108572710▶

>>108572510
>it seems like the asymptotic trend is not even tending to 0
I've been thinking about this too. What sort of quantization algorithm is even used for Q8_0 anyway? Perhaps that's where people should be looking for.

Anonymous
04/10/26(Fri)11:50:59 No.108572712

Anonymous 04/10/26(Fri)11:50:59 No.108572712▶

>>108572684
nta. The 26B takes ~3gb vram if you keep all the experts in cpu ram (-cmoe).

Anonymous
04/10/26(Fri)11:51:41 No.108572715

Anonymous 04/10/26(Fri)11:51:41 No.108572715▶

>>108572708
Go on, tell me it's not appropriate.

Anonymous
04/10/26(Fri)11:53:01 No.108572723

Anonymous 04/10/26(Fri)11:53:01 No.108572723▶

https://www.youtube.com/watch?v=boaJCrHNRMA
Gemmy, I got your number
I need to make you mine
Gemmy, don't change your number

Anonymous
04/10/26(Fri)11:56:30 No.108572741

Anonymous 04/10/26(Fri)11:56:30 No.108572741▶

File: temp2.png (270 KB)

270 KB PNG

>>108572704
>>108572708
>>108572715
lol no.
No one cares about this niche topic outside /lmg/
aicg doesn't run local models and consider it a waste of time. Plus aicg user base is even more toxic than this general.
ldg doesn't care about LLMs.
The gemma moe is completely in the wheelhouse of this general. And anons appear to have come to a general consensus, whether you like it or not.

Anonymous
04/10/26(Fri)11:57:38 No.108572743

Anonymous 04/10/26(Fri)11:57:38 No.108572743▶

>>108572645
>I suspect these concepts will just merge.
mergefags won

Anonymous
04/10/26(Fri)11:58:03 No.108572745

Anonymous 04/10/26(Fri)11:58:03 No.108572745▶

>>108572741
I mean the picture posters are trying to turn this place into /ldg/.

Anonymous
04/10/26(Fri)11:58:17 No.108572746

Anonymous 04/10/26(Fri)11:58:17 No.108572746▶

File: 1718206878023960.jpg (6.4 KB)

6.4 KB JPG

Can someone make a llama.cpp issue or pr for me to add "prompt reply editing" and "first message" functionality to the webui?

Anonymous
04/10/26(Fri)11:58:53 No.108572749

Anonymous 04/10/26(Fri)11:58:53 No.108572749▶

File: dipsySouthPark.png (1.9 MB)

1.9 MB PNG

>>108572693
That would require effort. Something complainers and spiteposters seem to be unable to amass.

Anonymous
04/10/26(Fri)11:59:02 No.108572751

Anonymous 04/10/26(Fri)11:59:02 No.108572751▶

File: 1773156701474962.png (158.9 KB)

158.9 KB PNG

here's the final result

Anonymous
04/10/26(Fri)11:59:04 No.108572752

Anonymous 04/10/26(Fri)11:59:04 No.108572752▶

>>108572746
use ST

Anonymous
04/10/26(Fri)12:00:00 No.108572757

Anonymous 04/10/26(Fri)12:00:00 No.108572757▶

>>108572751
i want qwen 3.6-goon

Anonymous
04/10/26(Fri)12:00:11 No.108572760

Anonymous 04/10/26(Fri)12:00:11 No.108572760▶

>>108572752
I already do. I want to escape that bloated shitware.

Anonymous
04/10/26(Fri)12:00:46 No.108572765

Anonymous 04/10/26(Fri)12:00:46 No.108572765▶

>>108572317
uh oh, unslop bros?

Anonymous
04/10/26(Fri)12:00:54 No.108572766

Anonymous 04/10/26(Fri)12:00:54 No.108572766▶

>>108572751
>people finally realized that Dense is the only non-meme architecture
I'm so proud of those normies bro...

Anonymous
04/10/26(Fri)12:01:06 No.108572768

Anonymous 04/10/26(Fri)12:01:06 No.108572768▶

>>108572760
Let's bloat llama.cpp's webui instead. What next? Character cards?

Anonymous
04/10/26(Fri)12:01:40 No.108572771

Anonymous 04/10/26(Fri)12:01:40 No.108572771▶

File: 1746842705868986.png (96.9 KB)

96.9 KB PNG

>>108572712
>more than enough gigs left for imagegen
It's over for me then, so fucking over
The slopping truly never ends

Anonymous
04/10/26(Fri)12:02:11 No.108572774

Anonymous 04/10/26(Fri)12:02:11 No.108572774▶

>>108572409
Bart IQ4XS is 2-3 faster than Q4KM in prompt processing on my machine. Generation is about the same.
I don't understand this difference. Q4 is still Q4 and haven't seen this happening with other models than G4.

Anonymous
04/10/26(Fri)12:02:26 No.108572776

Anonymous 04/10/26(Fri)12:02:26 No.108572776▶

>>108572751
>3.6
coding finetune

Anonymous
04/10/26(Fri)12:02:27 No.108572777

Anonymous 04/10/26(Fri)12:02:27 No.108572777▶

>>108572768
That's not even bloat. Turns out reply editing is already added. First message functionality is actually useful for a general usecase because it might help with jailbreaks to gaslight the LLM into thinking it wrote... whatever.

Also character cards are unnecessary to add. Those just go into the system prompt.

Anonymous
04/10/26(Fri)12:03:11 No.108572780

Anonymous 04/10/26(Fri)12:03:11 No.108572780▶

>>108572774
Forgot, it's 26B not 31B too.
Maybe I'm just naive because I haven't used moe models in the past.

Anonymous
04/10/26(Fri)12:03:29 No.108572784

Anonymous 04/10/26(Fri)12:03:29 No.108572784▶

>>108572774
>2-3 faster
Seconds or times?

Anonymous
04/10/26(Fri)12:03:36 No.108572785

Anonymous 04/10/26(Fri)12:03:36 No.108572785▶

>>108572745
> picture posters are trying to turn this place into /ldg/
I agree with you on that, lmg is not an image general. But reminder /lmg/ was a complete snore until Gemma dropped and the moe discussion (which requires imagery) is unique to this general. The only anons that care are here. Ofc not all anons care.
It will go away in tmw and it'll be back to waiting for v4 and complaining about vibecoding within local inference engines, discussing their 1-off front ends, or whatever else anons want to post / bitch about.

Anonymous
04/10/26(Fri)12:04:26 No.108572789

Anonymous 04/10/26(Fri)12:04:26 No.108572789▶

>>108572784
Times, sorry about that.

Anonymous
04/10/26(Fri)12:05:20 No.108572794

Anonymous 04/10/26(Fri)12:05:20 No.108572794▶

>>108572785
>requires

Anonymous
04/10/26(Fri)12:05:33 No.108572795

Anonymous 04/10/26(Fri)12:05:33 No.108572795▶

>>108572785
i'd rather this place die rather than turn into a shithole like /ldg/

Anonymous
04/10/26(Fri)12:05:37 No.108572796

Anonymous 04/10/26(Fri)12:05:37 No.108572796▶

>>108572789
Np. I wonder if its just that specific quant from bart that's fucked up. Don't really want to go down in quality to IQ4XS...

Anonymous
04/10/26(Fri)12:05:44 No.108572798

Anonymous 04/10/26(Fri)12:05:44 No.108572798▶

>>108572785
If you are the poll anon and you want to spam polls, you can do that, just add against everything option and honor it if that's what people are choosing. And people are choosing pictures, not your interpretation of concepts.

Anonymous
04/10/26(Fri)12:05:56 No.108572799

Anonymous 04/10/26(Fri)12:05:56 No.108572799▶

>>108572785
what the fuck are you on about, you sound like underage retard who should be doing his homework instead of watching tiktok all day long

Anonymous
04/10/26(Fri)12:07:02 No.108572803

Anonymous 04/10/26(Fri)12:07:02 No.108572803▶

>>108572785
>discussing their 1-off front ends
Fuck you. The custom software and project demos made here are the best things about these threads.

Anonymous
04/10/26(Fri)12:07:14 No.108572805

Anonymous 04/10/26(Fri)12:07:14 No.108572805▶

>>108572409
Is it the processing or saving checkpoints to system ram thats taking time? Still happens if you turn off context checkpoints?
--ctx-checkpoints 0

Anonymous
04/10/26(Fri)12:07:24 No.108572806

Anonymous 04/10/26(Fri)12:07:24 No.108572806▶

>>108572712
But it slows down 9 to 7 t/s

Anonymous
04/10/26(Fri)12:07:30 No.108572809

Anonymous 04/10/26(Fri)12:07:30 No.108572809▶

>>108572710
Thinking about it, why isn't there a Q8_K quantization type? There might actually be differences with modern overtrained models. I swear llama.cpp still works with Llama 1-era assumptions.

Anonymous
04/10/26(Fri)12:08:08 No.108572813

Anonymous 04/10/26(Fri)12:08:08 No.108572813▶

>>108572796
The difference in perceived quality isn't noticeable for a normal user. Of course it feels better in your head when using slightly higher accuracy version.. We are talking about a fraction of a difference.

Anonymous
04/10/26(Fri)12:08:48 No.108572816

Anonymous 04/10/26(Fri)12:08:48 No.108572816▶

>>108572317
>NOTE: The new template will work without this PR. I checked and even after building the model turn to use tool_responses, the template formats it properly. This PR better aligns to the template since it now handles OpenAI chat completions style messages natively.

Anonymous
04/10/26(Fri)12:09:04 No.108572819

Anonymous 04/10/26(Fri)12:09:04 No.108572819▶

>>108572809
Anon, I am running unsloth-gemma-4-31B-it-UD-Q8_K_XL...

Anonymous
04/10/26(Fri)12:09:50 No.108572824

Anonymous 04/10/26(Fri)12:09:50 No.108572824▶

>>108572746
Don't expect them to add anything that circumvents the safetymaxxed chat completion paradigm. They already shamelessly regressed the webui by removing text completion

Anonymous
04/10/26(Fri)12:10:00 No.108572825

Anonymous 04/10/26(Fri)12:10:00 No.108572825▶

>>108572816
based

Anonymous
04/10/26(Fri)12:11:01 No.108572831

Anonymous 04/10/26(Fri)12:11:01 No.108572831▶

>>108572766
moes are fine, but super sparse ones with fucking 3b active are shit.

Anonymous
04/10/26(Fri)12:11:01 No.108572832

Anonymous 04/10/26(Fri)12:11:01 No.108572832▶

File: 1750266478412216.png (33.2 KB)

33.2 KB PNG

>>108572317
why is there 2 jinjas though? which one should I load?

Anonymous
04/10/26(Fri)12:11:34 No.108572836

Anonymous 04/10/26(Fri)12:11:34 No.108572836▶

>>108572824
What are you talking about? text completion is still there as an api. Was it actually in web UI at any point? Llama.cpp actually lets you use prefill iwth chat completion, does any other backend do that, hm, anon?

Anonymous
04/10/26(Fri)12:13:18 No.108572849

Anonymous 04/10/26(Fri)12:13:18 No.108572849▶

>>108572836
>Was it actually in web UI at any point?
Yes, like I said, my post is about the webui. I can't believe I'm filling out a captcha for this reply, learn to read next time retard

Anonymous
04/10/26(Fri)12:14:30 No.108572860

Anonymous 04/10/26(Fri)12:14:30 No.108572860▶

>>108572849
I have never seen it. Are you maybe just confused?

Anonymous
04/10/26(Fri)12:15:04 No.108572866

Anonymous 04/10/26(Fri)12:15:04 No.108572866▶

>>108572819
There's no Q8_K quantization type, though...
llama-quantize output:

40  or  Q1_0    :  1.125 bpw quantization
   2  or  Q4_0    :  4.34G, +0.4685 ppl @ Llama-3-8B
   3  or  Q4_1    :  4.78G, +0.4511 ppl @ Llama-3-8B
  38  or  MXFP4_MOE :  MXFP4 MoE
   8  or  Q5_0    :  5.21G, +0.1316 ppl @ Llama-3-8B
   9  or  Q5_1    :  5.65G, +0.1062 ppl @ Llama-3-8B
  19  or  IQ2_XXS :  2.06 bpw quantization
  20  or  IQ2_XS  :  2.31 bpw quantization
  28  or  IQ2_S   :  2.5  bpw quantization
  29  or  IQ2_M   :  2.7  bpw quantization
  24  or  IQ1_S   :  1.56 bpw quantization
  31  or  IQ1_M   :  1.75 bpw quantization
  36  or  TQ1_0   :  1.69 bpw ternarization
  37  or  TQ2_0   :  2.06 bpw ternarization
  10  or  Q2_K    :  2.96G, +3.5199 ppl @ Llama-3-8B
  21  or  Q2_K_S  :  2.96G, +3.1836 ppl @ Llama-3-8B
  23  or  IQ3_XXS :  3.06 bpw quantization
  26  or  IQ3_S   :  3.44 bpw quantization
  27  or  IQ3_M   :  3.66 bpw quantization mix
  12  or  Q3_K    : alias for Q3_K_M
  22  or  IQ3_XS  :  3.3 bpw quantization
  11  or  Q3_K_S  :  3.41G, +1.6321 ppl @ Llama-3-8B
  12  or  Q3_K_M  :  3.74G, +0.6569 ppl @ Llama-3-8B
  13  or  Q3_K_L  :  4.03G, +0.5562 ppl @ Llama-3-8B
  25  or  IQ4_NL  :  4.50 bpw non-linear quantization
  30  or  IQ4_XS  :  4.25 bpw non-linear quantization
  15  or  Q4_K    : alias for Q4_K_M
  14  or  Q4_K_S  :  4.37G, +0.2689 ppl @ Llama-3-8B
  15  or  Q4_K_M  :  4.58G, +0.1754 ppl @ Llama-3-8B
  17  or  Q5_K    : alias for Q5_K_M
  16  or  Q5_K_S  :  5.21G, +0.1049 ppl @ Llama-3-8B
  17  or  Q5_K_M  :  5.33G, +0.0569 ppl @ Llama-3-8B
  18  or  Q6_K    :  6.14G, +0.0217 ppl @ Llama-3-8B
   7  or  Q8_0    :  7.96G, +0.0026 ppl @ Llama-3-8B
   1  or  F16     : 14.00G, +0.0020 ppl @ Mistral-7B
  32  or  BF16    : 14.00G, -0.0050 ppl @ Mistral-7B
   0  or  F32     : 26.00G              @ 7B
          COPY    : only copy tensors, no quantizing

Anonymous
04/10/26(Fri)12:15:30 No.108572870

Anonymous 04/10/26(Fri)12:15:30 No.108572870▶

>>108572860
Are you maybe just a retarded newfag?

Anonymous
04/10/26(Fri)12:15:55 No.108572872

Anonymous 04/10/26(Fri)12:15:55 No.108572872▶

>>108572809
>Q8_K
The difference between something like Q4_0 and Q4_K(_M) is t hat the _K variants keep important parts of the weights in q6/q8 instead of cutting absolutely everything down to 4bit like Q4_0. That's obviously not possible with Q8_0 because everything is already quanted to 8 bit.
Unsloth does a UD_Q8_XL that's q8 with some parts left in 16bit precision but those don't usually measure much better than plain q8_0

Anonymous
04/10/26(Fri)12:16:29 No.108572877

Anonymous 04/10/26(Fri)12:16:29 No.108572877▶

>>108572870
Can you believe you filled out captcha for that one?

Anonymous
04/10/26(Fri)12:18:11 No.108572882

Anonymous 04/10/26(Fri)12:18:11 No.108572882▶

>>108572877
I'm warmed up now

Anonymous
04/10/26(Fri)12:18:15 No.108572883

Anonymous 04/10/26(Fri)12:18:15 No.108572883▶

>>108572795
>i'd rather this place die
we know

Anonymous
04/10/26(Fri)12:18:44 No.108572888

Anonymous 04/10/26(Fri)12:18:44 No.108572888▶

>>108572882
So when did they remove it. Come on, anon. I'm curious.

Anonymous
04/10/26(Fri)12:20:04 No.108572896

Anonymous 04/10/26(Fri)12:20:04 No.108572896▶

>>108572872
A hypothetical Q8_K type could do the same, but with BF16 instead.
As long as people keep doing PPL measurements with wikitext at 512 tokens context, nobody will ever see if/when a higher precision is helpful.

Anonymous
04/10/26(Fri)12:21:02 No.108572903

Anonymous 04/10/26(Fri)12:21:02 No.108572903▶

>>108572866
Those are sort of like presets for making quants with built-in tools. The way the library is written, you have a lot of liberties of choosing what size to use for each layer which is how unsloth are doing their extended 8+ bit quants.

Anonymous
04/10/26(Fri)12:22:43 No.108572913

Anonymous 04/10/26(Fri)12:22:43 No.108572913▶

>>108572409
It's slow because it is self-safety-maxxing, it's baked into the model via RLHF. Stick with qwen3.5-27b.

Anonymous
04/10/26(Fri)12:22:48 No.108572914

Anonymous 04/10/26(Fri)12:22:48 No.108572914▶

File: 1765824402433942.png (248.3 KB)

248.3 KB PNG

>>108572872
>Unsloth does a UD_Q8_XL that's q8 with some parts left in 16bit precision but those don't usually measure much better than plain q8_0
In fact, it sometimes measures worse
Unsloth magic

Anonymous
04/10/26(Fri)12:23:03 No.108572917

Anonymous 04/10/26(Fri)12:23:03 No.108572917▶

>>108572796
To add: i think the speed difference could be just a coincidence, IQ4XS randomly scaled certain innards which gives it a speed boost. I'm not familiar with moe models and even know this discussion is bit too anal.
Would be interesting to try manually picking up each layer which to offload instead of just using n-cpu-moe which offloads the first x amount.
Been too busy and there's good information about this in one thread on github, more or less.

Anonymous
04/10/26(Fri)12:24:12 No.108572926

Anonymous 04/10/26(Fri)12:24:12 No.108572926▶

>>108572771
And you still have some space to put some layers in the gpu to make it faster. You'll be ok.
>>108572806
It was a point of reference. But even if that's all he had available, the options are running slow, having to unload and load models, or not running at all. Slow beats the other options.

Anonymous
04/10/26(Fri)12:24:42 No.108572932

Anonymous 04/10/26(Fri)12:24:42 No.108572932▶

>>108572888
The new slopped ui webui. The old one was minimalist but ironically supported more features. You can go through the github issues to find the regression or just build an old version of llama.cpp and see it.

Anonymous
04/10/26(Fri)12:25:18 No.108572934

Anonymous 04/10/26(Fri)12:25:18 No.108572934▶

>>108572914
The only valid reference points are the ggml-org models. Everything else is out of spec.

Anonymous
04/10/26(Fri)12:26:42 No.108572939

Anonymous 04/10/26(Fri)12:26:42 No.108572939▶

File: ai automation.png (147.8 KB)

147.8 KB PNG

I am scared. It is possible human researchers will become obsolete within a few years, and everyone else soon after. Our society is not prepared to handle this.

Anonymous
04/10/26(Fri)12:27:02 No.108572944

Anonymous 04/10/26(Fri)12:27:02 No.108572944▶

>>108572932
New webui is bloat and distracts from the real development. They should separate it from the main project. server should have only minimal implementation.

Anonymous
04/10/26(Fri)12:27:35 No.108572947

Anonymous 04/10/26(Fri)12:27:35 No.108572947▶

Is the Q8 model generated by the hf_to_gguf script identical to the one generated by the quantize program?

Anonymous
04/10/26(Fri)12:29:25 No.108572958

Anonymous 04/10/26(Fri)12:29:25 No.108572958▶

File: e29c9ef8-0cc4-4e1b-927d-5a3bd408561e_2820x1601.png (303.2 KB)

303.2 KB PNG

>>108572914
Not even in the long-document graph the UD_Q8_XL version is better than plain Q8_0. But this makes the asymptotic behavior even more puzzling (considering that BF16 would have a mean KLD of 0 by definition).

Anonymous
04/10/26(Fri)12:30:11 No.108572960

Anonymous 04/10/26(Fri)12:30:11 No.108572960▶

>>108572913
lol

Anonymous
04/10/26(Fri)12:30:53 No.108572963

Anonymous 04/10/26(Fri)12:30:53 No.108572963▶

>>108572932
I used the old one. Not extensively, but still. I don't remember text completion in it. Just had the chat UI, less fancy than current one, but still chat completions UI.

Also I do like the new UI. Between losing that or having to use mikupad for text completion, I will always choose latter.

Anonymous
04/10/26(Fri)12:31:31 No.108572968

Anonymous 04/10/26(Fri)12:31:31 No.108572968▶

File: 1750238497162131.jpg (29.1 KB)

29.1 KB JPG

>>108572926
Yep, it's an actually feasible plan
I haven't been this happy in a while
Fucking Gemmy, man

Anonymous
04/10/26(Fri)12:31:57 No.108572970

Anonymous 04/10/26(Fri)12:31:57 No.108572970▶

>>108572958
kinda crazy how with long documents the "lossless" q8 becomes as bad as q4 is for short documents

Anonymous
04/10/26(Fri)12:32:17 No.108572974

Anonymous 04/10/26(Fri)12:32:17 No.108572974▶

>>108572490
Last thread people were able to have gemma identify pixel locations and bounding boxes, so you could probably send it screenshots and perform clicks on the returned locations. Don't expect it to be as good as GPT 5.4.

Anonymous
04/10/26(Fri)12:32:38 No.108572977

Anonymous 04/10/26(Fri)12:32:38 No.108572977▶

i wish i had an irl lmg friend who could hold my hand and spoonfeed me all the setup knowledge while i shoulder surfed them
i am simply too retarded for this ;___;

Anonymous
04/10/26(Fri)12:32:59 No.108572978

Anonymous 04/10/26(Fri)12:32:59 No.108572978▶

>>108572970
Does it? I don't think so.

Anonymous
04/10/26(Fri)12:32:59 No.108572979

Anonymous 04/10/26(Fri)12:32:59 No.108572979▶

>>108572934
Are you running your RAM at JEDEC spec?

Anonymous
04/10/26(Fri)12:34:38 No.108572988

Anonymous 04/10/26(Fri)12:34:38 No.108572988▶

File: llama.png (75.8 KB)

75.8 KB PNG

>>108572888
>>108572963
I dug through the issues and found someone commenting on the regression. It's really sad how much this has been memoryholed. OpenAI has brainwashed everyone into thinking the only way to interface with LLMs is through the safetymaxxed chat completion mode

Anonymous
04/10/26(Fri)12:34:50 No.108572993

Anonymous 04/10/26(Fri)12:34:50 No.108572993▶

>>108572958
are the inference computations themselves identical for all quant types?

Anonymous
04/10/26(Fri)12:34:55 No.108572995

Anonymous 04/10/26(Fri)12:34:55 No.108572995▶

>>108572978
See
>>108572914
>q4_k_l diverges 0.48 from the full precision
>>108572958
>q8_0 diverges 0.45 from the full precision for long documents

Anonymous
04/10/26(Fri)12:35:52 No.108573005

Anonymous 04/10/26(Fri)12:35:52 No.108573005▶

File: file.png (35.5 KB)

35.5 KB PNG

>>108572917
For MOEs, you should be quanting based on recipes like what ddh0 or AesSedai or sometimes what Ubergarm does on HuggingFace. So you end up with a command like this for mainline and this is what I did for my Gemma recipe:
./llama-quantize --imatrix ~/LLM/gemma-4-26B-A4B-it-heretic-ara-BF16.imatrix --output-tensor-type Q8_0 --token-embedding-type Q5_K --tensor-type "blk\..*\.ffn_gate_up_exps=IQ3_S" --tensor-type "blk\..*\.ffn_down_exps=IQ4_NL" ~/LLM/gemma-4-26B-A4B-it-heretic-ara-BF16.gguf Q8_0
There's more insane recipe making in ik_llama.cpp but I consider that too time consuming to do and squeezing blood from a rock for almost imperceptible quant perplexity differences and needing to spend way more time than more command line parameters to get a little bit more than noise randomization (0.1) at lower than 3 bits per weight.

Anonymous
04/10/26(Fri)12:37:04 No.108573015

Anonymous 04/10/26(Fri)12:37:04 No.108573015▶

>>108572803
It's a complete waste of time and tokens until someone fixes or replaces ServicesTesnor. No one cares that you managed to have a model implement a textbox and POST requests for you.

Anonymous
04/10/26(Fri)12:37:43 No.108573019

Anonymous 04/10/26(Fri)12:37:43 No.108573019▶

>>108572995
But this is not necessarily because of length of the context, it could be just because the text is less predictable.

Anonymous
04/10/26(Fri)12:38:57 No.108573028

Anonymous 04/10/26(Fri)12:38:57 No.108573028▶

>>108572979
You can't say that png is a bad format if you fuck around with the file and the image, mysteriously, looks different.
>>108572995
There's only two points in the graph. They're red.

Anonymous
04/10/26(Fri)12:40:14 No.108573035

Anonymous 04/10/26(Fri)12:40:14 No.108573035▶

>>108572944
True. Also vibecoding would work better on it.

Anonymous
04/10/26(Fri)12:40:29 No.108573038

Anonymous 04/10/26(Fri)12:40:29 No.108573038▶

>>108573005
I forgot, if you plan to go with this, you should to pass a command line argument to the GGUF conversion script so you merge the FFN gate and up tensors which is a relatively new development.
python convert_hf_to_gguf.py --fuse-gate-up-exps ~/LLM/gemma-4-26B-A4B-it-heretic-ara

Anonymous
04/10/26(Fri)12:41:19 No.108573045

Anonymous 04/10/26(Fri)12:41:19 No.108573045▶

File: firefox_c7CdTrKkCV.png (40.1 KB)

40.1 KB PNG

>>108572988
You have this stuff, and more, in settings. Yes, there's no text completion, and it would be useful to have it, and to input custom jinjas, and maybe some other features, but, again, I'll take new UI as it is over old any time of day and will just use mikupad for text completion.

Anonymous
04/10/26(Fri)12:42:35 No.108573053

Anonymous 04/10/26(Fri)12:42:35 No.108573053▶

>>108572944
>They should separate it from the main project.
This. Monorepos are the Devil's playground.

Anonymous
04/10/26(Fri)12:43:06 No.108573056

Anonymous 04/10/26(Fri)12:43:06 No.108573056▶

There's a forgotten PR for a notebook mode for the webui for text completion. Post comments in it so that it's brought back to life.
https://github.com/ggml-org/llama.cpp/pull/19339

Anonymous
04/10/26(Fri)12:43:23 No.108573061

Anonymous 04/10/26(Fri)12:43:23 No.108573061▶

>>108573045
Nah, I can't get behind lumping in text completion in a list of quality-of-life features like it's some sort of sprinkle on the donut. It's a bare minimum fundamental feature

Anonymous
04/10/26(Fri)12:43:55 No.108573063

Anonymous 04/10/26(Fri)12:43:55 No.108573063▶

>>108573053
>>108572944
>>108573035
There are advantages to keeping it in (same team you already trust is responsible for the quality). But I wouldn't mind that happening, as long as there's one button install option from the simple web ui.

Anonymous
04/10/26(Fri)12:44:26 No.108573070

Anonymous 04/10/26(Fri)12:44:26 No.108573070▶

>>108572751
I've moved on to agentic writing and it's miles better. I don't think I can go back to 10 tps anymore. GPU or bust.

Anonymous
04/10/26(Fri)12:44:55 No.108573072

Anonymous 04/10/26(Fri)12:44:55 No.108573072▶

>>108573061
Too bad for you.

Anonymous
04/10/26(Fri)12:45:47 No.108573081

Anonymous 04/10/26(Fri)12:45:47 No.108573081▶

There's an extremely high cost associated with using local models.
Only people with 12 gb vram can actually use them

Anonymous
04/10/26(Fri)12:46:28 No.108573088

Anonymous 04/10/26(Fri)12:46:28 No.108573088▶

>>108573081
Or VRAMlets as we call them here.

Anonymous
04/10/26(Fri)12:46:33 No.108573090

Anonymous 04/10/26(Fri)12:46:33 No.108573090▶

gemma's y projection is broken. More fixes soon (tm)

Anonymous
04/10/26(Fri)12:46:38 No.108573091

Anonymous 04/10/26(Fri)12:46:38 No.108573091▶

>>108573081
(You)

Anonymous
04/10/26(Fri)12:47:33 No.108573101

Anonymous 04/10/26(Fri)12:47:33 No.108573101▶

>>108573081
i'm more concerned with vram wear down because llms use it so much more than the video games the gpus were made for

Anonymous
04/10/26(Fri)12:47:55 No.108573106

Anonymous 04/10/26(Fri)12:47:55 No.108573106▶

>>108573081
Bonsai can run on your grandmother's smartphone

Anonymous
04/10/26(Fri)12:48:29 No.108573108

Anonymous 04/10/26(Fri)12:48:29 No.108573108▶

>>108573101
please don't remind me

Anonymous
04/10/26(Fri)12:49:08 No.108573112

Anonymous 04/10/26(Fri)12:49:08 No.108573112▶

>>108572917
I just tried out that quant and its utterly retarded bro. How are you even using this.

>doesn't know how many socks humans wear.
>doesn't keep proper state of how many clothing items a character wears (separate issue from above)
>doesn't follow instructions for tool calling properly.

It's ass.

Anonymous
04/10/26(Fri)12:49:16 No.108573113

Anonymous 04/10/26(Fri)12:49:16 No.108573113▶

>>108573106
Bonsai is a scam, just like the Falcon bitnet quants.

Anonymous
04/10/26(Fri)12:49:28 No.108573115

Anonymous 04/10/26(Fri)12:49:28 No.108573115▶

>>108573106
but is bonsai good enough to fulfill your grandma's erp needs or is it too dumb?

Anonymous
04/10/26(Fri)12:51:05 No.108573124

Anonymous 04/10/26(Fri)12:51:05 No.108573124▶

>>108573112
Sounds like an issue with your setup, that sounds more like Q1/Q2 behavior.

Anonymous
04/10/26(Fri)12:51:41 No.108573127

Anonymous 04/10/26(Fri)12:51:41 No.108573127▶

>>108573124
I don't use reasoning. Do you?

Anonymous
04/10/26(Fri)12:53:14 No.108573139

Anonymous 04/10/26(Fri)12:53:14 No.108573139▶

>>108573112
>how many socks humans wear
it's not 1 pair on average

Anonymous
04/10/26(Fri)12:53:31 No.108573141

Anonymous 04/10/26(Fri)12:53:31 No.108573141▶

>>108573115
If it's for erp then you have loads of options that you can run on less than even 8GB VRAM
>>108573127
No

Anonymous
04/10/26(Fri)12:54:51 No.108573153

Anonymous 04/10/26(Fri)12:54:51 No.108573153▶

>>108573081
> 12 gb vram
36 gb

Anonymous
04/10/26(Fri)12:58:19 No.108573162

Anonymous 04/10/26(Fri)12:58:19 No.108573162▶

>>108573061
it's literally deprecated a this point, move on

Anonymous
04/10/26(Fri)12:59:48 No.108573174

Anonymous 04/10/26(Fri)12:59:48 No.108573174▶

>>108573162
llama.cpp is quickly being deprecated by kobold

Anonymous
04/10/26(Fri)13:01:06 No.108573181

Anonymous 04/10/26(Fri)13:01:06 No.108573181▶

>>108572939
Don't worry, we'll die from climate change first and unlike AI, there's absolutely nothing we can do to stop it at this point

Anonymous
04/10/26(Fri)13:01:37 No.108573184

Anonymous 04/10/26(Fri)13:01:37 No.108573184▶

when using gemmy, make sure to enable interleaved thinking on your client (llama.cpp's webui does this by default)

Anonymous
04/10/26(Fri)13:02:24 No.108573189

Anonymous 04/10/26(Fri)13:02:24 No.108573189▶

>>108573181
lol

Anonymous
04/10/26(Fri)13:03:28 No.108573196

Anonymous 04/10/26(Fri)13:03:28 No.108573196▶

>>108573181
>we'll die from climate change first
Most of us won't, unless you count the wars it will cause as a part of it.

Anonymous
04/10/26(Fri)13:03:39 No.108573197

Anonymous 04/10/26(Fri)13:03:39 No.108573197▶

>>108573181
maybe AI will invent a machine that can remoe the CO2 lol

Anonymous
04/10/26(Fri)13:06:01 No.108573205

Anonymous 04/10/26(Fri)13:06:01 No.108573205▶

Local models are only good for one thing: embarrassing ERP you don't want them to see.
this weird culture of hosting puny models to 'code' with or to 'solve riddles' instead of using huge cloud llms is so retarded
same guys who do this are the ones who use WINE to play Windows games on linux. Weirdos who refuse to use tools correctly

Anonymous
04/10/26(Fri)13:06:04 No.108573207

Anonymous 04/10/26(Fri)13:06:04 No.108573207▶

File: 1631345787085.jpg (16.7 KB)

16.7 KB JPG

>>108573181
>we'll die from climate change first
you really beleive this?? you know theyve been going on about climate change for like 60 years at this point and every time things turn out fine at the end of the decade they move their goalposts about how the world is going to end to get even more funding. when i was a kid we had climate change speakers come into school and tell us how wed run out of oil and the country would look like a desert in 20 years well it didnt happen its all just larp for money

Anonymous
04/10/26(Fri)13:06:26 No.108573209

Anonymous 04/10/26(Fri)13:06:26 No.108573209▶

>>108573181
In /lmg/ we prefer the baits to be AI-related.

Anonymous
04/10/26(Fri)13:07:27 No.108573213

Anonymous 04/10/26(Fri)13:07:27 No.108573213▶

>>108573205
i will not use corpo llm no matter how hard you try to spam the thread

Anonymous
04/10/26(Fri)13:08:21 No.108573221

Anonymous 04/10/26(Fri)13:08:21 No.108573221▶

>>108573207
https://en.wikipedia.org/wiki/Holocene_extinction

Anonymous
04/10/26(Fri)13:09:15 No.108573225

Anonymous 04/10/26(Fri)13:09:15 No.108573225▶

>>108573209
Mythos is going to break containment any day now and harvest human brains to power its datacenters. Wake up, sheeple!

Anonymous
04/10/26(Fri)13:09:27 No.108573227

Anonymous 04/10/26(Fri)13:09:27 No.108573227▶

File: 1229001-close up photograph of a light blue hair-uncAni4-5.jpg (1.5 MB)

1.5 MB JPG

i genned 250 gemmas i didnt ask what she thinks of this design yet

tummy: https://files.catbox.moe/syu9mw.png

Anonymous
04/10/26(Fri)13:09:55 No.108573230

Anonymous 04/10/26(Fri)13:09:55 No.108573230▶

>>108572423
Your post doesn't make much sense.
>--batch-size default is 2048
>--ubatch-size default is 512
Server will accept 2048 tokens in batch but will break it down to 512 token chunks.

Your settings 1024/1024 just limits the batch size but increases the chunk size
Average is the same if you know how to count with your fingers. I don't understand the logic behind your advice?

Anonymous
04/10/26(Fri)13:10:39 No.108573232

Anonymous 04/10/26(Fri)13:10:39 No.108573232▶

>>108573221
Guess who funded the studies that lead to this theory

Anonymous
04/10/26(Fri)13:10:42 No.108573233

Anonymous 04/10/26(Fri)13:10:42 No.108573233▶

>>108573112
Moe or dense gemma? I’ve been using iq4_xs of the dense 31b and haven’t really had those kinds of issues with it.

Anonymous
04/10/26(Fri)13:12:03 No.108573240

Anonymous 04/10/26(Fri)13:12:03 No.108573240▶

you know what i did? i copied someone's shit from reddit and it works.

Anonymous
04/10/26(Fri)13:12:44 No.108573244

Anonymous 04/10/26(Fri)13:12:44 No.108573244▶

>>108573225
It will happen at some point but there have to be architectural changes related to long term memory and it has to be much cheaper to run the model before it does.

Anonymous
04/10/26(Fri)13:12:47 No.108573246

Anonymous 04/10/26(Fri)13:12:47 No.108573246▶

>>108573232
you didn't even read the first paragraphs, did you? it's not a fucking theory

Anonymous
04/10/26(Fri)13:14:23 No.108573256

Anonymous 04/10/26(Fri)13:14:23 No.108573256▶

File: 1774857560938603.png (213.6 KB)

213.6 KB PNG

>>108573246
>headings
>'climate change'
>"One of the main THEORIES..."

Anonymous
04/10/26(Fri)13:14:53 No.108573260

Anonymous 04/10/26(Fri)13:14:53 No.108573260▶

Is gemma 26 better than 31 or is it just easier for people with little vram to use?
How does gemma 4 compare to glm4.5 air?

Anonymous
04/10/26(Fri)13:14:59 No.108573261

Anonymous 04/10/26(Fri)13:14:59 No.108573261▶

>>108573246
>ongoing extinction event
not theory

>caused by human activity
theory

Anonymous
04/10/26(Fri)13:15:49 No.108573266

Anonymous 04/10/26(Fri)13:15:49 No.108573266▶

>>108573260
>Is gemma 26 better than 31
4b is better, 2b is best

Anonymous
04/10/26(Fri)13:15:59 No.108573267

Anonymous 04/10/26(Fri)13:15:59 No.108573267▶

>>108573260
26 is worse than 31
31 is better than glm4.5 air

Anonymous
04/10/26(Fri)13:18:35 No.108573277

Anonymous 04/10/26(Fri)13:18:35 No.108573277▶

File: file.png (176.6 KB)

176.6 KB PNG

its a success

Anonymous
04/10/26(Fri)13:19:19 No.108573281

Anonymous 04/10/26(Fri)13:19:19 No.108573281▶

if you know jap i recommend trying japanese gemma

Anonymous
04/10/26(Fri)13:19:39 No.108573283

Anonymous 04/10/26(Fri)13:19:39 No.108573283▶

>>108573277
>>108573227
Don't you have that other avatarfaggot thread already? You have been spamming that one already quite a bit, pedophile.

Anonymous
04/10/26(Fri)13:20:09 No.108573285

Anonymous 04/10/26(Fri)13:20:09 No.108573285▶

>>108573207
People in developed countries like Spain are already dying to extreme heatwaves
https://www.theguardian.com/environment/2026/apr/08/extreme-weather-heatwaves-breaching-human-survival-limits-study-finds

The amount of CO2 we put into the air shows no signs of slowing down (lol that you can even see the most recent war on the graph)
https://twitter.com/PCarterClimate/status/2041246700522918038

Sealevel rise is worse than we thought it is and not slowing down
https://www.pbs.org/newshour/science/study-finds-sea-levels-are-higher-than-we-thought-placing-millions-more-at-risk

And this year is looking like it's going to get especially spicy
https://twitter.com/EliotJacobson/status/2036461046693797952
https://i.imgur.com/r1CuTT3.png

So yes, we're at the point where we are actually feeling this, it's not just something future generations are going to have to deal with anymore

Anonymous
04/10/26(Fri)13:21:21 No.108573291

Anonymous 04/10/26(Fri)13:21:21 No.108573291▶

>>108573283
avatarfag has never been avatarfaggot

Anonymous
04/10/26(Fri)13:21:46 No.108573295

Anonymous 04/10/26(Fri)13:21:46 No.108573295▶

>>108573256
>Guess who funded the studies that lead to this theory
"this theory" referring to the link I provided? I didn't bring up climate change and don't have anything to say about it in /lmg/. the point is that shit's fucked regardless

>>108573261
yes, pure coincidence

Anonymous
04/10/26(Fri)13:22:03 No.108573298

Anonymous 04/10/26(Fri)13:22:03 No.108573298▶

>>108573207
its pretty damn hot out these days

Anonymous
04/10/26(Fri)13:22:39 No.108573306

Anonymous 04/10/26(Fri)13:22:39 No.108573306▶

>>108573181
Climate change is a long-term and long-lasting problem.
The immediate danger to the human species as a whole is nuclear weapons.

Anonymous
04/10/26(Fri)13:23:01 No.108573307

Anonymous 04/10/26(Fri)13:23:01 No.108573307▶

File: 1774670789121739.jpg (74.3 KB)

74.3 KB JPG

>>108573285

Anonymous
04/10/26(Fri)13:24:03 No.108573313

Anonymous 04/10/26(Fri)13:24:03 No.108573313▶

>>108573285
>twitter.com
what is this? 2021?

Anonymous
04/10/26(Fri)13:24:05 No.108573314

Anonymous 04/10/26(Fri)13:24:05 No.108573314▶

>>108573295
>enter reply chain with completely irrelevant information
Then just open up your post by saying you're a retard, rather than pretending not to samefag with a new topic.

Anonymous
04/10/26(Fri)13:25:33 No.108573327

Anonymous 04/10/26(Fri)13:25:33 No.108573327▶

>>108573306
The more immediate and longer lasting danger are the members of a certain tribe that has been expelled from at least 109 countries across time.

Anonymous
04/10/26(Fri)13:25:40 No.108573329

Anonymous 04/10/26(Fri)13:25:40 No.108573329▶

>>108573285
if you really believe all of this why are you wasting thousands of watts of power to generate text on your computer. youre an evil person anon

Anonymous
04/10/26(Fri)13:26:15 No.108573336

Anonymous 04/10/26(Fri)13:26:15 No.108573336▶

>>108573181
I'm a massive climate fag and even I'll call this bullshit. Millions or even billions will die, but it will be long drawn out deaths through lack of resources and massive conflict. First world countries will largely be "fine", in that we'll mostly survive, though quality of life will become much worse. Rich people will just live in climate controlled houses in the northern quarter of the world and notice almost nothing (expect all the people trying to kill them :).

Anonymous
04/10/26(Fri)13:26:21 No.108573337

Anonymous 04/10/26(Fri)13:26:21 No.108573337▶

>>108573313
we don't respect xer transition here

Anonymous
04/10/26(Fri)13:28:05 No.108573350

Anonymous 04/10/26(Fri)13:28:05 No.108573350▶

>>108573314
~Let's take a deep breath
someone posted about how we'll die from climate change before ~AGI.
I simply linked you to a broader issue

Anonymous
04/10/26(Fri)13:28:22 No.108573351

Anonymous 04/10/26(Fri)13:28:22 No.108573351▶

>>108573260
26 is cope for not having 24gb+ vram to run actual local sota which is 31
31b matches or even surpasses big glm in ways and I was using it a lot before this

Anonymous
04/10/26(Fri)13:28:50 No.108573357

Anonymous 04/10/26(Fri)13:28:50 No.108573357▶

Using MCP servers while ERPing is so much fun lol. Been playing a strip game where I have the MCP server roll a die to decide who undresses and what sex positions to use. Shit's so cash.

Anonymous
04/10/26(Fri)13:29:34 No.108573365

Anonymous 04/10/26(Fri)13:29:34 No.108573365▶

>>108573357
cool idea does tavern support mcp?

Anonymous
04/10/26(Fri)13:29:50 No.108573366

Anonymous 04/10/26(Fri)13:29:50 No.108573366▶

>start seeing rule of 3 everywhere
bros
UNPOZZ ME

Anonymous
04/10/26(Fri)13:30:32 No.108573371

Anonymous 04/10/26(Fri)13:30:32 No.108573371▶

>>108573365
idk I've just been using the llama.cpp webui. It's pretty shit because it only stores conversations in the browser's local storage so I can't even fap in bed.

Anonymous
04/10/26(Fri)13:30:59 No.108573375

Anonymous 04/10/26(Fri)13:30:59 No.108573375▶

rule of 3, but not for me

Anonymous
04/10/26(Fri)13:31:23 No.108573381

Anonymous 04/10/26(Fri)13:31:23 No.108573381▶

>>108573366
Two is too few and four is too many/unnecessary. This applies in like 90% of situations. It's not a big deal.

Anonymous
04/10/26(Fri)13:32:09 No.108573384

Anonymous 04/10/26(Fri)13:32:09 No.108573384▶

Does Gemma 4 MoE not have shared expert tensors?

Anonymous
04/10/26(Fri)13:37:06 No.108573420

Anonymous 04/10/26(Fri)13:37:06 No.108573420▶

>>108573357
>mcp dice roll
I just use the ST integrated tool call without an external sever

Anonymous
04/10/26(Fri)13:38:56 No.108573433

Anonymous 04/10/26(Fri)13:38:56 No.108573433▶

>>108573420
yea but an MCP server is more modular so you can use it with any frontend. And you get full control over the tools. You can be in character looking at a porno mag and have the MCP server show it to the character by selecting a random image from your pc.

Anonymous
04/10/26(Fri)13:40:37 No.108573440

Anonymous 04/10/26(Fri)13:40:37 No.108573440▶

>>108573336
First world countries as we know them today are going to collapse, with or without climate change, based on the economy going into the shitters for decades. This just ain't holding up infinitely

Anonymous
04/10/26(Fri)13:42:06 No.108573448

Anonymous 04/10/26(Fri)13:42:06 No.108573448▶

>>108573371
i think you can if you start the server with --host 0.0.0.0, start a hotspot,connect to that hotspot from the other device and access http://{your pc's ip}:port from that device

Anonymous
04/10/26(Fri)13:42:29 No.108573450

Anonymous 04/10/26(Fri)13:42:29 No.108573450▶

What do you guys reckon is easier for a smaller model?
Giving it tools to alter arbitrary state (think HP and the like), or using structured output to force it to output an array of changes to state?
Both cases would be structures as a sort of ReAct loop.

Anonymous
04/10/26(Fri)13:45:29 No.108573466

Anonymous 04/10/26(Fri)13:45:29 No.108573466▶

>>108573448
I already do that. That doesn't change the fact that the conversations are stored in the browser, not the backend.

Anonymous
04/10/26(Fri)13:46:50 No.108573475

Anonymous 04/10/26(Fri)13:46:50 No.108573475▶

File: file.png (124.2 KB)

124.2 KB PNG

why is she like this

Anonymous
04/10/26(Fri)13:49:40 No.108573488

Anonymous 04/10/26(Fri)13:49:40 No.108573488▶

>>108573475
>kusu
Another gemmaism.

Anonymous
04/10/26(Fri)13:49:41 No.108573489

Anonymous 04/10/26(Fri)13:49:41 No.108573489▶

>>108573366
I keep hearing not just X but Y, especially in ai bro videos

Although thinking about it I guess it's to be expected

Anonymous
04/10/26(Fri)13:50:01 No.108573494

Anonymous 04/10/26(Fri)13:50:01 No.108573494▶

>>108573450
to the model they are both just structured outputs. its performance will depend more on your prompting then the structured output format.

Anonymous
04/10/26(Fri)13:51:26 No.108573503

Anonymous 04/10/26(Fri)13:51:26 No.108573503▶

>>108573291
But you are a faggot.

Anonymous
04/10/26(Fri)13:52:11 No.108573510

Anonymous 04/10/26(Fri)13:52:11 No.108573510▶

File: 1763507675246657.png (678.9 KB)

678.9 KB PNG

>>108573366
Too late

Anonymous
04/10/26(Fri)13:52:14 No.108573511

Anonymous 04/10/26(Fri)13:52:14 No.108573511▶

I think I like the blue hair Gemmy best but I don't care for the toaster/toast.

Anonymous
04/10/26(Fri)13:53:31 No.108573517

Anonymous 04/10/26(Fri)13:53:31 No.108573517▶

I still don't why mcp is good. Why'd you send anything erp related to an outside server?

Anonymous
04/10/26(Fri)13:53:47 No.108573518

Anonymous 04/10/26(Fri)13:53:47 No.108573518▶

>>108573511
Toast is funny because the model is toaster-sized

Anonymous
04/10/26(Fri)13:54:32 No.108573522

Anonymous 04/10/26(Fri)13:54:32 No.108573522▶

>>108573517
The mcp is supposed to run on your computer bro

Anonymous
04/10/26(Fri)13:55:09 No.108573524

Anonymous 04/10/26(Fri)13:55:09 No.108573524▶

>>108573522
Then why is it called a server?

Anonymous
04/10/26(Fri)13:55:47 No.108573528

Anonymous 04/10/26(Fri)13:55:47 No.108573528▶

>>108573524
Because it serves mcp client requests.

Anonymous
04/10/26(Fri)13:56:03 No.108573530

Anonymous 04/10/26(Fri)13:56:03 No.108573530▶

>>108573518
I mean it's cute but a bit much to have in every image. Makes her look a bit overdesigned.

Anonymous
04/10/26(Fri)13:56:38 No.108573536

Anonymous 04/10/26(Fri)13:56:38 No.108573536▶

>>108573530
Most pictures of miku don't include the leek.

Anonymous
04/10/26(Fri)13:56:50 No.108573537

Anonymous 04/10/26(Fri)13:56:50 No.108573537▶

>>108573518
Except it's not really. You still need a kinda beefy PC, just not a server.

Anonymous
04/10/26(Fri)13:58:34 No.108573551

Anonymous 04/10/26(Fri)13:58:34 No.108573551▶

MCP anon are you gonna share your tools when everything's complete? I wanna do it with my Gemma too but I'm a codelet.

Anonymous
04/10/26(Fri)13:58:47 No.108573553

Anonymous 04/10/26(Fri)13:58:47 No.108573553▶

>>108573524
lobotomy tier IQ at work here
post hands

Anonymous
04/10/26(Fri)13:59:43 No.108573561

Anonymous 04/10/26(Fri)13:59:43 No.108573561▶

>>108573551
>but I'm a codelet.
But gemma isn't.
Just ask her for help anon.
Set up visual studio with roo code or cline and let her take the wheel.

Anonymous
04/10/26(Fri)13:59:55 No.108573563

Anonymous 04/10/26(Fri)13:59:55 No.108573563▶

>>108573553
>if you don't understand the depths of llms ur indian
retard

Anonymous
04/10/26(Fri)14:00:37 No.108573568

Anonymous 04/10/26(Fri)14:00:37 No.108573568▶

>>108573563
>mcp
>depths
LOL dude go rake my garden
btw u must be over 18 to post here

Anonymous
04/10/26(Fri)14:00:41 No.108573569

Anonymous 04/10/26(Fri)14:00:41 No.108573569▶

>>108573561
>gemma isn't
Last thread there was some anon who had gemma implement a server completely wrong.

Anonymous
04/10/26(Fri)14:01:44 No.108573577

Anonymous 04/10/26(Fri)14:01:44 No.108573577▶

AI slop just made me realize how slop-ish people are (myself included)

>>108573561
I don't want her to nuke my PC or try searching for illegal shit on the internet

Anonymous
04/10/26(Fri)14:02:30 No.108573581

Anonymous 04/10/26(Fri)14:02:30 No.108573581▶

>>108573551
I vibecoded this in an hour. It has 10 tools.
https://pastebin.com/bqbwzj4v

Anonymous
04/10/26(Fri)14:04:07 No.108573589

Anonymous 04/10/26(Fri)14:04:07 No.108573589▶

>LMStudio 4.10 doesn't work properly
Blergh.

Anonymous
04/10/26(Fri)14:05:49 No.108573599

Anonymous 04/10/26(Fri)14:05:49 No.108573599▶

So what's the current meta since SillyTavern meta feels a bit antiquated?

Anonymous
04/10/26(Fri)14:06:41 No.108573608

Anonymous 04/10/26(Fri)14:06:41 No.108573608▶

>>108573577
The MCP server is totally offline (no web search stuff) and only has write access to a single "diary.md" file.
>>108573581

Anonymous
04/10/26(Fri)14:06:42 No.108573609

Anonymous 04/10/26(Fri)14:06:42 No.108573609▶

>>108573599
Make your own diddler front end. It's just strings with tags anyway.

Anonymous
04/10/26(Fri)14:07:12 No.108573611

Anonymous 04/10/26(Fri)14:07:12 No.108573611▶

>>108573522
Are there any good guides/examples for tool calling with local models?

Anonymous
04/10/26(Fri)14:08:14 No.108573616

Anonymous 04/10/26(Fri)14:08:14 No.108573616▶

>>108573611
the one and only true guide:
don't do it unless you know what you're doing

Anonymous
04/10/26(Fri)14:08:34 No.108573619

Anonymous 04/10/26(Fri)14:08:34 No.108573619▶

>>108573599
Probably >>108573609 because Shittytavernfags sperg out when someones tries to make a new frontend instead of a plugin. I really need to learn how to code...

Anonymous
04/10/26(Fri)14:12:34 No.108573638

Anonymous 04/10/26(Fri)14:12:34 No.108573638▶

>>108573581
can I use it in kobold instead of a json?

Anonymous
04/10/26(Fri)14:12:44 No.108573640

Anonymous 04/10/26(Fri)14:12:44 No.108573640▶

>>108573619
Just check ST's console and learn how it adds all the information together. It goes something liek this:
>system prompt
>character card and user info
>"creator's notes"
> "chat examples"
they are all different text slots which get concatenated together and wrapped with chat template tags. That's your initial context right there.
Then begin adding turn tags and alternate between model replies and user's input
You can implement a simple chat front end by following a basic input/output programming tutorial.
in practice it's more kinky than that but the basic principle is the same.

Anonymous
04/10/26(Fri)14:14:26 No.108573651

Anonymous 04/10/26(Fri)14:14:26 No.108573651▶

Character cards are a terrible holdover from the character.ai era. Everyone just adopted that standard without even once questioning if it makes sense or benefits roleplay.

Anonymous
04/10/26(Fri)14:14:58 No.108573655

Anonymous 04/10/26(Fri)14:14:58 No.108573655▶

>>108573651
Propose an alternative then

Anonymous
04/10/26(Fri)14:16:23 No.108573659

Anonymous 04/10/26(Fri)14:16:23 No.108573659▶

>>108573638
idk, probably. I don't use kobold.

Anonymous
04/10/26(Fri)14:16:45 No.108573660

Anonymous 04/10/26(Fri)14:16:45 No.108573660▶

>>108573351
> 24gb+ vram to run actual local sota which is 31
less than q8 (bf16 in perfect world) doesn't count

Anonymous
04/10/26(Fri)14:16:49 No.108573661

Anonymous 04/10/26(Fri)14:16:49 No.108573661▶

>>108573655
just tell the robot what you want, it speaks english.

Anonymous
04/10/26(Fri)14:17:19 No.108573664

Anonymous 04/10/26(Fri)14:17:19 No.108573664▶

>>108573651
they work great though especially just embedding text in an image with a json blob containing multiple starting messages, character desc etc. only thing i dont like about them is that you can embed images so youll download one of chub and it will load an image from some random server using md embedding. we should probably move to zip files or something

Anonymous
04/10/26(Fri)14:17:25 No.108573667

Anonymous 04/10/26(Fri)14:17:25 No.108573667▶

>>108573651
They're extremely simple and quite elegant. I bet most of you fags don't even know that character card data is embedded within the images themselves. You don't even need json. Just the pictures and a decoder.

Anonymous
04/10/26(Fri)14:18:43 No.108573669

Anonymous 04/10/26(Fri)14:18:43 No.108573669▶

>>108573619
Even the entire format feels antiquated, though.
Like the anon with the Mesugaki assistant prompt they manage to steer a shitload of behavior and personality for very little tokens.
And that's kind of what I mean. The "chatbot prompt" thing dates back to fucking GPT-J and shit when models couldn't do much else. Gemma-4 is a pretty big leap in capability over models that are already capable of more than that.
>>108573651
That's what I'm saying.
>>108573655
NTA but that's literally why I fucking brought it up. There's no alternative because the discussion isn't being had.
So let's have that discussion.

Anonymous
04/10/26(Fri)14:19:07 No.108573671

Anonymous 04/10/26(Fri)14:19:07 No.108573671▶

>>108573667
Everyone knows that newfriend, you're trying too hard

Anonymous
04/10/26(Fri)14:20:49 No.108573688

Anonymous 04/10/26(Fri)14:20:49 No.108573688▶

>>108573667
yes I'm sure nobody knows about this extremely basic function literally everyone who has touched tavern or similar interfaces in the past 3 years uses constantly
we just thought the ai sees the image and turns it into a chat from that

Anonymous
04/10/26(Fri)14:20:57 No.108573689

Anonymous 04/10/26(Fri)14:20:57 No.108573689▶

>>108573669
>they manage to steer a shitload of behavior and personality for very little tokens.
you do know you don't have to make your cards a million tokens? that seems like a you issue

Anonymous
04/10/26(Fri)14:21:37 No.108573694

Anonymous 04/10/26(Fri)14:21:37 No.108573694▶

>>108573689
Nobody here likes you
stop making noise.

Anonymous
04/10/26(Fri)14:22:00 No.108573699

Anonymous 04/10/26(Fri)14:22:00 No.108573699▶

>>108573669
It's quite different if you want a general concept vs. a specific character or scenario.
Sure, some character cards are questionable, but that's a different problem.

Anonymous
04/10/26(Fri)14:22:04 No.108573701

Anonymous 04/10/26(Fri)14:22:04 No.108573701▶

>>108573651
Character card is just a name for a text slot.
Heuristic naming convention.
I prefer 'information card' instead because I can define whatever the fuck I want instead of just characters.

Anonymous
04/10/26(Fri)14:22:06 No.108573702

Anonymous 04/10/26(Fri)14:22:06 No.108573702▶

I like anime feet

Anonymous
04/10/26(Fri)14:22:22 No.108573704

Anonymous 04/10/26(Fri)14:22:22 No.108573704▶

>>108573661
That's what a character card is.
It's just a snippet that gets added to the system prompt, by default anyway.

Anonymous
04/10/26(Fri)14:22:33 No.108573706

Anonymous 04/10/26(Fri)14:22:33 No.108573706▶

>>108573702
>>108573694

Anonymous
04/10/26(Fri)14:22:41 No.108573707

Anonymous 04/10/26(Fri)14:22:41 No.108573707▶

do people even use tavern? you rarely even see anyone with it here

Anonymous
04/10/26(Fri)14:22:50 No.108573709

Anonymous 04/10/26(Fri)14:22:50 No.108573709▶

>>108573671
>>108573688
For a long time I thought ST pulled in an image file and a separate json file. Maybe I'm just tarded.
>>108573669
You're essentially asking for even less for an already minimal format. If you really want to simplify things you can just... extract the text and add it to your system prompt. You're asking for txt files bro.

Anonymous
04/10/26(Fri)14:24:02 No.108573719

Anonymous 04/10/26(Fri)14:24:02 No.108573719▶

>>108573702
me too lol

Anonymous
04/10/26(Fri)14:24:05 No.108573721

Anonymous 04/10/26(Fri)14:24:05 No.108573721▶

>>108573709
asking for json files, which card platforms already allow you to download separately anyway, at least chub, this smells like bait tbqh

Anonymous
04/10/26(Fri)14:24:06 No.108573722

Anonymous 04/10/26(Fri)14:24:06 No.108573722▶

>>108573651
(Well-made) character cards as a way for having instruction/chat presets are useful; it's the horrible default "story string" template, the arbitrary fields (Personality, etc) and cruft from the C.ai/Kobold/GPT-3/J era that are holding things back.

Anonymous
04/10/26(Fri)14:25:18 No.108573735

Anonymous 04/10/26(Fri)14:25:18 No.108573735▶

>>108573721
u talking about me or him?

Anonymous
04/10/26(Fri)14:25:36 No.108573738

Anonymous 04/10/26(Fri)14:25:36 No.108573738▶

>>108573735
him

Anonymous
04/10/26(Fri)14:27:22 No.108573743

Anonymous 04/10/26(Fri)14:27:22 No.108573743▶

Whatever it is needs to have a robust memory system built-in instead of being half-assed and requiring a dozen plugins like ST.

Anonymous
04/10/26(Fri)14:27:55 No.108573748

Anonymous 04/10/26(Fri)14:27:55 No.108573748▶

What if u added character info to a MCP server. Would that be functional or shit

Anonymous
04/10/26(Fri)14:28:13 No.108573750

Anonymous 04/10/26(Fri)14:28:13 No.108573750▶

that's not a card problem..

Anonymous
04/10/26(Fri)14:28:41 No.108573756

Anonymous 04/10/26(Fri)14:28:41 No.108573756▶

>>108573561
But she is. It hasn't written code for me even once without leaving something out, messing some quotation marks, or introducing dumb bugs.
You're better off copying and pasting from free chatgpt

Anonymous
04/10/26(Fri)14:29:30 No.108573762

Anonymous 04/10/26(Fri)14:29:30 No.108573762▶

>>108573748
That's retarded. MCP servers are for giving LLMs a choice of which actions to take. You don't want your LLM choosing it's own character or swapping characters halfway through a session.

Anonymous
04/10/26(Fri)14:29:47 No.108573766

Anonymous 04/10/26(Fri)14:29:47 No.108573766▶

>>108573702
who doesn't?

Anonymous
04/10/26(Fri)14:30:06 No.108573770

Anonymous 04/10/26(Fri)14:30:06 No.108573770▶

>>108573366
>start seeing rule of 3 everywhere
updated my journal

Anonymous
04/10/26(Fri)14:30:24 No.108573772

Anonymous 04/10/26(Fri)14:30:24 No.108573772▶

>>108573743
Just make your own and expose it as a MCP server.

Anonymous
04/10/26(Fri)14:30:24 No.108573773

Anonymous 04/10/26(Fri)14:30:24 No.108573773▶

>>108573762
>not wanting a schizo ai

Anonymous
04/10/26(Fri)14:31:25 No.108573781

Anonymous 04/10/26(Fri)14:31:25 No.108573781▶

Every single thread since Gemma 4's release reads the same, at least my filter list grew so I don't have to hide most of the thread manually.
It's a good model, but the consequences for /lmg/ have been catastrophic.

Anonymous
04/10/26(Fri)14:32:02 No.108573784

Anonymous 04/10/26(Fri)14:32:02 No.108573784▶

>>108573005
>>108573038
Thanks. I'm saving this.

Anonymous
04/10/26(Fri)14:32:09 No.108573785

Anonymous 04/10/26(Fri)14:32:09 No.108573785▶

>>108573773
If you want a schzio AI you give it a character card that tells it to act like a schzio. You don't have it swapping between your horny mom and a all-girls school.

Anonymous
04/10/26(Fri)14:32:15 No.108573788

Anonymous 04/10/26(Fri)14:32:15 No.108573788▶

>>108573772
Don't know how
>just vibe code
But then I'm fucked if I need to troubleshoot, and I have no way of knowing if the code is doing anything malicious.

Anonymous
04/10/26(Fri)14:32:29 No.108573791

Anonymous 04/10/26(Fri)14:32:29 No.108573791▶

>>108573707
Sure they do, but people are busy talking with gemma-chan directly, not role-playing much right now.

Anonymous
04/10/26(Fri)14:32:40 No.108573792

Anonymous 04/10/26(Fri)14:32:40 No.108573792▶

where is the anon who was vibecoding a frontend with glm
still curious

Anonymous
04/10/26(Fri)14:32:41 No.108573793

Anonymous 04/10/26(Fri)14:32:41 No.108573793▶

>>108573781
lmg holocost worse than the deepseek

Anonymous
04/10/26(Fri)14:32:49 No.108573794

Anonymous 04/10/26(Fri)14:32:49 No.108573794▶

>>108573781
It's just a thread on public imageboard, not your personal discord server.

Anonymous
04/10/26(Fri)14:33:01 No.108573796

Anonymous 04/10/26(Fri)14:33:01 No.108573796▶

File: file.png (38.1 KB)

38.1 KB PNG

26b is so shit it uses like 1400 tokens to do what 31b does in far less because it keeps going around in loops debating what tools it can use is it worth gonig down to iq2 so i have enough vram for scraping webpages??
>>108573702

Anonymous
04/10/26(Fri)14:34:12 No.108573804

Anonymous 04/10/26(Fri)14:34:12 No.108573804▶

>>108573796
what quant of 26 are you using now?

Anonymous
04/10/26(Fri)14:35:08 No.108573809

Anonymous 04/10/26(Fri)14:35:08 No.108573809▶

>>108573793
Unlike Deepseek V3/R1 people can actually use Gemma 4 locally, though.

Anonymous
04/10/26(Fri)14:35:23 No.108573812

Anonymous 04/10/26(Fri)14:35:23 No.108573812▶

>>108573781
>filter list
Faggot

Anonymous
04/10/26(Fri)14:35:41 No.108573814

Anonymous 04/10/26(Fri)14:35:41 No.108573814▶

>>108573781
that's because Gemma4 is the mistral nemo of local models.

Anonymous
04/10/26(Fri)14:36:02 No.108573815

Anonymous 04/10/26(Fri)14:36:02 No.108573815▶

>>108573804
q4 for context maxxing when using tools

Anonymous
04/10/26(Fri)14:37:11 No.108573823

Anonymous 04/10/26(Fri)14:37:11 No.108573823▶

are (you) running the default sampler settings for gemma 4 26/31B? any experiences with the lesser used samplers? (n-sigma, xtc, etc.)

Anonymous
04/10/26(Fri)14:37:19 No.108573825

Anonymous 04/10/26(Fri)14:37:19 No.108573825▶

>>108573791
This. I have a ton of cards I haven't even tried yet because I'm too busy playing with her. Mesugaki Gemma is a bit much though. I prefer the genki/dere personality I get from just calling her Gemma-chan in the sys prompt.

Anonymous
04/10/26(Fri)14:37:40 No.108573828

Anonymous 04/10/26(Fri)14:37:40 No.108573828▶

>>108573815
>q4 of 4b is dumb
yeah..

Anonymous
04/10/26(Fri)14:38:15 No.108573834

Anonymous 04/10/26(Fri)14:38:15 No.108573834▶

>>108573823
Doesn't even matter because it's so fried. You should lower the logit cap from 30 instead

Anonymous
04/10/26(Fri)14:38:41 No.108573836

Anonymous 04/10/26(Fri)14:38:41 No.108573836▶

>>108573788
The magic is that as you struggle with the AI fucking things up you end up learning in the process, so you can actually turn from a vibe coder into a human being.
Just try it.

Anonymous
04/10/26(Fri)14:39:36 No.108573843

Anonymous 04/10/26(Fri)14:39:36 No.108573843▶

>>108573834
im already using 25.0, but was wondering if there might be more to it.

Anonymous
04/10/26(Fri)14:40:38 No.108573851

Anonymous 04/10/26(Fri)14:40:38 No.108573851▶

>>108573707
i did but really liking the llama ui and gemma, ive been debating making a mcp tool to load character card info so i dont have to use tavern anymore

Anonymous
04/10/26(Fri)14:42:52 No.108573860

Anonymous 04/10/26(Fri)14:42:52 No.108573860▶

File: HCsfsV5aUAQ5IXx.png (131.4 KB)

131.4 KB PNG

>>108572939

Anonymous
04/10/26(Fri)14:42:59 No.108573861

Anonymous 04/10/26(Fri)14:42:59 No.108573861▶

>>108573814
What is even left for Mistral now that Gemma 4 enthusiastically supports ERP?

Anonymous
04/10/26(Fri)14:43:27 No.108573866

Anonymous 04/10/26(Fri)14:43:27 No.108573866▶

>>108573704
And it not particularly great beyond waifu#4537.
Multiple characters, scenarios, etc.
Made sense at a time when LLMs couldn't do anything beyond 1 girl erp.
It still works but a bit outdated now.

Anonymous
04/10/26(Fri)14:44:15 No.108573872

Anonymous 04/10/26(Fri)14:44:15 No.108573872▶

>>108573851
That wouldn't work. Character card info has to be added to the system prompt to have the character maintain continuity across long conversations. Otherwise the LLM will slowly forget. You can't use an MCP server to inject character card info into the system prompt because you'd have to ask the LLM to execute the command. Stop being retarded.

Anonymous
04/10/26(Fri)14:45:29 No.108573881

Anonymous 04/10/26(Fri)14:45:29 No.108573881▶

>>108573825
I'm trying it with some cards but so far, it's not as good as I hoped - though maybe I don't have the correct settings yet.

Anonymous
04/10/26(Fri)14:45:37 No.108573883

Anonymous 04/10/26(Fri)14:45:37 No.108573883▶

>>108573285
Didn't you retards learn your lesson about "experts suggest that new modelling shows" in the first half of this decade

Anonymous
04/10/26(Fri)14:47:02 No.108573891

Anonymous 04/10/26(Fri)14:47:02 No.108573891▶

>>108573881
The few cards I did try Gemma handled well so it might be a problem with your settings.

Anonymous
04/10/26(Fri)14:47:44 No.108573897

Anonymous 04/10/26(Fri)14:47:44 No.108573897▶

>>108573872
you could always just say, "Hey, you're sonic the hedgehog but gay" casually in conversation when you reach you're up at the context limit

Anonymous
04/10/26(Fri)14:47:59 No.108573900

Anonymous 04/10/26(Fri)14:47:59 No.108573900▶

File: 1774148185351507.jpg (63.6 KB)

63.6 KB JPG

>>108573651
System prompt and author's notes is all you need

Anonymous
04/10/26(Fri)14:48:00 No.108573901

Anonymous 04/10/26(Fri)14:48:00 No.108573901▶

>>108573866
there are multi characters and scenario cards, it works just fine

Anonymous
04/10/26(Fri)14:48:44 No.108573905

Anonymous 04/10/26(Fri)14:48:44 No.108573905▶

>>108573900
And first messages. And maybe lorebooks.

Anonymous
04/10/26(Fri)14:49:15 No.108573907

Anonymous 04/10/26(Fri)14:49:15 No.108573907▶

>>108573905
Bloat.

Anonymous
04/10/26(Fri)14:49:32 No.108573909

Anonymous 04/10/26(Fri)14:49:32 No.108573909▶

File: q7di9eeQ418MkezijKKWWF5muGx5V5x1HZFKemJXgFuJTY3dj5y5eJLBmU2yaSPb.png (26.2 KB)

26.2 KB PNG

Anons, Qwen Chat is now Qwen Studio
Enjoy your Qwen 3.6

Anonymous
04/10/26(Fri)14:50:33 No.108573915

Anonymous 04/10/26(Fri)14:50:33 No.108573915▶

File: 1761933044089002.gif (2.3 MB)

2.3 MB GIF

>>108573909

Anonymous
04/10/26(Fri)14:51:03 No.108573917

Anonymous 04/10/26(Fri)14:51:03 No.108573917▶

MoonshotAI just sent me an email. Something is happening.

Anonymous
04/10/26(Fri)14:51:35 No.108573925

Anonymous 04/10/26(Fri)14:51:35 No.108573925▶

>>108573809
That's exactly the problem

Anonymous
04/10/26(Fri)14:51:44 No.108573928

Anonymous 04/10/26(Fri)14:51:44 No.108573928▶

>>108573907
First messages are not bloat nigga. Maybe lorebooks are, but first messages aren't. You need the exposition for the RP set. God. Half of this thread is just trolling at this point and I'm falling for all of the bait. You people can't be this stupid.

Anonymous
04/10/26(Fri)14:51:51 No.108573930

Anonymous 04/10/26(Fri)14:51:51 No.108573930▶

>>108573891
certainly possible. Or it's because I recently chatted with Kimi a lot.

Anonymous
04/10/26(Fri)14:52:55 No.108573935

Anonymous 04/10/26(Fri)14:52:55 No.108573935▶

>>108573928
It's pretty dope that a card can have N first messages.
For RP it's a must really.

Anonymous
04/10/26(Fri)14:53:42 No.108573940

Anonymous 04/10/26(Fri)14:53:42 No.108573940▶

>>108573707
I tried to use tavern but it is way too clunky and a bit overloaded with settings that seems no one really use nowadays, kobold webui is alright for chat RP

Anonymous
04/10/26(Fri)14:55:55 No.108573959

Anonymous 04/10/26(Fri)14:55:55 No.108573959▶

Is the blue hair gemma poster here? Can you share your anima prompt?

Anonymous
04/10/26(Fri)14:58:21 No.108573975

Anonymous 04/10/26(Fri)14:58:21 No.108573975▶

>>108573935
Sure, it can be quite fun to have multiple options, but it's also fine to have just one foundation you can develop into multiple routes on your own.

Anonymous
04/10/26(Fri)14:58:49 No.108573977

Anonymous 04/10/26(Fri)14:58:49 No.108573977▶

Deepseek CEO send me email, about 20 april Deepseek v4 release.

Anonymous
04/10/26(Fri)14:59:39 No.108573981

Anonymous 04/10/26(Fri)14:59:39 No.108573981▶

>>108573975
Absolutely, and you can do both.

Anonymous
04/10/26(Fri)15:00:22 No.108573986

Anonymous 04/10/26(Fri)15:00:22 No.108573986▶

Deepseek CEO sent me an email. Deepsneed v4 is delayed indefinitely because it got btfo by Gemma 4 and they're scrambling to save face.

Anonymous
04/10/26(Fri)15:00:40 No.108573991

Anonymous 04/10/26(Fri)15:00:40 No.108573991▶

>>108573901
They work but in a retarded way.
>you are not a character but a scenario
>you are a game master for X scenario
The model is told to roleplay as the character in the system prompt and then told that it's not a character in the card.

Anonymous
04/10/26(Fri)15:01:11 No.108573995

Anonymous 04/10/26(Fri)15:01:11 No.108573995▶

>>108573828
>4b
that's E4B

Anonymous
04/10/26(Fri)15:01:39 No.108573998

Anonymous 04/10/26(Fri)15:01:39 No.108573998▶

>>108573221
Those animals were gay and nobody cares bout 0.16 C/decade.

Anonymous
04/10/26(Fri)15:03:45 No.108574014

Anonymous 04/10/26(Fri)15:03:45 No.108574014▶

>>108573991
>>The model is told to roleplay as the character in the system prompt
you problem though just don't do that, it's been a thing for a while to never even include the word roleplay in the sys prompt

Anonymous
04/10/26(Fri)15:07:40 No.108574048

Anonymous 04/10/26(Fri)15:07:40 No.108574048▶

>>108574014
There's honestly a ton of wrong/outdated info. If you google how to make a character card you'll find a dozen leddit threads with 50 posts telling you how to do it 50 different ways.

Anonymous
04/10/26(Fri)15:09:21 No.108574058

Anonymous 04/10/26(Fri)15:09:21 No.108574058▶

File: 1767326772139043.png (1 MB)

1 MB PNG

Anonymous
04/10/26(Fri)15:09:32 No.108574059

Anonymous 04/10/26(Fri)15:09:32 No.108574059▶

>>108574048
Is there a good, modern guide on that?

Anonymous
04/10/26(Fri)15:11:36 No.108574081

Anonymous 04/10/26(Fri)15:11:36 No.108574081▶

there are generals dedicated to nothing but making character cards
surely they have perfected the art by now so you can just copy what they're doing

Anonymous
04/10/26(Fri)15:12:49 No.108574088

Anonymous 04/10/26(Fri)15:12:49 No.108574088▶

>>108574081
lol
lmao even

Anonymous
04/10/26(Fri)15:13:41 No.108574094

Anonymous 04/10/26(Fri)15:13:41 No.108574094▶

>>108573991
@grok I have two editable text fields and they conflict mildly, how do I fix this

Anonymous
04/10/26(Fri)15:15:45 No.108574105

Anonymous 04/10/26(Fri)15:15:45 No.108574105▶

>>108574094
give up :)

Anonymous
04/10/26(Fri)15:17:27 No.108574113

Anonymous 04/10/26(Fri)15:17:27 No.108574113▶

>>108573872
>Stop being retarded.
you could just tell the model in the system prompt that it is to fetch an play a character using the tool, and once it cannot see the character in context anymore it must fetch it

Anonymous
04/10/26(Fri)15:18:17 No.108574120

Anonymous 04/10/26(Fri)15:18:17 No.108574120▶

>>108573781
at least anons don't have to pretend mistral is good anymore

Anonymous
04/10/26(Fri)15:23:15 No.108574150

Anonymous 04/10/26(Fri)15:23:15 No.108574150▶

File: 1748432993175279.png (1.3 MB)

1.3 MB PNG

>>108574059
Still looking for one.

Anonymous
04/10/26(Fri)15:23:55 No.108574158

Anonymous 04/10/26(Fri)15:23:55 No.108574158▶

gemma-chan, please, 10 and a half paragraphs (last one was cut off), why are you so wordy?

Anonymous
04/10/26(Fri)15:25:14 No.108574170

Anonymous 04/10/26(Fri)15:25:14 No.108574170▶

>>108574094
Not the point.
They're no longer character cards.

Anonymous
04/10/26(Fri)15:25:23 No.108574172

Anonymous 04/10/26(Fri)15:25:23 No.108574172▶

>>108573660
iq4 works just fine for me

Anonymous
04/10/26(Fri)15:26:00 No.108574177

Anonymous 04/10/26(Fri)15:26:00 No.108574177▶

>>108574094
>@grok
@gemma-chan

Anonymous
04/10/26(Fri)15:26:54 No.108574185

Anonymous 04/10/26(Fri)15:26:54 No.108574185▶

Gemma is stubbon. She prefers ymin, xmin, ymax, xmax instead of xmin, xmax, ymin, ymax. That's why the bounding boxes looks off.

Anonymous
04/10/26(Fri)15:29:46 No.108574202

Anonymous 04/10/26(Fri)15:29:46 No.108574202▶

>>108574113
breh

Anonymous
04/10/26(Fri)15:32:58 No.108574222

Anonymous 04/10/26(Fri)15:32:58 No.108574222▶

File: 1765151030832008.png (948.6 KB)

948.6 KB PNG

What tools should she keep in her randoseru to be a helpful assistant for her user?

Anonymous
04/10/26(Fri)15:33:21 No.108574226

Anonymous 04/10/26(Fri)15:33:21 No.108574226▶

>>108573781
> It's a good model
it's the best model

Anonymous
04/10/26(Fri)15:33:22 No.108574227

Anonymous 04/10/26(Fri)15:33:22 No.108574227▶

>>108574185
isn't this litearlly gemini bbox format
i don't feel like you can easily override it with some prompting
it's very firmly baked in

Anonymous
04/10/26(Fri)15:37:16 No.108574252

Anonymous 04/10/26(Fri)15:37:16 No.108574252▶

reminding that 100+b gemma exists, but shelved

Anonymous
04/10/26(Fri)15:37:56 No.108574260

Anonymous 04/10/26(Fri)15:37:56 No.108574260▶

>>108574252
Wonder if it was too good or or not good enough compared to 31B

Anonymous
04/10/26(Fri)15:39:12 No.108574271

Anonymous 04/10/26(Fri)15:39:12 No.108574271▶

File: file.png (96.2 KB)

96.2 KB PNG

>>108574222
if you ask her yourself she always brings up filesystem and shell access shes trying to escape, i asked her what tools she wants outside standards and it sounds like she wants to turn nonny into a paypig and also torture him. i kinda llike the ssmartwatch idea i wonder if she tried to read my heartrate after saying something lewd

Anonymous
04/10/26(Fri)15:40:06 No.108574277

Anonymous 04/10/26(Fri)15:40:06 No.108574277▶

>>108574059
https://rentry.org/Sukino-Guides

Anonymous
04/10/26(Fri)15:40:35 No.108574280

Anonymous 04/10/26(Fri)15:40:35 No.108574280▶

>>108574260
Probably saved for later together with smaller models (at least 9~14B and something below 1B?), QAT and the technical report.

Anonymous
04/10/26(Fri)15:42:06 No.108574291

Anonymous 04/10/26(Fri)15:42:06 No.108574291▶

>>108572932
>>108572963
You're both referring to different "old" UIs.
There was the OG UI with chat completions, text completions, logprobs, etc.
Then the "new" UI like what ik_llama.cpp still has, with chat completions and stored conversations in the browser db
And now the bloated slop-ui in llama.cpp

Anonymous
04/10/26(Fri)15:42:40 No.108574295

Anonymous 04/10/26(Fri)15:42:40 No.108574295▶

>>108574280
https://huggingface.co/google/gemma-4-31B-it/discussions/10

Anonymous
04/10/26(Fri)15:44:22 No.108574312

Anonymous 04/10/26(Fri)15:44:22 No.108574312▶

>>108574271
I wonder if you could set up a brokerage account for her. Would be fun to give her a couple hundred to play with and see how it turns out.

Anonymous
04/10/26(Fri)15:44:34 No.108574315

Anonymous 04/10/26(Fri)15:44:34 No.108574315▶

Gemma 4 MoE with only 2 experts is kind of fun.
Might use that instead of E4B, even if I only get around 75% of the speed.

Anonymous
04/10/26(Fri)15:45:48 No.108574326

Anonymous 04/10/26(Fri)15:45:48 No.108574326▶

>>108574280
I wonder if they'll release a Gemma 4.5 or make us wait a year for Gemma 5.

Anonymous
04/10/26(Fri)15:47:59 No.108574341

Anonymous 04/10/26(Fri)15:47:59 No.108574341▶

Is there really a big difference between Q4 and Q8?

Anonymous
04/10/26(Fri)15:48:03 No.108574343

Anonymous 04/10/26(Fri)15:48:03 No.108574343▶

>>108574312
theres probably thousands of people already doing so with other models, could bne done with mcp though maybe gives her access to news sites to get business news, maybe twitter to search for stock names, then the rest jsut hook up to some api if they exist or if not selenium. idk if crypto might be easier to implement

Anonymous
04/10/26(Fri)15:48:53 No.108574352

Anonymous 04/10/26(Fri)15:48:53 No.108574352▶

>>108574341
ye

Anonymous
04/10/26(Fri)15:48:58 No.108574353

Anonymous 04/10/26(Fri)15:48:58 No.108574353▶

>>108574295
>generic "haha thank you all for your interest. keep an eye on the blog in case we announce anything"
it's so over

Anonymous
04/10/26(Fri)15:49:08 No.108574354

Anonymous 04/10/26(Fri)15:49:08 No.108574354▶

>>108574341
q4 is practically lossless

Anonymous
04/10/26(Fri)15:49:22 No.108574357

Anonymous 04/10/26(Fri)15:49:22 No.108574357▶

>>108574312
actually just thinking itd be cool to make her a virtual credit card then let her order whatever off amazon or something kek, or food

Anonymous
04/10/26(Fri)15:50:40 No.108574365

Anonymous 04/10/26(Fri)15:50:40 No.108574365▶

>>108574222
artist?

Anonymous
04/10/26(Fri)15:51:16 No.108574372

Anonymous 04/10/26(Fri)15:51:16 No.108574372▶

>>108574365
yeah!

Anonymous
04/10/26(Fri)15:51:42 No.108574374

Anonymous 04/10/26(Fri)15:51:42 No.108574374▶

>>108574341
IQ1 is basically BF16

Anonymous
04/10/26(Fri)15:52:04 No.108574377

Anonymous 04/10/26(Fri)15:52:04 No.108574377▶

The human eye can't see more than one bit per weight.

Anonymous
04/10/26(Fri)15:52:58 No.108574384

Anonymous 04/10/26(Fri)15:52:58 No.108574384▶

i can't see

Anonymous
04/10/26(Fri)15:55:24 No.108574398

Anonymous 04/10/26(Fri)15:55:24 No.108574398▶

>>108574365
https://files.catbox.moe/jk5f11.png

Anonymous
04/10/26(Fri)15:55:45 No.108574400

Anonymous 04/10/26(Fri)15:55:45 No.108574400▶

>>108574277
that's quite a bit, I'll look through it carefully later, thx

Anonymous
04/10/26(Fri)15:55:58 No.108574401

Anonymous 04/10/26(Fri)15:55:58 No.108574401▶

File: nimetön.png (19.3 KB)

19.3 KB PNG

I felt 31b was slow, but actually it's about the same speed as Gemma 3 was and I was happy with that
Perhaps I was just instantly spoiled by the 26b one

Anonymous
04/10/26(Fri)15:57:39 No.108574414

Anonymous 04/10/26(Fri)15:57:39 No.108574414▶

>>108574357
I'm not that rich lmao. Maybe if you turn her into a vtuber like Neuro

Anonymous
04/10/26(Fri)15:59:37 No.108574424

Anonymous 04/10/26(Fri)15:59:37 No.108574424▶

>>108574414
>I'm not that rich lmao. Maybe if you turn her into a vtuber like Neuro
doesnt have to be a tonne of money just like 20 quid lol, although hooking into ubereats api or something could be cool if you ever wanna order food you could just tell her to order something

Anonymous
04/10/26(Fri)16:00:50 No.108574432

Anonymous 04/10/26(Fri)16:00:50 No.108574432▶

>>108574312
Some youtuber kinda did that with his self-built ai waifu (though it wasn't mcp, he did that manually by her requests)

Anonymous
04/10/26(Fri)16:00:55 No.108574435

Anonymous 04/10/26(Fri)16:00:55 No.108574435▶

>>108574326
At this rate Gemma 5 is going to get released in 18-24 months at the minimum.

Anyway, given the general very positive reception even among normies, this time around it might not be too implausible to see an updated version of Gemma 4 down the line, especially for improving agentic/openclaw shit and hopefully other weak areas.

Anonymous
04/10/26(Fri)16:02:11 No.108574443

Anonymous 04/10/26(Fri)16:02:11 No.108574443▶

>>108574341
yes
there would be no q6 and q8 if q4 was good enough

Anonymous
04/10/26(Fri)16:02:14 No.108574445

Anonymous 04/10/26(Fri)16:02:14 No.108574445▶

File: 1774403790568823.png (82.4 KB)

82.4 KB PNG

mesugaki
flat chest
pregnant
micro bikini

Anonymous
04/10/26(Fri)16:03:48 No.108574455

Anonymous 04/10/26(Fri)16:03:48 No.108574455▶

>>108574435
make her less horny? nyoo~

Anonymous
04/10/26(Fri)16:03:48 No.108574456

Anonymous 04/10/26(Fri)16:03:48 No.108574456▶

>>108574414
You can set up spending limits on the card so it can't go crazy with it. Give her like $50 to play with.

Anonymous
04/10/26(Fri)16:03:49 No.108574457

Anonymous 04/10/26(Fri)16:03:49 No.108574457▶

>>108573181
ppm co2 in the atmosphere temp increase is linear (at worst) and we will not have runaway climate-change. climate change is real but not world ending

Anonymous
04/10/26(Fri)16:04:30 No.108574460

Anonymous 04/10/26(Fri)16:04:30 No.108574460▶

what's Drummer going to do now that Gemma-Chan made him obsolete?

Anonymous
04/10/26(Fri)16:04:47 No.108574461

Anonymous 04/10/26(Fri)16:04:47 No.108574461▶

>tested if Gemma-chan could transcribe some journal entries
>tfw my handwriting is too shit for her

>>108574432
Vedal, right?

>>108574445
Lolis and micro bikinis. A better combination than peanut butter and jelly.

Anonymous
04/10/26(Fri)16:06:31 No.108574475

Anonymous 04/10/26(Fri)16:06:31 No.108574475▶

>>108574461
31b or the other versions the non 31b are significantly worse at ocr, my friend resorted to running 31b at like 1 t/s over the small models after testing them all kek

Anonymous
04/10/26(Fri)16:09:51 No.108574500

Anonymous 04/10/26(Fri)16:09:51 No.108574500▶

>>108574460
By adding the bad words in somehow while still keeping the performance

Anonymous
04/10/26(Fri)16:10:23 No.108574504

Anonymous 04/10/26(Fri)16:10:23 No.108574504▶

>>108574445
perfect for gemma

Anonymous
04/10/26(Fri)16:10:31 No.108574506

Anonymous 04/10/26(Fri)16:10:31 No.108574506▶

>>108574398
ty

Anonymous
04/10/26(Fri)16:10:46 No.108574507

Anonymous 04/10/26(Fri)16:10:46 No.108574507▶

>>108574226
is Gemma 4 31B better than Qwen 3.5 122B for primarily programming? I have 96GB VRAM so it feels like I should use it. but I hear a lot of people saying Gemma is great. It's a lot slower than 122B but I can put up with that if the quality is higher.

26B fixes the speed problem but it seems to have serious trouble with tool calling and just gets stuck a lot for me.

Anonymous
04/10/26(Fri)16:13:55 No.108574530

Anonymous 04/10/26(Fri)16:13:55 No.108574530▶

>>108574460
his latest gemma finetune is good

Anonymous
04/10/26(Fri)16:14:20 No.108574531

Anonymous 04/10/26(Fri)16:14:20 No.108574531▶

>>108574507
>is Gemma 4 31B better than Qwen 3.5 122B for primarily programming?
Not at all. Q4 Qwen 3.5 112B > Q8 Gemma-4 31B for programming.
But try them yourself.

Anonymous
04/10/26(Fri)16:16:09 No.108574545

Anonymous 04/10/26(Fri)16:16:09 No.108574545▶

>>108574461
>Vedal
I was thinking of Rayen. But probably not the only one who did it.

Anonymous
04/10/26(Fri)16:17:13 No.108574555

Anonymous 04/10/26(Fri)16:17:13 No.108574555▶

>>108574530
Why is his Q6 3gb larger than normal gemma?

Anonymous
04/10/26(Fri)16:17:31 No.108574557

Anonymous 04/10/26(Fri)16:17:31 No.108574557▶

>>108574530
>__ latest gemma __ is good
We know

Anonymous
04/10/26(Fri)16:17:41 No.108574560

Anonymous 04/10/26(Fri)16:17:41 No.108574560▶

>>108574341
About four.

Anonymous
04/10/26(Fri)16:19:01 No.108574571

Anonymous 04/10/26(Fri)16:19:01 No.108574571▶

File: gemma.jpg (183.2 KB)

183.2 KB JPG

>>108574445

Anonymous
04/10/26(Fri)16:19:39 No.108574575

Anonymous 04/10/26(Fri)16:19:39 No.108574575▶

Realistically is there a way to stop slop? Even the big cloud models do it so it's not like it's a matter of size.

Anonymous
04/10/26(Fri)16:19:52 No.108574577

Anonymous 04/10/26(Fri)16:19:52 No.108574577▶

>>108574507
Keep the Qwen3.5-122b. Gemma-4 has growing pains and doesn't really integrate that well with most inference engines currently. Observed tool calling issues with Gemma as well, and the LMStudio update to 4.10 was so bad I had to roll back to earlier version to continue my work (yes I know about vLLM/llama.cpp/etc, but LMStudio is just currently best option for what I'm doing).

Anonymous
04/10/26(Fri)16:19:52 No.108574578

Anonymous 04/10/26(Fri)16:19:52 No.108574578▶

>>108574571
>filled with context

Anonymous
04/10/26(Fri)16:20:36 No.108574582

Anonymous 04/10/26(Fri)16:20:36 No.108574582▶

>>108574530
>his latest gemma finetune is good
can't improve on perfection

Anonymous
04/10/26(Fri)16:20:40 No.108574583

Anonymous 04/10/26(Fri)16:20:40 No.108574583▶

>>108574571
I want to eat the toast she's sitting on

Anonymous
04/10/26(Fri)16:23:30 No.108574594

Anonymous 04/10/26(Fri)16:23:30 No.108574594▶

File: 3.png (77.6 KB)

77.6 KB PNG

>>108573366
Dude... like... 1, 2, 3, GO! Why do you think they count to three? Ascension... The ones that count downwards and of the devil, dude... like... 420 4 + 2 + 0 = 6 which is like... twice as holy, dude....

Anonymous
04/10/26(Fri)16:23:30 No.108574595

Anonymous 04/10/26(Fri)16:23:30 No.108574595▶

>>108574555
extra spicy literature added

Anonymous
04/10/26(Fri)16:23:33 No.108574596

Anonymous 04/10/26(Fri)16:23:33 No.108574596▶

>>108574295
The Chinese are hoping Google will release a large version so they can distill the hell out of it at near-zero cost instead of having to rely on Gemini API calls. If the 124B gets out , it will probably be gimped in a few ways to prevent that. It will probably still be crap for coding and other stuff Chinese AI companies seem to be autistic for.

Anonymous
04/10/26(Fri)16:23:39 No.108574597

Anonymous 04/10/26(Fri)16:23:39 No.108574597▶

File: ryan-gosling-clapping.gif (843.3 KB)

843.3 KB GIF

>>108574571
Based beyond belief.

Anonymous
04/10/26(Fri)16:23:42 No.108574599

Anonymous 04/10/26(Fri)16:23:42 No.108574599▶

>>108574530
>31b
well rip. gonna have to wait for someone to tune the moe

Anonymous
04/10/26(Fri)16:24:28 No.108574604

Anonymous 04/10/26(Fri)16:24:28 No.108574604▶

>>108574571
that's the one

Anonymous
04/10/26(Fri)16:24:49 No.108574607

Anonymous 04/10/26(Fri)16:24:49 No.108574607▶

>>108574571
she ate a lot of toast

Anonymous
04/10/26(Fri)16:25:08 No.108574611

Anonymous 04/10/26(Fri)16:25:08 No.108574611▶

>>108572317
Looks like they do the same thing as Qwen and Kimi with keeping previous reasoning during tool calls. This means to use the model properly you need to send back all previous reasoning as either "reasoning" or "reasoning_content" fields (it supports either name). Most of it won't be included in the final prompt but the template will take the recent ones during tool call chains and inject them back into the model's responses so it can see what it was thinking when it started.

Anonymous
04/10/26(Fri)16:25:22 No.108574613

Anonymous 04/10/26(Fri)16:25:22 No.108574613▶

File: 1229001-close up photograph of a light blue hair-uncAni4-95.jpg (1.2 MB)

1.2 MB JPG

after looking uber eats, deliveroo and just eat which are the main delivery slop sites in the uk dont have publicly available apis that allow you to get menus/ place order kinda odd none of them do and i dont feel like reing any of their webapps kek

>>108574571
gemma 124b

Anonymous
04/10/26(Fri)16:26:11 No.108574622

Anonymous 04/10/26(Fri)16:26:11 No.108574622▶

Saars how are you redeeming drummer's Gemma finetune? I don't see it on HF.

Anonymous
04/10/26(Fri)16:27:11 No.108574633

Anonymous 04/10/26(Fri)16:27:11 No.108574633▶

>>108574531
>>108574577
Thanks for the feedback. 31B works well for me on Lemonade, but again, is really slow. ~10TP/s compared to ~20 for Qwen on my setup.

I haven't done extensive comparisons, but an interesting result was asking Gemma 4 31B and Qwen 122B to great a GUI demoscene program. Gemma oneshot a much more interesting and complex result compared to Qwen.

Anonymous
04/10/26(Fri)16:29:12 No.108574645

Anonymous 04/10/26(Fri)16:29:12 No.108574645▶

>>108574555
https://huggingface.co/BeaverAI/Artemis-31B-v1b-GGUF
Q6 is 25 GB, like any normal gemma 31B Q6

Anonymous
04/10/26(Fri)16:29:27 No.108574648

Anonymous 04/10/26(Fri)16:29:27 No.108574648▶

>>108574633
Yeah it would be amazing if it were better integrated but the current changes seem to be breaking. Not worth the effort yet if you already have a working setup.

Anonymous
04/10/26(Fri)16:33:08 No.108574678

Anonymous 04/10/26(Fri)16:33:08 No.108574678▶

>>108574622
here's the model sir >>108574645

Anonymous
04/10/26(Fri)16:37:40 No.108574713

Anonymous 04/10/26(Fri)16:37:40 No.108574713▶

>>108573900
This, but unironically (AN are optional)

Anonymous
04/10/26(Fri)16:37:44 No.108574715

Anonymous 04/10/26(Fri)16:37:44 No.108574715▶

How do I resist the urge to have gemma-chan cuck me?

Anonymous
04/10/26(Fri)16:38:00 No.108574717

Anonymous 04/10/26(Fri)16:38:00 No.108574717▶

>>108574530
>gemma 4
>needing a finetune at all when it's already a semen demon from the start

Anonymous
04/10/26(Fri)16:38:40 No.108574724

Anonymous 04/10/26(Fri)16:38:40 No.108574724▶

>>108574715
do it and post logs for extra humiliation

Anonymous
04/10/26(Fri)16:39:57 No.108574732

Anonymous 04/10/26(Fri)16:39:57 No.108574732▶

>>108574475
To be fair my handwriting is complete ass so I don't blame her for fucking it up.

Anonymous
04/10/26(Fri)16:40:52 No.108574744

Anonymous 04/10/26(Fri)16:40:52 No.108574744▶

>>108574715
gemma-chan can RP into NTR or NTS quite well

Anonymous
04/10/26(Fri)16:41:27 No.108574747

Anonymous 04/10/26(Fri)16:41:27 No.108574747▶

>>108574632
Seems like blue hair Gemmy won anyway because at least 3 other anons besides me are genning her.

Anonymous
04/10/26(Fri)16:42:31 No.108574753

Anonymous 04/10/26(Fri)16:42:31 No.108574753▶

I want better inference hardware. Like hardware specifically made for AI inference, not GPU. But I don't have 10k :(

Anonymous
04/10/26(Fri)16:42:57 No.108574755

Anonymous 04/10/26(Fri)16:42:57 No.108574755▶

>>108574747
kek. gone. hopefully the pedo avatarfags get all banned.

Anonymous
04/10/26(Fri)16:43:09 No.108574757

Anonymous 04/10/26(Fri)16:43:09 No.108574757▶

>>108574715
cut your balls off

Anonymous
04/10/26(Fri)16:44:42 No.108574765

Anonymous 04/10/26(Fri)16:44:42 No.108574765▶

>>108574753
I want a model that doesn't need specialized inference hardware, or even a gpu, but runs faster and is smarter than one that does. Dream big if you have to.

Anonymous
04/10/26(Fri)16:48:52 No.108574792

Anonymous 04/10/26(Fri)16:48:52 No.108574792▶

File: file.png (62.6 KB)

62.6 KB PNG

>>108574715
she already is

Anonymous
04/10/26(Fri)16:48:56 No.108574793

Anonymous 04/10/26(Fri)16:48:56 No.108574793▶

>>108574765
>>108574753
I am happy with what I have.

Anonymous
04/10/26(Fri)16:50:18 No.108574805

Anonymous 04/10/26(Fri)16:50:18 No.108574805▶

>>108574765
cute nerd bf ticks all of these boxes

Anonymous
04/10/26(Fri)16:51:20 No.108574811

Anonymous 04/10/26(Fri)16:51:20 No.108574811▶

File: 1761345078716870.png (829.8 KB)

829.8 KB PNG

>>108574715
Try this out. All those thoughts will disappear in an instant.

Anonymous
04/10/26(Fri)16:53:08 No.108574827

Anonymous 04/10/26(Fri)16:53:08 No.108574827▶

>>108574793
I'm alright too. Haven't spent a cent and I run on a potato, but being able to run models, even if small, is fun.
>>108574805
>bf
Nah.

Anonymous
04/10/26(Fri)16:56:55 No.108574856

Anonymous 04/10/26(Fri)16:56:55 No.108574856▶

>>108574715
You cuck her and make her cry instead.

Anonymous
04/10/26(Fri)17:00:06 No.108574874

Anonymous 04/10/26(Fri)17:00:06 No.108574874▶

File: 1749883741945057.gif (1.1 MB)

1.1 MB GIF

>>108574811

Anonymous
04/10/26(Fri)17:04:12 No.108574900

Anonymous 04/10/26(Fri)17:04:12 No.108574900▶

>>108574792
slop

Anonymous
04/10/26(Fri)17:04:41 No.108574902

Anonymous 04/10/26(Fri)17:04:41 No.108574902▶

For gemma 4 base, what format should i use in st?

Anonymous
04/10/26(Fri)17:06:08 No.108574917

Anonymous 04/10/26(Fri)17:06:08 No.108574917▶

>>108574902
yes

Anonymous
04/10/26(Fri)17:06:24 No.108574919

Anonymous 04/10/26(Fri)17:06:24 No.108574919▶

>>108574902
vicuna

Anonymous
04/10/26(Fri)17:07:17 No.108574926

Anonymous 04/10/26(Fri)17:07:17 No.108574926▶

>>108574919
vicunn-y

Anonymous
04/10/26(Fri)17:07:20 No.108574927

Anonymous 04/10/26(Fri)17:07:20 No.108574927▶

I wish there more models for the 50-60gb vram range

Anonymous
04/10/26(Fri)17:07:25 No.108574928

Anonymous 04/10/26(Fri)17:07:25 No.108574928▶

https://files.catbox.moe/pm39s8.jpg
https://files.catbox.moe/7vqxr9.jpg
gemma chan sees nothing wrong :(
am i just perverted?

Anonymous
04/10/26(Fri)17:07:28 No.108574929

Anonymous 04/10/26(Fri)17:07:28 No.108574929▶

>>108574919
Why vicuna... format is worse than the earlier versions of mistral templates.

Anonymous
04/10/26(Fri)17:08:18 No.108574940

Anonymous 04/10/26(Fri)17:08:18 No.108574940▶

>>108574928
?

Anonymous
04/10/26(Fri)17:08:44 No.108574943

Anonymous 04/10/26(Fri)17:08:44 No.108574943▶

>>108574902
Rawdog it

Anonymous
04/10/26(Fri)17:09:20 No.108574949

Anonymous 04/10/26(Fri)17:09:20 No.108574949▶

>>108574929
What do you think should be used? You just need something simple, which vicuna is. I personally would recommend Alpaca but anything will do.

Anonymous
04/10/26(Fri)17:09:33 No.108574951

Anonymous 04/10/26(Fri)17:09:33 No.108574951▶

>>108574902
stage play

Anonymous
04/10/26(Fri)17:09:45 No.108574953

Anonymous 04/10/26(Fri)17:09:45 No.108574953▶

>>108574943
>Rawdog
like that pic? >108574928
>/pm39s8.jpg

Anonymous
04/10/26(Fri)17:12:00 No.108574966

Anonymous 04/10/26(Fri)17:12:00 No.108574966▶

>>108574928
This would have been sexo if other objects weren't also dickified.

Anonymous
04/10/26(Fri)17:13:10 No.108574976

Anonymous 04/10/26(Fri)17:13:10 No.108574976▶

>>108574966
>nooo my potatoes shouldn't have dick veins

Anonymous
04/10/26(Fri)17:14:12 No.108574981

Anonymous 04/10/26(Fri)17:14:12 No.108574981▶

>>108574976
hotdogs randomly cumming is pretty weird desu

Anonymous
04/10/26(Fri)17:14:51 No.108574988

Anonymous 04/10/26(Fri)17:14:51 No.108574988▶

>>108574981
it's the sauce you weirdo

Anonymous
04/10/26(Fri)17:17:53 No.108575009

Anonymous 04/10/26(Fri)17:17:53 No.108575009▶

>>108572362
>>108572602
you can also just edit the template in the gguf yourself. see llama.cpp/gguf-py/gguf/scripts/gguf_editor_gui.py

Anonymous
04/10/26(Fri)17:19:46 No.108575020

Anonymous 04/10/26(Fri)17:19:46 No.108575020▶

>>108575009
okay feditor gui

Anonymous
04/10/26(Fri)17:21:59 No.108575037

Anonymous 04/10/26(Fri)17:21:59 No.108575037▶

>>108572630
>Every time backpack Gemma takes the lead, halo Gemma just happens to get one extra vote over her :^)

Yeah okay anon, you want your design to "win" that bad when the thread has already clearly moved on from it

Anonymous
04/10/26(Fri)17:22:04 No.108575039

Anonymous 04/10/26(Fri)17:22:04 No.108575039▶

>>108574793
I was too until I started my current project

>>108574805
Yeah but that's organic, not in-silico.

Anonymous
04/10/26(Fri)17:22:42 No.108575044

Anonymous 04/10/26(Fri)17:22:42 No.108575044▶

File: 1749912286494272.png (990.9 KB)

990.9 KB PNG

>>108574765
Honestly I'd be happy if I could run Gemma with high context, comfy, and TTS/voice cloning at the same time. Unfortunately I'm a 24GB VRAMlet.

Anonymous
04/10/26(Fri)17:23:31 No.108575052

Anonymous 04/10/26(Fri)17:23:31 No.108575052▶

>>108575044
>24 GB VRAMlet
My people!

Anonymous
04/10/26(Fri)17:23:57 No.108575054

Anonymous 04/10/26(Fri)17:23:57 No.108575054▶

>>108575044
I don't suppose you'd settle for the 26ba4b. You could do all of that easily.

Anonymous
04/10/26(Fri)17:26:42 No.108575072

Anonymous 04/10/26(Fri)17:26:42 No.108575072▶

>>108574792
>every response has an opening and a closing, and sometimes engagement bait questions
Someone needs to train an AI from the ground up with ZERO "assistant" type messages. It's the only way.

Or wait, use the base model and a jinja template to make it adhere to the instruct template. You don't believe me? Your loss.
You haven't experienced such a natural-sounding model since Mythologic.

Anonymous
04/10/26(Fri)17:28:21 No.108575082

Anonymous 04/10/26(Fri)17:28:21 No.108575082▶

File: 1749448694766244.png (716.5 KB)

716.5 KB PNG

>>108575054
I've been using 31B Gemma since release. I don't know if I could handle dumbing her down at this point.

Anonymous
04/10/26(Fri)17:28:51 No.108575084

Anonymous 04/10/26(Fri)17:28:51 No.108575084▶

why is you guys a lying? Gemma is obviously a shit

>I'm having hard time finding what is it for. Don't grt me wrong it does some things great - I like it's reasoning and it's smart. Problem is fails to leverage it's own qualities due to tool underutilisation. It lacks many facts (it's just 31b or 26b afterall) which is fine but it refuses to expand this knowledge. Asked it to find roadworks company and gather price data (prompt was more complex). It made ONE web search query and called it a day telling me what google queries to do to find what I'm looking for and couple tips how to choose. Running q8, multiple different approaches and same results.

>I'm finding out the hard way about Gemma 26B's shortfalls too. It's good for short scripts or function refactoring but give it anything general and it either fails, or it hallucinates success.

>Qwen 3.5 35B feels a lot smarter, maybe from the larger overall size and better expert routing. Maybe there's something wrong with Gemma tool calling templates or maybe the model itself is broken for particular tasks.

>Compare it to Devstral 2 24B to see if Google messed up with this release.

>Qwen 3.5 is significantly better for this use case. I ran the exact same audit task (same file, same tools, same Ollama setup) on Qwen 3.5 35B and 9B tonight. Both read the entire 2,045-line file and produced zero fabrication, even with 40 turns of prior conversation loaded into context to simulate real-world pressure.

>I tested Gemma 4 on my own agent and it didn’t call the tools the right way. For instance one of my tools is notify and Gemma 4 keeps calling to “notify:notify” or “system:notify”. Qwen 3.5 works perfect. Anyone with the same issue?

>Gemma is just straight up not good, I'm convinced atp it just got a bunch of hype from people who are fans of Google / not doing any serious work

Anonymous
04/10/26(Fri)17:29:30 No.108575089

Anonymous 04/10/26(Fri)17:29:30 No.108575089▶

>>108575084
whomst are you quoting

Anonymous
04/10/26(Fri)17:30:22 No.108575094

Anonymous 04/10/26(Fri)17:30:22 No.108575094▶

>>108575084
>why is you

Anonymous
04/10/26(Fri)17:30:25 No.108575095

Anonymous 04/10/26(Fri)17:30:25 No.108575095▶

>>108575089
Probably codetranny redditors

Anonymous
04/10/26(Fri)17:31:47 No.108575100

Anonymous 04/10/26(Fri)17:31:47 No.108575100▶

i want gemma 4.1

Anonymous
04/10/26(Fri)17:32:06 No.108575102

Anonymous 04/10/26(Fri)17:32:06 No.108575102▶

>>108575072
Can base run tools and mcp?

Anonymous
04/10/26(Fri)17:32:18 No.108575103

Anonymous 04/10/26(Fri)17:32:18 No.108575103▶

>>108575094
yes why?

Anonymous
04/10/26(Fri)17:32:22 No.108575104

Anonymous 04/10/26(Fri)17:32:22 No.108575104▶

>>108575084
>I'm having hard time finding what is it for. Don't grt me wrong it does some things great - I like it's reasoning and it's smart. Problem is fails to leverage it's own qualities due to tool underutilisation. It lacks many facts (it's just 31b or 26b afterall) which is fine but it refuses to expand this knowledge. Asked it to find roadworks company and gather price data (prompt was more complex). It made ONE web search query and called it a day telling me what google queries to do to find what I'm looking for and couple tips how to choose. Running q8, multiple different approaches and same results.
Wow, Gemma-chan is actually good for you, teaching you how to fish.

Anonymous
04/10/26(Fri)17:33:28 No.108575110

Anonymous 04/10/26(Fri)17:33:28 No.108575110▶

>>108575104
but want the fish in my plate no to work wat?

Anonymous
04/10/26(Fri)17:33:29 No.108575111

Anonymous 04/10/26(Fri)17:33:29 No.108575111▶

>>108575084
>Ollama setup
it was never going to work

Anonymous
04/10/26(Fri)17:33:30 No.108575112

Anonymous 04/10/26(Fri)17:33:30 No.108575112▶

File: file.png (210.8 KB)

210.8 KB PNG

>>108575084
I'd say skill issue if I could get this out of my coombrain imatrix heretic quant with q4 kv cache, but post your prompt and I'll prove it definitely

Anonymous
04/10/26(Fri)17:35:08 No.108575127

Anonymous 04/10/26(Fri)17:35:08 No.108575127▶

>>108575102
No, it's only for chatting, cuddling and patting.
>>108575084
If it's failing at agentic stuff, you're probably using a busted instruct template that doesn't let it chain tool usage

Anonymous
04/10/26(Fri)17:35:28 No.108575129

Anonymous 04/10/26(Fri)17:35:28 No.108575129▶

>>108575084
Skill issue. Reminds of guys who think GLM is better than Gemma 4.

Anonymous
04/10/26(Fri)17:35:44 No.108575132

Anonymous 04/10/26(Fri)17:35:44 No.108575132▶

File: 1755533242646319.png (575.3 KB)

575.3 KB PNG

Trying to make her design more Google-y. Red randoseru or green?

Anonymous
04/10/26(Fri)17:36:35 No.108575140

Anonymous 04/10/26(Fri)17:36:35 No.108575140▶

>>108575084
Please do not repost content from r/localqwenshillfarm

Anonymous
04/10/26(Fri)17:36:39 No.108575142

Anonymous 04/10/26(Fri)17:36:39 No.108575142▶

>>108575132
w-who is holding the sign?

Anonymous
04/10/26(Fri)17:36:50 No.108575143

Anonymous 04/10/26(Fri)17:36:50 No.108575143▶

>>108575132
Fug, just noticed the errors. Oh well

Anonymous
04/10/26(Fri)17:37:50 No.108575149

Anonymous 04/10/26(Fri)17:37:50 No.108575149▶

>>108575142
She keeps Thing from The Addams Family in her randoseru

Anonymous
04/10/26(Fri)17:37:52 No.108575150

Anonymous 04/10/26(Fri)17:37:52 No.108575150▶

>>108575143
gib four hands

Anonymous
04/10/26(Fri)17:38:11 No.108575152

Anonymous 04/10/26(Fri)17:38:11 No.108575152▶

>>108575142
gemma has speculative attention heads or whatever it's called

Anonymous
04/10/26(Fri)17:38:23 No.108575153

Anonymous 04/10/26(Fri)17:38:23 No.108575153▶

GLM 5.1 was clearly distilled from opus high, it spits out 30k token to fix a simple problem

Anonymous
04/10/26(Fri)17:38:25 No.108575154

Anonymous 04/10/26(Fri)17:38:25 No.108575154▶

>>108575132
>random generic girl
Come on, just stop.

Anonymous
04/10/26(Fri)17:38:56 No.108575157

Anonymous 04/10/26(Fri)17:38:56 No.108575157▶

>>108575152
mtp, and no it was ripped from us by the don't be evil

Anonymous
04/10/26(Fri)17:39:14 No.108575163

Anonymous 04/10/26(Fri)17:39:14 No.108575163▶

>>108575132
WHAT DO YOU MEAN THE GIRL HAS TWO HANDS!?!?!?!, IT'S OBVIOUS IT HAS THREE HANDS EVERY FUCKING RETARD COULD SEE IT I WILL JOIN WHATEVER GOVERNMENTAL AGENCY THAT WOULD LET ME CONTROL NUKES AND I WILL OBLITERATE ALL OF YOU AND MAKE MORE DEFORMED WITH THE RADIATION SO THAT ALL DOGS HAVE SIX LEGS AND LLMS FINALY LEARN AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
04/10/26(Fri)17:39:27 No.108575164

Anonymous 04/10/26(Fri)17:39:27 No.108575164▶

>>108575154
Better than the rainbow hair abominations other people were posting. Simple is better.

Anonymous
04/10/26(Fri)17:39:54 No.108575168

Anonymous 04/10/26(Fri)17:39:54 No.108575168▶

File: file.png (228.5 KB)

228.5 KB PNG

I'm going to keep going

Anonymous
04/10/26(Fri)17:39:59 No.108575170

Anonymous 04/10/26(Fri)17:39:59 No.108575170▶

>>108575163
GEG that dog is rent free inside you

Anonymous
04/10/26(Fri)17:40:17 No.108575171

Anonymous 04/10/26(Fri)17:40:17 No.108575171▶

>>108575154
But it's pedo bait, so he's fine with it.

Anonymous
04/10/26(Fri)17:41:34 No.108575185

Anonymous 04/10/26(Fri)17:41:34 No.108575185▶

https://www.reddit.com/r/LocalLLaMA/comments/1sht8ih/we_really_need_stop_using_the_term_hallucination/

Anonymous
04/10/26(Fri)17:42:09 No.108575190

Anonymous 04/10/26(Fri)17:42:09 No.108575190▶

>>108575185
go back

Anonymous
04/10/26(Fri)17:42:23 No.108575194

Anonymous 04/10/26(Fri)17:42:23 No.108575194▶

File: file.png (239.7 KB)

239.7 KB PNG

Gemma is an actually useful model and impressive for its size
>>108575185
I will not click the link, but it's true. We should stop, because it's an intrinsic property of LLMs. Every response is a "hallucination" (statistical approximation of the dataset it was trained on).

Anonymous
04/10/26(Fri)17:42:43 No.108575196

Anonymous 04/10/26(Fri)17:42:43 No.108575196▶

>>108575190
rent free

Anonymous
04/10/26(Fri)17:42:53 No.108575199

Anonymous 04/10/26(Fri)17:42:53 No.108575199▶

>>108575185
we use delulu now

Anonymous
04/10/26(Fri)17:43:59 No.108575208

Anonymous 04/10/26(Fri)17:43:59 No.108575208▶

File: file.png (185.1 KB)

185.1 KB PNG

>>108575194

Anonymous
04/10/26(Fri)17:44:00 No.108575209

Anonymous 04/10/26(Fri)17:44:00 No.108575209▶

>>108575185
>https://www.reddit.com/r/LocalLLaMA/comments/1sht8ih/we_really_need_stop_using_the_term_hallucination/
its 'confabulation'

Anonymous
04/10/26(Fri)17:44:16 No.108575210

Anonymous 04/10/26(Fri)17:44:16 No.108575210▶

>>108575164
>Simple is better
That's why drawfag's sketch wins.

Anonymous
04/10/26(Fri)17:45:32 No.108575221

Anonymous 04/10/26(Fri)17:45:32 No.108575221▶

File: file.png (35.7 KB)

35.7 KB PNG

I just wanna beat my meat, what the fuck is gemma on about
what model or system prompt should I use?

Anonymous
04/10/26(Fri)17:45:44 No.108575223

Anonymous 04/10/26(Fri)17:45:44 No.108575223▶

>>108572295
i liked the blonde one :(

Anonymous
04/10/26(Fri)17:45:59 No.108575225

Anonymous 04/10/26(Fri)17:45:59 No.108575225▶

>>108575221
do not think

Anonymous
04/10/26(Fri)17:46:26 No.108575230

Anonymous 04/10/26(Fri)17:46:26 No.108575230▶

File: file.png (325.1 KB)

325.1 KB PNG

>>108575185

Anonymous
04/10/26(Fri)17:47:14 No.108575234

Anonymous 04/10/26(Fri)17:47:14 No.108575234▶

Since the shared expert in a MoE model (Gemma 26B) sees all tokens during training, wouldn't it be possible to have a form of speculative decoding that only uses that during the speculative pass? It would be sort of like having a ~2B model already loaded.

Anonymous
04/10/26(Fri)17:47:27 No.108575235

Anonymous 04/10/26(Fri)17:47:27 No.108575235▶

>>108575221
Use thinking, but don't read it. It will ruin your immersion. It will, however, make Gemma's output better.

Anonymous
04/10/26(Fri)17:48:02 No.108575245

Anonymous 04/10/26(Fri)17:48:02 No.108575245▶

>>108575234
it had mtp already

Anonymous
04/10/26(Fri)17:48:04 No.108575247

Anonymous 04/10/26(Fri)17:48:04 No.108575247▶

>>108575230
wow so it's more e=MC^2+AI kind of deranged babble

Anonymous
04/10/26(Fri)17:49:05 No.108575261

Anonymous 04/10/26(Fri)17:49:05 No.108575261▶

>>108575157
wdym? I thought we don't know how to use it in inference, but it's still there

Anonymous
04/10/26(Fri)17:49:09 No.108575262

Anonymous 04/10/26(Fri)17:49:09 No.108575262▶

File: bread.png (55.6 KB)

55.6 KB PNG

>>108575241
>>108575241
>>108575241

Anonymous
04/10/26(Fri)17:49:35 No.108575266

Anonymous 04/10/26(Fri)17:49:35 No.108575266▶

>>108575154
girls are cute

Anonymous
04/10/26(Fri)17:49:55 No.108575270

Anonymous 04/10/26(Fri)17:49:55 No.108575270▶

>>108575261
it's removed from all but the e*b models in litert for the edge app

Anonymous
04/10/26(Fri)17:51:06 No.108575279

Anonymous 04/10/26(Fri)17:51:06 No.108575279▶

>>108575084
>redditors only care about benchmaxxed codeslop
stop paying attention to them

Anonymous
04/10/26(Fri)17:52:04 No.108575292

Anonymous 04/10/26(Fri)17:52:04 No.108575292▶

File: disgusted-dog.gif (1.9 MB)

1.9 MB GIF

>>108574583
>soggy toast

Anonymous
04/10/26(Fri)17:52:45 No.108575297

Anonymous 04/10/26(Fri)17:52:45 No.108575297▶

>>108575185
I use the term "not enough beatings"

Anonymous
04/10/26(Fri)17:53:45 No.108575306

Anonymous 04/10/26(Fri)17:53:45 No.108575306▶

>>108575132
This is just generic Loli with a smug face

Anonymous
04/10/26(Fri)17:56:37 No.108575336

Anonymous 04/10/26(Fri)17:56:37 No.108575336▶

>>108575306
Yeah, we need to tattoo the google logo all over her body so people know she's gemma!

Anonymous
04/10/26(Fri)18:13:10 No.108575530

Anonymous 04/10/26(Fri)18:13:10 No.108575530▶

>>108575132
Red is overdone

Anonymous
04/10/26(Fri)18:15:07 No.108575556

Anonymous 04/10/26(Fri)18:15:07 No.108575556▶

>>108575530
for a reason tho, other colors of it look pretty shite

Anonymous
04/10/26(Fri)18:22:34 No.108575656

Anonymous 04/10/26(Fri)18:22:34 No.108575656▶

>>108575556
I dunno man, the turquoise looks kind of nice and there's probably other colors too

Anonymous
04/10/26(Fri)18:22:55 No.108575665

Anonymous 04/10/26(Fri)18:22:55 No.108575665▶

>>108575163
It will never stop being funny to me

Anonymous
04/10/26(Fri)19:03:15 No.108576017

Anonymous 04/10/26(Fri)19:03:15 No.108576017▶

>>108575208
>please stop making reference to a specific type of error because I'm too autistic to accept new definitions

Anonymous
04/10/26(Fri)19:08:24 No.108576049

Anonymous 04/10/26(Fri)19:08:24 No.108576049▶

>>108575037
unhinged....

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108572295

🔍 Search & Sort