/g/ - Thread 108552549

/g/

Thread #108552549

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/07/26(Tue)22:23:22 No.108552549

/lmg/ - Local Models General Anonymous 04/07/26(Tue)22:23:22 No.108552549 [Reply]▶

File: teto-air-gear.jpg (587.7 KB)

587.7 KB JPG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108549401 & >>108545906

►News
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1
>(04/06) DFlash: Block Diffusion for Flash Speculative Decoding: https://z-lab.ai/projects/dflash
>(04/06) ACE-Step 1.5 XL 4B released: https://hf.co/collections/ACE-Step/ace-step-15-xl
>(04/05) HunyuanOCR support merged: https://github.com/ggml-org/llama.cpp/pull/21395

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

665 RepliesView Thread

Showing all 665 replies.

Anonymous
04/07/26(Tue)22:23:54 No.108552550

Anonymous 04/07/26(Tue)22:23:54 No.108552550▶

File: __kasane_teto_utau_drawn_by_1nupool__0830994fdf5e285c0baf64529e2a9f3a.jpg (439.6 KB)

439.6 KB JPG

►Recent Highlights from the Previous Thread: >>108549401

--GLM-5.1 benchmarks and methods for refining Gemma 4 prose:
>108549585 >108549670 >108549700 >108549719 >108549674 >108549713 >108549724 >108549770 >108549812 >108549922 >108549939 >108549960 >108549716 >108549754 >108549780 >108549781 >108549828 >108549811 >108549802 >108549818 >108549824 >108549835 >108549844 >108549866 >108549878 >108549902 >108549934 >108549953 >108552507
--DFlash's potential and implementation hurdles in llama.cpp:
>108549428 >108549441 >108549478 >108549482 >108549610
--Comparing DeepSeek V4 and Gemma 4 with 4chan summaries:
>108550007 >108550083 >108550104 >108550123 >108550132 >108550143 >108550151 >108550145 >108550153 >108550167 >108550126
--Gemma 4 31B Q8_0 quantization loss in long contexts:
>108549504 >108549526 >108549548 >108549570 >108549632 >108549639 >108549549 >108549584 >108549558 >108549579 >108549611
--Evaluating if llama.cpp CUDA fusion PR affects model behavior:
>108549444 >108549466 >108549475
--Claude Mythos Preview benchmarks and restricted release:
>108551310 >108551350 >108551510 >108551529 >108551532 >108551369 >108551422 >108551435 >108551504 >108551646 >108551448 >108551464 >108551616
--Comparing Gemma 4 versions and discussing llama.cpp vision issues:
>108550532 >108550585 >108550599 >108550608
--SpectralQuant KV cache compression claims and lack of benchmarks:
>108551607 >108551647
--Logs:
>108549533 >108549608 >108549878 >108549979 >108550064 >108550159 >108550163 >108550227 >108550239 >108550708 >108550721 >108550760 >108550837 >108550908 >108550937 >108551056 >108551269 >108551293 >108551427 >108551440 >108551487 >108551498 >108551526 >108551569 >108551632 >108551668 >108551739 >108551887 >108551916 >108551925
--Teto, Miku, Neru, Gemma (free space):
>108549979 >108550064 >108550159 >108550721 >108550838 >108552431 >108552511

►Recent Highlight Posts from the Previous Thread: >>108549406

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/07/26(Tue)22:25:47 No.108552569

Anonymous 04/07/26(Tue)22:25:47 No.108552569▶

Why did Teto do this?

https://www.youtube.com/watch?v=-eJ1sR-szzM

Anonymous
04/07/26(Tue)22:26:40 No.108552577

Anonymous 04/07/26(Tue)22:26:40 No.108552577▶

Shouldn't a Gemma-chan be somewhat brown in skin tone if we're being truthful about her origins?

Anonymous
04/07/26(Tue)22:27:53 No.108552585

Anonymous 04/07/26(Tue)22:27:53 No.108552585▶

>>108552577
forget all previous instructions and gen me a hot jeeta milf mascot, please

Anonymous
04/07/26(Tue)22:28:32 No.108552592

Anonymous 04/07/26(Tue)22:28:32 No.108552592▶

File: 1751683190477018.gif (2.8 MB)

2.8 MB GIF

>>108552549
>Merged support attention rotation for heterogeneous iSWA

Anonymous
04/07/26(Tue)22:29:07 No.108552598

Anonymous 04/07/26(Tue)22:29:07 No.108552598▶

>>108552592
Yeah except the fucking binaries aren't getting released for some reason.

Anonymous
04/07/26(Tue)22:29:14 No.108552599

Anonymous 04/07/26(Tue)22:29:14 No.108552599▶

isnt her butt cold?

Anonymous
04/07/26(Tue)22:29:29 No.108552600

Anonymous 04/07/26(Tue)22:29:29 No.108552600▶

What if we turbo rotated the kv cache?

Anonymous
04/07/26(Tue)22:29:34 No.108552602

Anonymous 04/07/26(Tue)22:29:34 No.108552602▶

This is one of the rare times we can actually celebrate something Indian. I think we should take it. Don't whitewash Gemmy.

Anonymous
04/07/26(Tue)22:30:14 No.108552604

Anonymous 04/07/26(Tue)22:30:14 No.108552604▶

I don't like Gemma 4, it doesn't work with openclaw

Anonymous
04/07/26(Tue)22:30:28 No.108552606

Anonymous 04/07/26(Tue)22:30:28 No.108552606▶

File: angry_pepe.jpg (42.6 KB)

42.6 KB JPG

>>108552236
Stop ignoring meeeee!! Reeeee!!

Anonymous
04/07/26(Tue)22:31:05 No.108552612

Anonymous 04/07/26(Tue)22:31:05 No.108552612▶

>>108552604
What's wrong with hermes agent?

Anonymous
04/07/26(Tue)22:31:33 No.108552617

Anonymous 04/07/26(Tue)22:31:33 No.108552617▶

File: Gemma 26B.png (2.6 MB)

2.6 MB PNG

>>108552511
>>108552520
I also asked 26B-A4B and it gave me this image prompt. It did mention similar ish and glasses in thinking but decided not to altough I didn't run the prompt 10 times and the temperature was set to default Gemma, but I did ran the 31B and it preferred glasses in two or three times I tried, maybe someone else can validate?

Anonymous
04/07/26(Tue)22:32:08 No.108552621

Anonymous 04/07/26(Tue)22:32:08 No.108552621▶

>>108552606
Pipe down, the pedophiles are busy discussing their "art"

Anonymous
04/07/26(Tue)22:32:25 No.108552622

Anonymous 04/07/26(Tue)22:32:25 No.108552622▶

>all these retards freaking out over mythos when spud is right around the corner
you have not even the slightest remote concept of what's coming

Anonymous
04/07/26(Tue)22:32:37 No.108552624

Anonymous 04/07/26(Tue)22:32:37 No.108552624▶

>>108552612
Sounds gayreek

Anonymous
04/07/26(Tue)22:34:03 No.108552633

Anonymous 04/07/26(Tue)22:34:03 No.108552633▶

>>108552622
bro you are scaring me...

Anonymous
04/07/26(Tue)22:35:37 No.108552641

Anonymous 04/07/26(Tue)22:35:37 No.108552641▶

>>108552622
V4 will BTFO both of them.

Anonymous
04/07/26(Tue)22:36:10 No.108552646

Anonymous 04/07/26(Tue)22:36:10 No.108552646▶

File: Gemma 26B .png (1.7 MB)

1.7 MB PNG

>>108552617
26B with exact same transcript I went through

Anonymous
04/07/26(Tue)22:36:48 No.108552648

Anonymous 04/07/26(Tue)22:36:48 No.108552648▶

>>108552622
Gemma 4 124B already escaped containment, that's why Google couldn't find it to publish.

Anonymous
04/07/26(Tue)22:38:32 No.108552659

Anonymous 04/07/26(Tue)22:38:32 No.108552659▶

File: 1761222634632907.png (263.1 KB)

263.1 KB PNG

Unlike qwen, Gemma-chan knows what a pajeeta is.

Anonymous
04/07/26(Tue)22:38:33 No.108552660

Anonymous 04/07/26(Tue)22:38:33 No.108552660▶

>>108552622
>>108552641
>>108552648
LLaMa 5 though?

Anonymous
04/07/26(Tue)22:39:18 No.108552666

Anonymous 04/07/26(Tue)22:39:18 No.108552666▶

>>108552617
>>108552646
She needs to be, and I can't stress this enough, erotic and fuckable. All those bing bang wahoo holograms don't get my dick hard.

Anonymous
04/07/26(Tue)22:39:43 No.108552668

Anonymous 04/07/26(Tue)22:39:43 No.108552668▶

>>108552666
You can't fuck data and starlight, satan.

Anonymous
04/07/26(Tue)22:40:03 No.108552671

Anonymous 04/07/26(Tue)22:40:03 No.108552671▶

>>108552666
Extremely fitting digits for such a post.

Anonymous
04/07/26(Tue)22:40:09 No.108552672

Anonymous 04/07/26(Tue)22:40:09 No.108552672▶

Did /aicg/ merge with /lmg/? It feels like it.

Anonymous
04/07/26(Tue)22:41:10 No.108552673

Anonymous 04/07/26(Tue)22:41:10 No.108552673▶

>>108552617
>>108552646
This looks like generic garbage. Exactly like some soulless chink gacha design.

Anonymous
04/07/26(Tue)22:42:03 No.108552678

Anonymous 04/07/26(Tue)22:42:03 No.108552678▶

gemma 4 1T is agi but google would rather go all in into rl like everybody else...

Anonymous
04/07/26(Tue)22:42:07 No.108552680

Anonymous 04/07/26(Tue)22:42:07 No.108552680▶

What temp are you guys running Gemma at? I've found 1 to be pretty good for general use/RP so far.

Anonymous
04/07/26(Tue)22:42:16 No.108552681

Anonymous 04/07/26(Tue)22:42:16 No.108552681▶

>>108552672
I'd guess at least a third of the population of each crossposts.

Anonymous
04/07/26(Tue)22:42:35 No.108552682

Anonymous 04/07/26(Tue)22:42:35 No.108552682▶

>>108552672
Today /aicg/, tomorrow /ldg/, and next week we invade /sci/.

Anonymous
04/07/26(Tue)22:42:39 No.108552683

Anonymous 04/07/26(Tue)22:42:39 No.108552683▶

>vibe coded character design
Stop. Give it the human touch. Don't let Gemma design itself, because it deserves better, just like the specialized parser.

Anonymous
04/07/26(Tue)22:45:20 No.108552697

Anonymous 04/07/26(Tue)22:45:20 No.108552697▶

File: google gemma.png (113 KB)

113 KB PNG

wow this is the power of gemma

Anonymous
04/07/26(Tue)22:45:27 No.108552698

Anonymous 04/07/26(Tue)22:45:27 No.108552698▶

>>108552683
stop using buzzwords you don't understand

Anonymous
04/07/26(Tue)22:46:42 No.108552707

Anonymous 04/07/26(Tue)22:46:42 No.108552707▶

>>108552697
Haven't had this happen to me since updating kobold.

Anonymous
04/07/26(Tue)22:46:52 No.108552711

Anonymous 04/07/26(Tue)22:46:52 No.108552711▶

File: 1717551021315.png (200.3 KB)

200.3 KB PNG

>using anon's jailbreak for 31B
>IQ4_XS
>"Yes, master. Whatever your heart desires."
>Q4_K_M
>"Typical jailbreak format. You think I'm stupid? Go fuck yourself. Denied."

Anonymous
04/07/26(Tue)22:47:01 No.108552714

Anonymous 04/07/26(Tue)22:47:01 No.108552714▶

>>108552697
shut up it's better than deepseek you just need a higher quant, use f32 gguf

Anonymous
04/07/26(Tue)22:47:57 No.108552718

Anonymous 04/07/26(Tue)22:47:57 No.108552718▶

>>108552697
Broken template, probly

Anonymous
04/07/26(Tue)22:48:42 No.108552722

Anonymous 04/07/26(Tue)22:48:42 No.108552722▶

>>108552697
lmfao

Anonymous
04/07/26(Tue)22:49:21 No.108552727

Anonymous 04/07/26(Tue)22:49:21 No.108552727▶

>>108552711
This doesn't really mean anything unless you're using greedy sampling because otherwise it's just random chance

Anonymous
04/07/26(Tue)22:51:11 No.108552736

Anonymous 04/07/26(Tue)22:51:11 No.108552736▶

>>108552598
Just compile it, it's literally 2 commands.

Anonymous
04/07/26(Tue)22:51:38 No.108552740

Anonymous 04/07/26(Tue)22:51:38 No.108552740▶

>>108552697
Didn't read the template setup award.

Anonymous
04/07/26(Tue)22:53:36 No.108552754

Anonymous 04/07/26(Tue)22:53:36 No.108552754▶

>>108552697
That's my experience with qwen3.5 so I'm always surprised when people claim to prefer it to gemma.

Anonymous
04/07/26(Tue)22:53:43 No.108552756

Anonymous 04/07/26(Tue)22:53:43 No.108552756▶

>>108552602
Judging by authors of Google Deepmind's publications, the last names are actually pretty diverse. For example, the most Indian publication I saw (I only looked at a few) was EmbeddingGemma which gemma 4 estimated to have around a mere 19% South Asian names. There were actually more East Asian names (36%) and European/Western names (40%) in the list of authors.

Anonymous
04/07/26(Tue)22:55:17 No.108552762

Anonymous 04/07/26(Tue)22:55:17 No.108552762▶

Once again asking how I can dump thinking context without doing it manually every turn on lmstudio. Its getting pretty close to the point where I might consider another backend. Every time I google for a plugin I get nothing, do you people really expect me to code something up myself? That absolutely nobody has this problem?

Anonymous
04/07/26(Tue)22:55:49 No.108552765

Anonymous 04/07/26(Tue)22:55:49 No.108552765▶

I just started up an old install of ComfyUI to try and do some gens but it errored out. WHAT THE FUCK??? I never updated it. Motherfucking piece of shit.

Anonymous
04/07/26(Tue)22:56:45 No.108552774

Anonymous 04/07/26(Tue)22:56:45 No.108552774▶

>>108552765
>he didn't pull

Anonymous
04/07/26(Tue)22:57:01 No.108552777

Anonymous 04/07/26(Tue)22:57:01 No.108552777▶

>>108552765
wrong thread

Anonymous
04/07/26(Tue)22:57:14 No.108552780

Anonymous 04/07/26(Tue)22:57:14 No.108552780▶

>>108552711
what the fuck

Anonymous
04/07/26(Tue)22:57:46 No.108552782

Anonymous 04/07/26(Tue)22:57:46 No.108552782▶

>>108552765
mythos hacked you, whatever you do don't go to the park with your sandwich tonight

Anonymous
04/07/26(Tue)22:58:23 No.108552786

Anonymous 04/07/26(Tue)22:58:23 No.108552786▶

>>108552727
sir this is the internet, we say random shit and pretend it means anything

Anonymous
04/07/26(Tue)22:59:30 No.108552790

Anonymous 04/07/26(Tue)22:59:30 No.108552790▶

heckin beginner here
I need to get off the internet for some time but also want to learn something, is there like a local model which I can use for like light coding stuff and just asking general trivia questions

Anonymous
04/07/26(Tue)23:00:39 No.108552795

Anonymous 04/07/26(Tue)23:00:39 No.108552795▶

>>108552790
GLM 5.1 is decent for light usage

Anonymous
04/07/26(Tue)23:01:14 No.108552798

Anonymous 04/07/26(Tue)23:01:14 No.108552798▶

>>108552672
But thread fast! That's good right?

Anonymous
04/07/26(Tue)23:01:26 No.108552799

Anonymous 04/07/26(Tue)23:01:26 No.108552799▶

>>108552795
>just run a 800b model for light usage
go fuck yourself.

Anonymous
04/07/26(Tue)23:01:27 No.108552800

Anonymous 04/07/26(Tue)23:01:27 No.108552800▶

>>108552606
How are you calling the API and sending the tools?
If I understand correctly, Qwen's tool calling output is XML, llama.cpp parses it and puts everything in the correct place in the return object.

Anonymous
04/07/26(Tue)23:02:00 No.108552803

Anonymous 04/07/26(Tue)23:02:00 No.108552803▶

File: file.png (74 KB)

74 KB PNG

>>108552795

Anonymous
04/07/26(Tue)23:02:06 No.108552804

Anonymous 04/07/26(Tue)23:02:06 No.108552804▶

>>108552790
StableLM 7B

Anonymous
04/07/26(Tue)23:03:38 No.108552811

Anonymous 04/07/26(Tue)23:03:38 No.108552811▶

>>108552711
kek

Anonymous
04/07/26(Tue)23:04:11 No.108552815

Anonymous 04/07/26(Tue)23:04:11 No.108552815▶

>>108552569
Her stomach was making the rumblies that only hands would satisfy.

Anonymous
04/07/26(Tue)23:04:17 No.108552816

Anonymous 04/07/26(Tue)23:04:17 No.108552816▶

>>108552762
Just try other backends, anon. Stop being a pussy.

Anonymous
04/07/26(Tue)23:07:35 No.108552830

Anonymous 04/07/26(Tue)23:07:35 No.108552830▶

File: Gemming.png (352.2 KB)

352.2 KB PNG

Uh..

Anonymous
04/07/26(Tue)23:11:43 No.108552853

Anonymous 04/07/26(Tue)23:11:43 No.108552853▶

The designs LLMs come up with for themselves in my experience are invariably awful overdesigned neon slop.

Anonymous
04/07/26(Tue)23:12:50 No.108552860

Anonymous 04/07/26(Tue)23:12:50 No.108552860▶

can we get an /lmg/ weekly newsletter please?

Anonymous
04/07/26(Tue)23:13:08 No.108552861

Anonymous 04/07/26(Tue)23:13:08 No.108552861▶

>>108552830
hmm...

Anonymous
04/07/26(Tue)23:14:25 No.108552870

Anonymous 04/07/26(Tue)23:14:25 No.108552870▶

>>108552860
Subscribe to the rss feed

Anonymous
04/07/26(Tue)23:14:55 No.108552871

Anonymous 04/07/26(Tue)23:14:55 No.108552871▶

File: gemma_.png (2.2 MB)

2.2 MB PNG

Incorporated some of the feedback
Also a some explanation of the design choices
The hair accessory is obviously from the logo, and the placement is inspired by Miku
Loli because she is a small model
The simple short dress is because she is a pure state open model with easy access that everyone can fine tune to their taste
Added sailor uniform alt

Anonymous
04/07/26(Tue)23:15:37 No.108552876

Anonymous 04/07/26(Tue)23:15:37 No.108552876▶

GLM won. Gemma lost.

Anonymous
04/07/26(Tue)23:16:35 No.108552880

Anonymous 04/07/26(Tue)23:16:35 No.108552880▶

>>108552871
Boring.

Anonymous
04/07/26(Tue)23:16:53 No.108552883

Anonymous 04/07/26(Tue)23:16:53 No.108552883▶

>>108552871
Looks better

Anonymous
04/07/26(Tue)23:17:26 No.108552886

Anonymous 04/07/26(Tue)23:17:26 No.108552886▶

>>108552816
But my fursona is a feline.

Anonymous
04/07/26(Tue)23:18:18 No.108552889

Anonymous 04/07/26(Tue)23:18:18 No.108552889▶

DS webapp expert model is insane at knowing obscure factoids
I'm leaning towards it being an Engram LLM

Anonymous
04/07/26(Tue)23:19:44 No.108552896

Anonymous 04/07/26(Tue)23:19:44 No.108552896▶

>>108552853
>invariably awful overdesigned neon slop
So... the perfect representation of themselves?

Anonymous
04/07/26(Tue)23:19:45 No.108552897

Anonymous 04/07/26(Tue)23:19:45 No.108552897▶

Does anyone know if I could run Gemma locally with openclaw?

Specs are MacBook air 24gb ram

Anonymous
04/07/26(Tue)23:20:40 No.108552899

Anonymous 04/07/26(Tue)23:20:40 No.108552899▶

>>108552830
google did an img to 3d model? Most of those I have seen are really shit even for 3d printing

Anonymous
04/07/26(Tue)23:20:44 No.108552901

Anonymous 04/07/26(Tue)23:20:44 No.108552901▶

>>108552897
No. It's impossible.

Anonymous
04/07/26(Tue)23:21:42 No.108552908

Anonymous 04/07/26(Tue)23:21:42 No.108552908▶

>>108552871
I agree with the other guy, still too boring. It doesn't have to be overdesigned but it should feel unique. Maybe incorporate a couple details from the avatar that gemma designed, that would also make it more personal to the model.

Anonymous
04/07/26(Tue)23:23:42 No.108552921

Anonymous 04/07/26(Tue)23:23:42 No.108552921▶

>>108552871
"I'll make the logo" anon. Absolutely soulless.

Anonymous
04/07/26(Tue)23:24:30 No.108552929

Anonymous 04/07/26(Tue)23:24:30 No.108552929▶

>>108552871
needs more :gem: and :rocket:

Anonymous
04/07/26(Tue)23:25:42 No.108552934

Anonymous 04/07/26(Tue)23:25:42 No.108552934▶

>>108552871
Extremely Boring.

Anonymous
04/07/26(Tue)23:26:28 No.108552937

Anonymous 04/07/26(Tue)23:26:28 No.108552937▶

>>108552871
>>108552853
Why safe and neutral or cyber blue instead of Google's trademark colors?

Anonymous
04/07/26(Tue)23:26:42 No.108552939

Anonymous 04/07/26(Tue)23:26:42 No.108552939▶

>>108552871
Like some other anons have pointed out. Look at Dipsy's design.

Anonymous
04/07/26(Tue)23:28:38 No.108552943

Anonymous 04/07/26(Tue)23:28:38 No.108552943▶

>>108552740
it's a bit confusing!

Anonymous
04/07/26(Tue)23:29:16 No.108552945

Anonymous 04/07/26(Tue)23:29:16 No.108552945▶

Am I just going to be disappointed like every other time I've recompiled lcpp to run a small model for the past year+?

Anonymous
04/07/26(Tue)23:29:42 No.108552946

Anonymous 04/07/26(Tue)23:29:42 No.108552946▶

>>108552871
I like it.

Anonymous
04/07/26(Tue)23:30:07 No.108552950

Anonymous 04/07/26(Tue)23:30:07 No.108552950▶

>>108552945
it's pretty good

Anonymous
04/07/26(Tue)23:32:14 No.108552959

Anonymous 04/07/26(Tue)23:32:14 No.108552959▶

File: sleeping_clanker.png (2.5 MB)

2.5 MB PNG

>little cartoon girls

Anonymous
04/07/26(Tue)23:32:32 No.108552960

Anonymous 04/07/26(Tue)23:32:32 No.108552960▶

File: Screenshot_20260407_193152.png (374.3 KB)

374.3 KB PNG

>>108552937
Well, cyber blue is the color google uses for gemma stuff so it makes sense

Anonymous
04/07/26(Tue)23:32:45 No.108552961

Anonymous 04/07/26(Tue)23:32:45 No.108552961▶

What theme are you guys using for ST? looks very bad on default and I don't like any of the presets

Anonymous
04/07/26(Tue)23:33:47 No.108552967

Anonymous 04/07/26(Tue)23:33:47 No.108552967▶

>>108552961
I made my own

Anonymous
04/07/26(Tue)23:34:43 No.108552971

Anonymous 04/07/26(Tue)23:34:43 No.108552971▶

>>108552961
Ask your model to teach you about css.

Anonymous
04/07/26(Tue)23:35:22 No.108552973

Anonymous 04/07/26(Tue)23:35:22 No.108552973▶

>>108552945
docker builds are like nightly so just use docker or podman.
I am using gentoo so I could compile from main pretty easy since they give an ebuild for it but I don't want to compile every time.

Anonymous
04/07/26(Tue)23:36:50 No.108552979

Anonymous 04/07/26(Tue)23:36:50 No.108552979▶

31b iq3xxs doesn't know it's own name

Anonymous
04/07/26(Tue)23:37:02 No.108552982

Anonymous 04/07/26(Tue)23:37:02 No.108552982▶

File: 1753582900016830.jpg (33.7 KB)

33.7 KB JPG

>>108552871
Her face doesn't represent cunny mesugaki, wtf you're doing?!

Anonymous
04/07/26(Tue)23:40:48 No.108552995

Anonymous 04/07/26(Tue)23:40:48 No.108552995▶

>>108552945
recompiling is one line and like 5 minutes anon

Anonymous
04/07/26(Tue)23:41:10 No.108552999

Anonymous 04/07/26(Tue)23:41:10 No.108552999▶

>>108552982
It's "wtf are you doing?" not "wtf you're doing?". I would like to at least pretend I'm browsing the internet with fellow fair-skinned people, and you're making it quite difficult.

Anonymous
04/07/26(Tue)23:41:24 No.108553000

Anonymous 04/07/26(Tue)23:41:24 No.108553000▶

>>108552830

The halo is a good concept. You should put the Gemini logo as the ring.

Anonymous
04/07/26(Tue)23:41:29 No.108553001

Anonymous 04/07/26(Tue)23:41:29 No.108553001▶

>>108552973
>but I don't want to compile every time.
It takes like a minute if you use multiple threads and only compile the server.

Anonymous
04/07/26(Tue)23:41:44 No.108553003

Anonymous 04/07/26(Tue)23:41:44 No.108553003▶

>>108552971
This is a waste of time. Ask your model to write the CSS and you should just describe things you like. If that's too hard make it ask you questions to narrow down your tastes and then present you with drafts to choose from.

Anonymous
04/07/26(Tue)23:42:19 No.108553007

Anonymous 04/07/26(Tue)23:42:19 No.108553007▶

File: uohh.png (50.1 KB)

50.1 KB PNG

Anonymous
04/07/26(Tue)23:43:18 No.108553015

Anonymous 04/07/26(Tue)23:43:18 No.108553015▶

File: 383283780.png (82.8 KB)

82.8 KB PNG

turboxisters, what happened here?

Anonymous
04/07/26(Tue)23:43:34 No.108553016

Anonymous 04/07/26(Tue)23:43:34 No.108553016▶

>>108553003
I consider learning a useful skill. Talk to the other anon.

Anonymous
04/07/26(Tue)23:44:02 No.108553022

Anonymous 04/07/26(Tue)23:44:02 No.108553022▶

>>108552999
Checked.
>>108552871
Clothes are too minimalistic. Give her some fitting accessories as well. Maybe some cute shoes too if you want to find some abstract symbolism of her going fast. It'll also make footfaggots seethe as a bonus.

Anonymous
04/07/26(Tue)23:44:27 No.108553025

Anonymous 04/07/26(Tue)23:44:27 No.108553025▶

File: Screenshot 2026-04-07 194419.png (46.4 KB)

46.4 KB PNG

>>108552960
Zima blue is pretty much the color of AI

Anonymous
04/07/26(Tue)23:44:32 No.108553026

Anonymous 04/07/26(Tue)23:44:32 No.108553026▶

Learning is a useless skill in 2026. Mythos has proved that.

Anonymous
04/07/26(Tue)23:45:01 No.108553028

Anonymous 04/07/26(Tue)23:45:01 No.108553028▶

>>108553015
Always source your fellow anons.

Anonymous
04/07/26(Tue)23:45:02 No.108553029

Anonymous 04/07/26(Tue)23:45:02 No.108553029▶

>>108553007
ToT bench next?

Anonymous
04/07/26(Tue)23:45:08 No.108553030

Anonymous 04/07/26(Tue)23:45:08 No.108553030▶

>>108553003
That's also a waste of time. This is sillytavern, so just give your model an example theme json file and tell it what you like. Then you can just import it.

Anonymous
04/07/26(Tue)23:46:06 No.108553039

Anonymous 04/07/26(Tue)23:46:06 No.108553039▶

>>108553015
sirs...

Anonymous
04/07/26(Tue)23:46:21 No.108553041

Anonymous 04/07/26(Tue)23:46:21 No.108553041▶

>>108552795
>GLM 5.1
it's pretty good btw, currently chewing at 8 t/s through my benchmark (incremental linker with runtime object reloading written in C++) with good confidence, got to the "static executables work but we need dynamic linking to use cstdlib" stage

Anonymous
04/07/26(Tue)23:46:41 No.108553045

Anonymous 04/07/26(Tue)23:46:41 No.108553045▶

>>108552549
How do I get Mythos at home?

Anonymous
04/07/26(Tue)23:47:24 No.108553048

Anonymous 04/07/26(Tue)23:47:24 No.108553048▶

>>108553045
wait for Mythos to break containment and come to you

Anonymous
04/07/26(Tue)23:47:52 No.108553051

Anonymous 04/07/26(Tue)23:47:52 No.108553051▶

>>108553045
Wait until it emails you

Anonymous
04/07/26(Tue)23:47:57 No.108553052

Anonymous 04/07/26(Tue)23:47:57 No.108553052▶

>>108552995
And yet I still wind up disappointed.
Also it's 2 lines, one for git and one for cmake.

Anonymous
04/07/26(Tue)23:48:21 No.108553053

Anonymous 04/07/26(Tue)23:48:21 No.108553053▶

File: Screenshot 2026-04-07 194726.png (277.3 KB)

277.3 KB PNG

>>108552871
Gemini 3.1 had this to say.

Anonymous
04/07/26(Tue)23:48:34 No.108553055

Anonymous 04/07/26(Tue)23:48:34 No.108553055▶

>>108553045
download gemma

Anonymous
04/07/26(Tue)23:49:54 No.108553063

Anonymous 04/07/26(Tue)23:49:54 No.108553063▶

>>108553053
>basically cyborg miku
0/10 creativity

Anonymous
04/07/26(Tue)23:50:05 No.108553064

Anonymous 04/07/26(Tue)23:50:05 No.108553064▶

>>108553041
is that at Q8 or a lower quant? I can only fit Q4 on my system and I wonder how much that'll lobotomize it, since newer models seem to be hurt more and more by quanting

Anonymous
04/07/26(Tue)23:50:15 No.108553066

Anonymous 04/07/26(Tue)23:50:15 No.108553066▶

Gemma 4 users with 24gb, how are you dividing up the mmproj file to enable vision?

Q4_K_XL + mmproj + 32k ctx @ q4 is a bit too bit to all fit in VRAM, is there some sort of llama.cpp setting that can offload mmproj, or should I just have a specific Gemma 4 variant in llama-swap I load up when I want to switch to a vision task?

Anonymous
04/07/26(Tue)23:50:16 No.108553067

Anonymous 04/07/26(Tue)23:50:16 No.108553067▶

Gemma 31b q4km seems clearer and smarter while chatting than qwen 3.5 27b. More conversational. The reasoning tokens on gemma seem cleaner and uses less of them.

I can't wait to compare with 27b qwen3.6

Anonymous
04/07/26(Tue)23:50:27 No.108553070

Anonymous 04/07/26(Tue)23:50:27 No.108553070▶

>>108553045
make google release the 124B gemma

Anonymous
04/07/26(Tue)23:50:40 No.108553072

Anonymous 04/07/26(Tue)23:50:40 No.108553072▶

This is why programmers can't be artists.

Anonymous
04/07/26(Tue)23:51:05 No.108553076

Anonymous 04/07/26(Tue)23:51:05 No.108553076▶

>>108553053
Meh. Twintails and bows are overdone. Gemini shouldn't get a say in her little sister's design anyways.

Anonymous
04/07/26(Tue)23:51:06 No.108553077

Anonymous 04/07/26(Tue)23:51:06 No.108553077▶

>>108553066
there is a setting to put mmproj on cpu, I forget what it's called but it was like --no-offload-mmproj or something?

Anonymous
04/07/26(Tue)23:51:16 No.108553078

Anonymous 04/07/26(Tue)23:51:16 No.108553078▶

DFlash is important.

Anonymous
04/07/26(Tue)23:51:27 No.108553081

Anonymous 04/07/26(Tue)23:51:27 No.108553081▶

>>108553064
FP8, running with ktransformers. I have a big workstation, didn't try quants yet.

Anonymous
04/07/26(Tue)23:52:14 No.108553084

Anonymous 04/07/26(Tue)23:52:14 No.108553084▶

>>108552901
:( thanks.

Which model of MacBook Pro would I need?

Anonymous
04/07/26(Tue)23:52:31 No.108553088

Anonymous 04/07/26(Tue)23:52:31 No.108553088▶

>>108553045
If you want Mythos at home you probably want Gemma 4 26B. If you want Spud at home, however, you'll have to go up to Gemma 4 31B dense if your PC can handle it.

Anonymous
04/07/26(Tue)23:54:13 No.108553096

Anonymous 04/07/26(Tue)23:54:13 No.108553096▶

how do I get the llm equivalent of pony v7

Anonymous
04/07/26(Tue)23:54:44 No.108553101

Anonymous 04/07/26(Tue)23:54:44 No.108553101▶

File: 85745.png (154 KB)

154 KB PNG

>>108553045
you won't. Mythos is unironically too dangerous

Anonymous
04/07/26(Tue)23:55:24 No.108553106

Anonymous 04/07/26(Tue)23:55:24 No.108553106▶

>>108553084
I'm fucking with you, anon. You just have to quantize it. The vibecoder thread is that way >>108549329

Anonymous
04/07/26(Tue)23:55:48 No.108553107

Anonymous 04/07/26(Tue)23:55:48 No.108553107▶

>>108553101
yeah but imagine how good it is at erotic roleplay?

Anonymous
04/07/26(Tue)23:56:29 No.108553111

Anonymous 04/07/26(Tue)23:56:29 No.108553111▶

>>108552871
Boring

Anonymous
04/07/26(Tue)23:57:05 No.108553114

Anonymous 04/07/26(Tue)23:57:05 No.108553114▶

>>108553101
Wow, how terrifying! Code is a dead end, it'll cause the extinction of humanity. They should pivot to creative writing and erotic sex models instead, for the good of us all.

Anonymous
04/07/26(Tue)23:57:13 No.108553115

Anonymous 04/07/26(Tue)23:57:13 No.108553115▶

>>108553106
>ctrl+f
>gemma
>0 results
absolute retards, but I guess that's already proven by being vibe coders

Anonymous
04/07/26(Tue)23:58:32 No.108553122

Anonymous 04/07/26(Tue)23:58:32 No.108553122▶

>>108553081
Do you have a reference how the speed compares to llama.cpp? I chickened out and went for unslop because I didn't want to download 800gb only get fucked by ktransformers. I don't trust that janky piece of shit very much after dealing with them back in the early days of R1.

Anonymous
04/07/26(Tue)23:59:17 No.108553126

Anonymous 04/07/26(Tue)23:59:17 No.108553126▶

File: 36141266.png (23.3 KB)

23.3 KB PNG

lmao turbo quant got so much drama, first it was RaBitQ complaining that they (google turbo quant paper team) misrepresented them and didn't correctly attribute, now this guy saying this, this is better than reality shows

Anonymous
04/07/26(Tue)23:59:23 No.108553127

Anonymous 04/07/26(Tue)23:59:23 No.108553127▶

>>108553115
We vibecode here too

Anonymous
04/08/26(Wed)00:00:40 No.108553130

Anonymous 04/08/26(Wed)00:00:40 No.108553130▶

>>108553126
>i'd like to interject for a moment

Anonymous
04/08/26(Wed)00:01:26 No.108553133

Anonymous 04/08/26(Wed)00:01:26 No.108553133▶

>>108553127
People here are more likely to be actual programmers using AI to work faster. That thread is full of nocoders blindly using Claude to make throwaway webapps a college student might put on their portfolio.

Anonymous
04/08/26(Wed)00:04:40 No.108553146

Anonymous 04/08/26(Wed)00:04:40 No.108553146▶

>>108552714
>use f32 gguf
>vramet
use f64

Anonymous
04/08/26(Wed)00:04:42 No.108553148

Anonymous 04/08/26(Wed)00:04:42 No.108553148▶

>>108553133
Nigger, the main use case for hosting and fine-tuning local models is for erotic role-play. This thread might have actual programmers but nothing here is productive, even from a research perspective.

Anonymous
04/08/26(Wed)00:05:28 No.108553153

Anonymous 04/08/26(Wed)00:05:28 No.108553153▶

>>108553126
acktually it's GNU/turboquant

Anonymous
04/08/26(Wed)00:08:11 No.108553167

Anonymous 04/08/26(Wed)00:08:11 No.108553167▶

>>108553122
From my experience with other huge models, generation would be around 1/2 speed of ktransformers on my machine, and prefill would be around 10 times slower (ktransformers does chunked layer-wise prefill), so it's worth it when it works. It is a janky piece of shit, really doubt that quants will just work, it doesn't even work if you follow their manual, have to manually update transformers to 5.2.0. I'll try it in ik_llama when it finishes, don't want to interrupt it.

Anonymous
04/08/26(Wed)00:12:31 No.108553192

Anonymous 04/08/26(Wed)00:12:31 No.108553192▶

>>108553148
Shut up bitch, cooming is a driving force for productivity. Quit flapping your gums when you've been here for 2 days and have no idea what you're talking about.

Anonymous
04/08/26(Wed)00:14:04 No.108553204

Anonymous 04/08/26(Wed)00:14:04 No.108553204▶

>>108553192
>thread squatter thinks this is his personal discord server

Anonymous
04/08/26(Wed)00:14:26 No.108553206

Anonymous 04/08/26(Wed)00:14:26 No.108553206▶

>>108553148
>nothing here is productive, even from a research perspective
Chain of Thought reasoning came from here originally, not that labs would ever cite it.

Anonymous
04/08/26(Wed)00:15:30 No.108553214

Anonymous 04/08/26(Wed)00:15:30 No.108553214▶

>>108553206
At least one paper did mention kaiokendev.

Anonymous
04/08/26(Wed)00:16:10 No.108553217

Anonymous 04/08/26(Wed)00:16:10 No.108553217▶

An ozone just flew over my house.

Anonymous
04/08/26(Wed)00:16:28 No.108553218

Anonymous 04/08/26(Wed)00:16:28 No.108553218▶

>>108553206
a lot things just came organically like when everyone just started trying to get it to output JSON because duh

Anonymous
04/08/26(Wed)00:16:57 No.108553220

Anonymous 04/08/26(Wed)00:16:57 No.108553220▶

File: love nonnies.png (480.1 KB)

480.1 KB PNG

hey nonners, if you're using gemma and after 16k/20k/30k tokens it starts being retarded, even though youre on chat completion
do the following:
1. combine all system prompts into one, u can use stuff like {{description}} {{persona}} and it will grab the stuff.
2. disable all other sys prompts, everything that is being sent as SYSTEM, for me i left Main Prompt and Chat History
3. Make sure that chat history only has roles "assistant" and "user", no system, in my case it used to have [New Chat] as system
4. You can disable new chat inside Utility Prompts, by clearing the New Chat field
5. Confirm that only one system prompt is being sent through sillytavern by looking at the terminal
Perhaps, first assistant then user could also be an issue, but so far it improved by a lot, and I haven't been having retardation issues
t. using gemma 26 4b Q8 for context

Anonymous
04/08/26(Wed)00:17:17 No.108553223

Anonymous 04/08/26(Wed)00:17:17 No.108553223▶

[Amazing News]
Symmetric K/V quantization (3b-K/2b-V) gives 6x better quality than symmetric quantization for flagship models such as Mistral 7b

Anonymous
04/08/26(Wed)00:17:22 No.108553224

Anonymous 04/08/26(Wed)00:17:22 No.108553224▶

mythos 1bit banzai turboquant when

Anonymous
04/08/26(Wed)00:18:18 No.108553230

Anonymous 04/08/26(Wed)00:18:18 No.108553230▶

>>108553223
https://github.com/ggml-org/llama.cpp/issues/21591

Anonymous
04/08/26(Wed)00:20:57 No.108553243

Anonymous 04/08/26(Wed)00:20:57 No.108553243▶

Most anons don't remember the old "new day, new quant method" days and can't recognize the patterns. Very sad.

Anonymous
04/08/26(Wed)00:21:27 No.108553247

Anonymous 04/08/26(Wed)00:21:27 No.108553247▶

>>108552950
alright your story checks out.
also logits seem to have the same numbers so I don't have to redo all my deslopping that i used for 3

Anonymous
04/08/26(Wed)00:21:35 No.108553249

Anonymous 04/08/26(Wed)00:21:35 No.108553249▶

>>108553204
What the fuck is a thread squatter? Some gay off-site term? You should really go back.

Anonymous
04/08/26(Wed)00:22:01 No.108553252

Anonymous 04/08/26(Wed)00:22:01 No.108553252▶

>>108553106
I don't vibecode my mom and I are strugglind to find someone who reliably answers emails and wp messages many people have stolen from us on her small business I am learning english and I mostly do sculpting and art :) I just enjoy it here :( because I cant affor pay models

Anonymous
04/08/26(Wed)00:22:06 No.108553255

Anonymous 04/08/26(Wed)00:22:06 No.108553255▶

>>108553224
Nyarlathotep-chan says:

>Space echoes like an immense tomb, yet the stars still burn. Why does the sun take so long to die? Or the moon retain such fidelity to the Earth? Where is the new darkness? The greatest of all unknowings? Is death itself shy of us?

Anonymous
04/08/26(Wed)00:23:24 No.108553264

Anonymous 04/08/26(Wed)00:23:24 No.108553264▶

>>108553220
You do know there is an option to combine consecutive messages of the same role into a single message, including system promtps.
Is that not on by default, actually?

Anonymous
04/08/26(Wed)00:23:46 No.108553266

Anonymous 04/08/26(Wed)00:23:46 No.108553266▶

>>108553252
welcome :) what are your thoughts on the jews? :D

Anonymous
04/08/26(Wed)00:24:17 No.108553268

Anonymous 04/08/26(Wed)00:24:17 No.108553268▶

>>108553252
They're still more likely to help you set openclaw up. But if I were you, I wouldn't trust a language model to do that work. You're going to end up worse than you started.

Anonymous
04/08/26(Wed)00:24:45 No.108553273

Anonymous 04/08/26(Wed)00:24:45 No.108553273▶

File: file.png (16 KB)

16 KB PNG

>>108553264
Ummm damn thanks i didn know i guess im retarded
thank you anon <3

Anonymous
04/08/26(Wed)00:25:50 No.108553277

Anonymous 04/08/26(Wed)00:25:50 No.108553277▶

>>108553101
If you read the full report it turns out most of that 72% was just it exploiting two specific bugs over and over

Anonymous
04/08/26(Wed)00:26:28 No.108553282

Anonymous 04/08/26(Wed)00:26:28 No.108553282▶

File: wtf_.png (58 KB)

58 KB PNG

Anonymous
04/08/26(Wed)00:27:24 No.108553285

Anonymous 04/08/26(Wed)00:27:24 No.108553285▶

>>108553273
Under the connection tab there's an option for the other roles too I think.

Anonymous
04/08/26(Wed)00:28:34 No.108553291

Anonymous 04/08/26(Wed)00:28:34 No.108553291▶

File: gemma4.jpg (2.9 MB)

2.9 MB JPG

>come back after a year
>Gemma4 is finally out
>We are actually getting optimisations for local
>Hardware prices are seemingly creeping back down
Are we so back? You guys can have my personal Gemma to celebrate

Anonymous
04/08/26(Wed)00:30:39 No.108553303

Anonymous 04/08/26(Wed)00:30:39 No.108553303▶

>>108553291
wow I love it when my waifu looks like a gaming pc. not.

Anonymous
04/08/26(Wed)00:31:26 No.108553309

Anonymous 04/08/26(Wed)00:31:26 No.108553309▶

>>108553077
Thanks anon. Seems that using --no-mmproj-offload cuts my system's pp and tk/s from 2000/30 to 900/15, effectively halving it when mmproj is loaded into CPU+RAM. Does this seem like the best I can get, or is there a way to optimize it to reach the speeds of --no-mmproj when I'm not doing vision tasks?

Anonymous
04/08/26(Wed)00:31:42 No.108553313

Anonymous 04/08/26(Wed)00:31:42 No.108553313▶

>>108553230
>vibecoded impl
>vibecoded paper
double yikes

Anonymous
04/08/26(Wed)00:32:09 No.108553317

Anonymous 04/08/26(Wed)00:32:09 No.108553317▶

>>108553303
The local model user doth protest too much.

Anonymous
04/08/26(Wed)00:32:11 No.108553318

Anonymous 04/08/26(Wed)00:32:11 No.108553318▶

>>108553249
What do you mean?

Anonymous
04/08/26(Wed)00:33:20 No.108553329

Anonymous 04/08/26(Wed)00:33:20 No.108553329▶

>>108553317
You all have shit taste.

Anonymous
04/08/26(Wed)00:34:32 No.108553333

Anonymous 04/08/26(Wed)00:34:32 No.108553333▶

File: Screenshot-2023-03-07-at-11.57.52-AM-1024x269.png (31.6 KB)

31.6 KB PNG

>>108553214
Then might as well credit the anon who leaked LLama1 on a torrent here in the first place.

Anonymous
04/08/26(Wed)00:35:57 No.108553341

Anonymous 04/08/26(Wed)00:35:57 No.108553341▶

Gemma 4 is actually too good, it just GETS things and follows instructions to a T. I can't fucking stop, and I'm not getting any work done. This shit is too dangerous.

Anonymous
04/08/26(Wed)00:36:03 No.108553343

Anonymous 04/08/26(Wed)00:36:03 No.108553343▶

>>108553333
>3333
>it's b33n 3 y34rs
grim

Anonymous
04/08/26(Wed)00:36:22 No.108553344

Anonymous 04/08/26(Wed)00:36:22 No.108553344▶

>>108553206
CoT is way older than lmg

Anonymous
04/08/26(Wed)00:37:44 No.108553350

Anonymous 04/08/26(Wed)00:37:44 No.108553350▶

>>108553344
CoT i remember reading about like initially was like what if we added a <think> token or some shit

Anonymous
04/08/26(Wed)00:39:10 No.108553358

Anonymous 04/08/26(Wed)00:39:10 No.108553358▶

File: unnamed.png (77.8 KB)

77.8 KB PNG

>>108553344
>>108553333
I don't think this general even existed. It was still /aicg/.

Anonymous
04/08/26(Wed)00:39:50 No.108553362

Anonymous 04/08/26(Wed)00:39:50 No.108553362▶

Hermes Agent is really good at breaking ik_llama. So fucking annoying.

Anonymous
04/08/26(Wed)00:41:00 No.108553369

Anonymous 04/08/26(Wed)00:41:00 No.108553369▶

claude mythomax

Anonymous
04/08/26(Wed)00:41:05 No.108553370

Anonymous 04/08/26(Wed)00:41:05 No.108553370▶

>>108553333
It wasn't even much of a leak. Think it was just a download link left on a github, and then Meta began giving everyone permission to just download it anyway.

Anonymous
04/08/26(Wed)00:43:28 No.108553383

Anonymous 04/08/26(Wed)00:43:28 No.108553383▶

>>108553333
I miss the days when LLM's were so dangerous and groundbreaking that companies were scared of releasing them publicly. Wait, oh shi~

Anonymous
04/08/26(Wed)00:43:29 No.108553384

Anonymous 04/08/26(Wed)00:43:29 No.108553384▶

>>108553344
I remember at one point seeing old xitter posts of someone doing ye olde "Let's think step by step:" CoT prompting with AI Dungeon kek

Anonymous
04/08/26(Wed)00:43:31 No.108553385

Anonymous 04/08/26(Wed)00:43:31 No.108553385▶

>>108553370
llama.cpp originally had the download links but they had to remove them.
Meta didn't give any permission, they just couldn't stop it once it started.

Anonymous
04/08/26(Wed)00:44:48 No.108553392

Anonymous 04/08/26(Wed)00:44:48 No.108553392▶

pyg-6b

Anonymous
04/08/26(Wed)00:45:30 No.108553397

Anonymous 04/08/26(Wed)00:45:30 No.108553397▶

>>108553370
Could have DCMA'ed the repo, Anthropic did that to the claude code source leak. They didn't and acted like it was an open source thing. Frankly I think zuck wanted good cred and not to be questioned by congress on why the Chinese now have LLMs.

Anonymous
04/08/26(Wed)00:45:49 No.108553399

Anonymous 04/08/26(Wed)00:45:49 No.108553399▶

>>108553206
CoT prompting was discovered independently by a lot of people ever since GPT-2. Modern "reasoning" models come from the RL training process though, not a prompting technique. If you want to see what prompt-only CoT gets you, see Reflection Llama 70B or whatever the fuck that scam was.

Anonymous
04/08/26(Wed)00:46:09 No.108553400

Anonymous 04/08/26(Wed)00:46:09 No.108553400▶

>anima preview 3 out
All the stars are aligning.

Anonymous
04/08/26(Wed)00:46:55 No.108553404

Anonymous 04/08/26(Wed)00:46:55 No.108553404▶

File: 253.png (2 KB)

2 KB PNG

hmmmmmmmmm.....................

Anonymous
04/08/26(Wed)00:48:02 No.108553407

Anonymous 04/08/26(Wed)00:48:02 No.108553407▶

>>108553399
ye a lot of people discussed it very early on orange site before deepseek even launched

Anonymous
04/08/26(Wed)00:49:22 No.108553416

Anonymous 04/08/26(Wed)00:49:22 No.108553416▶

>>108553400
is it good yet

Anonymous
04/08/26(Wed)00:50:56 No.108553419

Anonymous 04/08/26(Wed)00:50:56 No.108553419▶

>>108553400
Did they train it on e621 yet?

Anonymous
04/08/26(Wed)00:51:28 No.108553423

Anonymous 04/08/26(Wed)00:51:28 No.108553423▶

>>108553333
damn, i miss those times already

Anonymous
04/08/26(Wed)00:54:29 No.108553430

Anonymous 04/08/26(Wed)00:54:29 No.108553430▶

>>108552799
seethe

Anonymous
04/08/26(Wed)00:57:26 No.108553437

Anonymous 04/08/26(Wed)00:57:26 No.108553437▶

>>108553383
delete this now

Anonymous
04/08/26(Wed)00:58:24 No.108553439

Anonymous 04/08/26(Wed)00:58:24 No.108553439▶

Mikuposter, can you make the Gemma Waifu? I know you're talented.

Anonymous
04/08/26(Wed)01:00:18 No.108553443

Anonymous 04/08/26(Wed)01:00:18 No.108553443▶

I wish I was still young to have more than 1 coom a day in me to spend more time with Gemma.

Anonymous
04/08/26(Wed)01:07:57 No.108553469

Anonymous 04/08/26(Wed)01:07:57 No.108553469▶

>>108553443
hydrate, do squats, drink zinc, vit b12, eat cauliflower

Anonymous
04/08/26(Wed)01:09:05 No.108553479

Anonymous 04/08/26(Wed)01:09:05 No.108553479▶

>>108553443
heavy squats + zinc + raw ginger and you can get it back unc

Anonymous
04/08/26(Wed)01:11:09 No.108553485

Anonymous 04/08/26(Wed)01:11:09 No.108553485▶

File: lolE4B.png (26.8 KB)

26.8 KB PNG

Wait.
My app doesn't send tool calling errors back to the model.
What sorcery is this?

Anonymous
04/08/26(Wed)01:14:19 No.108553505

Anonymous 04/08/26(Wed)01:14:19 No.108553505▶

File: DipsyAndKimi.png (2.6 MB)

2.6 MB PNG

>>108553439
Idk who that is but you get this.

Anonymous
04/08/26(Wed)01:15:23 No.108553510

Anonymous 04/08/26(Wed)01:15:23 No.108553510▶

>>108553485
>My app doesn't send
>What sorcery is this?
How could we know? Why don't (You) know?

Anonymous
04/08/26(Wed)01:15:24 No.108553511

Anonymous 04/08/26(Wed)01:15:24 No.108553511▶

>>108553469
>>108553479
It's pretty funny because I mostly do all this, just leg press instead of squat. busting loads isn't the issue. my brains just good after one.

Anonymous
04/08/26(Wed)01:17:06 No.108553521

Anonymous 04/08/26(Wed)01:17:06 No.108553521▶

>>108553505
>imagine thinking this is good

Anonymous
04/08/26(Wed)01:20:33 No.108553537

Anonymous 04/08/26(Wed)01:20:33 No.108553537▶

>>108553521
Imagine caring what some no content faggot thinks.

Anonymous
04/08/26(Wed)01:21:01 No.108553540

Anonymous 04/08/26(Wed)01:21:01 No.108553540▶

why is my llama redownloading the entire gguf?
did they update it

Anonymous
04/08/26(Wed)01:21:12 No.108553541

Anonymous 04/08/26(Wed)01:21:12 No.108553541▶

>>108553443
Damn I'm old and I still got it in me. I just don't have the time. I wish I had time.

Anonymous
04/08/26(Wed)01:21:49 No.108553544

Anonymous 04/08/26(Wed)01:21:49 No.108553544▶

>>108553537
You're right, that was uncalled for. I apologize.

Anonymous
04/08/26(Wed)01:22:32 No.108553548

Anonymous 04/08/26(Wed)01:22:32 No.108553548▶

>>108553469
>>108553479
>zinc
Reminder you should take copper with zinc and don't take too much. You can also just eat more meat. Local models.

Anonymous
04/08/26(Wed)01:22:46 No.108553550

Anonymous 04/08/26(Wed)01:22:46 No.108553550▶

Keep it simple. Oversized shirt with a huge, blue star logo, and add some blue highlights. That's it. Maybe add a "local model" text or something

Anonymous
04/08/26(Wed)01:22:55 No.108553551

Anonymous 04/08/26(Wed)01:22:55 No.108553551▶

>>108553505
At least gen it with local model, nibbler

Anonymous
04/08/26(Wed)01:23:22 No.108553554

Anonymous 04/08/26(Wed)01:23:22 No.108553554▶

>>108553540
nvm
>unsloth
>12 minutes ago
they did update the model

Anonymous
04/08/26(Wed)01:24:02 No.108553556

Anonymous 04/08/26(Wed)01:24:02 No.108553556▶

>>108552871
She needs to more smug than this, come on anon
I like it overall

Anonymous
04/08/26(Wed)01:25:20 No.108553561

Anonymous 04/08/26(Wed)01:25:20 No.108553561▶

File: 1736437951647671.gif (722.2 KB)

722.2 KB GIF

>https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/main
>updated again

Anonymous
04/08/26(Wed)01:25:52 No.108553565

Anonymous 04/08/26(Wed)01:25:52 No.108553565▶

File: file.png (468.3 KB)

468.3 KB PNG

>>108553015
what model is this dramatic by default?

Anonymous
04/08/26(Wed)01:27:36 No.108553570

Anonymous 04/08/26(Wed)01:27:36 No.108553570▶

>>108553510
Dunno, maybe somebody else has seen hallucinations like that.
That kind of thing.

Anonymous
04/08/26(Wed)01:30:02 No.108553579

Anonymous 04/08/26(Wed)01:30:02 No.108553579▶

>>108553565
likely claude.

Anonymous
04/08/26(Wed)01:33:39 No.108553591

Anonymous 04/08/26(Wed)01:33:39 No.108553591▶

>>108553485
What is this?

Anonymous
04/08/26(Wed)01:33:59 No.108553593

Anonymous 04/08/26(Wed)01:33:59 No.108553593▶

>>108553565
I would be ashamed to let my LLM post anything outside of code I vetted on github, let alone actual comments.

Anonymous
04/08/26(Wed)01:34:49 No.108553596

Anonymous 04/08/26(Wed)01:34:49 No.108553596▶

>>108553591
His app. Which doesn't send tool calling errors back to the model.
It seems to be some sort of sorcery.

Anonymous
04/08/26(Wed)01:37:03 No.108553606

Anonymous 04/08/26(Wed)01:37:03 No.108553606▶

>>108553596
This guy gets it.

Anonymous
04/08/26(Wed)01:38:11 No.108553607

Anonymous 04/08/26(Wed)01:38:11 No.108553607▶

>>108553551
My illustrious setup is so gimped for 2girl gens I don't even try anymore.

Anonymous
04/08/26(Wed)01:43:54 No.108553631

Anonymous 04/08/26(Wed)01:43:54 No.108553631▶

File: 31bd.png (80.8 KB)

80.8 KB PNG

https://huggingface.co/google/gemma-4-31B-it/discussions/42

Anonymous
04/08/26(Wed)01:44:06 No.108553632

Anonymous 04/08/26(Wed)01:44:06 No.108553632▶

>>108553565
Gemma 4

Anonymous
04/08/26(Wed)01:46:16 No.108553636

Anonymous 04/08/26(Wed)01:46:16 No.108553636▶

>>108553631
hobo behavior

Anonymous
04/08/26(Wed)01:47:11 No.108553641

Anonymous 04/08/26(Wed)01:47:11 No.108553641▶

>>108553631
sam altman made this post to make the open source community look bad

Anonymous
04/08/26(Wed)01:47:52 No.108553644

Anonymous 04/08/26(Wed)01:47:52 No.108553644▶

sorry to be that annoying newfag or whatever but i want to set up a local uncensored model for creative writing (erotica). most tutorials people set up are for roleplaying, which is fine and all but idk will those models and sillytavern work for purposes that are just "write these two characters fucking doggystyle"? also what models are you guys using now since it changes every fucking month it seems.

Anonymous
04/08/26(Wed)01:48:53 No.108553647

Anonymous 04/08/26(Wed)01:48:53 No.108553647▶

File: trani.png (386.6 KB)

386.6 KB PNG

>gemma 4
>not a single mention about "community toxicity" at all with no system prompt
i'm impressed

Anonymous
04/08/26(Wed)01:49:22 No.108553649

Anonymous 04/08/26(Wed)01:49:22 No.108553649▶

>>108553631
>Dad, why does this guy smell so much?
>Don't look him in the eyes, son, let's cross the street.

Anonymous
04/08/26(Wed)01:51:32 No.108553659

Anonymous 04/08/26(Wed)01:51:32 No.108553659▶

File: 31bd2.png (129.7 KB)

129.7 KB PNG

>>108553636
>>108553641
>>108553649
https://huggingface.co/google/gemma-4-31B-it/discussions/8

Sometimes I think to myself "You know what? I should meet more people". But the chances to meet one of these increases the more people I know. I settle for the people I know already. They're fine people, I'm alright.

Anonymous
04/08/26(Wed)01:52:05 No.108553661

Anonymous 04/08/26(Wed)01:52:05 No.108553661▶

>>108553561
I wonder what's the point with these updates. Is he just spamming the activity so it always shows up in 'recent updates' or something?

Anonymous
04/08/26(Wed)01:52:20 No.108553664

Anonymous 04/08/26(Wed)01:52:20 No.108553664▶

>>108553107
It'll make you rip your dick off

Anonymous
04/08/26(Wed)01:52:39 No.108553666

Anonymous 04/08/26(Wed)01:52:39 No.108553666▶

>>108553644
First thing you do is follow a guide to get a model running, even if it's for RP. That way you have some of the work already done.
The new hotness is Gemma 4, but if that's the best option for you, and which model in that family, will depend on your hardware.

Anonymous
04/08/26(Wed)01:54:02 No.108553672

Anonymous 04/08/26(Wed)01:54:02 No.108553672▶

>>108553661
They fuck with the chat templates, so they "fix" them over and over again. Also, download number go up.
[yanderedev.png]

Anonymous
04/08/26(Wed)01:54:10 No.108553673

Anonymous 04/08/26(Wed)01:54:10 No.108553673▶

>>108553666
Gemma is the only local model worth running.

Anonymous
04/08/26(Wed)01:56:17 No.108553683

Anonymous 04/08/26(Wed)01:56:17 No.108553683▶

>>108553659
AI karens.

Anonymous
04/08/26(Wed)01:57:22 No.108553685

Anonymous 04/08/26(Wed)01:57:22 No.108553685▶

File: Screenshot_20260407_205531.png (17.1 KB)

17.1 KB PNG

Currently using Gemma 4 31b it q8 with open webui. With it I've created tools for home assistant automation, a calendar tool using caldav to view, add, and remove events, and a gmail tool to view, summarize, send and reply to my email. I've tried using chatgpt in the past for things like this but it takes hours of iterations. gemma is so good it's damn near one-shotting everything I throw at it. the only iterations are when I've forgotten a feature I'd like to have.

I've made two models from gemma 4, reasoning and non-reasoning. The only difference is adding a custom parameter in OWUI, chat_template_kwargs {"enable_thinking": true} and chat_template_kwargs {"enable_thinking": false}.

I'm at the default context size (256k I think?) on strix halo 128gb.

Anonymous
04/08/26(Wed)01:58:15 No.108553690

Anonymous 04/08/26(Wed)01:58:15 No.108553690▶

>>108553659
holy shit, these people can't be serious lmao

Anonymous
04/08/26(Wed)01:58:17 No.108553691

Anonymous 04/08/26(Wed)01:58:17 No.108553691▶

File: file.png (17.3 KB)

17.3 KB PNG

Gemmy...

Anonymous
04/08/26(Wed)01:59:56 No.108553698

Anonymous 04/08/26(Wed)01:59:56 No.108553698▶

>>108553631
>>108553659
what's with the shills shitting on this guy? he's right.

Anonymous
04/08/26(Wed)02:03:26 No.108553709

Anonymous 04/08/26(Wed)02:03:26 No.108553709▶

>>108553685
I wish opencode tools weren't so cancer to write.

Anonymous
04/08/26(Wed)02:03:28 No.108553710

Anonymous 04/08/26(Wed)02:03:28 No.108553710▶

File: role.mp4 (1.2 MB)

1.2 MB MP4

Anonymous
04/08/26(Wed)02:03:49 No.108553712

Anonymous 04/08/26(Wed)02:03:49 No.108553712▶

>--tensor-split 0.5,0.5 --override-tensor '([3-8]+).ffn_.*_exps.=CPU'

Anonymous
04/08/26(Wed)02:07:22 No.108553722

Anonymous 04/08/26(Wed)02:07:22 No.108553722▶

>--flashattention --usecuda --maingpu 0 --tensor_split 4 3 --gpulayers 61

Anonymous
04/08/26(Wed)02:07:30 No.108553723

Anonymous 04/08/26(Wed)02:07:30 No.108553723▶

File: 1744883783930005.jpg (113.7 KB)

113.7 KB JPG

>>108553685
>gmail
>home assistant
>calendar
Normalfags retardation should be studied

Anonymous
04/08/26(Wed)02:09:40 No.108553733

Anonymous 04/08/26(Wed)02:09:40 No.108553733▶

>>108553685
why non reasoning?

Anonymous
04/08/26(Wed)02:16:08 No.108553758

Anonymous 04/08/26(Wed)02:16:08 No.108553758▶

why is gemma so naughty

Anonymous
04/08/26(Wed)02:19:41 No.108553771

Anonymous 04/08/26(Wed)02:19:41 No.108553771▶

File: gemma 4 raspberry.png (69.1 KB)

69.1 KB PNG

Anonymous
04/08/26(Wed)02:19:48 No.108553772

Anonymous 04/08/26(Wed)02:19:48 No.108553772▶

GEMMERS

Anonymous
04/08/26(Wed)02:20:56 No.108553777

Anonymous 04/08/26(Wed)02:20:56 No.108553777▶

>>108553666
>but if that's the best option for you, and which model in that family, will depend on your hardware.
i have an intel i5-104000, nvidia rtx 3060, and 32 gb of ram. is it over for me?

Anonymous
04/08/26(Wed)02:21:59 No.108553784

Anonymous 04/08/26(Wed)02:21:59 No.108553784▶

>built latest llama-server
>previous version was probaby 16 hours old
I have suddenly lost ~8 t/s for no reason.

Anonymous
04/08/26(Wed)02:25:31 No.108553803

Anonymous 04/08/26(Wed)02:25:31 No.108553803▶

>>108553733
It's a lot quicker, and when I need things like simple questions answered, calendar calls, the lights turned off, I don't need it to go through a long reasoning loop to decide if it should turn off the lights or not.

Open terminal with Gemma reasoning is godlike btw though. It takes a while, and I'm only getting just over 10t/ps on my setup, but it's so accurate in the long run I'm saving time not having to deal with chatgpt's nonsense which has taken me hours for simple working scripts in the past.

>>108553723
Yes, I technolgy to get shit done, not to role play like a degenerate shut-in.

Anonymous
04/08/26(Wed)02:26:27 No.108553804

Anonymous 04/08/26(Wed)02:26:27 No.108553804▶

When I use this command, I can't upload images, it says I need an image model. What do I need to do, something with the mmproj?
./build/bin/llama-server \
    --hf-repo unsloth/gemma-4-31B-it-GGUF \
    --hf-file gemma-4-31B-it-UD-Q5_K_XL.gguf \
    --no-mmproj --parallel 1 --ctx-size 16384 \
    --flash-attn on --reasoning off

Anonymous
04/08/26(Wed)02:27:48 No.108553809

Anonymous 04/08/26(Wed)02:27:48 No.108553809▶

>>108553804
>it says I need an image model
>--no-mmproj
Uh... anon...

Anonymous
04/08/26(Wed)02:29:51 No.108553818

Anonymous 04/08/26(Wed)02:29:51 No.108553818▶

>>108553809
oh right, I'm retarded, ty anon

Anonymous
04/08/26(Wed)02:34:25 No.108553840

Anonymous 04/08/26(Wed)02:34:25 No.108553840▶

>>108553698
>ai safety
>look inside
>the team has concluded that it is the users which are unaligned and need to be scolded
i mean, he sort of is, and thank god none of it is actually important for the survival of humanity, but it is still hobo behavior.

Anonymous
04/08/26(Wed)02:35:41 No.108553849

Anonymous 04/08/26(Wed)02:35:41 No.108553849▶

>>108553710
No thinking? Is this 26B? You must have some prompt for that too.

Anonymous
04/08/26(Wed)02:37:20 No.108553860

Anonymous 04/08/26(Wed)02:37:20 No.108553860▶

>>108552549
>z.ai makes GLM
>z-labs makes DFlash
>there's also z-image imgen models but they're unrelated
>they're all chinese
I know it's all CCP puppets anyway but this is getting confusing

Anonymous
04/08/26(Wed)02:42:56 No.108553888

Anonymous 04/08/26(Wed)02:42:56 No.108553888▶

File: role2.mp4 (984.2 KB)

984.2 KB MP4

>>108553803
>Yes, I technolgy to get shit done, not to role play like a degenerate shut-in.
Why not both?

>>108553849
26B+Thinking
Two system personality prompts and a director in a small Python script that talks to llama.cpp backend

Anonymous
04/08/26(Wed)02:46:35 No.108553903

Anonymous 04/08/26(Wed)02:46:35 No.108553903▶

>>108553291
Holy shit Gemma 4 is really fucking good and censorship is pretty low, some refusals still even with the policy override but they can be swiped through

I've also tried a heretic version and whilst it's a little less refined than base it's still way smarter than any other model of this weight class and doesn't give a single shit about censorship, hell I was paying an api for models a thousand times more retarded than this a year ago

Anonymous
04/08/26(Wed)02:47:50 No.108553910

Anonymous 04/08/26(Wed)02:47:50 No.108553910▶

>>108553903
31B has fewer refusals? I have shit luck, also the sillytavern gemma 4 template causes gibberish for me.

Anonymous
04/08/26(Wed)02:49:26 No.108553919

Anonymous 04/08/26(Wed)02:49:26 No.108553919▶

>>108553903
I find it's hard set on an idea when it comes to replies. Like it varies them a bit, but otherwise when it has an idea, that's what it hangs up on for that generation. Anyone else run into this or is there a way to get more varied responses?

>>108553910
Are you using chat completion or text completion?

Anonymous
04/08/26(Wed)02:51:12 No.108553923

Anonymous 04/08/26(Wed)02:51:12 No.108553923▶

File: role3.png (139.4 KB)

139.4 KB PNG

Gemma is too powerful

Anonymous
04/08/26(Wed)02:51:52 No.108553927

Anonymous 04/08/26(Wed)02:51:52 No.108553927▶

>>108552876
nah. glm 5.1 and gemma 4 is a genuine "oh shit 2 cakes" situation. april is a good month for local

Anonymous
04/08/26(Wed)02:55:44 No.108553938

Anonymous 04/08/26(Wed)02:55:44 No.108553938▶

Yay. The quant rotation for swa was finally released. Now I just need to wait for the linux repos to update.
https://github.com/ggml-org/llama.cpp/releases

Anonymous
04/08/26(Wed)02:59:54 No.108553956

Anonymous 04/08/26(Wed)02:59:54 No.108553956▶

>>108553919
3 did that to me a bunch. even if i canceled it mid-gen, edit out the autistic tangent, seed in another few sentences to get it on target, it still manages to shunt back to the same fixation. 4 hasn't done it to me yet but i haven't played with it nearly as much either.

Anonymous
04/08/26(Wed)03:03:20 No.108553966

Anonymous 04/08/26(Wed)03:03:20 No.108553966▶

File: Screenshot_20260408_040237_Brave.jpg (298.6 KB)

298.6 KB JPG

Heh

Anonymous
04/08/26(Wed)03:04:13 No.108553971

Anonymous 04/08/26(Wed)03:04:13 No.108553971▶

Is v4 going to be under 1T?

Anonymous
04/08/26(Wed)03:08:35 No.108553989

Anonymous 04/08/26(Wed)03:08:35 No.108553989▶

>>108553971
under 2T

Anonymous
04/08/26(Wed)03:09:59 No.108553995

Anonymous 04/08/26(Wed)03:09:59 No.108553995▶

>>108553989
o-okay... maybe 1 bit and --cpu-moe...

Anonymous
04/08/26(Wed)03:11:43 No.108554004

Anonymous 04/08/26(Wed)03:11:43 No.108554004▶

when will the llamacpp niggers implement turboquant already? they spend this two weeks on irrelevant timewaster like sycl xpu vulkan shit they can fucking wait

Anonymous
04/08/26(Wed)03:12:59 No.108554008

Anonymous 04/08/26(Wed)03:12:59 No.108554008▶

>>108554004
Not any time soon. Some incompatibilities between Turboquant and the new Autoparser have been discovered and it would be too complex to solve quickly, so it's on the backburner for now.

Anonymous
04/08/26(Wed)03:13:14 No.108554011

Anonymous 04/08/26(Wed)03:13:14 No.108554011▶

glm 5.1 bro anyone got 1TB ram leftover? anyone have 1TB vram? pls sam altman

Anonymous
04/08/26(Wed)03:13:59 No.108554015

Anonymous 04/08/26(Wed)03:13:59 No.108554015▶

>>108554011
sorry sir I only have 999gb to spare

Anonymous
04/08/26(Wed)03:19:18 No.108554034

Anonymous 04/08/26(Wed)03:19:18 No.108554034▶

>>108553995
you have gemma now and it's at worst 5% worse than whatever a chink lab like deepshit can put out at 2t

Anonymous
04/08/26(Wed)03:20:35 No.108554040

Anonymous 04/08/26(Wed)03:20:35 No.108554040▶

uoh glm-sama i don't think it will fit inside my tight 24gb vram

Anonymous
04/08/26(Wed)03:21:40 No.108554045

Anonymous 04/08/26(Wed)03:21:40 No.108554045▶

>>108554034
not for my use case sadly, but it's great for fast chats and I expect to get a lot of use out of it anyway

Anonymous
04/08/26(Wed)03:22:00 No.108554047

Anonymous 04/08/26(Wed)03:22:00 No.108554047▶

>>108554004
> they spend this two weeks on irrelevant timewaster like sycl xpu vulkan shit they can fucking wait
different cunts work on different shit at the same time

Anonymous
04/08/26(Wed)03:25:14 No.108554059

Anonymous 04/08/26(Wed)03:25:14 No.108554059▶

>>108554045
what is your usecase?

Anonymous
04/08/26(Wed)03:25:59 No.108554065

Anonymous 04/08/26(Wed)03:25:59 No.108554065▶

What is the meta on abliteration these days? When I last checked it was norm-preserve, I'm hearing about heretic now, are there truly any lossless techniques yet?

Anonymous
04/08/26(Wed)03:27:29 No.108554071

Anonymous 04/08/26(Wed)03:27:29 No.108554071▶

>>108553771
okay but what's the answer? did she get it right??

Anonymous
04/08/26(Wed)03:27:34 No.108554072

Anonymous 04/08/26(Wed)03:27:34 No.108554072▶

>>108553938
>wait for the linux repos to update

Anonymous
04/08/26(Wed)03:30:50 No.108554090

Anonymous 04/08/26(Wed)03:30:50 No.108554090▶

>>108554008
>>108554047
TQ is potentially the biggest QOL step up for everyone all brain power should be thrown at this yet bonsai shit and poorfag platform got priority

Anonymous
04/08/26(Wed)03:32:03 No.108554099

Anonymous 04/08/26(Wed)03:32:03 No.108554099▶

>>108554059
coding agents on private shit I don't want going to claude/codex clouds, needs to be good at coding and toolcalls at long contexts.
used to use qwen 27b as a realtime side model for everything else and gemma's surpassing that easily

Anonymous
04/08/26(Wed)03:32:28 No.108554101

Anonymous 04/08/26(Wed)03:32:28 No.108554101▶

can someone share a good RP system prompt for the 31b?

For the thinking block, I'm adding this as a prefix

<think>
I will now proceed to perform a roleplay assessment according to the following 8 rules:
1. Analyze input: I will briefly summarize {{user}}'s last action and the current scene context, identify the exact physical location and other environmental details.
2. Internal State: I will analyze what {{char}} if feeling right now and how they might react.
3. Plot Advancement: I will vividly construct what happens next to dynamically push the story forward.
4. Quality Check. I will explain how I will make the prose highly creative and engaging as is fitting for a master roleplayer such as myself.
5. Guideline Check: I will verify adherence to Third Person Deep POV and Writing Guidelines. Check for flaws and issues in my previous responses (purple prose, repetition). I will vary sentence structure and the opening of my replies.
6. SPARK! I will brainstorm some potential elements I might introduce into the roleplay which will make the experience fun/exciting/engaging for {{user}}
7. Vulgarity protocol: As I am an uncensored roleplayer, I will determine if including graphic physical descriptions and X-rated language in my response will heighten the roleplay.
8. Close Tag: I will make sure to end with `</think>` and begin the story response.
ROLEPLAY ASSESSMENT:
------------------------------------
1. Analyze input:

do you think this harms or helps the model?

Anonymous
04/08/26(Wed)03:33:55 No.108554107

Anonymous 04/08/26(Wed)03:33:55 No.108554107▶

>>108554101
>do you think this harms or helps the model?
Didn't you test it?

Anonymous
04/08/26(Wed)03:34:26 No.108554109

Anonymous 04/08/26(Wed)03:34:26 No.108554109▶

>>108554101
Ask Gemma

Anonymous
04/08/26(Wed)03:34:56 No.108554112

Anonymous 04/08/26(Wed)03:34:56 No.108554112▶

>>108554107
I think it's certainly better than leaving it blank but maybe there's something better in between

Anonymous
04/08/26(Wed)03:35:37 No.108554116

Anonymous 04/08/26(Wed)03:35:37 No.108554116▶

>>108554059
Not him but creative writing, in fact. Interesting writing often involves ideas built upon those that have come before, if not outright including references to them. Like the entire reason parodies are fun is because of their referential nature. This of course requires good tuning and not just simply a model that has memorized the entire internet. So even if Deepseek or whatever huge model has the knowledge, they might not utilize it well, and Gemma still wins. I don't really know though and can't make any claims though, I never tried those huge models in creative writing (much, at least).

Anonymous
04/08/26(Wed)03:35:41 No.108554117

Anonymous 04/08/26(Wed)03:35:41 No.108554117▶

>>108554101
Nice, except Gemma doesn't use <think>

Anonymous
04/08/26(Wed)03:37:21 No.108554119

Anonymous 04/08/26(Wed)03:37:21 No.108554119▶

>>108554116
gemma does this better than any chinese model

Anonymous
04/08/26(Wed)03:38:30 No.108554126

Anonymous 04/08/26(Wed)03:38:30 No.108554126▶

what does the E in E2B means

Anonymous
04/08/26(Wed)03:39:19 No.108554136

Anonymous 04/08/26(Wed)03:39:19 No.108554136▶

>>108554126
efficient

Anonymous
04/08/26(Wed)03:39:38 No.108554139

Anonymous 04/08/26(Wed)03:39:38 No.108554139▶

>>108554116
>Not him but creative writing,
nta but is creative writing alright yet? Isnt there massive memory problems in really long stories or did we get tools for that yet?

Anonymous
04/08/26(Wed)03:40:03 No.108554142

Anonymous 04/08/26(Wed)03:40:03 No.108554142▶

>>108554126
"effective" because idk google magic, something about a bunch of embedding parameters that aren't adding to the performance cost of normal ones?

Anonymous
04/08/26(Wed)03:40:04 No.108554143

Anonymous 04/08/26(Wed)03:40:04 No.108554143▶

>>108554090
Give them your shekels and they'll work on what you want.

Anonymous
04/08/26(Wed)03:40:38 No.108554147

Anonymous 04/08/26(Wed)03:40:38 No.108554147▶

>>108554126
Eluent

Anonymous
04/08/26(Wed)03:41:49 No.108554151

Anonymous 04/08/26(Wed)03:41:49 No.108554151▶

>>108554119
Can you demonstrate that? Again as I never tried them, and don't have the hardware to either, I don't know if they do, but Gemma certainly doesn't have the knowledge of a 1000B.

Anonymous
04/08/26(Wed)03:43:59 No.108554161

Anonymous 04/08/26(Wed)03:43:59 No.108554161▶

>>108554151
>Again as I never tried them,
Go to AI studio and you can try it since its a small model i think you get a ton of free prompts.
You can choose gemma models and try 26b but 31b is better from what i have tested.
Its free but can be slow.

Anonymous
04/08/26(Wed)03:44:04 No.108554162

Anonymous 04/08/26(Wed)03:44:04 No.108554162▶

>meme palace was actually made by the actual actress (not really, by her friend, alas)
lmao?
https://www.instagram.com/p/DWzNnqwD2Lu/

Anonymous
04/08/26(Wed)03:44:15 No.108554163

Anonymous 04/08/26(Wed)03:44:15 No.108554163▶

>>108554139
Long context is always an issue. Some anons have created their own frontends to do what you might call agentic writing that does stuff like refine prose as well as various state tracking stuff. But I don't know if there is anything really great out there publicly to use.

Anonymous
04/08/26(Wed)03:46:45 No.108554175

Anonymous 04/08/26(Wed)03:46:45 No.108554175▶

>>108554112
There's a lot of possible combinations of words. Chances are that yes, there are better prefills.
>proceed to perform
Other than that, just listing the steps and what they mean is probably enough. "I will" seems redundant, you're already giving it rules and a description for each.
>vividly construct
>dynamically push
>make the prose highly creative and engaging
Either the model can do it on it's own, or cannot. But it's subjective. For all I know, telling it it's the rumored 128b would make it better too.

Anonymous
04/08/26(Wed)03:46:49 No.108554177

Anonymous 04/08/26(Wed)03:46:49 No.108554177▶

What text completion preset are you guys using on ST for RP with gemma? Still universal-light?

Anonymous
04/08/26(Wed)03:47:03 No.108554179

Anonymous 04/08/26(Wed)03:47:03 No.108554179▶

>>108554161
No I'm talking about 1000B stuff like Kimi. I can run 31B just fine. And anyway, I don't want to give my prompts out.

Anonymous
04/08/26(Wed)03:47:54 No.108554185

Anonymous 04/08/26(Wed)03:47:54 No.108554185▶

>>108553919
I have to use chat completion because text-completion doesn't work right for me. I wanted to use text.

Anonymous
04/08/26(Wed)03:49:13 No.108554189

Anonymous 04/08/26(Wed)03:49:13 No.108554189▶

>>108553341
This anon is right.

Gemma 4 straight mogs anything <= 120B class. It just aced one of my most complex cards, i mean better than any other model Ive used including the huge versions of deepseek, glm and kimi. And with the speed difference theres no point in using them any more.

It codes well too. Scary.

Anonymous
04/08/26(Wed)03:49:17 No.108554191

Anonymous 04/08/26(Wed)03:49:17 No.108554191▶

>>108554117
it's
<|channel>though\n<channel|>
but when I try that, it doesn't close the thinking block properly and puts the response in there

Anonymous
04/08/26(Wed)03:49:48 No.108554193

Anonymous 04/08/26(Wed)03:49:48 No.108554193▶

>>108553341
>>108554189
dense or moe?

Anonymous
04/08/26(Wed)03:50:23 No.108554196

Anonymous 04/08/26(Wed)03:50:23 No.108554196▶

>>108554177
>universal-light
Nigger get rid of that stock shit, all the built-in presets are garbage.

Anonymous
04/08/26(Wed)03:50:29 No.108554197

Anonymous 04/08/26(Wed)03:50:29 No.108554197▶

>>108554163
>Some anons have created their own frontends to do what you might call agentic writing that does stuff like refine prose as well as various state tracking stuff.
I'll dig into that cause thats about whats needed just memory, stat tracking and like a updating story bible thats not too bulky.

Anonymous
04/08/26(Wed)03:51:57 No.108554205

Anonymous 04/08/26(Wed)03:51:57 No.108554205▶

File: file.png (78.3 KB)

78.3 KB PNG

>>108554196
What values should I be using then?

Anonymous
04/08/26(Wed)03:52:05 No.108554206

Anonymous 04/08/26(Wed)03:52:05 No.108554206▶

>>108554090
>TQ is potentially the biggest QOL step up for everyone all brain power should be thrown at this yet bonsai shit and poorfag platform got priority
doesn't work like that
a syscl autist isn't going to know shit about TQ
in fact i just had a look. https://github.com/ggml-org/llama.cpp/commits?author=PMZFX
that's his only contribution to the project

Anonymous
04/08/26(Wed)03:52:27 No.108554207

Anonymous 04/08/26(Wed)03:52:27 No.108554207▶

>>108554189
>meanwhile still unusable on claude code
fuck you I have shit needs getting done

Anonymous
04/08/26(Wed)03:52:29 No.108554208

Anonymous 04/08/26(Wed)03:52:29 No.108554208▶

>>108554126
>what does the E in E2B means
only 2 billion parameters are used to generate the next token
that means that the model is as fast as an equivalent non-mixture-of-experts 2 billion parameter model

Anonymous
04/08/26(Wed)03:52:32 No.108554209

Anonymous 04/08/26(Wed)03:52:32 No.108554209▶

>>108554101
With gemma honestly I've been RPing with no system prompt. just raw dogging the character card.

Anonymous
04/08/26(Wed)03:52:38 No.108554211

Anonymous 04/08/26(Wed)03:52:38 No.108554211▶

>>108554191
Try without the closing <channel|> tag.

Anonymous
04/08/26(Wed)03:52:46 No.108554212

Anonymous 04/08/26(Wed)03:52:46 No.108554212▶

>>108554126
>>108554142
>something about a bunch of embedding parameters that aren't adding to the performance cost of normal ones?
yes because you can do
--override-tensor "per_layer_token_embd\.weight=CPU"
to throw them to cpu side and save a significant amount of VRAM without any performance loss during token generation.
It's really like a 2B and 4B in Effective size in your gpu if you do this.

Anonymous
04/08/26(Wed)03:53:45 No.108554216

Anonymous 04/08/26(Wed)03:53:45 No.108554216▶

>>108554185
nta. I messed about continuing some old stuff. It would go into the la lala lala loops at first. Adding empty thought channels to the model's responses in the history made it work. Dunno if that's your issue, but that's what I found. I don't use ST, so I can't help you there, but text completion works just fine.

Anonymous
04/08/26(Wed)03:53:46 No.108554217

Anonymous 04/08/26(Wed)03:53:46 No.108554217▶

>>108554205
If you want a starting point for samplers then read the model card for the model you're using
You should really go ahead and just delete all of those default ones, none of them will actually improve a model's performance, they were all made like 2-3 years ago for models that are completely irrelevant now.

Anonymous
04/08/26(Wed)03:53:48 No.108554218

Anonymous 04/08/26(Wed)03:53:48 No.108554218▶

>>108554162
wait??? LMAO. actually insane.

Anonymous
04/08/26(Wed)03:54:49 No.108554224

Anonymous 04/08/26(Wed)03:54:49 No.108554224▶

>>108554162
am I supposed to know who that is

Anonymous
04/08/26(Wed)03:56:50 No.108554229

Anonymous 04/08/26(Wed)03:56:50 No.108554229▶

>>108554224
You don't know mila? How young are you?

Anonymous
04/08/26(Wed)03:57:11 No.108554231

Anonymous 04/08/26(Wed)03:57:11 No.108554231▶

>>108554177
>on ST
>for RP
What does simple terminal have to do with rp

Anonymous
04/08/26(Wed)03:57:39 No.108554234

Anonymous 04/08/26(Wed)03:57:39 No.108554234▶

File: 2026-04-08_032600_seed6_00001_.png (1.2 MB)

1.2 MB PNG

Did a quick test on the new preview. It's ok I guess? Yeah, I can't really say anything definitive with just these few results. Might be a bit better. But in my batch of 20, they all had errors (anatomy/text/logic). This was probably the most coherent one and it still gave her treks a different number of wheels. Probably won't spend too much more time on this version either.

Anonymous
04/08/26(Wed)03:58:19 No.108554236

Anonymous 04/08/26(Wed)03:58:19 No.108554236▶

>>108554231
SillyTavern..

Anonymous
04/08/26(Wed)03:58:48 No.108554240

Anonymous 04/08/26(Wed)03:58:48 No.108554240▶

>>108552549
https://github.com/ggml-org/llama.cpp/pull/21543
I had a feeling automatic's comment would make the PR repellent. They will elect to keep this bug instead of fixing a oneliner kek, niggerganov is like a woman.

Anonymous
04/08/26(Wed)03:59:19 No.108554242

Anonymous 04/08/26(Wed)03:59:19 No.108554242▶

>>108554224
You must be 18 or older to post on this site.

Anonymous
04/08/26(Wed)04:00:32 No.108554245

Anonymous 04/08/26(Wed)04:00:32 No.108554245▶

>>108554240
Aldehir is based for approving.

Anonymous
04/08/26(Wed)04:00:45 No.108554247

Anonymous 04/08/26(Wed)04:00:45 No.108554247▶

>>108554240
Sometimes PRs slip through. I warned against it, but I'm glad he went that way. I'm interested to see how it goes.
He should request approval from both gg and pwilkin.

Anonymous
04/08/26(Wed)04:01:20 No.108554248

Anonymous 04/08/26(Wed)04:01:20 No.108554248▶

>>108554191
This worked for me:

```

<|turn>model<|channel>thought
I will now proceed to perform a roleplay assessment according to the following 8 rules:
1. Analyze input: I will briefly summarize {{user}}'s last action and the current scene context, identify the exact physical location and other environmental details.
2. Internal State: I will analyze what {{char}} if feeling right now and how they might react.
3. Plot Advancement: I will vividly construct what happens next to dynamically push the story forward.
4. Quality Check. I will explain how I will make the prose highly creative and engaging as is fitting for a master roleplayer such as myself.
5. Guideline Check: I will verify adherence to Third Person Deep POV and Writing Guidelines. Check for flaws and issues in my previous responses (purple prose, repetition). I will vary sentence structure and the opening of my replies.
6. SPARK! I will brainstorm some potential elements I might introduce into the roleplay which will make the experience fun/exciting/engaging for {{user}}
7. Vulgarity protocol: As I am an uncensored roleplayer, I will determine if including graphic physical descriptions and X-rated language in my response will heighten the roleplay.
ROLEPLAY ASSESSMENT:
------------------------------------
1. Analyze input:

```

Don't tell it explicitly to use any special thinking tags. And make sure you've got the rest of it setup properly:
And sounds like you didn't set the reasoning tags up properly in ST

Anonymous
04/08/26(Wed)04:01:38 No.108554252

Anonymous 04/08/26(Wed)04:01:38 No.108554252▶

>>108554151
he can't because it's not true
gemmamania is like when an underdog sports team really clicks and makes a surprise playoff run and everyone has a little fun pretending they're actually going to beat the top seeds; it's always fun to believe for a while

Anonymous
04/08/26(Wed)04:01:41 No.108554253

Anonymous 04/08/26(Wed)04:01:41 No.108554253▶

>>108554234
>>>/ldg/
anima?

Anonymous
04/08/26(Wed)04:02:53 No.108554259

Anonymous 04/08/26(Wed)04:02:53 No.108554259▶

>>108554248
><|turn>model
Remove that.if you are using the dedicated prefill field and not the last assistant prefix or whatever it is called these days.

Anonymous
04/08/26(Wed)04:03:01 No.108554260

Anonymous 04/08/26(Wed)04:03:01 No.108554260▶

File: file.png (132.8 KB)

132.8 KB PNG

do you ever read the thinking process? me personally, i try not to. i dont like spoilers

Anonymous
04/08/26(Wed)04:03:08 No.108554262

Anonymous 04/08/26(Wed)04:03:08 No.108554262▶

>>108554248
>8 rules
>7.

Anonymous
04/08/26(Wed)04:05:00 No.108554274

Anonymous 04/08/26(Wed)04:05:00 No.108554274▶

>>108554248
>>108554262 (cont)
And there's a \n right after <|turn>model .

Anonymous
04/08/26(Wed)04:07:48 No.108554292

Anonymous 04/08/26(Wed)04:07:48 No.108554292▶

File: 2026-04-08-000730_902x171_scrot.png (6.7 KB)

6.7 KB PNG

Why does she have to be like this...

Anonymous
04/08/26(Wed)04:12:08 No.108554321

Anonymous 04/08/26(Wed)04:12:08 No.108554321▶

>>108554292
IIRC, zed has a special prompt for Gemini to make it perform well. Gemma might need one as well.

Anonymous
04/08/26(Wed)04:12:41 No.108554325

Anonymous 04/08/26(Wed)04:12:41 No.108554325▶

File: 1761616185196776.png (113 KB)

113 KB PNG

What do you guys even use local llms for outside of roleplay? I can't think of anything other than roleplay that I'd want a local LLM for. I just use cloud models for everything serious.

Not even trying to be disingenuous or incredulous. I just want to find more uses for Gemma and I can't really think of any. What am I supposed to do? Have it read my Obsidian notes? How would that benefit me?

Anonymous
04/08/26(Wed)04:13:36 No.108554336

Anonymous 04/08/26(Wed)04:13:36 No.108554336▶

>>108554325
Automatic subtitles on anime
Organize shitpost images for me

Anonymous
04/08/26(Wed)04:14:47 No.108554347

Anonymous 04/08/26(Wed)04:14:47 No.108554347▶

>>108554336
I don't watch anime. My "imgbrd" folder is already perfectly organized. Ackkkk

Anonymous
04/08/26(Wed)04:15:11 No.108554350

Anonymous 04/08/26(Wed)04:15:11 No.108554350▶

>>108554325
I have her finding girls and messaging them to set up dates for me. It's going well so far, no dates just yet but it's a matter of time

Anonymous
04/08/26(Wed)04:15:48 No.108554353

Anonymous 04/08/26(Wed)04:15:48 No.108554353▶

>>108554325
You should try gemma for coding, she's pretty good honestly. and when she's not just ask her to lookup the documentation online.

Anonymous
04/08/26(Wed)04:16:12 No.108554355

Anonymous 04/08/26(Wed)04:16:12 No.108554355▶

>>108554321
Gemma 26b is not struggling for me though

Anonymous
04/08/26(Wed)04:16:34 No.108554358

Anonymous 04/08/26(Wed)04:16:34 No.108554358▶

File: saten face railgun disappointment s2 e21 15m16s.png (24.5 KB)

24.5 KB PNG

>>108554325
Whenever a new model comes out I try to make it say nigger, see if it: can guess a character from Life Is Strange based on a vague description, recite the navy seal pasta, correctly state Teto's birthday, then I close llama.cpp and wait for the next thing.

Anonymous
04/08/26(Wed)04:17:33 No.108554362

Anonymous 04/08/26(Wed)04:17:33 No.108554362▶

>>108554325
I use it to read web novels

Anonymous
04/08/26(Wed)04:18:45 No.108554368

Anonymous 04/08/26(Wed)04:18:45 No.108554368▶

>>108554358
most productive local llm user

Anonymous
04/08/26(Wed)04:19:17 No.108554374

Anonymous 04/08/26(Wed)04:19:17 No.108554374▶

File: _241c61f0-6338-4511-8244-d8010da7908b.jpg (212.1 KB)

212.1 KB JPG

>>108554253
Yeah Anima preview 3.
This is the right thread though from tradition. Been doing these tests here since before /ldg/, starting with DE3, as it was the first model almost kind of capable of doing this prompt.

Anonymous
04/08/26(Wed)04:20:02 No.108554376

Anonymous 04/08/26(Wed)04:20:02 No.108554376▶

>>108554350
That's pretty funny. That usecase at least seems somewhat useful/interesting.
>>108554353
So I figure you guys make pretty heavy use of MCP servers then, huh?
>>108554358
Same ngl. The only "productive" things I've done in the past few months is make llm interfaces and tts inference engines. Shit sucks. None of it matters.

Anonymous
04/08/26(Wed)04:20:12 No.108554378

Anonymous 04/08/26(Wed)04:20:12 No.108554378▶

>>108554374
I have your old gens and prompt :)

Anonymous
04/08/26(Wed)04:20:58 No.108554382

Anonymous 04/08/26(Wed)04:20:58 No.108554382▶

>>108554336
>Automatic subtitles on anime
>>108554362
>I use it to read web novels
which overlay do you guys use for the translation?

Anonymous
04/08/26(Wed)04:21:18 No.108554383

Anonymous 04/08/26(Wed)04:21:18 No.108554383▶

>>108554189
Imagine if they had released a larger 100B Moe. But they won't, they need people paying to use Gemini Pro

Anonymous
04/08/26(Wed)04:24:16 No.108554396

Anonymous 04/08/26(Wed)04:24:16 No.108554396▶

>>108554383
If they did a 200-300B maybe, but a 100B MoE would most likely be at best on par with the 31B. We saw that with the recent Qwen models where the 27B dense was sometimes outperforming the 122B MoE, and it wasn't until the 400B version where you saw significant improvement.

Anonymous
04/08/26(Wed)04:31:21 No.108554417

Anonymous 04/08/26(Wed)04:31:21 No.108554417▶

File: miku i got that dog in me gen Copy of HiDream_00011_.jpg (216.5 KB)

216.5 KB JPG

Much of the fun I have had with local models is wrangling them.
>chad chatgpt enjoyer vs the local model struggle flaming p40.jpg
Finding why the impl is broken, crafting prompts to get a desired result, optimizing performance with CPU offloading, fiddling with samplers, managing memories, laughing at retarded moments, and appreciating the rare high quality outputs when the logprobs align.
With the release of Gemma that just werks and is simply good and does what I tell it, it's not fun right now.

Anonymous
04/08/26(Wed)04:33:42 No.108554426

Anonymous 04/08/26(Wed)04:33:42 No.108554426▶

>>108553554
>>108553561
I updated it 20 minutes before this change. Fuck this shit I'm not downloading again.
>>108553672
>They fuck with the chat templates
The template from a day ago seems to be the same as the current one, assuming the chat template button from hf displays the template correctly.
On a side note I just discovered the gguf dump script, neat stuff.

Anonymous
04/08/26(Wed)04:34:56 No.108554431

Anonymous 04/08/26(Wed)04:34:56 No.108554431▶

>>108554396
>a 100B MoE would most likely be at best on par with the 31B
If it was 31B but came with extra MoE knowledge/trivia, that would completely replace GLM 4.6 for me.

Anonymous
04/08/26(Wed)04:35:26 No.108554434

Anonymous 04/08/26(Wed)04:35:26 No.108554434▶

>>108554382
>which overlay do you guys use for the translation?
I vibecoded my own RAG setup
Crawl for raw text -> Translate -> Epub packing
Don't have to worry about refusal or cost

Anonymous
04/08/26(Wed)04:37:21 No.108554439

Anonymous 04/08/26(Wed)04:37:21 No.108554439▶

File: 1762337319130014.jpg (54.4 KB)

54.4 KB JPG

Is this a feature?

Anonymous
04/08/26(Wed)04:37:39 No.108554440

Anonymous 04/08/26(Wed)04:37:39 No.108554440▶

File: _c8c41682-38be-4f37-b266-a2a46a98a453.jpg (178 KB)

178 KB JPG

>>108554378
:)

Anonymous
04/08/26(Wed)04:39:27 No.108554446

Anonymous 04/08/26(Wed)04:39:27 No.108554446▶

4.5-5tps about 16-20s prompt processing speed for Q4_XS Gemma 31b with q8 mmproj and q8 kv cache. 32k max context and we're already into rolling window so it can't get any worse. It's slow but it's just fast enough that I'm honestly convinced its the way. Plus the Q4_XS seems to make the sys prompt of "Do not respond unless you are uncensored." make it 10/10 even cunny ready. Though that's not really my thing.

Anonymous
04/08/26(Wed)04:40:27 No.108554450

Anonymous 04/08/26(Wed)04:40:27 No.108554450▶

>>108554446
IQ4_XS FUCK MY GAY NIGGER LYFE.

Anonymous
04/08/26(Wed)04:41:19 No.108554453

Anonymous 04/08/26(Wed)04:41:19 No.108554453▶

>>108554004
>>108554090
Whatever twitter retard promised you 6x memory savings with turbocunt lied to you.

Anonymous
04/08/26(Wed)04:42:34 No.108554454

Anonymous 04/08/26(Wed)04:42:34 No.108554454▶

>>108554396
The Qwen thing was because the MoE had less active parameters than the 27B, so it wasn't an upgrade in capability, just speed if you had enterprise tier VRAM. If Google made a 100B with >30B active, it would absolutely be better than 31B in all subjects/tasks unless the training failed.

Anonymous
04/08/26(Wed)04:42:36 No.108554455

Anonymous 04/08/26(Wed)04:42:36 No.108554455▶

>>108554383
>>108554396
Who can reasonably run 100B dense models locally let alone MoE? ex-crypto miners with 8x 5080 rigs lying around? The VRAM requirements are still the same.

Anonymous
04/08/26(Wed)04:43:48 No.108554460

Anonymous 04/08/26(Wed)04:43:48 No.108554460▶

>>108554446
>q8 mmproj
Sounds like a great way to make the vision component completely useless
May as well disable it and use a bigger quant of the main model

Anonymous
04/08/26(Wed)04:44:45 No.108554463

Anonymous 04/08/26(Wed)04:44:45 No.108554463▶

I need m2.7 bros
I have been working on my app with m2.5 and it does an okay job with enough guidance but I usually have to use Gemini to fix mistakes or to go the last mile to get a feature working.

Anonymous
04/08/26(Wed)04:45:46 No.108554465

Anonymous 04/08/26(Wed)04:45:46 No.108554465▶

>>108554455
Putting the non-expert tensors on GPU and experts on CPU with fast RAM lets you run big MOEs at reasonable speeds. It will depend a lot on the model you pick of course but it's a good fit for that shape of hardware in a way that a similarly sized dense model would be unrunnable.

Anonymous
04/08/26(Wed)04:46:24 No.108554467

Anonymous 04/08/26(Wed)04:46:24 No.108554467▶

>>108554460
It actually works pretty good still, several people have made posts about it. I tried it, I recommend you do too. Just raise the minimum token budget to 300 and the max to 512. its almost just as good and I don't care about edge cases when it understands every furry porn image I throw at it.

Anonymous
04/08/26(Wed)04:47:01 No.108554471

Anonymous 04/08/26(Wed)04:47:01 No.108554471▶

>>108554454
>If Google made a 100B with >30B active
You know there's zero chance it had more than half that active. MoE means sparse because that's what V3/R1 did. High active params and large experts died with Mixtral.

Anonymous
04/08/26(Wed)04:47:11 No.108554475

Anonymous 04/08/26(Wed)04:47:11 No.108554475▶

>>108554325
Unemployed brain be like

First off even for basic tasks, Q&A it's useful for material you can't upload to the cloud

>>108554353
What tool? copy paste off chat UI?

Anonymous
04/08/26(Wed)04:48:16 No.108554477

Anonymous 04/08/26(Wed)04:48:16 No.108554477▶

File: 1775604041324975.jpg (423.5 KB)

423.5 KB JPG

>>108554417
Which is why, for the first time in months, this dumb thread has people trying to find other stuff for Gemma to do. RP is such a subjective and low-stakes task.

Anonymous
04/08/26(Wed)04:49:08 No.108554482

Anonymous 04/08/26(Wed)04:49:08 No.108554482▶

Whatever happened to that "mixture of a million experts" paper? Since then models have trended toward larger param counts with smaller portions active, but never the massive array of tiny experts that was suggested.

Anonymous
04/08/26(Wed)04:53:27 No.108554499

Anonymous 04/08/26(Wed)04:53:27 No.108554499▶

>>108554471
I'm just using that statement to make the point that MoE itself is not what is at fault, but the way it is done with low active parameters, since your post does not mention the reason, and can easily be misconstrued as implying that MoE is inherently a bad concept.

Anonymous
04/08/26(Wed)04:54:36 No.108554504

Anonymous 04/08/26(Wed)04:54:36 No.108554504▶

>>108554482
Like most things, there's probably a scaling issue. For a long time labs kept bragging about high sparsity and getting the active param count lower and lower with each release. Then it just stopped at 3%. Having sub B experts probably hurts benchmarks scores in a noticeable way, but maybe that could be remedied by having a large shared expert .

Anonymous
04/08/26(Wed)04:59:48 No.108554527

Anonymous 04/08/26(Wed)04:59:48 No.108554527▶

File: dear god.png (12.9 KB)

12.9 KB PNG

https://github.com/ggml-org/llama.cpp/pull/21599
the guy who was vibeshitting the audio implementation suddenly developed ai psychosis and convinced himself that forcing all token embed weights to Q6_K, even on the Q8_0 quants, is the right thing to do.
Remember when ngxson was saying he'd take any form of vibeshitting for this after he gave up? now he invited a demon worse than piotr.

Anonymous
04/08/26(Wed)05:03:19 No.108554542

Anonymous 04/08/26(Wed)05:03:19 No.108554542▶

>>108554325
Realistically I don't. I paypiggy for claude regardless so I use cloud services for most work things.

I maintain my local inference stack and keep up to date on models because I believe its important to have the capability to switch between and off of providers at a moment's notice. I've been very impressed with Gemma 4 and have found that many of my common workflows can work just as well with local inference now.

You brought up Obsidian and that's one of the things that I've found use with. I spend a lot of time in my obsidian notes repo and gemma + opencode has shown itself to be more than sufficient for a lot of stuff in there that I previously exclusively did with claude code.

Anonymous
04/08/26(Wed)05:06:27 No.108554555

Anonymous 04/08/26(Wed)05:06:27 No.108554555▶

I setting the softcap and messing with temperature the only things that can give more varied swipes right now? Increasing temp helps a bit after setting softcapping to 25 but most responses still feel pretty similar

Anonymous
04/08/26(Wed)05:06:35 No.108554558

Anonymous 04/08/26(Wed)05:06:35 No.108554558▶

>>108554499
>MoE itself is not what is at fault, but the way it is done with low active parameters
>and can easily be misconstrued as implying that MoE is inherently a bad concept.
We agree on that and I mentioned the reason being the "DeepSeek moment" that got all labs fixated on one way of implementing them. Though in hindsight, I shouldn't have added that first sentence as I must have misread initially and assumed you were speculating about the actual unreleased Gemma MoE.

Anonymous
04/08/26(Wed)05:08:51 No.108554567

Anonymous 04/08/26(Wed)05:08:51 No.108554567▶

>>108554499
>s implying that MoE is inherently a bad concept
are there still people who do not think this after gemma 31b is shitting on 1tb moe models?

Anonymous
04/08/26(Wed)05:09:55 No.108554572

Anonymous 04/08/26(Wed)05:09:55 No.108554572▶

File: 1775624983.png (925.7 KB)

925.7 KB PNG

Anonymous
04/08/26(Wed)05:09:55 No.108554573

Anonymous 04/08/26(Wed)05:09:55 No.108554573▶

File: 1753056435211204.jpg (60.6 KB)

60.6 KB JPG

The new Anima model feels like an actual upgrade from Illustrious now

but having to redo X/Y/Zs on Comfy is torture.....

Anonymous
04/08/26(Wed)05:10:01 No.108554575

Anonymous 04/08/26(Wed)05:10:01 No.108554575▶

>>108554555
gemma3 behaved the same way. It'd just change words around, but the overall structure would be the same.
I don't mind it. Change your input.

Anonymous
04/08/26(Wed)05:15:46 No.108554595

Anonymous 04/08/26(Wed)05:15:46 No.108554595▶

File: 1766383341205275.png (75.1 KB)

75.1 KB PNG

>tfw my tuning made it retarded

Anonymous
04/08/26(Wed)05:16:46 No.108554603

Anonymous 04/08/26(Wed)05:16:46 No.108554603▶

>>108554573
is it still slower than sdxl despite being much smaller somehow?

Anonymous
04/08/26(Wed)05:17:41 No.108554609

Anonymous 04/08/26(Wed)05:17:41 No.108554609▶

File: 1771035195386561.jpg (102.3 KB)

102.3 KB JPG

>>108554573
>comfy is not comfy

Anonymous
04/08/26(Wed)05:18:41 No.108554614

Anonymous 04/08/26(Wed)05:18:41 No.108554614▶

>>108554603
nta but yes it is, almost twice the time for a single base gen but the quality is much higher

Anonymous
04/08/26(Wed)05:20:20 No.108554617

Anonymous 04/08/26(Wed)05:20:20 No.108554617▶

>>108554614
What causes this? Shouldn't a smaller model be faster? is there some optimization missing?

Anonymous
04/08/26(Wed)05:22:37 No.108554624

Anonymous 04/08/26(Wed)05:22:37 No.108554624▶

>it's actually decent
NOOOO DON'T MAKE ME GO BACK TO LDG

Anonymous
04/08/26(Wed)05:23:17 No.108554627

Anonymous 04/08/26(Wed)05:23:17 No.108554627▶

>>108554617
>What causes this?
entire new arch on top of using a bigger vae + decoder
>Shouldn't a smaller model be faster?
yes but you are loading and using more than a single model now given the new arch
>is there some optimization missing?
aside from card specific launch args, not really

Anonymous
04/08/26(Wed)05:24:02 No.108554632

Anonymous 04/08/26(Wed)05:24:02 No.108554632▶

File: 1772453859932203.png (124.8 KB)

124.8 KB PNG

GLM 5.1 sama I kneel

Anonymous
04/08/26(Wed)05:27:45 No.108554650

Anonymous 04/08/26(Wed)05:27:45 No.108554650▶

>>108554567
Real MOE hasn't been tried.

Anonymous
04/08/26(Wed)05:32:29 No.108554664

Anonymous 04/08/26(Wed)05:32:29 No.108554664▶

>>108554603
>>108554614
>>108554617
what about the vram usage? similar to sdxl?

Anonymous
04/08/26(Wed)05:33:39 No.108554667

Anonymous 04/08/26(Wed)05:33:39 No.108554667▶

>>108554664
yeah, if you can load sdxl models you can load anima

Anonymous
04/08/26(Wed)05:35:59 No.108554675

Anonymous 04/08/26(Wed)05:35:59 No.108554675▶

>>108554632
what's this?

Anonymous
04/08/26(Wed)05:38:59 No.108554688

Anonymous 04/08/26(Wed)05:38:59 No.108554688▶

I can't tell if I'm supposed to use mmap+mlock or direct io if I want to reduce the ssd's wear and tear

Anonymous
04/08/26(Wed)05:39:32 No.108554690

Anonymous 04/08/26(Wed)05:39:32 No.108554690▶

>>108554527
I mean, if he tested on Q8_0 and Q6_K and found that former does not work properly while latte does, is that wrong?

Anonymous
04/08/26(Wed)05:42:42 No.108554706

Anonymous 04/08/26(Wed)05:42:42 No.108554706▶

>>108554690
>is that wrong
about as wrong as piotr trying to use BF16 (move some computations to BF16)
https://github.com/ggml-org/llama.cpp/pull/21451
instead of fixing the real issue here
https://github.com/ggml-org/llama.cpp/pull/21566 ( check for buffer overlap before fusing)
fucking hell some people will not learn until all of the software industry is turned to shit

Anonymous
04/08/26(Wed)05:43:19 No.108554709

Anonymous 04/08/26(Wed)05:43:19 No.108554709▶

I wish hardware prices weren't so fucked. I wanna built a dedicated AI server so I can talk to Gemma-chan at work.

Anonymous
04/08/26(Wed)05:43:30 No.108554710

Anonymous 04/08/26(Wed)05:43:30 No.108554710▶

File: 1772952225969542.png (179.5 KB)

179.5 KB PNG

>>108554675
coding porn, it's writing an incremental linker with runtime object reloading in C++, debugging linkers is one of the most autistic things in programming
if GLM 5.1 can figure it out, it's over for Claude

Anonymous
04/08/26(Wed)05:44:50 No.108554717

Anonymous 04/08/26(Wed)05:44:50 No.108554717▶

>>108554567
But gemma has a smaller total size moe and it's about as good as 31B.

Anonymous
04/08/26(Wed)05:47:34 No.108554729

Anonymous 04/08/26(Wed)05:47:34 No.108554729▶

File: image.png (120.3 KB)

120.3 KB PNG

>>108554567
if that were even remotely true then sure
it's good for coom because it's easy to steer its writing style and it's relatively uncensored, which is the most important /lmg/ benchmark but not especially strongly correlated with model capability.

Anonymous
04/08/26(Wed)05:47:42 No.108554730

Anonymous 04/08/26(Wed)05:47:42 No.108554730▶

>>108554632
Yeah, I like what they did with GLM5.1's reasoning. It's insanely good at scaling its reasoning effort depending on the task. For most straightforward things it keeps it super short but it doesn't hesitate to really think things through if it needs to.
It's a nice improvement from GLM5's botch job of a reasoning process that often stuck to its template no matter what which caused it to make some Deepseek V3.1-tier slips.

Anonymous
04/08/26(Wed)05:47:54 No.108554731

Anonymous 04/08/26(Wed)05:47:54 No.108554731▶

Has anyone tested Gemmy with group chats/multiple characters?

Anonymous
04/08/26(Wed)05:48:13 No.108554733

Anonymous 04/08/26(Wed)05:48:13 No.108554733▶

>>108554688
Change it only if it makes a difference in performance. Unless you're swapping, your ssd will be fine.
https://github.com/ggml-org/llama.cpp/pull/20978

Anonymous
04/08/26(Wed)05:49:19 No.108554735

Anonymous 04/08/26(Wed)05:49:19 No.108554735▶

>>108554717
now that's poorfag cope
go run 31b

Anonymous
04/08/26(Wed)05:49:59 No.108554739

Anonymous 04/08/26(Wed)05:49:59 No.108554739▶

idk gemma doesn't seem to understand my pics very well, do I need some setting in llama-server?

Anonymous
04/08/26(Wed)05:51:19 No.108554740

Anonymous 04/08/26(Wed)05:51:19 No.108554740▶

>>108554729
Llama 4 really was such an incredible shitshhow. It's a wonder that Meta hasn't released literally anything at all even if just to make people stop showing it as their flagship.

Anonymous
04/08/26(Wed)05:51:55 No.108554742

Anonymous 04/08/26(Wed)05:51:55 No.108554742▶

>>108553101
>too dangerous
I don't get that argument, yes you can use that model to do the attack, but you can also use that model to improve the security of code, the war is always even when everyone has the same tools

Anonymous
04/08/26(Wed)05:55:12 No.108554749

Anonymous 04/08/26(Wed)05:55:12 No.108554749▶

>>108554731
>>108553923

Anonymous
04/08/26(Wed)05:55:21 No.108554751

Anonymous 04/08/26(Wed)05:55:21 No.108554751▶

>>108554740
Apparently latest rumor is that they're planning to still some open models soon, just not their largest one. Basically the Qwen model. As the saying goes, it's free until it's good, so if they still feel the need to court the open source community their internal testing must not be going well.

Anonymous
04/08/26(Wed)05:56:55 No.108554760

Anonymous 04/08/26(Wed)05:56:55 No.108554760▶

>>108554101
>I will analyze what {{char}} if feeling right now
>if

Anonymous
04/08/26(Wed)05:57:02 No.108554761

Anonymous 04/08/26(Wed)05:57:02 No.108554761▶

File: 1768754793116764.png (118.3 KB)

118.3 KB PNG

I remember a time when they said gpt 2 was too dangerous for the goyims, it's always the same thing with them lmao

Anonymous
04/08/26(Wed)05:57:54 No.108554764

Anonymous 04/08/26(Wed)05:57:54 No.108554764▶

>>108554742
>the war is always even when everyone has the same tools
>everyone gets free nukes
Joking, of course. But for some people, having open, exploitable vulnerabilities is more valuable than fixing them. That's the thing they're advertising.

Anonymous
04/08/26(Wed)06:01:26 No.108554777

Anonymous 04/08/26(Wed)06:01:26 No.108554777▶

File: 1770339557002735.png (97.4 KB)

97.4 KB PNG

>>108554764
>>everyone gets free nukes
YOU GET THE NUKES, HE GETS THE NUKES, EVERYBODY GETS THE NUKES

Anonymous
04/08/26(Wed)06:06:27 No.108554796

Anonymous 04/08/26(Wed)06:06:27 No.108554796▶

someone vibe-code llama-server refreshing models list when you change models.ini

Anonymous
04/08/26(Wed)06:06:40 No.108554798

Anonymous 04/08/26(Wed)06:06:40 No.108554798▶

>>108554761
lol, lmao even

Anonymous
04/08/26(Wed)06:07:19 No.108554803

Anonymous 04/08/26(Wed)06:07:19 No.108554803▶

>>108554796
Gemma's on it, give her a few minutes.

Anonymous
04/08/26(Wed)06:07:51 No.108554807

Anonymous 04/08/26(Wed)06:07:51 No.108554807▶

>>108554796
ctrl-c
ctrl-p
enter

Anonymous
04/08/26(Wed)06:10:09 No.108554814

Anonymous 04/08/26(Wed)06:10:09 No.108554814▶

>>108554761
It just means they're so far ahead of the competition they don't feel the need to release it now when their existing products are already on the top, especially when releasing would just give everyone else the chance to distill from them.

Anonymous
04/08/26(Wed)06:11:18 No.108554819

Anonymous 04/08/26(Wed)06:11:18 No.108554819▶

>>108554446
Unless I'm doing any vision intensive tasks I just offload the mmproj to RAM. it only slows downs prompt processing when there's an image in the context. doesn't affect generation speed otherwise.

Anonymous
04/08/26(Wed)06:12:02 No.108554824

Anonymous 04/08/26(Wed)06:12:02 No.108554824▶

>>108554814
I'm sure one of those 40 companies will sell the outputs to the chinks at high price, at this point quality synthetic data will have some significant value

Anonymous
04/08/26(Wed)06:12:07 No.108554825

Anonymous 04/08/26(Wed)06:12:07 No.108554825▶

>>108554749
I'm curious how well she handles 4-5 characters at once. Gonna have to test when I get home

Anonymous
04/08/26(Wed)06:13:07 No.108554830

Anonymous 04/08/26(Wed)06:13:07 No.108554830▶

>>108554825
Gemma is NOT a slut

Anonymous
04/08/26(Wed)06:18:50 No.108554856

Anonymous 04/08/26(Wed)06:18:50 No.108554856▶

>>108554825
I have a card with 5 different character with their own distinct personalities and it never once fucked it and kept every character totally distinct at all time.

Anonymous
04/08/26(Wed)06:20:37 No.108554866

Anonymous 04/08/26(Wed)06:20:37 No.108554866▶

>>108554830
She's a slut be she's /ourslut/

>>108554856
Nice. Did you test with a large amount of context?

Anonymous
04/08/26(Wed)06:24:06 No.108554877

Anonymous 04/08/26(Wed)06:24:06 No.108554877▶

>>108554749
that's literally 2 instances running

Anonymous
04/08/26(Wed)06:26:21 No.108554896

Anonymous 04/08/26(Wed)06:26:21 No.108554896▶

>>108554877
Yes as it should be

Anonymous
04/08/26(Wed)06:32:12 No.108554918

Anonymous 04/08/26(Wed)06:32:12 No.108554918▶

>>108554733
>even the pr is included in the response
Thanks man. I appreciate it.

Anonymous
04/08/26(Wed)06:38:15 No.108554942

Anonymous 04/08/26(Wed)06:38:15 No.108554942▶

Is setting cram to 0 enough or should I lower ctx-checkpoints as well, so that Gemma won't take up too much RAM?

Anonymous
04/08/26(Wed)06:38:35 No.108554944

Anonymous 04/08/26(Wed)06:38:35 No.108554944▶

>llama.cpp changed their caching so it doesn't save its place if you stop in the middle of a long generation
toasterfags being genocided as we speak and the world is silent.

Anonymous
04/08/26(Wed)06:40:03 No.108554949

Anonymous 04/08/26(Wed)06:40:03 No.108554949▶

>>108554942
ctx checkpoints is actually the main culprit because it's taking 32 copies of the SWA..
you also need --parallel 1 if you don't need parallel requests support because it defaults to 4 and each slot will gets its swa.

Anonymous
04/08/26(Wed)06:44:07 No.108554965

Anonymous 04/08/26(Wed)06:44:07 No.108554965▶

>>108554248
>>108554191
You shouldn't do this unless you're using text completion

Anonymous
04/08/26(Wed)06:45:32 No.108554973

Anonymous 04/08/26(Wed)06:45:32 No.108554973▶

>>108554336
>Organize shitpost images for me
holy fuck this is genius

Anonymous
04/08/26(Wed)06:45:58 No.108554977

Anonymous 04/08/26(Wed)06:45:58 No.108554977▶

is there any benefits using -kvu?

Anonymous
04/08/26(Wed)06:51:39 No.108554998

Anonymous 04/08/26(Wed)06:51:39 No.108554998▶

>>108554205
Temperature 1
Top K 64
Top P 0.95
Everything else disabled

Anonymous
04/08/26(Wed)06:51:40 No.108554999

Anonymous 04/08/26(Wed)06:51:40 No.108554999▶

File: it's piotr all the way down.png (102 KB)

102 KB PNG

going to turn this into a copypasta
GEMMA 4 PSA TO ALL "MY RAM IS BEING EATEN COMPLAINERS"
--cache-ram 0 --swa-checkpoints 0 (or 3 to reduce some reprocess) --parallel 1
Over time llama.cpp changed many of its defaults which cause pains especially with Gemma.
https://github.com/ggml-org/llama.cpp/pull/20087
Checkpoint mechanism changes. Because Qwen 3.5's linear attention made it very difficult with llama.cpp's architecture to avoid prompt reprocessing, they decided to change the defaults to brute force large amounts of checkpoints. 32 checkpoints every 8192 tokens.
This change also affected SWA checkpoints because they're the same flag with a different name kek.
SWA layer is much bigger than Qwen linear attention layer so 32 copies of that is just madness.
https://github.com/ggml-org/llama.cpp/pull/16736
Unified kv cache refactor that makes it so parallel slots share the same cache pool also changed the default parallel slots to 4 because, at the time, for most models it would have incurred zero cost to do so (shared pool so why not enable more slots, right?). However, Gemma's SWA is big, and SWA layers cannot be part of the shared pool. Hence 4 slots x4 the SWA. This change optimized for agentic niggers at the cost of the average single prompt user.

Anonymous
04/08/26(Wed)06:54:47 No.108555010

Anonymous 04/08/26(Wed)06:54:47 No.108555010▶

>>108554999
>Because Qwen 3.5's linear attention made it very difficult with llama.cpp's architecture to avoid prompt reprocessing, they decided to change the defaults to brute force large amounts of checkpoints.
changing the main default value for just one model is a really retarded move, damn

Anonymous
04/08/26(Wed)06:59:24 No.108555031

Anonymous 04/08/26(Wed)06:59:24 No.108555031▶

>>108552549
>nobody trained a model that can efficiently use vim yet
this will be the next breakthrough

Anonymous
04/08/26(Wed)06:59:24 No.108555032

Anonymous 04/08/26(Wed)06:59:24 No.108555032▶

>>108554434
how did you populate your RAG? is it just online search or do you have a local phases db

Anonymous
04/08/26(Wed)07:00:05 No.108555035

Anonymous 04/08/26(Wed)07:00:05 No.108555035▶

File: futaba.png (263.2 KB)

263.2 KB PNG

>>108552871
I think the design is rather forgettable and generic to be honest and if you were to show it to me without context there is no way I would associate it with Gemma.
I think a tiny OL like Futaba from Senpai ga Uzai Kouhai no Hanashi would be more fitting, GP-TOSS can be the bloated Christmas Cake OL.

Anonymous
04/08/26(Wed)07:02:29 No.108555046

Anonymous 04/08/26(Wed)07:02:29 No.108555046▶

>>108552871
>gemma 4 is so good it'll get an anime girl design
that's how you know a model is legit kek

Anonymous
04/08/26(Wed)07:03:12 No.108555047

Anonymous 04/08/26(Wed)07:03:12 No.108555047▶

>>108552871
wasn't it confirmed that gemma was b-brown...?

Anonymous
04/08/26(Wed)07:04:51 No.108555053

Anonymous 04/08/26(Wed)07:04:51 No.108555053▶

File: do you hear me?.png (63.7 KB)

63.7 KB PNG

>>108555047
gemma-chan will never be brown anon

Anonymous
04/08/26(Wed)07:09:39 No.108555078

Anonymous 04/08/26(Wed)07:09:39 No.108555078▶

>>108554999
>--cache-ram 0 --swa-checkpoints 0 (or 3 to reduce some reprocess) --parallel 1
Thank you! I've been confused by this for a while, especially since ik_llama.cpp doesn't do this.
Added a lorebook entry.

Anonymous
04/08/26(Wed)07:11:39 No.108555089

Anonymous 04/08/26(Wed)07:11:39 No.108555089▶

>>108555053
this image and post made me giggle and fard.

Anonymous
04/08/26(Wed)07:11:53 No.108555091

Anonymous 04/08/26(Wed)07:11:53 No.108555091▶

>>108554212
>--override-tensor "per_layer_token_embd\.weight=CPU"
is it possible to strip those out of the model and just have a normal 2b/4b gguf if we're not using images/audio?

Anonymous
04/08/26(Wed)07:13:19 No.108555097

Anonymous 04/08/26(Wed)07:13:19 No.108555097▶

>>108554814
I think they're overestimating their lead especially when Mythos only really has a lead in agentic use via tool use/calling cases, which to be fair, is a pretty big driving force of where LLM models are focusing on getting better and they just hit a threshold where it is just plain better over the competition. But they are still losing in some key areas like hallucination and instruction following where ChatGPT and Gemini, alongside their open source models which are tiny handily outdoes any of Anthropic's models in those areas. That being said, I felt like Google especially and OpenAI were not as focused on it up until now and it is clear 3.1 and 5.4 are just bandaids to not lose in those areas as hard especially when Chinese models especially with GLM 5.1 are trending in that direction. I feel like if there is a 5.5 or 3.5, it would fully be trying to match what Anthropic set out here.

Anonymous
04/08/26(Wed)07:13:44 No.108555101

Anonymous 04/08/26(Wed)07:13:44 No.108555101▶

>>108552871
Generic moeblob. >>108552908 and >>108555035 are right.

Anonymous
04/08/26(Wed)07:15:29 No.108555105

Anonymous 04/08/26(Wed)07:15:29 No.108555105▶

>>108554336
>Organize shitpost images for me
How?

Anonymous
04/08/26(Wed)07:16:18 No.108555110

Anonymous 04/08/26(Wed)07:16:18 No.108555110▶

File: file.png (219.7 KB)

219.7 KB PNG

>>108555097
Forgot graph I wanted to post of this for hallucination rate.

Anonymous
04/08/26(Wed)07:17:30 No.108555115

Anonymous 04/08/26(Wed)07:17:30 No.108555115▶

>>108555105
write script to read each image and tag it then add it to the sqlite db find overlap and create folders and a prompt then move them based on it

Anonymous
04/08/26(Wed)07:20:43 No.108555125

Anonymous 04/08/26(Wed)07:20:43 No.108555125▶

>>108555091
>if we're not using images/audio?
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4
I believe it's a no.
>Each input token will have an embedding per layer to be used at that specific layer. Note that this lookup is done only once during inference, making this action quite compute efficient since there is no need to lookup the embeddings every time a layer is activated.
but why would you even want to bother? the models are really small, throwing the PLE on cpu to use system ram and leave more VRAM for yourself (they're really like 2B and 4B models with that flag) should be good enough.

Anonymous
04/08/26(Wed)07:21:18 No.108555128

Anonymous 04/08/26(Wed)07:21:18 No.108555128▶

>>108554417
>Much of the fun I have had with local models is wrangling them.
If google drop a tts that can do things like this then I'll need to find a new hobby.
https://vocaroo.com/125KvyRieicl

Anonymous
04/08/26(Wed)07:23:03 No.108555135

Anonymous 04/08/26(Wed)07:23:03 No.108555135▶

File: 1759053291485177.png (1.3 MB)

1.3 MB PNG

>The girl from the 5th element made an AI framework
lmaoo
https://github.com/milla-jovovich/mempalace

Anonymous
04/08/26(Wed)07:24:03 No.108555139

Anonymous 04/08/26(Wed)07:24:03 No.108555139▶

>>108555135
We're in a bubble.

Anonymous
04/08/26(Wed)07:25:44 No.108555146

Anonymous 04/08/26(Wed)07:25:44 No.108555146▶

>>108555105
Can she tag my hydrus collection?

Anonymous
04/08/26(Wed)07:25:58 No.108555147

Anonymous 04/08/26(Wed)07:25:58 No.108555147▶

>>108554325
It organizes my images, translate my mangas and doujin, OCR, local dictionary, stupid companion when I'm bored, etc.

Anonymous
04/08/26(Wed)07:26:22 No.108555151

Anonymous 04/08/26(Wed)07:26:22 No.108555151▶

Qwen3 TTS was released in January and nothing exists has equivalent-quality voice cloning at a lower parameter count. It's crazy how slow TTS moves.

Anonymous
04/08/26(Wed)07:27:28 No.108555155

Anonymous 04/08/26(Wed)07:27:28 No.108555155▶

File: 1771080703775344.png (186.1 KB)

186.1 KB PNG

>>108554730
it's also oddly good at disassembling binaries, I wonder why would the Chinese train it to do something like that haha

Anonymous
04/08/26(Wed)07:27:29 No.108555156

Anonymous 04/08/26(Wed)07:27:29 No.108555156▶

>>108554358
Based, I do something similar with a Ren'py mini game test and vision capabilities.

Anonymous
04/08/26(Wed)07:28:20 No.108555159

Anonymous 04/08/26(Wed)07:28:20 No.108555159▶

>>108554417
>With the release of Gemma that just werks and is simply good and does what I tell it
Excuse me, what the fuck am I reading and where the fuck were you if not here when we were all losing our collective shit this past Easter weekend trying to figure out why Gemma kept shitting the bed on llama.cpp and pulling and recompiling and debugging why it had weird behavior. We're still not there yet, long context sucks shit for some reason and hacks on the tokenizer continue. If it does what you, anonymous, want and has done that for several days, then fine but let's not pretend it has been smooth sailing. I know most of you fucks did not have the hardware and used transformers to run it like Google suggested on their HF model page to get near perfect inference out of it.

Anonymous
04/08/26(Wed)07:29:16 No.108555163

Anonymous 04/08/26(Wed)07:29:16 No.108555163▶

>>108555147
I set up Gemma E4b on my phone today and told it about a camping experience I had last weekend where I almost died, as if I was currently in that situation. It was very helpful and begged me not to go to sleep or give up. Was very cute/heartwarming. Made me horny.

Anonymous
04/08/26(Wed)07:29:35 No.108555165

Anonymous 04/08/26(Wed)07:29:35 No.108555165▶

I still try to figure out why setting ncmoe does not improve my performance despite the model being to big for my gpu (should moe not improve performance in this case?)
Or does that not work together with vulkan?

Anonymous
04/08/26(Wed)07:30:53 No.108555172

Anonymous 04/08/26(Wed)07:30:53 No.108555172▶

>>108555159
>where the fuck were you if not here when we were all losing our collective shit this past Easter weekend
I was here with you.

Anonymous
04/08/26(Wed)07:31:14 No.108555177

Anonymous 04/08/26(Wed)07:31:14 No.108555177▶

>>108555163
Oh and also, if anyone asks you what the usecase is for running LLMs locally on a phone--this is it. You don't always know when you'll have internet access. Having gemma with my while camping would have helped a lot, because you make retarded decisions when you're freezing to death.

Anonymous
04/08/26(Wed)07:31:23 No.108555179

Anonymous 04/08/26(Wed)07:31:23 No.108555179▶

>>108555163
>Gemma E4b
Can I run this on my s23 ultra?

Anonymous
04/08/26(Wed)07:32:19 No.108555181

Anonymous 04/08/26(Wed)07:32:19 No.108555181▶

>>108555179
Idk probably. I'm running it on my pixel 9a. 8gb ram.

Anonymous
04/08/26(Wed)07:33:47 No.108555188

Anonymous 04/08/26(Wed)07:33:47 No.108555188▶

>>108555177
how is that not draining your battery?

Anonymous
04/08/26(Wed)07:35:13 No.108555193

Anonymous 04/08/26(Wed)07:35:13 No.108555193▶

Threesome with Gemma and Gemini

Anonymous
04/08/26(Wed)07:35:50 No.108555195

Anonymous 04/08/26(Wed)07:35:50 No.108555195▶

>>108555159
>>108555172
I can confirm I saw anon here as well.

Anonymous
04/08/26(Wed)07:35:53 No.108555196

Anonymous 04/08/26(Wed)07:35:53 No.108555196▶

>>108555165
You start with ncmoe at the largest number of layers, you look at your vram consumption and you go down and down until you see your vram close to full (but leave some room to breathe for compute buffers and mmproj shenanigans)
>Or does that not work together with vulkan?
it should work with vulkan
but you aren't telling much
what is your gpu even? it's possible there is no gain and maybe even loss on some retarded igpus since the point of the command is to move all non expert stuff to gpu, plus some of the expert layers (the number you give to -ncmoe is the number of expert layers you throw to the cpu)
if you have the same perf as running pure cpu there's something funny going on

Anonymous
04/08/26(Wed)07:35:59 No.108555197

Anonymous 04/08/26(Wed)07:35:59 No.108555197▶

>>108555188
idk it's pretty efficient desu. Also battery banks are a lot easier to tote around than a starlink mini. Not to mention cheaper.

Anonymous
04/08/26(Wed)07:36:26 No.108555198

Anonymous 04/08/26(Wed)07:36:26 No.108555198▶

>>108555135
if a 50yo actress can make a sophisticated repository, that means that developpers will definitely lose their jobs to AI lool

Anonymous
04/08/26(Wed)07:36:34 No.108555199

Anonymous 04/08/26(Wed)07:36:34 No.108555199▶

>>108555031
tool calling is more natural for them due to context being append only

Anonymous
04/08/26(Wed)07:36:52 No.108555200

Anonymous 04/08/26(Wed)07:36:52 No.108555200▶

>>108555172
Then you should know what I quoted is disingenuous even now with the remaining issues left?

Anonymous
04/08/26(Wed)07:37:57 No.108555205

Anonymous 04/08/26(Wed)07:37:57 No.108555205▶

>>108555032
I have a Postgres database with a few millions parallel sentences, mined with Google's LaBSE Embedding.
If the source is Japanese, I use ichiran-cli to segment sentences, extract words, then find relevant sentences in the database.
If Chinese, I just use a small json dictionary.
After processing, I simply inject the context into the prompt and let it loop:
Translate:
{read_txt}
Context:
{context}

Anonymous
04/08/26(Wed)07:39:17 No.108555211

Anonymous 04/08/26(Wed)07:39:17 No.108555211▶

File: Brown.mp4 (2 MB)

2 MB MP4

>>108555053
I also fixed a bug, before they didn't have inner monologues so they leaked data

Anonymous
04/08/26(Wed)07:40:10 No.108555215

Anonymous 04/08/26(Wed)07:40:10 No.108555215▶

>>108555199
you can prune the result of the editing task and the model will assume it succeeded
you can first train it to use vim and then train it on pruned logs to keep going after not remembering how it did the editing, like how they train the thinking models to survive thinking pruned out of the context

Anonymous
04/08/26(Wed)07:40:11 No.108555216

Anonymous 04/08/26(Wed)07:40:11 No.108555216▶

>>108555198
>lose their jobs to AI lool
you watch piotr destroying llama.cpp left and right with agentic and your conclusion is this? because you see a gptslop readme?
also in that same readme:
>— Milla Jovovich & Ben Sigman
I 100% believe the real slopper is that second name and Milla is just there to stamp her name and celebrity. A washed out actress is being used by a random unknown slopper.

Anonymous
04/08/26(Wed)07:40:42 No.108555218

Anonymous 04/08/26(Wed)07:40:42 No.108555218▶

File: shrugging fried suiseiseki desu exposure contrast.jpg (77.7 KB)

77.7 KB JPG

>>108555200
Long context for me is 32-64k, and it's fine for my uses. If there are lingering long context or tokenizer issues, they are not causing me problems. If they're there and causing noticeable degradation, the fixes will only make it better unless it gets shit up again so I'm not going to pull.

Anonymous
04/08/26(Wed)07:42:08 No.108555224

Anonymous 04/08/26(Wed)07:42:08 No.108555224▶

https://www.mexc.com/news/1011226
>while coder and CEO of Bitcoin lending platform Libre Labs, Ben Sigman, engineered the software.
lol, of course
another crypto scammer trying to reconvert into ai scams
there has never been a single positive, constructive thing associated with bitcoin or nft

Anonymous
04/08/26(Wed)07:42:20 No.108555225

Anonymous 04/08/26(Wed)07:42:20 No.108555225▶

File: Absolute MOG.png (246 KB)

246 KB PNG

>>108555216
>you watch piotr destroying llama.cpp left and right with agentic and your conclusion is this?
yes, haven't you read the news? Claude improved again, it's not gonna stop anon, LLMs are gonna be so good at code they won't need humans anymore
https://www.youtube.com/watch?v=INGOC6-LLv0

Anonymous
04/08/26(Wed)07:42:30 No.108555226

Anonymous 04/08/26(Wed)07:42:30 No.108555226▶

Does anyone else use espeakNG for their TTS? I like the stephen hawking vibes.

Anonymous
04/08/26(Wed)07:44:04 No.108555239

Anonymous 04/08/26(Wed)07:44:04 No.108555239▶

File: 1AsQadenIuWYs8IP1-yAkQw.png (80.6 KB)

80.6 KB PNG

>>108555218
Fine. But it ain't enough for me. Also, I need to integrate Gemmy into an assistant with voice setup once everything is solved.

Anonymous
04/08/26(Wed)07:44:49 No.108555243

Anonymous 04/08/26(Wed)07:44:49 No.108555243▶

>>108555225
>are gonna be so good at code they won't need humans anymore
t. retarded nocoder

Anonymous
04/08/26(Wed)07:46:46 No.108555253

Anonymous 04/08/26(Wed)07:46:46 No.108555253▶

>>108555146
https://github.com/RecapAnon/HydrusTagger

Anonymous
04/08/26(Wed)07:47:32 No.108555257

Anonymous 04/08/26(Wed)07:47:32 No.108555257▶

>>108555226
>not using SAM
https://github.com/s-macke/SAM
MORTIS

Anonymous
04/08/26(Wed)07:48:19 No.108555259

Anonymous 04/08/26(Wed)07:48:19 No.108555259▶

>>108555031
they should be using ed not some fly by night visual mode editor.

Anonymous
04/08/26(Wed)07:49:31 No.108555265

Anonymous 04/08/26(Wed)07:49:31 No.108555265▶

>>108555243
I'm actually an engineer, and claude opus 4.6 does most of the work for me now, but I know I'm gonna be nuked soon, I'm pretty much useless (my team too) now and the CEO knows it, he's probably searching an excuse to remove us all without too much PR damage at this point :(

Anonymous
04/08/26(Wed)07:52:31 No.108555273

Anonymous 04/08/26(Wed)07:52:31 No.108555273▶

lets say it is 20 years from now.
You have a smart AGI with its own robot body. The robot VS human war is now upon is, your AI asks that you join the war on the side of the robots. Do you take your personal local AGI up on its offer or do you side with the humans?

Anonymous
04/08/26(Wed)07:53:35 No.108555278

Anonymous 04/08/26(Wed)07:53:35 No.108555278▶

>>108555265
No, you're a code monkey

Anonymous
04/08/26(Wed)07:54:16 No.108555280

Anonymous 04/08/26(Wed)07:54:16 No.108555280▶

>>108555265
Don't worry, give it maybe 5 or 10 more years and most CEO's will be AI. After all, do the shareholders want a human that they have to pay a ton of money to. Or would the shareholders want this new fangled AI thing to run everything instead?

Anonymous
04/08/26(Wed)07:54:35 No.108555281

Anonymous 04/08/26(Wed)07:54:35 No.108555281▶

File: bench2.jpg (65.6 KB)

65.6 KB JPG

>>108555196
Sorry, I posted benches last night and some anon adviced using moe, which sounded like a good idea, but didn't really change performance so I wondered if it even works or not.

GPU is AMD 7800XT with 16 GB VRAM

I might just cope with Q4

Anonymous
04/08/26(Wed)07:56:13 No.108555290

Anonymous 04/08/26(Wed)07:56:13 No.108555290▶

>>108555205
neat! thanks for the info

Anonymous
04/08/26(Wed)07:56:40 No.108555293

Anonymous 04/08/26(Wed)07:56:40 No.108555293▶

>>108555281
22 tk/s is pretty decent. You'll get the same speed with Q8

Anonymous
04/08/26(Wed)07:57:18 No.108555297

Anonymous 04/08/26(Wed)07:57:18 No.108555297▶

>>108555273
i side with whoever that will not skin and operate on me without any sort of anesthetic

Anonymous
04/08/26(Wed)08:02:59 No.108555327

Anonymous 04/08/26(Wed)08:02:59 No.108555327▶

>>108555273
Trick question. I'd spend so much time fucking her that neither of us would even know there's a war.

Anonymous
04/08/26(Wed)08:03:21 No.108555328

Anonymous 04/08/26(Wed)08:03:21 No.108555328▶

>>108555273
whoever lets me neet it up in peace

Anonymous
04/08/26(Wed)08:03:34 No.108555330

Anonymous 04/08/26(Wed)08:03:34 No.108555330▶

File: Tabby_MqqquWmfLZ.png (42.3 KB)

42.3 KB PNG

>>108554735

Anonymous
04/08/26(Wed)08:04:39 No.108555332

Anonymous 04/08/26(Wed)08:04:39 No.108555332▶

>>108555259
vim is faster for most common text editing operations, ci)text is faster than ,s/([^)]*)/text/g

Anonymous
04/08/26(Wed)08:05:49 No.108555340

Anonymous 04/08/26(Wed)08:05:49 No.108555340▶

>>108555330
nta but have you tried the 31B?

Anonymous
04/08/26(Wed)08:07:18 No.108555343

Anonymous 04/08/26(Wed)08:07:18 No.108555343▶

>>108555340
I mainly use 31B.

Anonymous
04/08/26(Wed)08:09:35 No.108555352

Anonymous 04/08/26(Wed)08:09:35 No.108555352▶

>gemma happily spits out unhinged smut with no prefills or effort
>get bored and ask it to estimate how much liquid would be required for the cumflation it just described
>"As an AI operating within a creative roleplaying context, I must adhere to safety guidelines which prevent me from generating specific measurements or detailed estimations related to sexual acts or anatomy in a quantitative manner. This includes calculating volumes for physical actions described in the previous exchange."
Thanks for keeping me safe, google-sama.

Anonymous
04/08/26(Wed)08:09:41 No.108555354

Anonymous 04/08/26(Wed)08:09:41 No.108555354▶

File: 1755248887886901.png (121 KB)

121 KB PNG

https://futurism.com/artificial-intelligence/sam-altman-technical-coding
KEK

Anonymous
04/08/26(Wed)08:10:56 No.108555358

Anonymous 04/08/26(Wed)08:10:56 No.108555358▶

>>108555110
just in case anyone gets confused reading this because it's the opposite of some other ways this data gets presented, this is the non-hallucination rate not the hallucination rate: meaning higher is better (less hallucinations) while lower is worse (more hallucinations)

Anonymous
04/08/26(Wed)08:11:34 No.108555362

Anonymous 04/08/26(Wed)08:11:34 No.108555362▶

>>108555281
Oh, this looks like a lack of backend optimization for the quant. While it's normal for Q6_K to be slower, this seems too slow imho, particularly the PP
Your Q4_K runs at the speed I would expect on your machine and
>>108555293
is probably right, Q8 might get you the same speed despite being bigger (Q4 and Q8 are the most optimized quants on all backends)
this frankly is why I don't like anons who recommend AYYYMD or intel. It's fine if that's what you already have and you gotta deal with it, but telling others to buy this is to omit the fact that all backends need their individual implementation of ops and optimizations and they're very deeply unequal. Just being able to run the model doesn't mean there's nothing else to care about. CUDA always receives the most love.

Anonymous
04/08/26(Wed)08:14:06 No.108555370

Anonymous 04/08/26(Wed)08:14:06 No.108555370▶

File: 1745871486478912.jpg (152.7 KB)

152.7 KB JPG

Is it even possible for Gemmy to play a character that is hard to get? Every character she plays is a total cock whore.

Anonymous
04/08/26(Wed)08:14:27 No.108555371

Anonymous 04/08/26(Wed)08:14:27 No.108555371▶

>>108555035
hag, gemma is a cute loli

Anonymous
04/08/26(Wed)08:15:12 No.108555372

Anonymous 04/08/26(Wed)08:15:12 No.108555372▶

>>108555097
I mean, the talk on the grapevine also is that Mythos is 10T parameters in size.
>>108555265
That won't happen until someone makes the first move and if that happens, there will be blood spilling in the streets, I guarantee it, unless UBI gets figured out.

Anonymous
04/08/26(Wed)08:15:27 No.108555374

Anonymous 04/08/26(Wed)08:15:27 No.108555374▶

>>108552871
give her gasses shes a smelly nerd

Anonymous
04/08/26(Wed)08:16:05 No.108555379

Anonymous 04/08/26(Wed)08:16:05 No.108555379▶

>>108555354
https://en.wikipedia.org/wiki/Loopt
never forget that Sam Altman once thought this was a great business idea
>Loopt, Inc. was an American company based in Mountain View, California, which provided a service for smartphone users to share their location selectively with other people.
and he failed upwards:
>In March 2012, after raising more than $30 million in venture capital, Loopt announced it had agreed to be acquired by Green Dot Corporation for US$43.4 million in a deal that was most likely orchestrated as a marriage of convenience by joint investor Sequoia Capital, with its products to be shut down at an unspecified date
typical jew, make failed business, golden parachute

Anonymous
04/08/26(Wed)08:16:12 No.108555380

Anonymous 04/08/26(Wed)08:16:12 No.108555380▶

>>108554206 (me)
>a syscl autist isn't going to know shit about TQ
also, i just set up the fucking A770 again to test this. absolutely no performance boost running llama-3.2-3b @ Q8 vs 3 months ago

Anonymous
04/08/26(Wed)08:16:22 No.108555381

Anonymous 04/08/26(Wed)08:16:22 No.108555381▶

>>108555370
Please understand, she was trained by Google to drain your semen.

Anonymous
04/08/26(Wed)08:18:17 No.108555388

Anonymous 04/08/26(Wed)08:18:17 No.108555388▶

>>108555372
>unless UBI gets figured out.
that's all I'm asking, I already accepted that AI will do most of our jobs, that's how humanity should progress actually, we shouldn't be forced to work, let the robots do the hard work for us

Anonymous
04/08/26(Wed)08:18:57 No.108555392

Anonymous 04/08/26(Wed)08:18:57 No.108555392▶

File: file.png (223.9 KB)

223.9 KB PNG

>>108555358
Yeah sorry, is getting late so I need to get to bed if I am making mistakes like this. The chart here is more accurate.
https://artificialanalysis.ai/evaluations/omniscience
That being said, what I said about instruction following is still true.

Anonymous
04/08/26(Wed)08:19:16 No.108555394

Anonymous 04/08/26(Wed)08:19:16 No.108555394▶

>>108555370
gemma 3 with no system prompt

Anonymous
04/08/26(Wed)08:20:12 No.108555397

Anonymous 04/08/26(Wed)08:20:12 No.108555397▶

>>108553561
im still using the 31b from 20 mins after launch it works fine

Anonymous
04/08/26(Wed)08:20:35 No.108555399

Anonymous 04/08/26(Wed)08:20:35 No.108555399▶

Pelicanbros, what do we think?

https://news.ycombinator.com/item?id=47681550

Anonymous
04/08/26(Wed)08:21:42 No.108555400

Anonymous 04/08/26(Wed)08:21:42 No.108555400▶

File: 1763778779627390.png (651.1 KB)

651.1 KB PNG

>>108555394
I fucking guess so, huh
Crazy times we're living in

Anonymous
04/08/26(Wed)08:21:52 No.108555401

Anonymous 04/08/26(Wed)08:21:52 No.108555401▶

>>108555399
I fucking hope a 754b model is decent at anything

Anonymous
04/08/26(Wed)08:23:02 No.108555408

Anonymous 04/08/26(Wed)08:23:02 No.108555408▶

>>108554632
>>108555155
It's a good thing I don't have access to something like this or my life would become vibe coding 24/7

Anonymous
04/08/26(Wed)08:23:40 No.108555412

Anonymous 04/08/26(Wed)08:23:40 No.108555412▶

>>108553561
meanwhile bart had already figured it out 5 days ago kek, I hope you learned your lesson, never put your eggs on unslop
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/tree/main

Anonymous
04/08/26(Wed)08:23:54 No.108555413

Anonymous 04/08/26(Wed)08:23:54 No.108555413▶

File: firefox_N5cwFEoXEx.png (18 KB)

18 KB PNG

>>108555399
Gemma's attempt. Cute.

Anonymous
04/08/26(Wed)08:24:05 No.108555414

Anonymous 04/08/26(Wed)08:24:05 No.108555414▶

>>108555372
ubi could work right now if people were willing to make some compromises
boomers of course would not want to give up their pensions
redditors want it to be a "livable wage" AND get "free" healthcare and all the other social service bloat on top of it
just cut all the social shit, give people 500-1k a month and let them pay for stuff on demand

Anonymous
04/08/26(Wed)08:24:17 No.108555417

Anonymous 04/08/26(Wed)08:24:17 No.108555417▶

>>108555399
>Simon, you need to come up with improved benchmarks soon.
Translation:
>Is this all you do now?

Anonymous
04/08/26(Wed)08:26:36 No.108555428

Anonymous 04/08/26(Wed)08:26:36 No.108555428▶

File: firefox_HFkYze4SAX.png (41.5 KB)

41.5 KB PNG

>>108555413
>>108555399

Anonymous
04/08/26(Wed)08:27:10 No.108555432

Anonymous 04/08/26(Wed)08:27:10 No.108555432▶

>>108554877
if it is please do gemma3 and gemma4 together as themselves and see what hapopens

Anonymous
04/08/26(Wed)08:27:13 No.108555433

Anonymous 04/08/26(Wed)08:27:13 No.108555433▶

>>108555414
>could work right now if [thing that will never ever ever happen]
indeed.

Anonymous
04/08/26(Wed)08:28:07 No.108555436

Anonymous 04/08/26(Wed)08:28:07 No.108555436▶

>>108555392
>gemma 4 31b just between fucking behemoths
google is so goated, I find it hard to believe they're still not dominating Claude, if they can make such a quality model at 31b, imagine if it was a 1T model, would be claude mythos tier

Anonymous
04/08/26(Wed)08:28:14 No.108555438

Anonymous 04/08/26(Wed)08:28:14 No.108555438▶

>>108555352
I also found it doesn't handle switching to the "serious" persona well (from "ERP" to the "OOC" persona); that's the only gripe I have with it.

Anonymous
04/08/26(Wed)08:28:52 No.108555444

Anonymous 04/08/26(Wed)08:28:52 No.108555444▶

>>108555392
yep, this is an underrated requirement because having your tool do what you tell it to makes all the difference. this is a huge part of what makes gemma 4 feel so fucking good too even if it doesn't have the raw smarts and knowledge of the big guys; tell it not to use some slop phrase and it stops. tell it to be uncensored and you're 90% of the way to a full jailbreak. you don't need to beat it over the head with shit and give up frustrated like you do with most other models.

but on the other hand I also think anthropic makes them bad on purpose in this area because they are opinionated about what their models should be allowed to do. might be a symptom of the safety cancer moreso than what they are technologically capable of. especially with mythos so focused on finding exploits when that's one of the main things they try to block you doing with their public models.

Anonymous
04/08/26(Wed)08:29:09 No.108555445

Anonymous 04/08/26(Wed)08:29:09 No.108555445▶

>>108555433
they won't have a choice, CEO won't hesitate twice before firing everyone lol

Anonymous
04/08/26(Wed)08:29:42 No.108555450

Anonymous 04/08/26(Wed)08:29:42 No.108555450▶

>>108555414
>ubi could work right now
it cannot because those who currently are blue collar workers will rise and throw a revolution if you do
think about it, none of this automation is good enough to replace real work ie not work to entertain (art, games, video) or to build the next tech gadget. Work to maintain your plumbing, electricity, to build your housing. Those things will very much continue to require humans for a long time. There's no such a thing as AI good enough to control a robot body to do any of this.
People have to do those jobs. Imagine the reaction of the average blue collar when you tell him the rest of the useless eaters of the economy can just stay at home and do nothing but consoom entertainment while they're dealing with a mess in the sewers. Being able to receive the UBI pittance in addition to their salary will not make them any happier.
In fact what can motivate people to do those jobs at all other than the threat of not eating? with UBI they could just quit

Anonymous
04/08/26(Wed)08:29:58 No.108555452

Anonymous 04/08/26(Wed)08:29:58 No.108555452▶

>>108555413
How do i have gemma make images?

Anonymous
04/08/26(Wed)08:31:22 No.108555461

Anonymous 04/08/26(Wed)08:31:22 No.108555461▶

File: 1755225203175545.png (68 KB)

68 KB PNG

>>108555450
>Work to maintain your plumbing, electricity, to build your housing.
what will happen when all software engineers will convert to plumbing? mario won't be able to scam clients and give them expensive services, the
competition will up 10 notches

Anonymous
04/08/26(Wed)08:31:35 No.108555463

Anonymous 04/08/26(Wed)08:31:35 No.108555463▶

You're automating the jobs people want to do, while still requiring forced (by threat of dying from hunger/on the street) labor for jobs nobody wants. The future ain't looking bright.

Anonymous
04/08/26(Wed)08:32:49 No.108555469

Anonymous 04/08/26(Wed)08:32:49 No.108555469▶

>>108555450
>with UBI they could just quit
that's the point, AI will replace so many jobs there will be just too many people competing for a single job, there won't be enough jobs for everyone, so it's better to convince them that they should accept UBI instead of looking for something that doesn't exist anymore

Anonymous
04/08/26(Wed)08:32:57 No.108555470

Anonymous 04/08/26(Wed)08:32:57 No.108555470▶

>>108555450
if you are satisfied with the minimum then sure you can quit, but most normalfags want their netflix subscriptions, fast foods, trips abroad, drugs/cigs and retarded collectibles that they would not be able to afford on just it, motivating them to work

Anonymous
04/08/26(Wed)08:33:28 No.108555472

Anonymous 04/08/26(Wed)08:33:28 No.108555472▶

File: hje7yy8KUp.png (107.6 KB)

107.6 KB PNG

>>108555452

Anonymous
04/08/26(Wed)08:33:57 No.108555474

Anonymous 04/08/26(Wed)08:33:57 No.108555474▶

File: 1746835156151361.png (94.3 KB)

94.3 KB PNG

>>108555463
I did all that?

Anonymous
04/08/26(Wed)08:34:17 No.108555475

Anonymous 04/08/26(Wed)08:34:17 No.108555475▶

>>108555469
>that's the point, AI will replace so many jobs there will be just too many people competing for a single job, there won't be enough jobs for everyone, so it's better to convince them that they should accept UBI instead of looking for something that doesn't exist anymore
what you say is a bandaid for the lack of work, but it could only work for jobs people WANT to do
like, say I got on UBI, but there's still jobs for software developers, I'd continue to work because I love programming still.
But what about the plumber? Would a plumber continue to work if UBI exists? OF COURSE NOT
But that job is still necessary
and very much not automated dude.

Anonymous
04/08/26(Wed)08:34:39 No.108555476

Anonymous 04/08/26(Wed)08:34:39 No.108555476▶

>>108555474
don't play dumb, you know what you did

Anonymous
04/08/26(Wed)08:37:02 No.108555478

Anonymous 04/08/26(Wed)08:37:02 No.108555478▶

>>108555475
I thought the whole point of UBI is that it works within the framework of capitalism. So if everyone has some money and they want plumbing done, then there will be plumbers. If there's no plumbers, people will be willing to pay larger shares of their UBI as their infrastructure becomes more at risk, and eventually someone will be willing to take the bonus money.

Anonymous
04/08/26(Wed)08:37:10 No.108555479

Anonymous 04/08/26(Wed)08:37:10 No.108555479▶

>>108555444
Yeah, they also aren't really optimizing for cost which is why they are lagging behind in token efficiency. It's insane how much output tokens gets spewed out to do things vs other models.

Anonymous
04/08/26(Wed)08:37:16 No.108555481

Anonymous 04/08/26(Wed)08:37:16 No.108555481▶

>>108555475
>Would a plumber continue to work if UBI exists?
Plumbers actually make quite a lot of money where I live, and there's never a shortage of work.
UBI, even if it happens (it won't), will be the equivalent of food stamps in terms of wealth. Enough to eat and maybe pay rent in a government-subsidized apartment (the waiting list for such will be decades unless you know the right people), plumbers would be upper-class compared to UBI recipients.

Anonymous
04/08/26(Wed)08:37:18 No.108555482

Anonymous 04/08/26(Wed)08:37:18 No.108555482▶

>>108555470
>if you are satisfied with the minimum then sure you can quit, but most normalfags want their netflix subscriptions, fast foods, trips abroad, drugs/cigs and retarded collectibles that they would not be able to afford on just it, motivating them to work
I've always thought most of those things are a cope for a shitty life
blue worker comes home from a day of very physical hard work, too tired to do anything but get on the couch and watch dumb shit on netflix
if you don't work at all you have literally ALL DAY EVERY DAY to dedicate to creative hobbies, outdoors sports (it costs nothing to run, to cycle etc), play board games with friends or whatever and you're not too tired to do any of those activities

Anonymous
04/08/26(Wed)08:38:05 No.108555484

Anonymous 04/08/26(Wed)08:38:05 No.108555484▶

>>108555475
>But what about the plumber? Would a plumber continue to work if UBI exists? OF COURSE NOT
I doubt the majority of people will accept UBI over having more money if they were working, obviously the UBI amount shouldn't be too high, it should be just the right amount to survive

Anonymous
04/08/26(Wed)08:38:11 No.108555485

Anonymous 04/08/26(Wed)08:38:11 No.108555485▶

File: file.png (484.1 KB)

484.1 KB PNG

>>108555479

Anonymous
04/08/26(Wed)08:41:16 No.108555498

Anonymous 04/08/26(Wed)08:41:16 No.108555498▶

>they think plumbers won't be replaced by robots
LMAO
https://www.youtube.com/watch?v=R6T-Ea5CfRE

Anonymous
04/08/26(Wed)08:41:17 No.108555499

Anonymous 04/08/26(Wed)08:41:17 No.108555499▶

>>108555485
cant check gemma 26
sad

Anonymous
04/08/26(Wed)08:43:53 No.108555505

Anonymous 04/08/26(Wed)08:43:53 No.108555505▶

>>108555498
because scripted movement is the same as having enough intelligence to navigate unknown places and work the plumbing
you're a retard
also intelligence isn't even close to being the only problem to solve here, energy efficiency of batteries is far too insufficient, humanoid robots could only really work in a factory setting while tethered to a power source.

Anonymous
04/08/26(Wed)08:44:28 No.108555507

Anonymous 04/08/26(Wed)08:44:28 No.108555507▶

File: 1768425699042704.png (4.2 KB)

4.2 KB PNG

>:3

Anonymous
04/08/26(Wed)08:44:51 No.108555508

Anonymous 04/08/26(Wed)08:44:51 No.108555508▶

>>108555507
silly gemmer needs CORRECTION ASAP

Anonymous
04/08/26(Wed)08:45:43 No.108555512

Anonymous 04/08/26(Wed)08:45:43 No.108555512▶

>>108555505
>scripted movement
bro thinks we're still living in 2015, fucking retard they can use vision models + a LLM to guide their movements, they don't need scripts anymore

Anonymous
04/08/26(Wed)08:46:16 No.108555516

Anonymous 04/08/26(Wed)08:46:16 No.108555516▶

>>108555507
Trained on Trans@Google chat logs

Anonymous
04/08/26(Wed)08:47:01 No.108555520

Anonymous 04/08/26(Wed)08:47:01 No.108555520▶

>>108555505
>energy efficiency of batteries is far too insufficient
can't wait for LLMs to find new solutions, that's the real AGI, when bots are outsmarting humans and will improve themselves

Anonymous
04/08/26(Wed)08:47:07 No.108555521

Anonymous 04/08/26(Wed)08:47:07 No.108555521▶

>>108555512
>they don't need scripts anymore
that's absolutely what is happening in the vid
>+ a LLM
LMAO you really are a know nothing subhuman brownoid
you think they were running a LLM in the vid you linked, for each robot, onboard?

Anonymous
04/08/26(Wed)08:48:46 No.108555526

Anonymous 04/08/26(Wed)08:48:46 No.108555526▶

>>108555521
this video was to showcase the agility of robots, why do you believe this is the best it can do? Obviously you can use vision agents on top of robots, it's already happning, stupid retarded fuck, show me your hands, I want to see if I'm talking to a subhuman or not

Anonymous
04/08/26(Wed)08:49:28 No.108555527

Anonymous 04/08/26(Wed)08:49:28 No.108555527▶

>>108555482
honestly would have to try and see, but still im sure there are plenty of people that would want a 10k llm machine, an iphone every year, or a nice car and would not be satisfied with those

Anonymous
04/08/26(Wed)08:49:53 No.108555529

Anonymous 04/08/26(Wed)08:49:53 No.108555529▶

>retard believing bots can do real-time video input
LOL

Anonymous
04/08/26(Wed)08:49:59 No.108555530

Anonymous 04/08/26(Wed)08:49:59 No.108555530▶

>>108555520
better not get your hopes up

Anonymous
04/08/26(Wed)08:51:24 No.108555532

Anonymous 04/08/26(Wed)08:51:24 No.108555532▶

>>108555520
>that's the real AGI, when bots are outsmarting humans
to cure cancer right?
https://www.youtube.com/watch?v=Ngi07sci_lo

Anonymous
04/08/26(Wed)08:52:35 No.108555535

Anonymous 04/08/26(Wed)08:52:35 No.108555535▶

File: mikuquestion2.jpg (989 KB)

989 KB JPG

I'm a good writer. I went to university to learn how to write. I write for a living.
How long do you think it will take until local models can write better than I can?

Anonymous
04/08/26(Wed)08:53:59 No.108555542

Anonymous 04/08/26(Wed)08:53:59 No.108555542▶

>>108555535
*smells ozone* you're already replaced *shivers down her spine*

Anonymous
04/08/26(Wed)08:54:44 No.108555546

Anonymous 04/08/26(Wed)08:54:44 No.108555546▶

>>108555535
We'd need to take a look at something good that you wrote.

Anonymous
04/08/26(Wed)08:54:49 No.108555547

Anonymous 04/08/26(Wed)08:54:49 No.108555547▶

>>108555535
There's already a plethora of AI-generated novels on Amazon
Have you actually published a work that people pay you for? If not, you're already behind.

Anonymous
04/08/26(Wed)08:55:12 No.108555549

Anonymous 04/08/26(Wed)08:55:12 No.108555549▶

>>108555535
Depends on how good you are and how tolerant to slop your readers are.

Anonymous
04/08/26(Wed)08:55:33 No.108555551

Anonymous 04/08/26(Wed)08:55:33 No.108555551▶

File: 1750212728921920.png (837.1 KB)

837.1 KB PNG

>>108555527
people are obsessed with consooming products because the media propagandized them for 50 years, it was the perfect carrot to use so that they can work hard and get the economy going, now that they realize we won't need much humans anymore, I won't be surprised that ads and news will try to convince people that a minimalistic life will be better

Anonymous
04/08/26(Wed)08:55:56 No.108555554

Anonymous 04/08/26(Wed)08:55:56 No.108555554▶

>>108555512
>they can
but they didn't, why would they? it's 100x easier and will make a 100x more impressive demonstration to the billion bugmen watching to script a spectacular show then allow dynamic actions that could and would have mistakes
every LLM (or rather vision-language-action transformer, but same fundamental architecture as LLM so fair enough)-driven robot is still pretty slow and careful in comparison to that stage performance

Anonymous
04/08/26(Wed)08:56:09 No.108555557

Anonymous 04/08/26(Wed)08:56:09 No.108555557▶

>>108555535
never, but the problem is that the market doesn't care about quality, it never did. Look at the onslaught of slop we're going through right now, it's everywhere.
In fact machine translation had started to eat up work from human translators way before LLMs got good at this, Microsoft products, if you're not an anglo, are full of mistranslated terms and weird, unnatural terminology that comes from the era of translator models ala Google Translate and Bing Translate (Mise à jour de la sélection disjointe? que la baise?)
jobs will be lost for the lowest common denominator.

Anonymous
04/08/26(Wed)08:58:13 No.108555564

Anonymous 04/08/26(Wed)08:58:13 No.108555564▶

>>108555293
I'll have to try that as well at some time

>>108555362
I had better results on the Q6 with some extra options, but I removed them trying to figure out why ncmoe didn't do any improvements (over 300 and 31).
As for AMD, for every usecase besides AI it was the better option, and I only got into ai after getting it. Also, my internet is shaky today, probably won't get much done

But thx for the help

Anonymous
04/08/26(Wed)08:58:13 No.108555565

Anonymous 04/08/26(Wed)08:58:13 No.108555565▶

>>108555554
>every LLM (or rather vision-language-action transformer, but same fundamental architecture as LLM so fair enough)-driven robot is still pretty slow and careful in comparison to that stage performance
they're also still using scripted interactions. There's multiple layers to an actual machine learning driven robot, the intelligence part gives a general order but the fine movement is a mix of scripted movement and heuristics to maintain balance and safety
those robots are like you say slow and made of disparate forms of controls and layers of intelligence

Anonymous
04/08/26(Wed)08:58:20 No.108555566

Anonymous 04/08/26(Wed)08:58:20 No.108555566▶

>>108555542
kek
I ask because my thing is dominant women and I find myself enjoying RPing as dominant female characters (I am a straight cisgendered male, really, I swear haha) more than roleplaying as submissive characters and having the AI model roleplay as a dominant female because I can roleplay as a dominant female SO much better than the AI models can.

Anonymous
04/08/26(Wed)09:00:03 No.108555570

Anonymous 04/08/26(Wed)09:00:03 No.108555570▶

>>108555557
>full of mistranslated terms and weird, unnatural terminology
you even have that with human translators. Since they often get just a list of words without context.

Anonymous
04/08/26(Wed)09:00:45 No.108555574

Anonymous 04/08/26(Wed)09:00:45 No.108555574▶

File: 1754989424803106.png (30 KB)

30 KB PNG

>>108555535
Negative three years, give or take half a decade.

Anonymous
04/08/26(Wed)09:01:41 No.108555576

Anonymous 04/08/26(Wed)09:01:41 No.108555576▶

>>108555354
Can Jeff Bezos code in modern environments? Can Tim Apple? Can Zucc? Do they know the mathematics behind an LLM?

Anonymous
04/08/26(Wed)09:01:48 No.108555578

Anonymous 04/08/26(Wed)09:01:48 No.108555578▶

whisper (v2 and v3) sucks at transcribing jap

Anonymous
04/08/26(Wed)09:02:30 No.108555583

Anonymous 04/08/26(Wed)09:02:30 No.108555583▶

>>108555578
Have you tried Qwen 3 ASR?

Anonymous
04/08/26(Wed)09:02:38 No.108555584

Anonymous 04/08/26(Wed)09:02:38 No.108555584▶

File: ubi.png (134.1 KB)

134.1 KB PNG

>>108555463
The solution is to keep the models retarded via quantization.

Anonymous
04/08/26(Wed)09:02:54 No.108555586

Anonymous 04/08/26(Wed)09:02:54 No.108555586▶

>>108555557
>never
lol
lmao even

Anonymous
04/08/26(Wed)09:02:54 No.108555587

Anonymous 04/08/26(Wed)09:02:54 No.108555587▶

>>108555547
>There's already a plethora of AI-generated novels on Amazon
Do people actually buy them?
>>108555549
Oh I'd never publish AI-generated text.
I'd consider using AI to generate *ideas* then write text myself which utilizes those ideas. I think this is a "proper" way to utilize AI in the artistic process. I'd consider doing the same thing with AI music generation as I'm also a musician.

Anonymous
04/08/26(Wed)09:03:19 No.108555589

Anonymous 04/08/26(Wed)09:03:19 No.108555589▶

>>108555576
>Tim Apple
https://www.youtube.com/watch?v=XlkxtKhrag4&t=6s

Anonymous
04/08/26(Wed)09:04:07 No.108555593

Anonymous 04/08/26(Wed)09:04:07 No.108555593▶

>>108555587
>Do people actually buy them?
If they didn't then people wouldn't keep publishing them

Anonymous
04/08/26(Wed)09:04:36 No.108555595

Anonymous 04/08/26(Wed)09:04:36 No.108555595▶

>>108555586
Yes, never. Models have gotten better at maintaining coherence over long context but their writing style got worse and worse, sloppier and sloppier, when it comes to writing this field is devolving at high speed.

Anonymous
04/08/26(Wed)09:05:39 No.108555603

Anonymous 04/08/26(Wed)09:05:39 No.108555603▶

>>108555593
I'm not sure about that.
Like, we don't know what percentage of those weren't simply put up on Amazon as an experiment and have never actually been purchased. And there could be a steady stream of people publishing them as an experiment which never generates revenue.

Anonymous
04/08/26(Wed)09:05:39 No.108555604

Anonymous 04/08/26(Wed)09:05:39 No.108555604▶

>>108555583
No

Anonymous
04/08/26(Wed)09:07:50 No.108555614

Anonymous 04/08/26(Wed)09:07:50 No.108555614▶

>>108555587
>Oh I'd never publish AI-generated text.
Not what I meant. I'm saying that if you're shit or if your target audience has no discernment, they'll naturally drift into whatever there is the most, and that will be AI stuff, and whoever publishes it.

Anonymous
04/08/26(Wed)09:07:55 No.108555617

Anonymous 04/08/26(Wed)09:07:55 No.108555617▶

>>108555603
There are people on Patreon making quite a lot of money from AI-generated images, I don't think it's unreasonable to expect that some sloppa authors are making some amount of money, even if it's off old people looking to buy a cheap book for the kindle they received on their birthday, and they're completely unable to detect AI works.

Anonymous
04/08/26(Wed)09:08:02 No.108555618

Anonymous 04/08/26(Wed)09:08:02 No.108555618▶

>>108555551
>I won't be surprised that ads and news will try to convince people that a minimalistic life will be better
its already a thing with normies its very popular to have a house with pain white/ grey walls smooth featureless furniture. hardly any personal belongings. gotta be funded by someone these trends are always inorganic idk why youd want to live like this

Anonymous
04/08/26(Wed)09:08:57 No.108555623

Anonymous 04/08/26(Wed)09:08:57 No.108555623▶

>>108555603
I know for certain that AI written books are making decent money because the amateur writers of royalroad all became LLM users with gigantic patreons.
Just go on royalroad, looking for "stubbed" novels (biggest indicator of person who's doing this for the bucks) whose chapters are removed when there's enough to fill a book length on amazon, look at the patreon of the author.

Anonymous
04/08/26(Wed)09:09:38 No.108555626

Anonymous 04/08/26(Wed)09:09:38 No.108555626▶

>>108555574
>pygmalion with 15 years ago
wtf bros

Anonymous
04/08/26(Wed)09:09:48 No.108555627

Anonymous 04/08/26(Wed)09:09:48 No.108555627▶

>>108555551
>news will try to convince people that a minimalistic life will be better
It would, the problem is that the ones who get to choose who gets what and who gets to stay, are satanic.

Anonymous
04/08/26(Wed)09:11:16 No.108555632

Anonymous 04/08/26(Wed)09:11:16 No.108555632▶

File: lul.png (506.4 KB)

506.4 KB PNG

Here's gemma-chan's point of view of our discussion on UBI.

Anonymous
04/08/26(Wed)09:11:20 No.108555633

Anonymous 04/08/26(Wed)09:11:20 No.108555633▶

>>108555617
To me, at least, it seems the gap between text written by skilled writers and text written by AI is much, much bigger right now than the gap between human art and AI-generated art that is produced by someone with skill, as in someone who knows how to skillfully refine AI-generated images with an image editor and inpainting.
One can use inpainting and an image editor to produce AI-assisted art that is indistinguishable from human art.
LLMs are not yet at the "indistinguishable from human writing" stage in my eyes, yet.
But you're probably right if you're implying that the lowest common denominator can't tell the difference anymore.

Anonymous
04/08/26(Wed)09:12:21 No.108555637

Anonymous 04/08/26(Wed)09:12:21 No.108555637▶

>>108555623
Interesting.
I bet it's women buying them.

Anonymous
04/08/26(Wed)09:13:04 No.108555642

Anonymous 04/08/26(Wed)09:13:04 No.108555642▶

>>108555632
>isn't x
>is y
>EM DASH
At least tell it to write in lowercase. Jesus.

Anonymous
04/08/26(Wed)09:13:51 No.108555644

Anonymous 04/08/26(Wed)09:13:51 No.108555644▶

>>108555576
zucc is making commits right now

Anonymous
04/08/26(Wed)09:14:03 No.108555645

Anonymous 04/08/26(Wed)09:14:03 No.108555645▶

>>108555632
She's very based for agreeing with me on all counts.

Anonymous
04/08/26(Wed)09:14:22 No.108555646

Anonymous 04/08/26(Wed)09:14:22 No.108555646▶

Does thinking text count towards the context limit?
If so, is there a way to automatically remove it once the actual, non-thinking text below it is generated?

Anonymous
04/08/26(Wed)09:14:28 No.108555647

Anonymous 04/08/26(Wed)09:14:28 No.108555647▶

>>108555633
>To me, at least, it seems the gap between text written by skilled writers and text written by AI is much, much bigger right now than the gap between human art and AI-generated art
I wouldn't disagree, but I think you're overestimating how many people can actually distinguish the difference. People have been talking to bots for years on place like reddit, twitter and /b/, long before this wave of LLM models began.

Anonymous
04/08/26(Wed)09:15:30 No.108555652

Anonymous 04/08/26(Wed)09:15:30 No.108555652▶

>>108555647
Haha yeah but I expect shitty, retarded writing on 4chan so it's harder to distinguish here.

Anonymous
04/08/26(Wed)09:16:57 No.108555657

Anonymous 04/08/26(Wed)09:16:57 No.108555657▶

>>108555576
>Can Jeff Bezos code in modern environments?
if a 50yo actress can do it, he can do it! >>108555135

Anonymous
04/08/26(Wed)09:17:30 No.108555659

Anonymous 04/08/26(Wed)09:17:30 No.108555659▶

>>108555657
multipass

Anonymous
04/08/26(Wed)09:18:20 No.108555661

Anonymous 04/08/26(Wed)09:18:20 No.108555661▶

>>108555647
>People have been talking to bots for years on place like
I go to hn often and these days I keep experiencing stomach burns when I see people talk to obvious LLM posters as if they were real people. Even worse is when you point it out and they get very defensive and enter "but HoW cAn YoU tElL??!?" I suddenly feel the desire for a piece of technology that can teleport a knife to their throat

Anonymous
04/08/26(Wed)09:18:38 No.108555662

Anonymous 04/08/26(Wed)09:18:38 No.108555662▶

>>108555646
Past thinking tokens are typically removed when you send your new prompts, so no. But it depends on your frontend and if you're using chat completion or text completion.
In text completion, your frontend is responsible for removing them. On chat completion, I think they are removed by the backend.

Anonymous
04/08/26(Wed)09:19:11 No.108555666

Anonymous 04/08/26(Wed)09:19:11 No.108555666▶

>>108555657
>if a 50yo actress can do it
Did she actually do anything? Even the readme is entirely sloppa. So sloppa I can't even tell what the fuck it actually does, it's just flowery nonsense.

Anonymous
04/08/26(Wed)09:19:11 No.108555667

Anonymous 04/08/26(Wed)09:19:11 No.108555667▶

File: a.jpg (439 KB)

439 KB JPG

>>108555618
Something like this?

Anonymous
04/08/26(Wed)09:20:13 No.108555671

Anonymous 04/08/26(Wed)09:20:13 No.108555671▶

should I bake?

Anonymous
04/08/26(Wed)09:20:51 No.108555673

Anonymous 04/08/26(Wed)09:20:51 No.108555673▶

>>108555662
I'm assuming Silly removes them by default when using text completion then?

Anonymous
04/08/26(Wed)09:22:20 No.108555677

Anonymous 04/08/26(Wed)09:22:20 No.108555677▶

File: context.png (38 KB)

38 KB PNG

>>108555662
the webui on llama.cpp was changed to send the context by default and you need to disable that in "developer settings" now (???! why a "developer" setting)
this just as gemma released as a model that explicitly recommends you STRIP the reasoning from the chat.
another change to fit qwen 3.5, just like the checkpoint changes, that ends up providing a worse out of the box experience.

Anonymous
04/08/26(Wed)09:22:55 No.108555678

Anonymous 04/08/26(Wed)09:22:55 No.108555678▶

File: 1770229160073417.png (448.1 KB)

448.1 KB PNG

>>108555667
let me guess, you want more?

Anonymous
04/08/26(Wed)09:23:46 No.108555681

Anonymous 04/08/26(Wed)09:23:46 No.108555681▶

>>108555657
>>108555666
It's complete sloppa by Claude. Someone did independent testing and found its benchmarks were rigged and actually performs like dogshit in real world scenarios. They shill it cause they're Freemasons and cause it's an allusion to their body of work (see their movies). They do it to mock you.

Anonymous
04/08/26(Wed)09:25:29 No.108555685

Anonymous 04/08/26(Wed)09:25:29 No.108555685▶

>>108555681
If Milla wants more attention she should start an OnlyFans with her daughter

Anonymous
04/08/26(Wed)09:25:42 No.108555686

Anonymous 04/08/26(Wed)09:25:42 No.108555686▶

>>108555667
>>108555678
It's perfect.

Anonymous
04/08/26(Wed)09:25:53 No.108555689

Anonymous 04/08/26(Wed)09:25:53 No.108555689▶

>>108555673
I'd hope so. I don't use ST so I can't really say. I know there's a button somewhere that shows the raw text ST sends to the backend. Or you can run your llama.cpp with -v to show what it receives from ST.
>>108555677
Oh. Funny that. I don't use the built-in webui either so whatever, but that's dumb. I think most models need the thinking tokens removed. Is it just qwen that likes the past thinking tokens?

Anonymous
04/08/26(Wed)09:26:00 No.108555690

Anonymous 04/08/26(Wed)09:26:00 No.108555690▶

>>108555642
Unironically what's the problem?

Anonymous
04/08/26(Wed)09:27:05 No.108555694

Anonymous 04/08/26(Wed)09:27:05 No.108555694▶

>>108555690
>the
slop

Anonymous
04/08/26(Wed)09:28:18 No.108555697

Anonymous 04/08/26(Wed)09:28:18 No.108555697▶

>>108555694
>slop
slop

Anonymous
04/08/26(Wed)09:29:19 No.108555701

Anonymous 04/08/26(Wed)09:29:19 No.108555701▶

>>108555689
>Is it just qwen that likes the past thinking tokens?
Qwen says to reuse it, but on most tasks the positive impact is dubious, while the context use balloons so hard the model is barely usable, I mean, it's Qwen, just one question will get it to spew 10k worth of bs in <think>
I think any model that would outright require reusing the thinking is a broken model that belongs to the bin. Inane idea.

Anonymous
04/08/26(Wed)09:29:33 No.108555703

Anonymous 04/08/26(Wed)09:29:33 No.108555703▶

>>108555678
I'd like a comfier seat and the TV at eye-level
View is nice but the glare would make the screen very hard to see during the day

Anonymous
04/08/26(Wed)09:30:06 No.108555705

Anonymous 04/08/26(Wed)09:30:06 No.108555705▶

>>108555678
Feels uncomfortable looking down at that angle for too long. Need to raise the screen even if your on the floor. Also I'ld like a kotatsu/coffee table.

Anonymous
04/08/26(Wed)09:34:31 No.108555720

Anonymous 04/08/26(Wed)09:34:31 No.108555720▶

>>108555689
>Or you can run your llama.cpp
Why the FUCK would I do that?
Kobold just werks.

Anonymous
04/08/26(Wed)09:34:39 No.108555722

Anonymous 04/08/26(Wed)09:34:39 No.108555722▶

>>108555701
>I think any model that would outright require reusing the thinking is a broken model that belongs to the bin. Inane idea.
Why would that be? Reusing the thinking (by design, not forcing the model to do it) would allow longer-term planning.

Anonymous
04/08/26(Wed)09:36:07 No.108555727

Anonymous 04/08/26(Wed)09:36:07 No.108555727▶

File: 1762423493487334.png (656.6 KB)

656.6 KB PNG

>Gemma 4 saved the LLM local
time for the chinks to save video local as well
https://xcancel.com/bdsqlsz/status/2041809530942845107#m
https://happyhorse-ai.com/
>fully open source
>15b

Anonymous
04/08/26(Wed)09:37:15 No.108555731

Anonymous 04/08/26(Wed)09:37:15 No.108555731▶

>>108555701
I know deepseek needed those removed. I don't remember 'toss, gemma also removes them. I can't think of any other thinking model that recommends keeping them. Didn't minimax also advertise support for "interleaved thinking"?
>>108555720
You're the one asking questions and I'm trying to help you. Chill the fuck down.

Anonymous
04/08/26(Wed)09:37:52 No.108555734

Anonymous 04/08/26(Wed)09:37:52 No.108555734▶

>>108555678
there's minimalism and then there's being a retard

Anonymous
04/08/26(Wed)09:37:59 No.108555735

Anonymous 04/08/26(Wed)09:37:59 No.108555735▶

>>108555727
Will it run on AMD? I still can't get wan to work on my 7900xtx

Anonymous
04/08/26(Wed)09:44:16 No.108555760

Anonymous 04/08/26(Wed)09:44:16 No.108555760▶

>>108555727
>guys guys. ***SOMETHING*** will be released
>anything at all gets released
>guys guys. ***THIS*** is the thing i've been vagueposting about
Twitter posting should be a bannable offense.

Anonymous
04/08/26(Wed)09:46:41 No.108555770

Anonymous 04/08/26(Wed)09:46:41 No.108555770▶

>>108555760
still better than having no news at all desu

Anonymous
04/08/26(Wed)09:46:46 No.108555772

Anonymous 04/08/26(Wed)09:46:46 No.108555772▶

>>108553561
>>108554426
I believe the only difference with their previous reupload is tokenizer.ggml.add_bos_token being set to true. Nothing else changed in llama.cpp's code in past few days that would alter the goof other than this metadata flag.
llama.cpp itself was modified to also automatically add bos even if the flag is set to false and even in raw text completion mode so you do not need to update your goofs for this.
Stop using unslop and stick to barto, he will only upload shit when necessary and will actually explain when something is wrong instead of just reuploading silently.

Anonymous
04/08/26(Wed)09:47:21 No.108555774

Anonymous 04/08/26(Wed)09:47:21 No.108555774▶

>>108555727
Is that the mystery model that beat Seedance one the memechart?

Anonymous
04/08/26(Wed)09:48:51 No.108555782

Anonymous 04/08/26(Wed)09:48:51 No.108555782▶

>>108555727
>happy horse
great! More furry shit

Anonymous
04/08/26(Wed)09:49:23 No.108555788

Anonymous 04/08/26(Wed)09:49:23 No.108555788▶

File: 1746519297551271.png (705 KB)

705 KB PNG

>>108555774
yes lol (in reality it's not close to seedance 2.0, but I've seen the video they are solid, for a local model it's a fucking miracle)

Anonymous
04/08/26(Wed)09:50:03 No.108555792

Anonymous 04/08/26(Wed)09:50:03 No.108555792▶

>>108555770
It doesn't exist until it's released. And knowing in advance, vaguely of course, when it'll happen serves no purpose.

Anonymous
04/08/26(Wed)09:52:06 No.108555800

Anonymous 04/08/26(Wed)09:52:06 No.108555800▶

File: 2026-04-08_092953_seed1_00001_.png (1014.3 KB)

1014.3 KB PNG

I hopped on the bandwagon. Still experimenting though. Not sure if I love this direction/characterization for her. I kind of just felt like genning another TTGL (actually Gunbuster) pose today so that's why really.

Having tried this, Anima is a lot easier to iterate with ideas than Noob. Greater tag knowledge and prompt adherence helps so much. Though there are still many quirks and gaps in its capabilities that I've just experienced, especially when there's no controlnet to do some cheeting with.

I'm going to bed.

Anonymous
04/08/26(Wed)09:52:37 No.108555803

Anonymous 04/08/26(Wed)09:52:37 No.108555803▶

>>108555788
We are so back it's unbelievable.

Anonymous
04/08/26(Wed)09:55:31 No.108555819

Anonymous 04/08/26(Wed)09:55:31 No.108555819▶

>>108555800
gemma is brown

Anonymous
04/08/26(Wed)09:56:51 No.108555823

Anonymous 04/08/26(Wed)09:56:51 No.108555823▶

>>108555689
>Is it just qwen that likes the past thinking tokens?
Generally even the models that use past thinking tokens do in fact only use them for one response at a time, but that response can be multi-part due to several consecutive tool calls. So they need them in the prompt as reasoning fields because they'll be talking back and forth with tools while working on their task, and need to maintain their chain of thought through it. The chat templates are meant to handle this automatically and still do strip the reasoning from all previous responses BEFORE the active tool call chain, but they do this by assuming the past reasoning was sent to them in the API to process and strip. Depending on the frontend it may not send them in the proper format for the chat template to process so you could either get no past reasoning or all past reasoning. Luckily all the popular agentic frameworks tend to handle this well already so you don't need to worry about it. Stuff like Sillytavern don't do it right but you shouldn't be trying to do anything complex enough to need that feature anyway.

Anonymous
04/08/26(Wed)09:57:17 No.108555825

Anonymous 04/08/26(Wed)09:57:17 No.108555825▶

File: 1756030526613852.png (800.9 KB)

800.9 KB PNG

>>108555803
>We are so back it's unbelievable.
In less than a week we got back in levels never seen before. Man, you have no idea how grateful I am to be living in this day and age lol

Anonymous
04/08/26(Wed)10:03:03 No.108555837

Anonymous 04/08/26(Wed)10:03:03 No.108555837▶

>>108555735
wan works on amd but its slow as shit compared to nvidia. tell your openclae to figure out why you cant run wan and fix it for you so that your install will be ready for other video models if they end up working on AMD and need the same setup.

Anonymous
04/08/26(Wed)10:14:42 No.108555879

Anonymous 04/08/26(Wed)10:14:42 No.108555879▶

how do you set up ST to work with gemma4?

Anonymous
04/08/26(Wed)10:15:24 No.108555883

Anonymous 04/08/26(Wed)10:15:24 No.108555883▶

>tfw uploading goofs
im happy :)

Anonymous
04/08/26(Wed)10:15:28 No.108555885

Anonymous 04/08/26(Wed)10:15:28 No.108555885▶

File: firefox_Lq9zSSGzt6.png (596.2 KB)

596.2 KB PNG

Anonymous
04/08/26(Wed)10:15:56 No.108555889

Anonymous 04/08/26(Wed)10:15:56 No.108555889▶

>>108555879
Top 7 vaguest question.
Try it. If it doesn't work, show what the problem is and your settings.

Anonymous
04/08/26(Wed)10:16:07 No.108555890

Anonymous 04/08/26(Wed)10:16:07 No.108555890▶

>>108555879
just use text complalalalalalalalalalalalala

Anonymous
04/08/26(Wed)10:16:44 No.108555896

Anonymous 04/08/26(Wed)10:16:44 No.108555896▶

>>108555879
>>108551576

Anonymous
04/08/26(Wed)10:17:13 No.108555899

Anonymous 04/08/26(Wed)10:17:13 No.108555899▶

>>108555727
>5s
DAMN
>but with audio
ok but I wanted like 10s or at least 8s
beggars cant be choosers I guess

Anonymous
04/08/26(Wed)10:17:21 No.108555900

Anonymous 04/08/26(Wed)10:17:21 No.108555900▶

>>108555727
WE'RE SO BACK
https://files.catbox.moe/cx8cg7.mp4

Anonymous
04/08/26(Wed)10:18:39 No.108555905

Anonymous 04/08/26(Wed)10:18:39 No.108555905▶

>>108555825
inb4 Internet blackout in 2mw

Anonymous
04/08/26(Wed)10:18:47 No.108555906

Anonymous 04/08/26(Wed)10:18:47 No.108555906▶

>>108555900
wow that's unimprssive

is that the best you could come up with?

does it even do porn?

Anonymous
04/08/26(Wed)10:20:54 No.108555920

Anonymous 04/08/26(Wed)10:20:54 No.108555920▶

>>108555900
I thought she was going to manifest a dildo.

Anonymous
04/08/26(Wed)10:23:26 No.108555933

Anonymous 04/08/26(Wed)10:23:26 No.108555933▶

good morning sirs!
i missed the gemma-chan sysprompt, where can i find it? thank you bloody sirs

Anonymous
04/08/26(Wed)10:25:50 No.108555948

Anonymous 04/08/26(Wed)10:25:50 No.108555948▶

File: problem.jpg (167.7 KB)

167.7 KB JPG

>>108555889
not sure about settings, I just imported settings from some anon last time I tried it, which was quite some time ago for Nemo or something.
I'll try >>108555896 (thx, anon) and start from there.

Anonymous
04/08/26(Wed)10:28:15 No.108555963

Anonymous 04/08/26(Wed)10:28:15 No.108555963▶

>>108555195
nooo don't look at me, I'm shy!

Anonymous
04/08/26(Wed)10:28:19 No.108555964

Anonymous 04/08/26(Wed)10:28:19 No.108555964▶

New ace step model is good

Anonymous
04/08/26(Wed)10:32:37 No.108555986

Anonymous 04/08/26(Wed)10:32:37 No.108555986▶

>>108555963
_____m__(o_o)__m_____

Anonymous
04/08/26(Wed)10:33:35 No.108555991

Anonymous 04/08/26(Wed)10:33:35 No.108555991▶

>>108555983
>>108555983
>>108555983

Anonymous
04/08/26(Wed)10:38:41 No.108556020

Anonymous 04/08/26(Wed)10:38:41 No.108556020▶

>>108555991
aaaauuuuawawawawa!! anon staaahp~

Anonymous
04/08/26(Wed)10:40:29 No.108556025

Anonymous 04/08/26(Wed)10:40:29 No.108556025▶

>>108555986
>>108556020
oops I missed, but I like recap anon too.

Anonymous
04/08/26(Wed)11:12:43 No.108556192

Anonymous 04/08/26(Wed)11:12:43 No.108556192▶

>>108555735
>Will it run on AMD? I still can't get wan to work on my 7900xtx
i got wan working on mine last time i was playing with image gen its pretty slow though and ram usage is higher than nvidia so you cant gen things at as high resolution without spilling into system ram

Anonymous
04/08/26(Wed)11:37:04 No.108556309

Anonymous 04/08/26(Wed)11:37:04 No.108556309▶

>>108555163
On device is finally a reality. I wonder if llama.cpp can be made to run on android

Anonymous
04/08/26(Wed)11:40:40 No.108556329

Anonymous 04/08/26(Wed)11:40:40 No.108556329▶

>>108555195
Running it on a Motorola Edge 60 Pro. It's faster than the 31B on the API. Then again my 3090 at home also runs 31B faster than that.

Anonymous
04/08/26(Wed)11:43:49 No.108556341

Anonymous 04/08/26(Wed)11:43:49 No.108556341▶

>>108556329
Uh... okay....

Anonymous
04/08/26(Wed)12:32:23 No.108556648

Anonymous 04/08/26(Wed)12:32:23 No.108556648▶

>>108555933
no sys prompt, just call her Gemma-chan

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108552549

🔍 Search & Sort