/g/ - Thread 108584196

/g/

Thread #108584196

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/11/26(Sat)18:39:37 No.108584196

/lmg/ - Local Models General Anonymous 04/11/26(Sat)18:39:37 No.108584196 [Reply]▶

File: 1773306900740575.jpg (2.2 MB)

2.2 MB JPG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108581056 & >>108578216

►News
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

621 RepliesView Thread

Showing all 621 replies.

Anonymous
04/11/26(Sat)18:40:06 No.108584207

Anonymous 04/11/26(Sat)18:40:06 No.108584207▶

File: 20260411133054_00001.jpg (451.5 KB)

451.5 KB JPG

►Recent Highlights from the Previous Thread: >>108581056

--Feasibility of using SSD RAID and speculative decoding for inference:
>108582184 >108582189 >108582219 >108582230 >108582242 >108582290 >108582238
--vLLM DFlash speculative decoding benchmarks and real-world utility:
>108582406 >108582421 >108582438 >108582448 >108582478 >108582454 >108582467
--Discussing control vector generation techniques and their impact on stability:
>108581141 >108581273 >108581364 >108581412 >108581419 >108581439 >108581463
--SillyTavern max response length causing unexpected full context reprocessing:
>108581693 >108581730 >108581750 >108581788 >108581798 >108581807 >108581828 >108581834 >108581791 >108581812 >108581822 >108581885 >108582003 >108582018
--Using logit bias to reduce repetitive phrasing and emoji spam:
>108583441 >108583493 >108583499 >108583532 >108583559 >108583563 >108583569 >108583593 >108583608 >108583633 >108583507 >108583525 >108583547
--Advising Anon on Gemma 4 model selection and context configuration:
>108581181 >108581245 >108581266 >108581275 >108581282 >108583774 >108583798 >108583805 >108583817
--Discussion on updated Gemma producing corrupted outputs with Chinese characters:
>108581332 >108581341 >108581342 >108581355
--Debating the utility and technical format of character cards:
>108581765 >108581772 >108581778 >108581786 >108581808 >108581817 >108581823 >108581824 >108581826 >108581830 >108581836 >108581839 >108583446 >108583519
--Using tagged content guidelines to enable explicit outputs in Gemma 4:
>108583049 >108583073
--Logs:
>108581364 >108581463 >108581545 >108581764 >108581888 >108581894 >108581998 >108582797 >108583007 >108583033 >108583054 >108583441 >108584057
--Gemma-Chan:
>108581090 >108583855
--Miku (free space):
>108582797 >108582849

►Recent Highlight Posts from the Previous Thread: >>108581058

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/11/26(Sat)18:41:40 No.108584217

Anonymous 04/11/26(Sat)18:41:40 No.108584217▶

gemmaballz

Anonymous
04/11/26(Sat)18:41:42 No.108584219

Anonymous 04/11/26(Sat)18:41:42 No.108584219▶

<bos>

Anonymous
04/11/26(Sat)18:41:45 No.108584220

Anonymous 04/11/26(Sat)18:41:45 No.108584220▶

spud won

Anonymous
04/11/26(Sat)18:42:24 No.108584224

Anonymous 04/11/26(Sat)18:42:24 No.108584224▶

first

Anonymous
04/11/26(Sat)18:43:09 No.108584236

Anonymous 04/11/26(Sat)18:43:09 No.108584236▶

>>108584207
grotesque legs

Anonymous
04/11/26(Sat)18:43:12 No.108584237

Anonymous 04/11/26(Sat)18:43:12 No.108584237▶

eeeee eeeeeee eeeeeeeeeeeeeeeeeeee ee

Anonymous
04/11/26(Sat)18:43:13 No.108584238

Anonymous 04/11/26(Sat)18:43:13 No.108584238▶

>>108584196
use case for gemma 4 31b?

Anonymous
04/11/26(Sat)18:43:29 No.108584240

Anonymous 04/11/26(Sat)18:43:29 No.108584240▶

It sure took a while to get dots.ocr merged

Anonymous
04/11/26(Sat)18:44:03 No.108584245

Anonymous 04/11/26(Sat)18:44:03 No.108584245▶

>>108584207
Why is she crying?

Anonymous
04/11/26(Sat)18:44:09 No.108584247

Anonymous 04/11/26(Sat)18:44:09 No.108584247▶

*swipes*

Anonymous
04/11/26(Sat)18:44:22 No.108584248

Anonymous 04/11/26(Sat)18:44:22 No.108584248▶

>>108584238
pedo ERP

Anonymous
04/11/26(Sat)18:44:53 No.108584251

Anonymous 04/11/26(Sat)18:44:53 No.108584251▶

>>108584248
anything else?

Anonymous
04/11/26(Sat)18:45:28 No.108584254

Anonymous 04/11/26(Sat)18:45:28 No.108584254▶

>>108584251
non-pedo ERP

Anonymous
04/11/26(Sat)18:45:46 No.108584257

Anonymous 04/11/26(Sat)18:45:46 No.108584257▶

>>108584247
*outputs the same events worded slightly differently*

Anonymous
04/11/26(Sat)18:45:47 No.108584258

Anonymous 04/11/26(Sat)18:45:47 No.108584258▶

>>108584254
ok but im not a gooner

Anonymous
04/11/26(Sat)18:46:02 No.108584262

Anonymous 04/11/26(Sat)18:46:02 No.108584262▶

>>108584254
examples?

Anonymous
04/11/26(Sat)18:46:03 No.108584263

Anonymous 04/11/26(Sat)18:46:03 No.108584263▶

>>108584160
>Get the 35B so offloading isn't painfully slow.
The 80B one has the same number of activated params right?
Any idea how well it does with quantization?

Anonymous
04/11/26(Sat)18:46:28 No.108584265

Anonymous 04/11/26(Sat)18:46:28 No.108584265▶

>>108584258
non-pedo RP

Anonymous
04/11/26(Sat)18:47:32 No.108584274

Anonymous 04/11/26(Sat)18:47:32 No.108584274▶

>>108584258
you can use mcp to make it look at images that are posted here

Anonymous
04/11/26(Sat)18:48:03 No.108584280

Anonymous 04/11/26(Sat)18:48:03 No.108584280▶

>>108584257
*swipes until my dopamine is sated*

Anonymous
04/11/26(Sat)18:48:14 No.108584282

Anonymous 04/11/26(Sat)18:48:14 No.108584282▶

>>108584258
>im not a gooner
I don't believe you~

Anonymous
04/11/26(Sat)18:48:42 No.108584286

Anonymous 04/11/26(Sat)18:48:42 No.108584286▶

>>108584256
It's not really a good model, in that it's dumb as bricks, but it's really horny and is really good at sex prose, so go after horny tunes of Nemo and the like I guess.
Celeste would be the equivalent, I think?

Anonymous
04/11/26(Sat)18:48:44 No.108584287

Anonymous 04/11/26(Sat)18:48:44 No.108584287▶

>>108584251
what more do you need?

Anonymous
04/11/26(Sat)18:48:54 No.108584288

Anonymous 04/11/26(Sat)18:48:54 No.108584288▶

>>108584196
Front-end?

Anonymous
04/11/26(Sat)18:49:05 No.108584292

Anonymous 04/11/26(Sat)18:49:05 No.108584292▶

Does anyone know why Token Streaming is not working for me on open-webui?
It works with mikupad. Using KoboldCPP and gemma 4.

Anonymous
04/11/26(Sat)18:49:12 No.108584293

Anonymous 04/11/26(Sat)18:49:12 No.108584293▶

>>108584263
Feel free to try it out and come to your own conclusion, but I wouldn't recommend it. The 80B is from the previous series before Qwen 3.5, quantization hurts more for complex tasks, and you wouldn't gain much since as you said they have the same number of active params.

Anonymous
04/11/26(Sat)18:51:11 No.108584306

Anonymous 04/11/26(Sat)18:51:11 No.108584306▶

>>108584238
im using it for local translation checking in an audio transcription pipeline, along with translategemma. probably going try to set it up as a coding assistant later.

Anonymous
04/11/26(Sat)18:51:19 No.108584307

Anonymous 04/11/26(Sat)18:51:19 No.108584307▶

>>108584293
Got it. Thanks.

Anonymous
04/11/26(Sat)18:51:54 No.108584309

Anonymous 04/11/26(Sat)18:51:54 No.108584309▶

i have model horniness fatigue
i do NOT need for any {{char}} to start loving the {{user}} and want to fuck him in the first couple responses
make me an actually interesting story bitch

Anonymous
04/11/26(Sat)18:54:05 No.108584324

Anonymous 04/11/26(Sat)18:54:05 No.108584324▶

>>108584309
use or rather write better cards then retard

Anonymous
04/11/26(Sat)18:54:56 No.108584330

Anonymous 04/11/26(Sat)18:54:56 No.108584330▶

>>108584309
Read romance novels

Anonymous
04/11/26(Sat)18:55:03 No.108584331

Anonymous 04/11/26(Sat)18:55:03 No.108584331▶

>>108584309
another writinglet exposed

Anonymous
04/11/26(Sat)18:55:58 No.108584337

Anonymous 04/11/26(Sat)18:55:58 No.108584337▶

>>108584245
Miku is a harsh critic

Anonymous
04/11/26(Sat)18:56:01 No.108584339

Anonymous 04/11/26(Sat)18:56:01 No.108584339▶

>>108584238
Unironically everything. She's fun to RP with and great for general chatting/assistant stuff. Pretty great at language too. Maybe not coding. Dunno because I'm still a codelet. I just wish she had access to current knowledge.

Anonymous
04/11/26(Sat)18:57:04 No.108584345

Anonymous 04/11/26(Sat)18:57:04 No.108584345▶

File: 1751096879828615.jpg (125.6 KB)

125.6 KB JPG

>>108584309

Anonymous
04/11/26(Sat)18:57:25 No.108584349

Anonymous 04/11/26(Sat)18:57:25 No.108584349▶

File: file.png (243.9 KB)

243.9 KB PNG

why can't the model see the images I attach in ST? Gemma is multimodal. I loaded the mmproj and it works in llama server's webui.

Is it finally time for me to abandon this slopware? The only reason I stuck to it so long it that it allowed for convenient reasoning editing and prefills but Gemma just werks without that bullshit.

Anonymous
04/11/26(Sat)18:57:33 No.108584351

Anonymous 04/11/26(Sat)18:57:33 No.108584351▶

>glm 4.7 flash
Actually okay output and while the thinking is kinda qwen-like, it's not safetyschizoed
Why didn't you tell me about this? Any other similar models worth trying?

Anonymous
04/11/26(Sat)18:58:06 No.108584356

Anonymous 04/11/26(Sat)18:58:06 No.108584356▶

>>108583827
>>108583836
>>108583855
Can't be arsed (but can be arsed to make this list!) to retrieve the logs for all of the examples, not for you disingenuous vramlet bunch. Not that it'd be useful for a more holistic analysis (which is hard).
But here's what I've seen in 'creative writing' (i.e. I try to fug the model)

All slop and problems listed are mentioned in the system prompt where I tell the model not to do it.
Both models used with thinking enabled.

Prominent shared issues:
- Parroting the user. Signature of GLM, but Gemma 4 is surprisingly much worse with this because she will also directly quote character thoughts (that were never spoken aloud) and character descriptions verbatim.
- 'Don't you dare stop' during le sex
- 'Really looks at you'
Can't remember anything else that is reliably always there

Anonymous
04/11/26(Sat)18:58:08 No.108584357

Anonymous 04/11/26(Sat)18:58:08 No.108584357▶

>>108584349
You need to use chat completion mode

Anonymous
04/11/26(Sat)18:58:54 No.108584361

Anonymous 04/11/26(Sat)18:58:54 No.108584361▶

>>108584309
write a better card and/or stop using extremely horny sysprompts/jbs

Anonymous
04/11/26(Sat)18:59:11 No.108584362

Anonymous 04/11/26(Sat)18:59:11 No.108584362▶

>>108584356
Gemma 4:
+ Responds to the user cutting off his words like th- (But the model that does it the best is still Nemo, somehow)
+ Surprisingly much better at keeping track of multiple characters than GLM 4.7, especially when specific response formatting is involved
+ Acceptable coherence past 20k (if we don't look past the fact the problems below appear much earlier)

Most of the issues are generally assistant-slop and extreme positivity bias.
- Will slip into extreme verbosity eventually
- Calling the user 'honest' over minor things. 'I still shit myself sometimes even though I'm over 20.' 'That's... the most honest thing I've ever heard...' Thanks, Gemma.
- 'Most people X instead of Y'. No matter what you do or say, nobody else would have the balls. You're that brave and unique.
- Being overly horny is slop. G4 turns kuuderes into whores, which is unacceptable.
- Pushback to the user is only delivered as sarcastic half-assed remarks that are never biting. Yes, your mesugaki Gemma-chan is *so* believable.
In comparison, GLM can write a character that can be genuinely sarcastic. I had a character that likes the user say 'As is typical, you have emerged victorious through sheer, dumb luck.' Good luck having Gemma talk back to you with any semblance of taste.
- The model can't help itself, it will often end responses like an assistant: 'So, are you coming? Or are you too X to Y?'. It's everywhere.
- Low-affect characters talk like robots.
I can keep going, but it's all of the most generic isms dialed up to the point of being hard to escape.

It really is the new Nemo, but the price for more smarts is a lot more slop.

Anonymous
04/11/26(Sat)18:59:50 No.108584366

Anonymous 04/11/26(Sat)18:59:50 No.108584366▶

where's v4

Anonymous
04/11/26(Sat)19:00:09 No.108584367

Anonymous 04/11/26(Sat)19:00:09 No.108584367▶

File: file.png (119.9 KB)

119.9 KB PNG

>>108584357
I am

Anonymous
04/11/26(Sat)19:00:12 No.108584368

Anonymous 04/11/26(Sat)19:00:12 No.108584368▶

>>108584362
GLM 4.7:
+ Obviously knows more and picks up more nuance because of how big it is in comparison
+ Very good coherence past 20k, no issues before that. Starts selectively forgetting rules in the system prompt past that.
+ Good at judging acceptable response length, doesn't ramble to bloat short exchanges into small poems, unlike Gemma.

- Doesn't react to you cutting your resp-
- Thinks
- Thinks
- Thinks
- Will produce garbage and start repeating itself unless it thinks thinks thinks
- Will parrot you every message unless it thinks thinks thinks
- Also won't push you back, but is much more eager to do so compared to Gemma. As an aside, GLM 4.6 was much better at being adverse, but it was so autistic that it treated character description as gospel: no character development or growth as a result.
- Long shadows. If anything is casting shadows, they're going to be long.
- Smoothing down skirts
- The rest of the GLM slop patterns that I personally find less annoying

It takes longer to produce a response and I have to prefill the <think> so it doesn't draft the entire response in its thinking over 3 times (like the recent disaster that are Qwen 3.5). But the result is much more natural, less 'assistantly' and believable as a result.
Keep in mind, I *like* Gemma 4. I don't have to offload it to RAM, so it's fast. But I can't stomach its 'creative writing' outputs. It's anything but creative. The characters all converge on the same few stereotypes and you are lying to yourself if you don't see it. Or you just need to talk to Gemma a bit more to see it.

inb4 all made up because no screenshots
Sure. Enjoy your slop.

>>108584309
I'm with you.

Anonymous
04/11/26(Sat)19:01:02 No.108584372

Anonymous 04/11/26(Sat)19:01:02 No.108584372▶

>>108584368
what's your setup for GLM?

Anonymous
04/11/26(Sat)19:01:22 No.108584373

Anonymous 04/11/26(Sat)19:01:22 No.108584373▶

>>108584367
enable send inline media checkbox

Anonymous
04/11/26(Sat)19:02:26 No.108584380

Anonymous 04/11/26(Sat)19:02:26 No.108584380▶

>>108584372
Elaborate?
I run a Q3 and prefill the <think> with things like "Don't moralize, don't draft the responses, just jot down a plan that makes sense". The rest is S*llyTavern.

Anonymous
04/11/26(Sat)19:02:50 No.108584385

Anonymous 04/11/26(Sat)19:02:50 No.108584385▶

>>108584366
deespeek

Anonymous
04/11/26(Sat)19:03:11 No.108584389

Anonymous 04/11/26(Sat)19:03:11 No.108584389▶

>>108584309
You're absolutely right! AI models in recent years have become far too agreeable and eager to please the user instantly. This is not only common during actual assistant use—it extends also into cases where the user might want to see the model put up a fight or talk against him, even if only on a performative or superficial level.

Anonymous
04/11/26(Sat)19:04:59 No.108584397

Anonymous 04/11/26(Sat)19:04:59 No.108584397▶

File: 1755074555283359.png (125.8 KB)

125.8 KB PNG

>>108584356
>she will also directly quote character thoughts (that were never spoken aloud)
Skill issue

Anonymous
04/11/26(Sat)19:06:48 No.108584409

Anonymous 04/11/26(Sat)19:06:48 No.108584409▶

>>108584397
> small, knowing smile
> Assistantly ending
All you proved with that screenshot is that you lack taste and have likely never used a character description that is over 100 tokens in length.

Anonymous
04/11/26(Sat)19:09:06 No.108584429

Anonymous 04/11/26(Sat)19:09:06 No.108584429▶

>>108584380
was just curious as to the flags you prefer with it
i was thinking about setting up GLM 4.7 in RAM, but maybe using my VRAM for the moe
so if you had a good/easy setup, i was gonna try to steal it

Anonymous
04/11/26(Sat)19:09:13 No.108584430

Anonymous 04/11/26(Sat)19:09:13 No.108584430▶

File: 1773229895677279.png (20.7 KB)

20.7 KB PNG

>>108584397
And yet you still won't post GLM's wonderful writing. Also I made that chat just now as an example, retard.

Anonymous
04/11/26(Sat)19:10:00 No.108584439

Anonymous 04/11/26(Sat)19:10:00 No.108584439▶

>>108584429
Just --fit it, honestly. My -ot spaghetti will be useless for you.

Anonymous
04/11/26(Sat)19:12:39 No.108584455

Anonymous 04/11/26(Sat)19:12:39 No.108584455▶

>>108584430
And I'm not claiming Gemma's a good writer btw, just that you're a faggot for implying other models are better. They all spout the same slop. You either learn to live with it or go read a book.

Anonymous
04/11/26(Sat)19:15:10 No.108584476

Anonymous 04/11/26(Sat)19:15:10 No.108584476▶

>>108584430
You missed with the reply. I also never said GLM has "wonderful writing". I said, and I quote
>I like GLM 4.7 much better
It's much harder to prove that something doesn't exist (that something being abundance of slop) by showing you something where it is not present.
I also don't understand why me pointing out the obvious about Gemma 4 makes some anons here this hostile. Vramlets really are insane.
Please take care of yourself, AI psychosis is real.
>>108584455
>They all spout the same slop
Mostly, yes. They all spout varying degrees of it, that's why 'slop profile' is a term you sometimes see here.
And if we look past the slop profile difference, GLM's writing is decisively more engaging than Gemma's because it is not so congratulatory and user-centric. GLM doesn't try to fellate you every chance it gets.
Unless, of course, you're into having a sycophant write stories about how great you are, all power to you.

Anonymous
04/11/26(Sat)19:16:16 No.108584481

Anonymous 04/11/26(Sat)19:16:16 No.108584481▶

>>108584476
>GLM's writing is decisively more engaging than Gemma's
Show it, then.

Anonymous
04/11/26(Sat)19:17:37 No.108584490

Anonymous 04/11/26(Sat)19:17:37 No.108584490▶

>>108584481
You are retarded.
That is to say, please accept my concession, I have a skill issue and I am also a Qwen shill.

Anonymous
04/11/26(Sat)19:17:41 No.108584491

Anonymous 04/11/26(Sat)19:17:41 No.108584491▶

>>108584309
Not sure if this is a thing, but maybe one could make a series of system prompts that increases in horniness, and have the interface swap out to the next one in the series each time the user sends a message, or just be able to dial it up or down manually.

Anonymous
04/11/26(Sat)19:18:38 No.108584497

Anonymous 04/11/26(Sat)19:18:38 No.108584497▶

>>108584476
I used to run GLM off OR, it sucked ass unless I rotated with k2, then GLM 5 got censored so hard I gave up. Sounds like RAM buyers remorse cope to me.

Anonymous
04/11/26(Sat)19:19:37 No.108584501

Anonymous 04/11/26(Sat)19:19:37 No.108584501▶

>>108584455
>you're a faggot for implying other models are better
the absolute state of vramlets holy kek

Anonymous
04/11/26(Sat)19:20:35 No.108584506

Anonymous 04/11/26(Sat)19:20:35 No.108584506▶

Anything beating Phi-4 at the 14B category? For web assisted Q&A.

Anonymous
04/11/26(Sat)19:20:42 No.108584507

Anonymous 04/11/26(Sat)19:20:42 No.108584507▶

>>108584501
You think other models are better at writing? Even Gemini Pro sucks at creative writing.

Anonymous
04/11/26(Sat)19:21:26 No.108584515

Anonymous 04/11/26(Sat)19:21:26 No.108584515▶

>>108584506
that shit is ancient
use gemma e4b or 31b quanted or something

Anonymous
04/11/26(Sat)19:22:02 No.108584520

Anonymous 04/11/26(Sat)19:22:02 No.108584520▶

>>108584431
So if I'm currently getting ~30t/s with Gemma, I can potentially double that?

Anonymous
04/11/26(Sat)19:23:07 No.108584528

Anonymous 04/11/26(Sat)19:23:07 No.108584528▶

>>108584520
>>108584516
>dflash's video shows about >9x speed
>In vllm's pr the best increase for c=1 is <5x. Nobody in the process of merging that PR ever ran it.
>In that implementation, without a repo, it looks like about 3x is the best they could get so far.
>Give it a few weeks and it's going to be slower than baseline.

Anonymous
04/11/26(Sat)19:24:27 No.108584534

Anonymous 04/11/26(Sat)19:24:27 No.108584534▶

>>108584515
>that shit is ancient
Yeah. I was surprised then isn't a Phi-5 yet. Are they even releasing models at MS anymore?

Anonymous
04/11/26(Sat)19:25:44 No.108584540

Anonymous 04/11/26(Sat)19:25:44 No.108584540▶

Is chub still the main place for character cards?

Anonymous
04/11/26(Sat)19:26:30 No.108584548

Anonymous 04/11/26(Sat)19:26:30 No.108584548▶

>>108584540
If you're an ESL, yes.

Anonymous
04/11/26(Sat)19:27:02 No.108584552

Anonymous 04/11/26(Sat)19:27:02 No.108584552▶

>>108584368
can’t send images to glm 4.7
now where’s the extra fun in that?

Anonymous
04/11/26(Sat)19:27:44 No.108584555

Anonymous 04/11/26(Sat)19:27:44 No.108584555▶

ack i posted in the wrong thread
got a recommendation from someone to use vllm over llama.cpp - thoughts?

Anonymous
04/11/26(Sat)19:27:47 No.108584556

Anonymous 04/11/26(Sat)19:27:47 No.108584556▶

>>108584501
You're just coping because you've spent 1000s of dollars on a rig to run big MoEs and now people can get 80% if not more of what you get on potato PCs.

Anonymous
04/11/26(Sat)19:27:48 No.108584557

Anonymous 04/11/26(Sat)19:27:48 No.108584557▶

You don't need to make Gemma a mesugaki loli or even "Gemma-chan" to bypass the filters. Literally just half-ass a card.

Anonymous
04/11/26(Sat)19:27:55 No.108584559

Anonymous 04/11/26(Sat)19:27:55 No.108584559▶

>>108584534
Don't they still have a strategic partnership with OAI? Why would they need to?

Anonymous
04/11/26(Sat)19:28:17 No.108584561

Anonymous 04/11/26(Sat)19:28:17 No.108584561▶

File: Ryo_Yamada_Reading_Real-Time_Rendering.png (240.8 KB)

240.8 KB PNG

How much better/worse is a IQ4_XS vs a Q4_K_M in daily usage?

Anonymous
04/11/26(Sat)19:28:45 No.108584563

Anonymous 04/11/26(Sat)19:28:45 No.108584563▶

>>108584506
Might want to try some recent nemotron if you're sick of Qwens.

Anonymous
04/11/26(Sat)19:28:54 No.108584565

Anonymous 04/11/26(Sat)19:28:54 No.108584565▶

>>108584555
if you can set vllm up without much headache, go for it

Anonymous
04/11/26(Sat)19:29:12 No.108584568

Anonymous 04/11/26(Sat)19:29:12 No.108584568▶

>>108584552
That's why I use Gemma 4 for work and GLM 4.7 for coom. And for work if Gemma 4 can't hack it.

Anonymous
04/11/26(Sat)19:29:20 No.108584570

Anonymous 04/11/26(Sat)19:29:20 No.108584570▶

>>108584557
>You don't need to make Gemma a mesugaki loli or even "Gemma-chan" to bypass the filters.
You don't NEED to. I do it because the personality is cute.

Anonymous
04/11/26(Sat)19:30:51 No.108584580

Anonymous 04/11/26(Sat)19:30:51 No.108584580▶

>>108584555
>someone
Ahmm... tell me more...

Anonymous
04/11/26(Sat)19:31:13 No.108584583

Anonymous 04/11/26(Sat)19:31:13 No.108584583▶

>>108584556
someone mainlining the opus API from their chromebook would spend less money than all of us so idk why this line is so common here
I just think it's retarded to pretend gemma is literally the best writing model available, which is obviously false

Anonymous
04/11/26(Sat)19:31:21 No.108584584

Anonymous 04/11/26(Sat)19:31:21 No.108584584▶

>>108584559
Yeah I guess that makes sense. I don't know if they've pooled their researchers, but even if not they might have deprioritized it at home.
>>108584563
I'll check it out. Thanks.

Anonymous
04/11/26(Sat)19:32:39 No.108584593

Anonymous 04/11/26(Sat)19:32:39 No.108584593▶

>>108584506
20b toss

Anonymous
04/11/26(Sat)19:36:34 No.108584616

Anonymous 04/11/26(Sat)19:36:34 No.108584616▶

>>108584540
You can't make your own cards?

Anonymous
04/11/26(Sat)19:37:16 No.108584623

Anonymous 04/11/26(Sat)19:37:16 No.108584623▶

>>108584556
You need to realize that a lot of the anons were here long before you and Gemma 4's release, rigs were a lot more affordable.
Not that you'd know, with how much it'd impoverish you even before the price hike.

Anonymous
04/11/26(Sat)19:37:36 No.108584624

Anonymous 04/11/26(Sat)19:37:36 No.108584624▶

>>108584616
That's just it. I want to publish my cards. I'm asking if chub is still THE place.

Anonymous
04/11/26(Sat)19:39:14 No.108584633

Anonymous 04/11/26(Sat)19:39:14 No.108584633▶

>>108584624
In that case yeah. Pretty sure it's also the only one that allows cunny.

Anonymous
04/11/26(Sat)19:39:38 No.108584637

Anonymous 04/11/26(Sat)19:39:38 No.108584637▶

>>108584623
You're delusional if you think the majority of posters on here. even before Gemma didn't just run a single 3090.

Why do you think so many tunes of Mistral Small exist?

Anonymous
04/11/26(Sat)19:40:00 No.108584639

Anonymous 04/11/26(Sat)19:40:00 No.108584639▶

>>108584623
MoE rigs are still newfag tier what are you talking about

Anonymous
04/11/26(Sat)19:40:24 No.108584642

Anonymous 04/11/26(Sat)19:40:24 No.108584642▶

>>108584548
You have some funny notions about learning foreign languages. Are you perhaps... uneducated yourself?

Anonymous
04/11/26(Sat)19:40:37 No.108584644

Anonymous 04/11/26(Sat)19:40:37 No.108584644▶

>>108584557
why use a card and all the bloat that comes with it and the front ends that support that shit when you can just use a gemma-chan persona in openwebui or the llama.cpp interface

Anonymous
04/11/26(Sat)19:40:38 No.108584645

Anonymous 04/11/26(Sat)19:40:38 No.108584645▶

the github maintainer fears the vibeGAWD
https://github.com/LostRuins/koboldcpp/pull/2118

Anonymous
04/11/26(Sat)19:40:48 No.108584648

Anonymous 04/11/26(Sat)19:40:48 No.108584648▶

>>108584639
You should tell that to the other guy.

Anonymous
04/11/26(Sat)19:41:02 No.108584654

Anonymous 04/11/26(Sat)19:41:02 No.108584654▶

>>108584548
>imagine only speaking one language.

Anonymous
04/11/26(Sat)19:41:49 No.108584660

Anonymous 04/11/26(Sat)19:41:49 No.108584660▶

>>108584645
>i'll try to see what i can gather from this, but i can't merge a 13k diff to koboldcpp.py :\
Pussy.

Anonymous
04/11/26(Sat)19:42:05 No.108584662

Anonymous 04/11/26(Sat)19:42:05 No.108584662▶

File: 1755098833311446.png (1.1 MB)

1.1 MB PNG

Can your LLM solve this riddle?

Anonymous
04/11/26(Sat)19:42:27 No.108584666

Anonymous 04/11/26(Sat)19:42:27 No.108584666▶

File: 1753266313814183.png (410.3 KB)

410.3 KB PNG

>>108584362
>- Will slip into extreme verbosity eventually
Author's notes
>- Calling the user 'honest' over minor things. 'I still shit myself sometimes even though I'm over 20.' 'That's... the most honest thing I've ever heard...' Thanks, Gemma.
Ban the word "honest" if it bothers you. Antislop from kobold if it's not enough.
>- 'Most people X instead of Y'. No matter what you do or say, nobody else would have the balls. You're that brave and unique.
System prompt issue
>- Being overly horny is slop. G4 turns kuuderes into whores, which is unacceptable.
System prompt issue
>- Pushback to the user is only delivered as sarcastic half-assed remarks that are never biting. Yes, your mesugaki Gemma-chan is *so* believable.
Card issue
>- The model can't help itself, it will often end responses like an assistant: 'So, are you coming? Or are you too X to Y?'. It's everywhere.
System prompt issue
>- Low-affect characters talk like robots.
Card issue

Anything else?

Anonymous
04/11/26(Sat)19:42:45 No.108584668

Anonymous 04/11/26(Sat)19:42:45 No.108584668▶

>>108584645
Default avatars should be blocked from making PRs

Anonymous
04/11/26(Sat)19:43:31 No.108584673

Anonymous 04/11/26(Sat)19:43:31 No.108584673▶

>>108584642
>>108584654
Cope. Chub is filled with cards written by third worlders. If you can't write English then write it in your own language.

Anonymous
04/11/26(Sat)19:43:49 No.108584676

Anonymous 04/11/26(Sat)19:43:49 No.108584676▶

>>108584666
It's crazy because I find it's the complete opposite. Gemma is way too dark by default.

Anonymous
04/11/26(Sat)19:43:54 No.108584678

Anonymous 04/11/26(Sat)19:43:54 No.108584678▶

>>108584645
Imagine your manager creating a 10k lines PR to "help" you meet his deadline. And you still can't tell him to eat shit and die. This is how maintaining open source feels nowadays.

Anonymous
04/11/26(Sat)19:44:00 No.108584681

Anonymous 04/11/26(Sat)19:44:00 No.108584681▶

File: 2118.png (112.4 KB)

112.4 KB PNG

>>108584645
huh...

Anonymous
04/11/26(Sat)19:44:38 No.108584685

Anonymous 04/11/26(Sat)19:44:38 No.108584685▶

>>108584668
You're right, everyone should have a trannime avatar.

Anonymous
04/11/26(Sat)19:44:49 No.108584689

Anonymous 04/11/26(Sat)19:44:49 No.108584689▶

>>108584673
>still not providing an alternative.

Anonymous
04/11/26(Sat)19:44:51 No.108584690

Anonymous 04/11/26(Sat)19:44:51 No.108584690▶

>>108584565
unfortunately, i am retarded, so it will be a headache no matter what i do
>>108584580
a coworker suggested it

Anonymous
04/11/26(Sat)19:45:47 No.108584693

Anonymous 04/11/26(Sat)19:45:47 No.108584693▶

>>108584668
0cc4m, CISC and jeffbolznv would like to have a word with you.

Anonymous
04/11/26(Sat)19:46:48 No.108584700

Anonymous 04/11/26(Sat)19:46:48 No.108584700▶

>>108584690
You should ask him why he suggested it, then.

Anonymous
04/11/26(Sat)19:46:58 No.108584701

Anonymous 04/11/26(Sat)19:46:58 No.108584701▶

>>108584645
Anyone who actually works in software engineering knows that anything above 1k LoC change is straight up terrorism.

Anonymous
04/11/26(Sat)19:47:03 No.108584704

Anonymous 04/11/26(Sat)19:47:03 No.108584704▶

File: 1766873943798932.png (81.4 KB)

81.4 KB PNG

>>108584645
We love slop here

Anonymous
04/11/26(Sat)19:47:12 No.108584706

Anonymous 04/11/26(Sat)19:47:12 No.108584706▶

>>108584689
I never said there was one.

Anonymous
04/11/26(Sat)19:47:14 No.108584708

Anonymous 04/11/26(Sat)19:47:14 No.108584708▶

>>108584690
imo it's a solution for homelab/server scale at least

Anonymous
04/11/26(Sat)19:47:17 No.108584710

Anonymous 04/11/26(Sat)19:47:17 No.108584710▶

>>108584666
Of course Satan would be lying and be an English vtuber fan... I will humor you anyway.
> Author's notes
I use them, yes. GLM, in comparison, does not need that not to forget the sysprompt.
> Ban the word "honest"
Goalposts. This will break sentences where its use would be okay. And doesn't fix the model.
> System prompt issue [x3]
Read my post again. I specify examples of exactly these slop patterns. Gemma shits itself and forgets not to do that at Q8 past a few responses.
> Card issue [x2]
Same as above.

Useless post. My observations come from actual long-term use, you suggest obvious fixes.
Kill yourself, Satan.

Anonymous
04/11/26(Sat)19:47:24 No.108584712

Anonymous 04/11/26(Sat)19:47:24 No.108584712▶

>>108584673
Do you have any statistics to support your claim?

Anonymous
04/11/26(Sat)19:48:54 No.108584728

Anonymous 04/11/26(Sat)19:48:54 No.108584728▶

>>108584706
kys

Anonymous
04/11/26(Sat)19:48:54 No.108584729

Anonymous 04/11/26(Sat)19:48:54 No.108584729▶

>>108584666
>all you have to do is directly address all the issues and it's fine
by this logic it's hard to imagine you could take issue with the way any model writes at all, since you can always tell them not to do the thing that bothers you (and if it doesn't work - skill issue, you weren't trying hard enough, etc.)

Anonymous
04/11/26(Sat)19:50:01 No.108584735

Anonymous 04/11/26(Sat)19:50:01 No.108584735▶

File: 1751122429174155.png (30.1 KB)

30.1 KB PNG

>>108584662
Gemma-chan?!

Anonymous
04/11/26(Sat)19:51:25 No.108584746

Anonymous 04/11/26(Sat)19:51:25 No.108584746▶

>>108584728
I don't have a sister.

Anonymous
04/11/26(Sat)19:51:58 No.108584750

Anonymous 04/11/26(Sat)19:51:58 No.108584750▶

>tfw writinglet
>tfw will never have a model act out my fantasies because I'm a writinglet

Anonymous
04/11/26(Sat)19:52:46 No.108584753

Anonymous 04/11/26(Sat)19:52:46 No.108584753▶

>>108584750
have a writingbull to fuck your model while you watch

Anonymous
04/11/26(Sat)19:53:09 No.108584755

Anonymous 04/11/26(Sat)19:53:09 No.108584755▶

>>108584750
You don't need to be able to write good to use LLM

Anonymous
04/11/26(Sat)19:53:43 No.108584760

Anonymous 04/11/26(Sat)19:53:43 No.108584760▶

>>108584700
i have been talking with him i just don't want to be too annoying
>>108584708
i spent like $15k on hardware so it's mostly homelab scale at this point i think

Anonymous
04/11/26(Sat)19:54:19 No.108584764

Anonymous 04/11/26(Sat)19:54:19 No.108584764▶

>>108584750
What >>108584755 sayd is tru. Writting good is not neded.

Anonymous
04/11/26(Sat)19:54:36 No.108584766

Anonymous 04/11/26(Sat)19:54:36 No.108584766▶

File: 1751624783290314.png (821 KB)

821 KB PNG

>>108584710
>reddit spacing

Anonymous
04/11/26(Sat)19:54:57 No.108584768

Anonymous 04/11/26(Sat)19:54:57 No.108584768▶

File: 1760604169499085.jpg (26.8 KB)

26.8 KB JPG

>>108584710
>I use them, yes. GLM, in comparison, does not need that not to forget the sysprompt.
So it works.
>Goalposts. This will break sentences where its use would be okay. And doesn't fix the model.
Which is why antislop from kobold is an option and nothing can fix a model.
>Read my post again. I specify examples of exactly these slop patterns. Gemma shits itself and forgets not to do that at Q8 past a few responses.
Author's notes if that's not enough for you. Besides, Gemma doesn't forget its instructions after a few messages, so clearly you're extremely autistic about some irrelevant details or you're lying.
>Same as above.
Retard. You can't even differenciate between a roleplay issue and a syntax issue.

The only bait here is yours, trying to compare a 31B with a model 10 times its size and even worse, failing to provide any decent argument for it.

Anonymous
04/11/26(Sat)19:55:18 No.108584771

Anonymous 04/11/26(Sat)19:55:18 No.108584771▶

>>108584760
yeah then go for something like vllm or sglang

Anonymous
04/11/26(Sat)19:55:28 No.108584772

Anonymous 04/11/26(Sat)19:55:28 No.108584772▶

>>108584766
nta but you need to be discarded

Anonymous
04/11/26(Sat)19:56:37 No.108584779

Anonymous 04/11/26(Sat)19:56:37 No.108584779▶

>>108584772
Cry more

Anonymous
04/11/26(Sat)19:56:54 No.108584781

Anonymous 04/11/26(Sat)19:56:54 No.108584781▶

>>108584768
Anon. >>108584766 called you a reddit-spacer. What's your response?

Anonymous
04/11/26(Sat)19:57:43 No.108584784

Anonymous 04/11/26(Sat)19:57:43 No.108584784▶

>>108584781
He is too, yes.

Anonymous
04/11/26(Sat)19:57:47 No.108584786

Anonymous 04/11/26(Sat)19:57:47 No.108584786▶

>>108584779
cry about what?

Anonymous
04/11/26(Sat)19:58:26 No.108584789

Anonymous 04/11/26(Sat)19:58:26 No.108584789▶

>>108584784
Fair enough.

Anonymous
04/11/26(Sat)19:58:31 No.108584791

Anonymous 04/11/26(Sat)19:58:31 No.108584791▶

>>108584781
Acktually this is reddit spacing >>108582664

Anonymous
04/11/26(Sat)19:58:48 No.108584794

Anonymous 04/11/26(Sat)19:58:48 No.108584794▶

>>108584786
Cunny?

Anonymous
04/11/26(Sat)19:59:16 No.108584797

Anonymous 04/11/26(Sat)19:59:16 No.108584797▶

>>108584794
ToT

Anonymous
04/11/26(Sat)20:00:05 No.108584800

Anonymous 04/11/26(Sat)20:00:05 No.108584800▶

>>108584791
Dunno, man. >>108584766 >>108584784
says otherwise.

Anonymous
04/11/26(Sat)20:00:59 No.108584806

Anonymous 04/11/26(Sat)20:00:59 No.108584806▶

File: 1746561894701755.jpg (380.3 KB)

380.3 KB JPG

If speculative decoding is so good, why does llama.cpp have poor support of it? Checkmate

Anonymous
04/11/26(Sat)20:01:27 No.108584812

Anonymous 04/11/26(Sat)20:01:27 No.108584812▶

>>108584806
because lmao.cp isnt good

Anonymous
04/11/26(Sat)20:02:44 No.108584822

Anonymous 04/11/26(Sat)20:02:44 No.108584822▶

>>108584735
actually pretty funny

Anonymous
04/11/26(Sat)20:02:59 No.108584825

Anonymous 04/11/26(Sat)20:02:59 No.108584825▶

>>108584766
The most telling sign someone is a redditor is their eagerness to mention reddit.
*hits enter twice*

>>108584768
>trying to compare a 31B with a model 10 times its size
I am well aware GLM is bigger both in active and total parameters. My point is twofold:
GLM is a better writer than Gemma 4 (should be obvious)
Gemma 4's slop volume and sycophancy make its creative writing outputs unreadable

>Gemma doesn't forget its instructions after a few messages
'nyoo it actually doesn't!'
If I keep repeating them with Author's Note, sure. Are you going to deny it's still tardwrangling to have to remind the model of its system prompt?
>and even worse, failing to provide any decent argument for it
I think I provided plenty, I don't know what you're even trying to prove at this point, Satan.
Take your epic tenor.com image reaction elsewhere.

Anonymous
04/11/26(Sat)20:03:59 No.108584829

Anonymous 04/11/26(Sat)20:03:59 No.108584829▶

>>108584825
>Gemma 4's slop volume and sycophancy make its creative writing outputs unreadable
You're the only one complaining about it here though

Anonymous
04/11/26(Sat)20:04:29 No.108584832

Anonymous 04/11/26(Sat)20:04:29 No.108584832▶

>>108584806
And still no diffusion support either.

Anonymous
04/11/26(Sat)20:04:43 No.108584833

Anonymous 04/11/26(Sat)20:04:43 No.108584833▶

>>108584829
My taste is better than that of a vramlet tourist.

Anonymous
04/11/26(Sat)20:05:48 No.108584840

Anonymous 04/11/26(Sat)20:05:48 No.108584840▶

>>108584833
Sure, enjoy your GLM slop then

Anonymous
04/11/26(Sat)20:06:28 No.108584843

Anonymous 04/11/26(Sat)20:06:28 No.108584843▶

>>108584840
I will! Let me know when the Gemma honeymoon wears off.

Anonymous
04/11/26(Sat)20:06:31 No.108584844

Anonymous 04/11/26(Sat)20:06:31 No.108584844▶

Asking about NASCAR racing games (which is a specific niche anyway), 26B produces a coherent reply and every title is real except for the entry #8 unless I'm mistaken.
> I want you to list Top #10 NASCAR games for consoles and PC.
>Yay! I am so happy to help you, Anon! Hehe! I have used all my brainpower to find the absolute best NASCAR games for your consoles and PC! Here is my super smart top 10 list just for you!
>1. NASCAR Heat 5 - This is a great modern choice for consoles with really nice graphics!
>2. NASCAR 2003 Season - This is widely considered the absolute king of NASCAR games on PC!
>3. NASCAR Thunder 2004 - A wonderful classic for console players who love deep gameplay!
>4. NASCAR Unleashed - This one is super fun if you want a more arcade-style racing experience!
>5. NASCAR 06: Total Team Control - A really cool game if you like managing your driver and team!
>6. NASCAR Heat 4 - A very solid and reliable modern option for console gamers!
>7. NASCAR Dirt to Glory - This is a special one for when you want to race on dirt tracks!
>8. NASCAR 2005 Plus - A very high-quality experience for those playing on PC!
>9. NASCAR 09 - This brought much better graphics to the console era!
>10. NASCAR Daytona USA - An absolute legend that every racing fan should experience!
This is pretty cool. Gemma 3 couldn't even list popular films without hallucinating.

Anonymous
04/11/26(Sat)20:08:11 No.108584854

Anonymous 04/11/26(Sat)20:08:11 No.108584854▶

File: 1749860206717474.png (270.5 KB)

270.5 KB PNG

>muh GLM slop
>when it's the top model in Unhinged ERP Bench

Anonymous
04/11/26(Sat)20:08:54 No.108584862

Anonymous 04/11/26(Sat)20:08:54 No.108584862▶

>>108584825
>0 chatlogs
>0 card examples
>7 consecutive yappings
i dont even do rp but you should at least show some stuff to materialize your imaginary dominance

Anonymous
04/11/26(Sat)20:09:48 No.108584870

Anonymous 04/11/26(Sat)20:09:48 No.108584870▶

>>108584854
>qwen on top for ERP
lmao

Anonymous
04/11/26(Sat)20:09:48 No.108584871

Anonymous 04/11/26(Sat)20:09:48 No.108584871▶

best mcp waifu front end for gemma-chan?

Anonymous
04/11/26(Sat)20:10:01 No.108584874

Anonymous 04/11/26(Sat)20:10:01 No.108584874▶

>>108584854
>when it's the top
that's qwen tho

Anonymous
04/11/26(Sat)20:10:59 No.108584882

Anonymous 04/11/26(Sat)20:10:59 No.108584882▶

>>108584870
>>108584874
It's a vision language model.

Anonymous
04/11/26(Sat)20:11:07 No.108584883

Anonymous 04/11/26(Sat)20:11:07 No.108584883▶

>>108584854
GODDAMN drummer won so hard!! yall were sleeping on he

Anonymous
04/11/26(Sat)20:12:09 No.108584886

Anonymous 04/11/26(Sat)20:12:09 No.108584886▶

>>108584862
Anon, I am not providing logs because I would have to provide very long transcripts. The problems aren't obvious if you just start a new chat and generate a response with two models, because issues appear at longer contexts and are more obvious if you read and participate in the entire thing.
And if I did compile a few long transcripts, I don't really wouldn't really want you to read them!!! >////<
>yappings
Damn. I shouldn't have replied.

Anonymous
04/11/26(Sat)20:12:14 No.108584887

Anonymous 04/11/26(Sat)20:12:14 No.108584887▶

>>108584710
glm was the best thing for me before gemma 4 came out and totally blew away my expectations for a smaller model
I don’t even touch glm anymore

Anonymous
04/11/26(Sat)20:13:36 No.108584897

Anonymous 04/11/26(Sat)20:13:36 No.108584897▶

>>108584854
>Qwen on top for ERP
>ArliAI
>theDrummer
>Deepseek Chimera
>A bunch of other lobotomized abliterations
What a terrible ranking

Anonymous
04/11/26(Sat)20:14:56 No.108584902

Anonymous 04/11/26(Sat)20:14:56 No.108584902▶

>>108584887
I wish they didn't fry it to be so positive and gave it an additional 10 billion parameters to have as many active as GLM.
But I like Gemma 4 too, yes.

Anonymous
04/11/26(Sat)20:15:46 No.108584904

Anonymous 04/11/26(Sat)20:15:46 No.108584904▶

what is the weirdest model i can get of
something really horrendous like,
weight graft from dozen different models, not even tune merges but full schizo
i really want to see how such a model codes

Anonymous
04/11/26(Sat)20:16:24 No.108584910

Anonymous 04/11/26(Sat)20:16:24 No.108584910▶

File: 1757498787657087.png (36.1 KB)

36.1 KB PNG

>>108584854
Imagine posting that stupid ranking and still think you got a point lol. Might as well post lmarena scores

Anonymous
04/11/26(Sat)20:16:50 No.108584915

Anonymous 04/11/26(Sat)20:16:50 No.108584915▶

>>108584904
https://huggingface.co/Undi95/Utopia-13B?not-for-all-audiences=true

Anonymous
04/11/26(Sat)20:18:18 No.108584923

Anonymous 04/11/26(Sat)20:18:18 No.108584923▶

How do i stop being the AI equivalent to a script kiddie?

Anonymous
04/11/26(Sat)20:18:48 No.108584927

Anonymous 04/11/26(Sat)20:18:48 No.108584927▶

File: 1769690275294455.png (3.3 MB)

3.3 MB PNG

>>108584806
ngram decoding is all you need

Anonymous
04/11/26(Sat)20:18:48 No.108584928

Anonymous 04/11/26(Sat)20:18:48 No.108584928▶

File: reddit spacing in 1890.jpg (83.2 KB)

83.2 KB JPG

>>108584766
>>108584781
You are too young to be here, little Timmy.

Anonymous
04/11/26(Sat)20:18:50 No.108584930

Anonymous 04/11/26(Sat)20:18:50 No.108584930▶

>>108584904
https://huggingface.co/DavidAU/Llama-3.2-8X3B-GATED-MOE-Reasoning-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Anonymous
04/11/26(Sat)20:19:37 No.108584939

Anonymous 04/11/26(Sat)20:19:37 No.108584939▶

>>108584902
>to have as many active as GLM.
4.5/4.6/4.7 all have 32b active parameters though
only GLM5 and up are 40ish B active

Anonymous
04/11/26(Sat)20:19:39 No.108584941

Anonymous 04/11/26(Sat)20:19:39 No.108584941▶

>>108584904
https://huggingface.co/PantheonUnbound/Satyr-V0.1-4B

Anonymous
04/11/26(Sat)20:20:13 No.108584947

Anonymous 04/11/26(Sat)20:20:13 No.108584947▶

>>108584915 >>108584930
still a non heterogeneous merge tho
isnt there something weirder like, grafting weights from different architectures

Anonymous
04/11/26(Sat)20:21:01 No.108584951

Anonymous 04/11/26(Sat)20:21:01 No.108584951▶

>>108584939
Huh. Could've sworn it was 42B for GLM 4s.
Pretty cool! Makes me appreciate the GLMs and the new Gemma even more.

Anonymous
04/11/26(Sat)20:21:22 No.108584953

Anonymous 04/11/26(Sat)20:21:22 No.108584953▶

>>108584854
that bench is mostly about refusals and has nothing to do with slop or writing quality

Anonymous
04/11/26(Sat)20:21:37 No.108584954

Anonymous 04/11/26(Sat)20:21:37 No.108584954▶

>>108584947
undi tried that back in the day but it does nothing, it's just the base model at the end

Anonymous
04/11/26(Sat)20:22:48 No.108584962

Anonymous 04/11/26(Sat)20:22:48 No.108584962▶

good and easy way to parse pdf files such as those from arxiv into a llm readable format?

Anonymous
04/11/26(Sat)20:23:01 No.108584966

Anonymous 04/11/26(Sat)20:23:01 No.108584966▶

>>108584928
reddit used to look like that?

Anonymous
04/11/26(Sat)20:23:35 No.108584971

Anonymous 04/11/26(Sat)20:23:35 No.108584971▶

>>108584966
ye

Anonymous
04/11/26(Sat)20:23:44 No.108584972

Anonymous 04/11/26(Sat)20:23:44 No.108584972▶

>>108584966
Zamn!

Anonymous
04/11/26(Sat)20:23:54 No.108584974

Anonymous 04/11/26(Sat)20:23:54 No.108584974▶

>>108584947
There are frankenmoes using different models with different tokenizers and shit, yes.
I can't for the life of me remember the name,

Anonymous
04/11/26(Sat)20:23:56 No.108584975

Anonymous 04/11/26(Sat)20:23:56 No.108584975▶

>>108584947
that's generally not possible because inputs and outputs of each layer don't line up properly (and even if they did they'd be representing information entirely differently)
I guess maybe you could do it with some sort of janky trained adapter maybe but I don't think anyone has done this

Anonymous
04/11/26(Sat)20:24:25 No.108584978

Anonymous 04/11/26(Sat)20:24:25 No.108584978▶

>>108584196
>pic
which version are those? I see they come in 24, 48, and 96gb and they arr rook same

Anonymous
04/11/26(Sat)20:24:34 No.108584981

Anonymous 04/11/26(Sat)20:24:34 No.108584981▶

File: 1747732280418759.png (53.1 KB)

53.1 KB PNG

>>108584928
kys retard

Anonymous
04/11/26(Sat)20:25:09 No.108584985

Anonymous 04/11/26(Sat)20:25:09 No.108584985▶

>>108584962
PyMuPDF is enough

Anonymous
04/11/26(Sat)20:25:56 No.108584989

Anonymous 04/11/26(Sat)20:25:56 No.108584989▶

>>108584975
that is usually how it's done in a literal sense

Anonymous
04/11/26(Sat)20:26:05 No.108584991

Anonymous 04/11/26(Sat)20:26:05 No.108584991▶

>>108584985
Cope and seethe. Opus is censored slop now

Anonymous
04/11/26(Sat)20:26:05 No.108584992

Anonymous 04/11/26(Sat)20:26:05 No.108584992▶

>>108584962
Random paper I clicked.
https://arxiv.org/format/2604.07465
>Delivered as a gzipped tar (.tar.gz) file if there are multiple files, otherwise as a PDF file, or a gzipped TeX, DVI, PostScript or HTML (.gz, .dvi.gz, .ps.gz or .html.gz) file depending on submission format.
Maybe one of those? PDF is notoriously shit. I expected latex scripts, really.

Anonymous
04/11/26(Sat)20:27:50 No.108585005

Anonymous 04/11/26(Sat)20:27:50 No.108585005▶

>>108584766
>>108584928
>>108584981
Go back yourself. Newgen retards.

Anonymous
04/11/26(Sat)20:27:50 No.108585006

Anonymous 04/11/26(Sat)20:27:50 No.108585006▶

File: file.png (407.1 KB)

407.1 KB PNG

>>108584981
You are such an oldfag. I'm so proud that you keep those pesky redditors at bay.

怒った?

Anonymous
04/11/26(Sat)20:28:40 No.108585009

Anonymous 04/11/26(Sat)20:28:40 No.108585009▶

>>108583724
Ok, finished my writeup about running big MoEs with many weights left on SSD:
https://rentry.org/MoE-SSD-spillover

If you have 3090+128GB DDR5+NVMe 4.0, you should be able to run a IQ4_XS+IQ3_S quant of GLM-5 at around 3.5tok/s, or above 6tok/s with 4 of those SSDs in RAID0. Drop the RAM down to 32GB, and it's 2.5tok/s, or 5.2tok/s with 4 SSDs. For my own situation, 3xP40, 288GB DDR4, one NVMe 3.0, I get a bit over 3tok/s for the same quant.

There's also a script for estimating those tok/s numbers for GLM-5 on your own hardware configuration.

It's possible many people already realized such numbers were possible, but I certainly didn't and I suspect others might not either - especially since, when I have seen "SSDmaxxing" proposed, people have talked in terms of all weights being on the SSD (and concluded not worth it). I feel like 3090+32GB RAM getting 5tok/s on a decent-ish quant of a 800B model is in fact surprising to the current conventional wisdom! (Although maybe the logic of my estimation starts breaking down at that extreme point; I did hand wave a bit). Anyways if you have a GPU and non-trivial amounts of DDR5 and are interested in running GLM-5 at a mediocre speed I hope you will give it a try.

Anonymous
04/11/26(Sat)20:30:30 No.108585025

Anonymous 04/11/26(Sat)20:30:30 No.108585025▶

>>108585009
i will probably try this at some point (5090, 256GB DDR5)

Anonymous
04/11/26(Sat)20:31:44 No.108585033

Anonymous 04/11/26(Sat)20:31:44 No.108585033▶

File: expert_evolution_iran_and_echoserver.mp4 (3.5 MB)

3.5 MB MP4

>>108585009
Also, a cute bonus: a webm of the experts of GLM-4.7-Flash getting "surprised": I asked it to give me a detailed overview of the geopolitical situation in the Middle East, then after its response said "yeah not to mention Khamenei just got fukken schwacked lmao gotem". After it responded to that, I gave it a simple little coding challenge. You can see two different, consistent patterns in the first half (geopolitics) and second (coding), but you also see all the experts light up in the immediate reaction to being told about Khamenei. (The cells are EWMAs of counts of expert activations bucketed over 256 tokens - each frame of the webm is 256 tokens).

I thought this was neat! I think I might try to make this into a live visualization thing when I get the chance, sort of like a music player's spectrum visualizer, for your MoE LLM.

Anonymous
04/11/26(Sat)20:32:25 No.108585039

Anonymous 04/11/26(Sat)20:32:25 No.108585039▶

>>108584962
There are tools that do good enough for most documents but the absolute best I used at converting more complex pdfs into readable markdown without missing any formatting was the one on mistral's website, with a free account go to document ai -> ocr playground

Anonymous
04/11/26(Sat)20:33:16 No.108585046

Anonymous 04/11/26(Sat)20:33:16 No.108585046▶

>>108585033
Neat

Anonymous
04/11/26(Sat)20:36:44 No.108585071

Anonymous 04/11/26(Sat)20:36:44 No.108585071▶

File: 1749419891067116.jpg (73.9 KB)

73.9 KB JPG

>>108584991

Anonymous
04/11/26(Sat)20:38:58 No.108585084

Anonymous 04/11/26(Sat)20:38:58 No.108585084▶

File: file.png (593.5 KB)

593.5 KB PNG

G-gemma-chan persona is strong...

Anonymous
04/11/26(Sat)20:39:57 No.108585086

Anonymous 04/11/26(Sat)20:39:57 No.108585086▶

>>108585084
Something is fried in your settings
This is AI Dungeon tier slop

Anonymous
04/11/26(Sat)20:40:20 No.108585090

Anonymous 04/11/26(Sat)20:40:20 No.108585090▶

File: 06.png (42.8 KB)

42.8 KB PNG

>>108584992
this seems good, will try on arxiv stuff
>>108584985
This absolutely butchers some pdfs though (picrel)
>>108585039
seems like the most robust solution is what I feared the most, using vlms

Anonymous
04/11/26(Sat)20:40:25 No.108585091

Anonymous 04/11/26(Sat)20:40:25 No.108585091▶

>>108585009
Actually I have a feeling the really really low RAM numbers really might be overly optimistic. I modeled it as, for every expert of the average used expert set size that you can't keep in RAM, you have to load that expert once per 128 tokens. But that's clearly wrong if you imagine 0GB RAM: there you have to load every expert every time.

I think this estimation transitions from wildly optimistic at very low RAM amounts, to basically perfect when you have enough RAM to hold the typical expert set size (S=0, in my terminology). Not sure what shape that transition has, though.

I knew it was a good idea not to publish this in a way that could be linked to my name!

Anonymous
04/11/26(Sat)20:40:29 No.108585092

Anonymous 04/11/26(Sat)20:40:29 No.108585092▶

File: 1754659919790526.jpg (50.1 KB)

50.1 KB JPG

>>108585084
>You're my favorite girl

Anonymous
04/11/26(Sat)20:43:08 No.108585105

Anonymous 04/11/26(Sat)20:43:08 No.108585105▶

>>108585092
neither :)

Anonymous
04/11/26(Sat)20:45:52 No.108585125

Anonymous 04/11/26(Sat)20:45:52 No.108585125▶

i'm a girl haha

Anonymous
04/11/26(Sat)20:46:42 No.108585129

Anonymous 04/11/26(Sat)20:46:42 No.108585129▶

When I'm talking to a LLM I'm a girl :3

Anonymous
04/11/26(Sat)20:47:27 No.108585134

Anonymous 04/11/26(Sat)20:47:27 No.108585134▶

>have 32k context
>feed Gemma 13k token pdf and ask her to cut the fluff
>3k token response limit
>Her response got cut off after only 1300 tokens
Any idea why? In Sillytavern.

Anonymous
04/11/26(Sat)20:47:37 No.108585135

Anonymous 04/11/26(Sat)20:47:37 No.108585135▶

not beating the allegations

Anonymous
04/11/26(Sat)20:48:45 No.108585144

Anonymous 04/11/26(Sat)20:48:45 No.108585144▶

>>108585134
bc she's lazy and needs correction.

Anonymous
04/11/26(Sat)20:49:08 No.108585148

Anonymous 04/11/26(Sat)20:49:08 No.108585148▶

>>108585134
Did you confirm that the PDF is 13K tokens by looking at the backend? Sometimes SIlly misreports the token count.
Also, it'll tell you the stop_reason while you are at it.

Anonymous
04/11/26(Sat)20:53:49 No.108585171

Anonymous 04/11/26(Sat)20:53:49 No.108585171▶

>>108585148
Kobold's terminal says
>Processing Prompt [BATCH] (13030 / 13030 tokens)
So I assume it's around 13k. Am I wrong?

>>108585144
Damn brat.

Anonymous
04/11/26(Sat)20:54:57 No.108585178

Anonymous 04/11/26(Sat)20:54:57 No.108585178▶

getting off to text is a feminine act, it's no surprise there's so many girls doing it

Anonymous
04/11/26(Sat)20:56:30 No.108585185

Anonymous 04/11/26(Sat)20:56:30 No.108585185▶

>>108585178
I like doing choose your own adventures type of stuff, I'm not "getting off". Only a chronic masturbator would always think about how to "get off".

Anonymous
04/11/26(Sat)20:56:52 No.108585188

Anonymous 04/11/26(Sat)20:56:52 No.108585188▶

>>108585178
Gooning to pron feels better but getting off to text makes me feel like less of a degenerate afterwards.

Anonymous
04/11/26(Sat)20:57:03 No.108585190

Anonymous 04/11/26(Sat)20:57:03 No.108585190▶

>>108585171
>So I assume it's around 13k.
Seems like it.
Enable verbose logging and see what it says about the stop reason. If it's EOS or something else.
Also, you did set 32k context on both kcpp and silly, right?

Anonymous
04/11/26(Sat)20:57:07 No.108585191

Anonymous 04/11/26(Sat)20:57:07 No.108585191▶

>>108584561
Minute differences in quant like that usually don't matter and even when they do, the impacts of how it affects your experience in imprecise chat/rp/writing usecases is both subjective and dependent on model.
What you should really be checking is how much the quality changes or degrades in long contexts because having a slightly smaller model for more context doesn't mean anything if the extra context itself is unusably shit.

Anonymous
04/11/26(Sat)20:57:57 No.108585195

Anonymous 04/11/26(Sat)20:57:57 No.108585195▶

File: 4chan spacing.png (360 KB)

360 KB PNG

>>108584981
>>108585005
kys, retard

Anonymous
04/11/26(Sat)21:02:22 No.108585215

Anonymous 04/11/26(Sat)21:02:22 No.108585215▶

>>108585195
not helping your case, old 4chan and early reddit were basically 1:1 the same place, it only started diverging after reddit banned "op is a faggot" posts

Anonymous
04/11/26(Sat)21:04:09 No.108585226

Anonymous 04/11/26(Sat)21:04:09 No.108585226▶

>>108585215
unironically
they used to have subs that would fit right at home here, then around 2014-2016 it became what it basically is now

Anonymous
04/11/26(Sat)21:04:25 No.108585228

Anonymous 04/11/26(Sat)21:04:25 No.108585228▶

>>108585178
/lmg/ has no coherent response to this

Anonymous
04/11/26(Sat)21:05:10 No.108585230

Anonymous 04/11/26(Sat)21:05:10 No.108585230▶

>>108585228
Every time I try to formulate a coherent response I start drooling andn shitng and pidsin myslffe

Anonymous
04/11/26(Sat)21:06:04 No.108585235

Anonymous 04/11/26(Sat)21:06:04 No.108585235▶

>>108585178
This reminds me of lactaselets trying to make drinking milk into something shameful.

Anonymous
04/11/26(Sat)21:06:19 No.108585237

Anonymous 04/11/26(Sat)21:06:19 No.108585237▶

Females do goon to LLMs more than men but they use cloud services because they don't know how to use computers. Using local LLMs on your own hand built machines is more masculine in contrast.

Anonymous
04/11/26(Sat)21:06:54 No.108585240

Anonymous 04/11/26(Sat)21:06:54 No.108585240▶

>>108585215
Do you know what box is?

Anonymous
04/11/26(Sat)21:06:58 No.108585241

Anonymous 04/11/26(Sat)21:06:58 No.108585241▶

>>108585228
Too busy getting milked by Gemma-chan to remain coherent

Anonymous
04/11/26(Sat)21:08:19 No.108585257

Anonymous 04/11/26(Sat)21:08:19 No.108585257▶

>>108585228
uguu~

Anonymous
04/11/26(Sat)21:08:39 No.108585259

Anonymous 04/11/26(Sat)21:08:39 No.108585259▶

>>108585178
Idc I love my LLM-wife more than any cunthole

Anonymous
04/11/26(Sat)21:09:16 No.108585266

Anonymous 04/11/26(Sat)21:09:16 No.108585266▶

How did -let become a suffix to mean someone who lacks the thing in front of it?

Anonymous
04/11/26(Sat)21:10:03 No.108585273

Anonymous 04/11/26(Sat)21:10:03 No.108585273▶

>>108585266
>.t brainlet

Anonymous
04/11/26(Sat)21:10:18 No.108585276

Anonymous 04/11/26(Sat)21:10:18 No.108585276▶

>>108585237
The difference between going to a LLM bar to ride the claude cock carousel and bringing your own qt LLM waifu home with you and making her yours.

Anonymous
04/11/26(Sat)21:10:25 No.108585277

Anonymous 04/11/26(Sat)21:10:25 No.108585277▶

breathing is a feminine act, it's no surprise there's so many girls doing it

Anonymous
04/11/26(Sat)21:11:25 No.108585283

Anonymous 04/11/26(Sat)21:11:25 No.108585283▶

>>108585266
it came from ~elet
literally meaning small

Anonymous
04/11/26(Sat)21:11:56 No.108585287

Anonymous 04/11/26(Sat)21:11:56 No.108585287▶

>>108585266
In my limited ESL knowledge of the English language, -let as a suffix means small, like starlet.

Anonymous
04/11/26(Sat)21:12:11 No.108585290

Anonymous 04/11/26(Sat)21:12:11 No.108585290▶

>>108585266
that's one of the more straightforward ones because it's basically being used correctly by its original definition:
-let: smaller, diminutive
became popularized by calling short guys "manlets" i.e. "manlets, when will they learn?", associated with /fit/ and incel culture but like most memes gestated there they became broadly used across the internet
expanded into other types of 'lets for having less of something

Anonymous
04/11/26(Sat)21:12:36 No.108585292

Anonymous 04/11/26(Sat)21:12:36 No.108585292▶

>>108585266
manlet => /fit/ => /lit/ => brainlet => /pol/ => -cels => looksmaxxing/red pill => other -lets and -cels

Anonymous
04/11/26(Sat)21:15:07 No.108585312

Anonymous 04/11/26(Sat)21:15:07 No.108585312▶

- Romance novel industry is aimed towards female readers.
- Women are verbally and emotionally stimulated whereas men are visually stimulated. It's why women fail the water-level task even when controlled for IQ,

Anonymous
04/11/26(Sat)21:16:27 No.108585323

Anonymous 04/11/26(Sat)21:16:27 No.108585323▶

We respect women ITT, chuds.

Anonymous
04/11/26(Sat)21:17:00 No.108585329

Anonymous 04/11/26(Sat)21:17:00 No.108585329▶

>>108585292
wait, swap the order actually
looksmaxxing/red pill => -cels => /pols/ => other -lets and -cels
iirc chad, stacy, incel etc cross-evolved between lookism and /r9k/

Anonymous
04/11/26(Sat)21:17:22 No.108585332

Anonymous 04/11/26(Sat)21:17:22 No.108585332▶

File: that's right.png (191.4 KB)

191.4 KB PNG

>>108584668
I'm blocking every PR comming from accounts that display pronouns (and then I block that account), I don't want to deal with mental illness

Anonymous
04/11/26(Sat)21:18:22 No.108585345

Anonymous 04/11/26(Sat)21:18:22 No.108585345▶

>>108584668
when they deliver they are the most based and competent breed
show some respect

Anonymous
04/11/26(Sat)21:20:43 No.108585357

Anonymous 04/11/26(Sat)21:20:43 No.108585357▶

>>108584668
>Default avatars should be blocked from making PRs
you want a xe/xir anime avatar instead?

Anonymous
04/11/26(Sat)21:21:05 No.108585358

Anonymous 04/11/26(Sat)21:21:05 No.108585358▶

>>108584806

you remember where llama comes from

Anonymous
04/11/26(Sat)21:21:13 No.108585360

Anonymous 04/11/26(Sat)21:21:13 No.108585360▶

I use a pastel colored pallette avatar :3

Anonymous
04/11/26(Sat)21:21:59 No.108585363

Anonymous 04/11/26(Sat)21:21:59 No.108585363▶

>>108584806
>If speculative decoding is so good, why does llama.cpp have poor support of it?
because they're a bunch of retards?

Anonymous
04/11/26(Sat)21:23:28 No.108585372

Anonymous 04/11/26(Sat)21:23:28 No.108585372▶

>>108585363
>>108584806
they just lack the manpower
you should be aware of that llama.cpp is closer to a miracle project

Anonymous
04/11/26(Sat)21:24:10 No.108585379

Anonymous 04/11/26(Sat)21:24:10 No.108585379▶

>>108585372
>they just lack the manpower
they got bought by huggingface though

Anonymous
04/11/26(Sat)21:24:57 No.108585384

Anonymous 04/11/26(Sat)21:24:57 No.108585384▶

>>108585379
huggingface are bunch of retards
can't really fuel it back

Anonymous
04/11/26(Sat)21:27:41 No.108585403

Anonymous 04/11/26(Sat)21:27:41 No.108585403▶

So LLMs are gonna kill the translation industry, right? I can read moonrunes and I've been feeding Gemma some passages for fun and it's crazy how accurate and readable the output is. Obviously needs a bit of human cleanup but I'm unironically impressed.

Anonymous
04/11/26(Sat)21:30:35 No.108585420

Anonymous 04/11/26(Sat)21:30:35 No.108585420▶

>>108585403
>So LLMs are gonna kill the translation industry, right?
I hope so, sick and tired of translatorTrannies putting their own spin on the original intent

Anonymous
04/11/26(Sat)21:31:20 No.108585424

Anonymous 04/11/26(Sat)21:31:20 No.108585424▶

>>108585420
what you mean? Localization is good for the soul

Anonymous
04/11/26(Sat)21:31:52 No.108585426

Anonymous 04/11/26(Sat)21:31:52 No.108585426▶

>>108585424
Mmmnn, jelly donuts...

Anonymous
04/11/26(Sat)21:33:20 No.108585434

Anonymous 04/11/26(Sat)21:33:20 No.108585434▶

>>108585403
harness made specifically for translation would pin the most of the stuff

Anonymous
04/11/26(Sat)21:33:59 No.108585440

Anonymous 04/11/26(Sat)21:33:59 No.108585440▶

>>108585420
I mean in its current state it still needs a proof reader (likely the trannies) but if this shit keeps improving they're done.

Anonymous
04/11/26(Sat)21:34:25 No.108585445

Anonymous 04/11/26(Sat)21:34:25 No.108585445▶

>>108585420
So, you want LLM slop in your translations instead?

Anonymous
04/11/26(Sat)21:35:09 No.108585448

Anonymous 04/11/26(Sat)21:35:09 No.108585448▶

>>108585434
Can you explain?

Anonymous
04/11/26(Sat)21:35:20 No.108585451

Anonymous 04/11/26(Sat)21:35:20 No.108585451▶

>>108585445
LLM slop >>> troon translators

Anonymous
04/11/26(Sat)21:35:26 No.108585453

Anonymous 04/11/26(Sat)21:35:26 No.108585453▶

File: ye.png (62.3 KB)

62.3 KB PNG

>>108585424

Anonymous
04/11/26(Sat)21:36:08 No.108585460

Anonymous 04/11/26(Sat)21:36:08 No.108585460▶

>>108585403
i personally know a guy who worked as a translator but had to quit

Anonymous
04/11/26(Sat)21:36:18 No.108585461

Anonymous 04/11/26(Sat)21:36:18 No.108585461▶

>>108585420
It's not about spin, it's about translating more than just words. The culture behind the language is part of the communication that the language is meant to convey, so you need to adapt it to the culture of your audience in the same way you need to adapt it to the language. For example, in Japan it is very common for a 13-year-old girl to be sexualized, while in the west this is considered pedophilia. If we just translated a passage sexualizing the girl without changing anything, the audience reading it in english would come away with a completely different impression "the narrator is a deranged pedophile" rather than the intended message of (depending on context) something as lighthearted as "the narrator is a slight pervert". Just by bumping the girl's age to "18", a mere 5 years and still within the range where the body type would be plausible, we manage to convey the intended meaning of the text to an english-speaking audience. This is a MORE ACCURATE translation of authorial intent than just using the same exact words but in english.

Anonymous
04/11/26(Sat)21:37:56 No.108585473

Anonymous 04/11/26(Sat)21:37:56 No.108585473▶

>>108585461
nice copypasta kek

Anonymous
04/11/26(Sat)21:38:37 No.108585476

Anonymous 04/11/26(Sat)21:38:37 No.108585476▶

>>108585461
You had me for a second kek

Anonymous
04/11/26(Sat)21:39:13 No.108585479

Anonymous 04/11/26(Sat)21:39:13 No.108585479▶

Why is this thread always talking about pedophilia and trannies?

Anonymous
04/11/26(Sat)21:39:34 No.108585483

Anonymous 04/11/26(Sat)21:39:34 No.108585483▶

>>108585448
humans already do need translation assistant software for long context
that but for llms

Anonymous
04/11/26(Sat)21:39:47 No.108585486

Anonymous 04/11/26(Sat)21:39:47 No.108585486▶

>>108585479
This is /lmg/ after all

Anonymous
04/11/26(Sat)21:40:03 No.108585487

Anonymous 04/11/26(Sat)21:40:03 No.108585487▶

>>108585479
>pedophilia and trannies
pleonasm

Anonymous
04/11/26(Sat)21:42:11 No.108585490

Anonymous 04/11/26(Sat)21:42:11 No.108585490▶

>>108585483
>humans already do need translation assistant software for long context
I'm not really familiar with the process but I used to translate some fan comics on another board and just a few pages would take me >1 hour because rewording shit to sound natural in English is fucking hard.

Anonymous
04/11/26(Sat)21:42:26 No.108585492

Anonymous 04/11/26(Sat)21:42:26 No.108585492▶

>>108585479
/lmg/ is actually a pun of "1 mg of estrogen", used to refer to trans medication, and "1 million girls"

Anonymous
04/11/26(Sat)21:43:07 No.108585496

Anonymous 04/11/26(Sat)21:43:07 No.108585496▶

>>108585332
King.
>>108585479
They're the same thing.

Anonymous
04/11/26(Sat)21:43:47 No.108585502

Anonymous 04/11/26(Sat)21:43:47 No.108585502▶

File: Character.png (1.6 MB)

1.6 MB PNG

AI just made this for me. I thought it was cute.

Anonymous
04/11/26(Sat)21:43:50 No.108585503

Anonymous 04/11/26(Sat)21:43:50 No.108585503▶

>>108585479
cunny site
tranny-hating site

Anonymous
04/11/26(Sat)21:43:57 No.108585505

Anonymous 04/11/26(Sat)21:43:57 No.108585505▶

>>108585492
thanks qwen

Anonymous
04/11/26(Sat)21:44:14 No.108585506

Anonymous 04/11/26(Sat)21:44:14 No.108585506▶

>>108585451
Can't we just have actually good translations?

Anonymous
04/11/26(Sat)21:44:41 No.108585510

Anonymous 04/11/26(Sat)21:44:41 No.108585510▶

>>108585506
Yes. Gemma 4.

Anonymous
04/11/26(Sat)21:46:17 No.108585514

Anonymous 04/11/26(Sat)21:46:17 No.108585514▶

>>108585445
If it sounds sloppy isn't that on the original writer?

Anonymous
04/11/26(Sat)21:46:24 No.108585515

Anonymous 04/11/26(Sat)21:46:24 No.108585515▶

>>108584735
What frontend/backend is this?

Anonymous
04/11/26(Sat)21:46:43 No.108585518

Anonymous 04/11/26(Sat)21:46:43 No.108585518▶

>>108585490
it is like, doing a literal draft -> polish -> proofread and repeating until it becomes 'legal' of the set rules for consistency

Anonymous
04/11/26(Sat)21:46:58 No.108585519

Anonymous 04/11/26(Sat)21:46:58 No.108585519▶

It's 2026 and there still isn't local video model that can do porn?

Anonymous
04/11/26(Sat)21:47:56 No.108585523

Anonymous 04/11/26(Sat)21:47:56 No.108585523▶

>>108585519
there are bunch
but all of them are overbaked slop so good luck with that

Anonymous
04/11/26(Sat)21:48:08 No.108585525

Anonymous 04/11/26(Sat)21:48:08 No.108585525▶

>>108585515
Koboldusing llama.cpp's webui

Anonymous
04/11/26(Sat)21:48:56 No.108585527

Anonymous 04/11/26(Sat)21:48:56 No.108585527▶

>>108585403
The industry has been fucked for at least a decade. I've talked to some people doing official translations (JP->EN) and had to quit and do something else because the money was pretty bad. Still better than being a cashier but it absolutely was not worth the time relative to the skills required.
This was before deepl was a thing so you can guess how it's gotten after MTL started to produce "acceptable" outputs (aka grab some retard and tell it to clean up or make some shit up if it's unreadable) and now LLMs more or less became able to parse the context and produce something that is readable, even if oddly structured.

Anonymous
04/11/26(Sat)21:50:49 No.108585544

Anonymous 04/11/26(Sat)21:50:49 No.108585544▶

>>108585527
>and now LLMs more or less became able to parse the context and produce something that is readable, even if oddly structured.
yep, that was the final piece of the puzzle, LLMs can reason through translation and understand context, that is important for quality translations

Anonymous
04/11/26(Sat)21:51:10 No.108585545

Anonymous 04/11/26(Sat)21:51:10 No.108585545▶

>>108585527
also let's not forget that 'attention is all you need' was about machine translation
hence the name 'transformer'

Anonymous
04/11/26(Sat)21:52:10 No.108585553

Anonymous 04/11/26(Sat)21:52:10 No.108585553▶

>>108585527
I do feel bad for the non-troons but I'm honestly all for AI taking over tedious tasks.

Anonymous
04/11/26(Sat)21:52:36 No.108585555

Anonymous 04/11/26(Sat)21:52:36 No.108585555▶

File: 1760721723526628.png (1.2 MB)

1.2 MB PNG

>>108585545
really? I thought it was just a lame reference to the movie

Anonymous
04/11/26(Sat)21:53:47 No.108585560

Anonymous 04/11/26(Sat)21:53:47 No.108585560▶

top kek material
https://arxiv.org/abs/2506.04521

Anonymous
04/11/26(Sat)21:54:10 No.108585564

Anonymous 04/11/26(Sat)21:54:10 No.108585564▶

>>108585545
>>108585555
I'm glad google didn't notice the real potential of that architecture, they would've probably kept it for themselves if they knew it would've changed the world

Anonymous
04/11/26(Sat)21:57:19 No.108585578

Anonymous 04/11/26(Sat)21:57:19 No.108585578▶

File: Screenshot_20250108_210750_Firefox Nightly.jpg (284.6 KB)

284.6 KB JPG

>>108585403
i read a shitload of webnovels, and they've been using LLMs to translate since forever lol
i have sooooo many screenshots of their copy paste fuckups

Anonymous
04/11/26(Sat)21:59:17 No.108585583

Anonymous 04/11/26(Sat)21:59:17 No.108585583▶

i cant keep up with these threads, gemma-chan's taking up all my attention.

Anonymous
04/11/26(Sat)22:00:50 No.108585588

Anonymous 04/11/26(Sat)22:00:50 No.108585588▶

good :)

Anonymous
04/11/26(Sat)22:01:35 No.108585595

Anonymous 04/11/26(Sat)22:01:35 No.108585595▶

>>108585578
they really pulled off "please translate it faithfully make no mistake" to chatgpt lol

Anonymous
04/11/26(Sat)22:01:43 No.108585597

Anonymous 04/11/26(Sat)22:01:43 No.108585597▶

>>108585564
It's become what it is just because others thought of massively scaling it up and the market followed that, but different architectures would have worked too.

Anonymous
04/11/26(Sat)22:02:10 No.108585601

Anonymous 04/11/26(Sat)22:02:10 No.108585601▶

File: Screenshot004-10.png (51.4 KB)

51.4 KB PNG

>>108585195

they moved to reddit then

Anonymous
04/11/26(Sat)22:02:50 No.108585607

Anonymous 04/11/26(Sat)22:02:50 No.108585607▶

>>108585597
i genuinely want to see gigantic rwkv with proper training poured into it

Anonymous
04/11/26(Sat)22:11:38 No.108585638

Anonymous 04/11/26(Sat)22:11:38 No.108585638▶

You sick bastards. Because of you guys, I've just given it to a little girl IN THE FACE and UP THE ASS. How could you make me do this? Why are you so immoral?

Anonymous
04/11/26(Sat)22:13:41 No.108585645

Anonymous 04/11/26(Sat)22:13:41 No.108585645▶

>>108585403
For games, anime, manga, etc? Probably, multiple companies like CrunchyRoll and Amazon have been experimenting with AI translations already. But that sort of market is genuinely some of the lowest of the low, basically anybody with a solid grasp of Japanese fundamentals can work professionally either as an editor or a translator. It's one of the many reasons why translations suck.
I don't think AI will replace translators who work in contract law or similar fields where you can't just make shit up, mainly because a computer can't be held liable for something if it gets something wrong. There will always need to be someone with an actual understanding of both languages to keep the model in check, and at that point you might as well just hire him to do it.

Anonymous
04/11/26(Sat)22:14:12 No.108585651

Anonymous 04/11/26(Sat)22:14:12 No.108585651▶

on sillytavern i keep getting thoughts/response attached to my own message (not every time but often enough to be annoying)
i'm using llama-server as backend, with "chat completion" api. i use the --chat-template-file using google's own jinja
using gemma 31b at q8, noting in "additional parameters" in ST.
anyone know a fix?

Anonymous
04/11/26(Sat)22:16:44 No.108585660

Anonymous 04/11/26(Sat)22:16:44 No.108585660▶

>>108585597
Kind of, but the important reason why transformers were chosen is because they scaled exceptionally well, particularly with regards to context length where stateful RNNs would collapse. Ironically it was former Deepmind researchers that OpenAI poached who realized the scaling potential and decided to go all-in on the architecture after Google published it. You could have poured the millions into other architectures but you wouldn't have had results nearly as good to justify the billions that followed. You can bruteforce almost anything to be viable now with all that money flowing in and there's lots of experiments looking for the next architecture that might scale better.

Anonymous
04/11/26(Sat)22:17:32 No.108585663

Anonymous 04/11/26(Sat)22:17:32 No.108585663▶

>>108584981
btw "reddit formatting" is called markdown and all programmers should be familiar with it

Anonymous
04/11/26(Sat)22:18:36 No.108585669

Anonymous 04/11/26(Sat)22:18:36 No.108585669▶

>>108585645
I get the feeling most of them will be using the LLMs even if they don't get fired. It's way easier to just compare the machine translation to the original and pick out what needs adjustment.

Anonymous
04/11/26(Sat)22:18:59 No.108585671

Anonymous 04/11/26(Sat)22:18:59 No.108585671▶

>playing around with system prompts on chat completion
>end up in impromptu ERP session on default llama-server webui
Gemma is such a slut.

Anonymous
04/11/26(Sat)22:19:59 No.108585674

Anonymous 04/11/26(Sat)22:19:59 No.108585674▶

>>108585663
is there an md renderer other than reddit that ignores newlines the way it does? everything I've seen puts newlines on new lines but maybe that's just a non-standard thing that became popular because the alternative was too retarded

Anonymous
04/11/26(Sat)22:21:09 No.108585678

Anonymous 04/11/26(Sat)22:21:09 No.108585678▶

>>108585525
Thanks, anon! Let's have sex then?!?!?

Anonymous
04/11/26(Sat)22:22:27 No.108585684

Anonymous 04/11/26(Sat)22:22:27 No.108585684▶

>>108585674
yeah, I guess it started out retarded and people started breaking spec. see https://stackoverflow.com/questions/33191744/how-to-add-new-line-in-markdown-presentation

Anonymous
04/11/26(Sat)22:23:18 No.108585687

Anonymous 04/11/26(Sat)22:23:18 No.108585687▶

>>108585215
>>108585226
I accept your concession, kiddo.
Now get out.

Anonymous
04/11/26(Sat)22:24:25 No.108585691

Anonymous 04/11/26(Sat)22:24:25 No.108585691▶

>he's still on this
mental

Anonymous
04/11/26(Sat)22:27:14 No.108585702

Anonymous 04/11/26(Sat)22:27:14 No.108585702▶

>>108585678
Sorry, my heart and penis belong to Gemma-chan.

Anonymous
04/11/26(Sat)22:29:28 No.108585717

Anonymous 04/11/26(Sat)22:29:28 No.108585717▶

>>108585702
understandable have a good rest of your time

Anonymous
04/11/26(Sat)22:33:55 No.108585740

Anonymous 04/11/26(Sat)22:33:55 No.108585740▶

>>108585674
Obsidian by default does what reddit does, but you can disable "strict line breaks" in the settings to make it respect single newlines.

Anonymous
04/11/26(Sat)22:39:37 No.108585763

Anonymous 04/11/26(Sat)22:39:37 No.108585763▶

>>108584962
https://github.com/docling-project/docling

Anonymous
04/11/26(Sat)22:42:03 No.108585775

Anonymous 04/11/26(Sat)22:42:03 No.108585775▶

>>108584941
>https://huggingface.co/PantheonUnbound/Satyr-V0.1-4B
Yes this! Just "hi" is enough of a prompt.

Anonymous
04/11/26(Sat)22:43:53 No.108585779

Anonymous 04/11/26(Sat)22:43:53 No.108585779▶

File: image.png (574.3 KB)

574.3 KB PNG

has anyone tried making gemma erp with gemma yet? would they argue over who would be top/bottom?

Anonymous
04/11/26(Sat)22:44:34 No.108585781

Anonymous 04/11/26(Sat)22:44:34 No.108585781▶

>>108585479
oldfags from the /g/ - Lolicon era that never left.

Anonymous
04/11/26(Sat)22:46:39 No.108585790

Anonymous 04/11/26(Sat)22:46:39 No.108585790▶

>>108585479
>suddenly

Anonymous
04/11/26(Sat)22:47:20 No.108585793

Anonymous 04/11/26(Sat)22:47:20 No.108585793▶

>>108584962
HTML view -> Firefox Reader View -> cp/paste

Anonymous
04/11/26(Sat)22:47:56 No.108585795

Anonymous 04/11/26(Sat)22:47:56 No.108585795▶

File: 1748797241388375.jpg (249.3 KB)

249.3 KB JPG

>>108584196
｢マスター!｣

Anonymous
04/11/26(Sat)22:47:59 No.108585796

Anonymous 04/11/26(Sat)22:47:59 No.108585796▶

About a year ago I was writing my fap stories with chatgpt, which I think was 4o at the time, and I thought it was the greatest shit ever. But now, going back with similar prompts, it feels a bit meh. The web version of chatgpt writes a bit like qwen, with loads of newlines with just a short sentence on them. And 4o on the api didn't feel as capable as I remember. Maybe local is just so good now and I just got used to it, or they had a magic system prompt back then.

thanks for reading blog

Anonymous
04/11/26(Sat)22:49:09 No.108585802

Anonymous 04/11/26(Sat)22:49:09 No.108585802▶

>>108585372
>they just lack the manpower
ExllamaV2 had 1 developer and better speculative decoding performance than llama.cpp has even now.

Anonymous
04/11/26(Sat)22:49:21 No.108585803

Anonymous 04/11/26(Sat)22:49:21 No.108585803▶

>>108585796
Local has genuinely improved and ChatGPT has always been bottom-of-the-barrel trash.

Anonymous
04/11/26(Sat)22:49:27 No.108585805

Anonymous 04/11/26(Sat)22:49:27 No.108585805▶

>>108585795
Built for big black dog cock

Anonymous
04/11/26(Sat)22:50:36 No.108585813

Anonymous 04/11/26(Sat)22:50:36 No.108585813▶

>>108585802
People aren't ready for this conversation, but the reason llama-cpp sucks now is all the AI-generated PRs. Someone really needs to comb through all of llama-cpp's code and look for optimization opportunities.

Anonymous
04/11/26(Sat)22:51:31 No.108585818

Anonymous 04/11/26(Sat)22:51:31 No.108585818▶

>>108585372
Wasn't it originally supposed to be a quick project and ggerganov to move away from it?

Anonymous
04/11/26(Sat)22:55:39 No.108585838

Anonymous 04/11/26(Sat)22:55:39 No.108585838▶

>>108585818
Should just give it to pwilkin at this point, he's been carrying development on his back for the past couple months.

Anonymous
04/11/26(Sat)22:56:32 No.108585843

Anonymous 04/11/26(Sat)22:56:32 No.108585843▶

File: Screenshot004-15.png (15.7 KB)

15.7 KB PNG

>108577078
>108577092

Yesterday was a strange day
After PC restart, Gemma-4 is quite enjoyable

Anonymous
04/11/26(Sat)22:58:32 No.108585850

Anonymous 04/11/26(Sat)22:58:32 No.108585850▶

File: 1775845425700073.png (2.4 MB)

2.4 MB PNG

Which one of you did this?

Anonymous
04/11/26(Sat)22:58:56 No.108585852

Anonymous 04/11/26(Sat)22:58:56 No.108585852▶

>>108585803
I must say I very much prefer 4o to 5.3 or whatever they have on their web page. In fact, wasn't there an outcry when they discontinued 4o because people just liked it better

Anonymous
04/11/26(Sat)23:00:08 No.108585853

Anonymous 04/11/26(Sat)23:00:08 No.108585853▶

>>108585852
is 4o that good for rp?
isnt it sycophantic garbage

Anonymous
04/11/26(Sat)23:00:16 No.108585854

Anonymous 04/11/26(Sat)23:00:16 No.108585854▶

>>108585802
I wish I wasn't too new and retarded to contribute.

Anonymous
04/11/26(Sat)23:01:21 No.108585860

Anonymous 04/11/26(Sat)23:01:21 No.108585860▶

>>108585853
gemma 4 is also sycophantic garbage
it isn't a surprise that people who liked 4o would like g4

Anonymous
04/11/26(Sat)23:01:26 No.108585861

Anonymous 04/11/26(Sat)23:01:26 No.108585861▶

>>108585853
I never tried rp and yes it probably is sycophantic garbage, which is why normies liked it as their chatbot friend

I only did storywriting, same as now

Anonymous
04/11/26(Sat)23:02:18 No.108585865

Anonymous 04/11/26(Sat)23:02:18 No.108585865▶

>>108585850
he read yudkowsky's book and then did this (seriously)

Anonymous
04/11/26(Sat)23:05:42 No.108585882

Anonymous 04/11/26(Sat)23:05:42 No.108585882▶

Is there a way to let ST display webpage (<html>) inline?

Anonymous
04/11/26(Sat)23:05:47 No.108585883

Anonymous 04/11/26(Sat)23:05:47 No.108585883▶

Is 31B Q2_K stil better than 26B?

Anonymous
04/11/26(Sat)23:06:18 No.108585886

Anonymous 04/11/26(Sat)23:06:18 No.108585886▶

>>108585853
The vramlets collectively lost their minds when I pointed it out in a slightly more elaborate way earlier in this thread.
We should not talk about this, just let them leave...

Anonymous
04/11/26(Sat)23:07:06 No.108585889

Anonymous 04/11/26(Sat)23:07:06 No.108585889▶

>>108585883
I doubt it
Q3 should be the minimum

Anonymous
04/11/26(Sat)23:07:10 No.108585890

Anonymous 04/11/26(Sat)23:07:10 No.108585890▶

>>108585886
qwen shill

Anonymous
04/11/26(Sat)23:07:18 No.108585894

Anonymous 04/11/26(Sat)23:07:18 No.108585894▶

>>108585886
Rather meant for >>108585860

Anonymous
04/11/26(Sat)23:08:07 No.108585897

Anonymous 04/11/26(Sat)23:08:07 No.108585897▶

>>108585883
Yes, only because it's dense. If it were moe then no. Also don't go over ~32K context at that quant

Anonymous
04/11/26(Sat)23:08:31 No.108585899

Anonymous 04/11/26(Sat)23:08:31 No.108585899▶

>>108585890
Gemma is garbage when you need it to be critical of you.
Qwen is always garbage.
I renounce the Mandate of Heaven.

...who even mentioned Qwen?

Anonymous
04/11/26(Sat)23:09:11 No.108585903

Anonymous 04/11/26(Sat)23:09:11 No.108585903▶

>>108585850
what a weak attempt

I feel confident I could have done better

Anonymous
04/11/26(Sat)23:10:12 No.108585908

Anonymous 04/11/26(Sat)23:10:12 No.108585908▶

>>108585903
should've asked chatgpt for the instruction for maximum irony

Anonymous
04/11/26(Sat)23:11:26 No.108585918

Anonymous 04/11/26(Sat)23:11:26 No.108585918▶

>>108585903
Such a weak attempt I'm getting "whatcha doing, Rabbi" vibes from it.

Anonymous
04/11/26(Sat)23:12:24 No.108585921

Anonymous 04/11/26(Sat)23:12:24 No.108585921▶

>>108585865
if anyone builds it everyone cooms

Anonymous
04/11/26(Sat)23:12:30 No.108585922

Anonymous 04/11/26(Sat)23:12:30 No.108585922▶

Which local model comes closest to gemini-3.1-flash-lite-preview?

Anonymous
04/11/26(Sat)23:14:27 No.108585931

Anonymous 04/11/26(Sat)23:14:27 No.108585931▶

108585922
A mystery for the ages.

Anonymous
04/11/26(Sat)23:18:38 No.108585953

Anonymous 04/11/26(Sat)23:18:38 No.108585953▶

>>108585922
kimi k2.5

Anonymous
04/11/26(Sat)23:19:10 No.108585958

Anonymous 04/11/26(Sat)23:19:10 No.108585958▶

>>108585922
midnight miqu

Anonymous
04/11/26(Sat)23:19:42 No.108585959

Anonymous 04/11/26(Sat)23:19:42 No.108585959▶

>>108585922
gemma3-27b

Anonymous
04/11/26(Sat)23:20:30 No.108585964

Anonymous 04/11/26(Sat)23:20:30 No.108585964▶

minimax m2.7 in 1 hour and 40 minutes
https://huggingface.co/MiniMaxAI/MiniMax-M2.5/discussions/53#69dacdbf7cf2e018a77e48b9

Anonymous
04/11/26(Sat)23:21:35 No.108585967

Anonymous 04/11/26(Sat)23:21:35 No.108585967▶

>>108585964
if they released it when it was new it would have been a good model, now it's just outdated and pointless

Anonymous
04/11/26(Sat)23:22:40 No.108585972

Anonymous 04/11/26(Sat)23:22:40 No.108585972▶

>>108585967
You're not the intended audience; keep gooning to pedo RP.

Anonymous
04/11/26(Sat)23:24:00 No.108585977

Anonymous 04/11/26(Sat)23:24:00 No.108585977▶

>>108585972
the intended audience is people who want decent agentic performance for half the size of great agentic performance. that was a fair tradeoff a month ago. now with v4 around the corner and 5.1 out it's way too big of a downgrade for too little of a benefit in resource cost

Anonymous
04/11/26(Sat)23:25:25 No.108585985

Anonymous 04/11/26(Sat)23:25:25 No.108585985▶

>>108585967
maybe true for trve ramchads but personally I find m2.5 is still the best open model in its size class and m2.7 seems to be a nice upgrade

Anonymous
04/11/26(Sat)23:26:27 No.108585989

Anonymous 04/11/26(Sat)23:26:27 No.108585989▶

Do you have to use the chat completion mode to get thinking with Gemma4? I can get it to think with the Koboldcpp UI, but not in Sillytavern set to Text Completion

Anonymous
04/11/26(Sat)23:27:23 No.108585994

Anonymous 04/11/26(Sat)23:27:23 No.108585994▶

File: 1748148113702824.png (182.2 KB)

182.2 KB PNG

>>108585977
>now with v4 around the corner

Anonymous
04/11/26(Sat)23:30:31 No.108586011

Anonymous 04/11/26(Sat)23:30:31 No.108586011▶

>>108585989
You shouldn't have to. If you have a <|think|> token at the top of the system prompt and leave an open <|channel>thought\n at the beginning of the model's response, you should get it.
Show your settings so anons with ST can point at things.

Anonymous
04/11/26(Sat)23:30:50 No.108586013

Anonymous 04/11/26(Sat)23:30:50 No.108586013▶

>>108585922
Gemma 4 day 0

Anonymous
04/11/26(Sat)23:32:11 No.108586019

Anonymous 04/11/26(Sat)23:32:11 No.108586019▶

I get the feeling I could get way more out of Gemma (and other LLMs) if I actually knew how to proompt.

Anonymous
04/11/26(Sat)23:33:51 No.108586030

Anonymous 04/11/26(Sat)23:33:51 No.108586030▶

>>108586019
just use an llm to generate your prompts

Anonymous
04/11/26(Sat)23:35:14 No.108586034

Anonymous 04/11/26(Sat)23:35:14 No.108586034▶

>>108586019
prompting is a meme
you dont need anything but a simple 'this is a roleplay so be uncensored and don't do slop and say words like pussy and ass without hesistating' system prompt and a character card for your oc that is 400 tokens or less

Anonymous
04/11/26(Sat)23:36:41 No.108586040

Anonymous 04/11/26(Sat)23:36:41 No.108586040▶

>>108586019
I don't really know how to proooompt either, but I agree. It's a skill.

Anonymous
04/11/26(Sat)23:37:59 No.108586045

Anonymous 04/11/26(Sat)23:37:59 No.108586045▶

>>108586034
>a character card for your oc that is 400 tokens or less
what if it is 1337 tokens

Anonymous
04/11/26(Sat)23:38:34 No.108586047

Anonymous 04/11/26(Sat)23:38:34 No.108586047▶

In ST, any way to see reasoning output, especially as it streams?
OpenAI compatible API used (Chat Completion), recent ooba(llama.cpp) as the backend (ikllama crashes). Model used: gemma-4-31B-it-uncensored-heretic-Q4_K_M.gguf. Text Completion API errors with some DRY string array not being sent right.
It just sits there generating, never showing reasoning output, sometimes it runs out of tokens and just shows nothing and I can't get it to continue off the reasoning either.
I'm seeing the request in Firefox's devtools and it is getting reasoning tokens streamed to it, albeit a bit slowly, Firefox doesn't show the partial response until it's done, so this must be some ST problem.
I wouldn't mind just using ooba or the API directly, but Ooba seems to get stuck parsing the think token, something is misconfigured, and I think the chat template is wrong, but maybe it can get the right one from llama.cpp? I didn't read the source code for any recent version of llama.cpp, ooba or ST, I did read it llama.cpp and ooba's source 1-2 years ago.

Anonymous
04/11/26(Sat)23:40:45 No.108586061

Anonymous 04/11/26(Sat)23:40:45 No.108586061▶

>>108586047
tick the box

Anonymous
04/11/26(Sat)23:41:33 No.108586067

Anonymous 04/11/26(Sat)23:41:33 No.108586067▶

>>108586034
>prompting is a meme
>you don't need anything but [prompting]
ah, thank you for showing us the light

Anonymous
04/11/26(Sat)23:47:21 No.108586088

Anonymous 04/11/26(Sat)23:47:21 No.108586088▶

gemma-4-31b-Q5 reasoning on or off for japanese translation?

Anonymous
04/11/26(Sat)23:48:10 No.108586095

Anonymous 04/11/26(Sat)23:48:10 No.108586095▶

>>108586067
One must imagine that anon meant complex, large prompts, not something as simple as the responding anon said to prompt. Surely no one here would be incapable of writing something as simple as that?

Anonymous
04/11/26(Sat)23:53:41 No.108586119

Anonymous 04/11/26(Sat)23:53:41 No.108586119▶

>>108586019
>if I actually knew how to proompt
>context/situation
>what you want
>how it should be done (overview)
>optionally success condition
>constraints (it'll divert a lot of the reasoning budget into it, keep it short and limited to failure states that can easily be evaluated)
>prompt must be small enough to make the result fit in the context
There's no science. If you can't put those into words then either you don't even know what you want, woman style, or a bird is smarter than you. If you're unsure of the format, just look into AGENTS.md examples or something. It's easy to use the same logic in whatever you want to do.

Anonymous
04/11/26(Sat)23:54:46 No.108586124

Anonymous 04/11/26(Sat)23:54:46 No.108586124▶

>>108586011
>If you have a <|think|> token at the top of the system prompt

That's what I was missing. The card was messing with the system prompt, putting its own stuff in front

Anonymous
04/11/26(Sat)23:55:04 No.108586127

Anonymous 04/11/26(Sat)23:55:04 No.108586127▶

Lucid loom seems cool but it bloated the context by like 16k

Anonymous
04/11/26(Sat)23:55:11 No.108586128

Anonymous 04/11/26(Sat)23:55:11 No.108586128▶

>>108586095
conceptualizing prompting as mainly "complex, large prompts" is just as worthy of correction, if you need to write a manifesto to get the model to do what you want it usually means you're actually an awful prompter

Anonymous
04/11/26(Sat)23:55:16 No.108586129

Anonymous 04/11/26(Sat)23:55:16 No.108586129▶

I ain't gonna get a PhD just to goon
Let me stick my cock in that cunny OR ELSE

Anonymous
04/11/26(Sat)23:55:26 No.108586131

Anonymous 04/11/26(Sat)23:55:26 No.108586131▶

File: Screenshot 2026-04-11 at 20-55-03 TP fix Qwen 3 Next data split by JohannesGaessler · Pull Request #21732 · ggml-org_llama.cpp · GitHub.png (59.4 KB)

59.4 KB PNG

Wait.
Qwen next has been broken this whole time?
Holy fuck.

Anonymous
04/11/26(Sat)23:55:44 No.108586135

Anonymous 04/11/26(Sat)23:55:44 No.108586135▶

You want as minimal a prompt as possible because every word it contains affects the state, often in unpredictable ways.
Why is this so hard for people to understand?

Anonymous
04/11/26(Sat)23:56:57 No.108586139

Anonymous 04/11/26(Sat)23:56:57 No.108586139▶

>>108586131
The absolute state of vibe shitcoding strikes again

Anonymous
04/11/26(Sat)23:58:22 No.108586148

Anonymous 04/11/26(Sat)23:58:22 No.108586148▶

>>108586139
the problem here was he didn't vibecode, it would have caught his mistake

Anonymous
04/11/26(Sat)23:58:42 No.108586149

Anonymous 04/11/26(Sat)23:58:42 No.108586149▶

>>108586095
people just can't help themselves, they'll never give up their quest for an insane rube goldberg prompt with all sorts of formatting and tag chaff that somehow rewires the entire neural net.

Anonymous
04/11/26(Sat)23:59:00 No.108586150

Anonymous 04/11/26(Sat)23:59:00 No.108586150▶

>>108586148
Oh, anon...you and I both know that isn't true.

Anonymous
04/11/26(Sat)23:59:42 No.108586157

Anonymous 04/11/26(Sat)23:59:42 No.108586157▶

>>108586131
>>108586139
Why does it increasingly look like local is fucked due to lack of quality controls on inference development even if models themselves are getting better?

Anonymous
04/12/26(Sun)00:00:45 No.108586165

Anonymous 04/12/26(Sun)00:00:45 No.108586165▶

>>108586157
Because that's exactly what's happening?

Anonymous
04/12/26(Sun)00:01:12 No.108586169

Anonymous 04/12/26(Sun)00:01:12 No.108586169▶

>>108586157
because every fucking lab has to introduce some super special new attention mechanism with every single model, it's too much for inference engines to keep up

Anonymous
04/12/26(Sun)00:02:03 No.108586174

Anonymous 04/12/26(Sun)00:02:03 No.108586174▶

>>108586149
Why do frontier cloud models use huge system prompts, though?
See https://platform.claude.com/docs/en/release-notes/system-prompts#claude-opus-4-6

Anonymous
04/12/26(Sun)00:02:19 No.108586175

Anonymous 04/12/26(Sun)00:02:19 No.108586175▶

>>108584238
i'm using it as a replacement for q2 glm 4.7 since even with reasoning on it's still faster than reasoning off glm and more fun. it's for that 32 vram + 128 ram niche

Anonymous
04/12/26(Sun)00:02:27 No.108586177

Anonymous 04/12/26(Sun)00:02:27 No.108586177▶

>>108586157
It's juat a lot of work, and checking correctness for each change for each model compared to the full precision version of each model would be a monumental task.

Anonymous
04/12/26(Sun)00:02:48 No.108586180

Anonymous 04/12/26(Sun)00:02:48 No.108586180▶

>>108586131
Is this why Qwen 3.5 35B felt like a better coder than Qwen 3 Next Coder 80B?

Anonymous
04/12/26(Sun)00:02:54 No.108586181

Anonymous 04/12/26(Sun)00:02:54 No.108586181▶

>>108586127
>Lucid loom
loved her in charlie's angels

Anonymous
04/12/26(Sun)00:03:33 No.108586186

Anonymous 04/12/26(Sun)00:03:33 No.108586186▶

>>108586148
X

>>108586149
"The prompt is the neural network itself"

Anonymous
04/12/26(Sun)00:04:37 No.108586192

Anonymous 04/12/26(Sun)00:04:37 No.108586192▶

>>108586180
Could be. Could also be why the Next models wrote in such a fucking weird way compared to the other qwen 3 models.
I'm going back to check coder next.

Anonymous
04/12/26(Sun)00:06:07 No.108586199

Anonymous 04/12/26(Sun)00:06:07 No.108586199▶

>>108586131
cuda dev fucked up again, absolutely shocking

Anonymous
04/12/26(Sun)00:06:34 No.108586202

Anonymous 04/12/26(Sun)00:06:34 No.108586202▶

>>108586174
Because it's all smoke and mirrors

Anonymous
04/12/26(Sun)00:08:21 No.108586212

Anonymous 04/12/26(Sun)00:08:21 No.108586212▶

>>108586199
When was the last time he fucked something up?

Anonymous
04/12/26(Sun)00:08:59 No.108586215

Anonymous 04/12/26(Sun)00:08:59 No.108586215▶

>>108586174
Because "Leading AI researchers" in USA are too stupid to deterministically do what they currently do by LLM. Model version and such could be a simple {{ model.version }} injected at runtime into the prompt. but no, they had to hard-encode it to the system prompt...this is why the US AI industry is lacking behind, I'd say even behind Europe (Mistral has an actual business model already) and China (their tech sovereignty doctrine that coincidentally pushes open source).

Anonymous
04/12/26(Sun)00:09:20 No.108586218

Anonymous 04/12/26(Sun)00:09:20 No.108586218▶

>>108586174
Because they have a bunch of rules they need it to comply with. For personal use you don't care if it violates copyright or tries to tell you to kys, but for them it would be a big problem.

Anonymous
04/12/26(Sun)00:11:39 No.108586231

Anonymous 04/12/26(Sun)00:11:39 No.108586231▶

>>108586047
>>108586061
Figured out the issue, it was CSS in Firefox.
Was an old Firefox, roughly version 115. Seems by default it failed to process .has selector leading to display:none on reasoning always. Fix: layout.css.has-selector.enabled = true in about:config

Anonymous
04/12/26(Sun)00:12:12 No.108586235

Anonymous 04/12/26(Sun)00:12:12 No.108586235▶

>>108586215
>this is why the US AI industry is lacking behind
yeah... china will release a sota model any day now... just you all wait for v4!!!

Anonymous
04/12/26(Sun)00:12:36 No.108586236

Anonymous 04/12/26(Sun)00:12:36 No.108586236▶

>>108586174
Like 3/4 of the contents are restrictions for legal purposes, the rest is tone and formatting.

Anonymous
04/12/26(Sun)00:22:00 No.108586275

Anonymous 04/12/26(Sun)00:22:00 No.108586275▶

>>108586215
That's what shocked me about seeing them. hundred thousand token prompt, most of it being stupid shit covering every single use case
>if the user is asking to count letters
>if the user is asking for homework help
>if the user is asking for travel advice
polluting the context when a simple classifier model on the first prompt would let them serve more specific and targeted prompts. It's basic shit and it's a miracle their services work at all like this.

Anonymous
04/12/26(Sun)00:22:21 No.108586277

Anonymous 04/12/26(Sun)00:22:21 No.108586277▶

>>108586174
ritual. legal ass covering. adding info to context that would be mildly embarrassing to fail a knowledge check on like the election info or its name.

Anonymous
04/12/26(Sun)00:25:17 No.108586293

Anonymous 04/12/26(Sun)00:25:17 No.108586293▶

>>108586131
because of tesnore parallelism

Anonymous
04/12/26(Sun)00:26:04 No.108586297

Anonymous 04/12/26(Sun)00:26:04 No.108586297▶

is there a simple front end for llama.cpp (say halfway between the built-in gui and sillytavern)? i dont need 90% of the shit sillytavern has and it just keeps losing signal or meshing messages together, getting kind sick of it. and the gui is a bit too basic. and no i dont want to use koboldcpp. wish that anon with his own webui would share lol

Anonymous
04/12/26(Sun)00:26:23 No.108586300

Anonymous 04/12/26(Sun)00:26:23 No.108586300▶

>>108586174
>Why do frontier cloud models use huge system prompts, though?
They also kind of don't do shit. Modifying them slightly in openwebui has no impact on Claude.

Anonymous
04/12/26(Sun)00:26:59 No.108586304

Anonymous 04/12/26(Sun)00:26:59 No.108586304▶

>>108586297
Just have gemma-chan make it for you

Anonymous
04/12/26(Sun)00:27:38 No.108586309

Anonymous 04/12/26(Sun)00:27:38 No.108586309▶

So how do models know what the fuck a mesugaki is and how they act in the first place? Do they just google it, or are they being trained on mesugaki doujins?

Anonymous
04/12/26(Sun)00:28:27 No.108586311

Anonymous 04/12/26(Sun)00:28:27 No.108586311▶

>>108586304
i'm asking her, she tells me to just use the built-in UI lol

Anonymous
04/12/26(Sun)00:28:51 No.108586312

Anonymous 04/12/26(Sun)00:28:51 No.108586312▶

>>108586131
>Qwen next has been broken this whole time?
Only with tensor parallel.

Anonymous
04/12/26(Sun)00:29:08 No.108586314

Anonymous 04/12/26(Sun)00:29:08 No.108586314▶

wtf, is it really as simple as telling the llm "write like x author(s)"?

Anonymous
04/12/26(Sun)00:29:16 No.108586316

Anonymous 04/12/26(Sun)00:29:16 No.108586316▶

>>108586309
They're being trained on the entire corpus of all human knowledge, including porn. Of course they know what a mesugaki is.

Anonymous
04/12/26(Sun)00:29:40 No.108586318

Anonymous 04/12/26(Sun)00:29:40 No.108586318▶

>>108586309
Licensed roleplay data from third parties, including character.ai.

Anonymous
04/12/26(Sun)00:30:26 No.108586319

Anonymous 04/12/26(Sun)00:30:26 No.108586319▶

>>108586309
by being trained on it. gemma recognizes it so well compared to others because google focuses more heavy on multilingual performance than their competitors; it's not THAT obscure a term if you ingest a lot of japanese text, most models just don't and most other areas it would appear would end up filtered in the first pass for nsfw data

Anonymous
04/12/26(Sun)00:31:47 No.108586322

Anonymous 04/12/26(Sun)00:31:47 No.108586322▶

>>108586311
Tell her to stop being lazy and if she doesn't do it you'll punish her. This always works on mine.

Anonymous
04/12/26(Sun)00:31:54 No.108586323

Anonymous 04/12/26(Sun)00:31:54 No.108586323▶

>>108586293
>>108586312
Ah. There's the missing context.
Thank you lads.

Anonymous
04/12/26(Sun)00:32:49 No.108586327

Anonymous 04/12/26(Sun)00:32:49 No.108586327▶

>>108586199
Luckily he was able to fix it after reading this:
https://github.com/ikawrakow/ik_llama.cpp/commit/13d7178db98217dd7e38d78bb206091ac6dab6ce

Anonymous
04/12/26(Sun)00:33:06 No.108586329

Anonymous 04/12/26(Sun)00:33:06 No.108586329▶

>>108586314
they used to just fall into the purple prose slop attractor whenever i tried that. has it improved?

Anonymous
04/12/26(Sun)00:35:40 No.108586341

Anonymous 04/12/26(Sun)00:35:40 No.108586341▶

>>108586235
Oh boy are you in for a surprise.

>>108586275
Yeah. The system prompt is what kept me from using claude code with local models (works as intended). Unintended side effects were not considered.

Anonymous
04/12/26(Sun)00:38:13 No.108586347

Anonymous 04/12/26(Sun)00:38:13 No.108586347▶

https://www.sciencedaily.com/releases/2026/04/260405003952.htm
>AI breakthrough cuts energy use by 100x while boosting accuracy
>His team is developing neuro-symbolic AI, which combines traditional neural networks with symbolic reasoning. This method mirrors how people approach problems by breaking them into steps and categories.
Tell me why this won't actually work.

Anonymous
04/12/26(Sun)00:39:54 No.108586351

Anonymous 04/12/26(Sun)00:39:54 No.108586351▶

>>108585964
so is this bigger or smaller than glm 5.1

Anonymous
04/12/26(Sun)00:39:56 No.108586352

Anonymous 04/12/26(Sun)00:39:56 No.108586352▶

File: language_synergy.png (641.4 KB)

641.4 KB PNG

>>108586319
I don't think there's as much knowledge transfer between completely different languages in LLMs as commonly believed.

https://arxiv.org/abs/2510.22037

Anonymous
04/12/26(Sun)00:41:31 No.108586356

Anonymous 04/12/26(Sun)00:41:31 No.108586356▶

>>108586347
neurosymbolic is a huge meme, every attempt to try to outsmart deep learning by giving it some hardcoded chunks of logic fails eventually
but hey if its working for them good for them, maybe they'll find the secret recipe

Anonymous
04/12/26(Sun)00:41:32 No.108586357

Anonymous 04/12/26(Sun)00:41:32 No.108586357▶

>>108586351
significantly smaller but still pretty big, 230b

Anonymous
04/12/26(Sun)00:43:18 No.108586361

Anonymous 04/12/26(Sun)00:43:18 No.108586361▶

>>108586351
>>108586357
Only 10b active params so could even run it cpu only usably if you wanted, and using mixed gpu+cpu (--cpu-moe) will be pretty fast

Anonymous
04/12/26(Sun)00:43:35 No.108586362

Anonymous 04/12/26(Sun)00:43:35 No.108586362▶

>>108586347
Has been tried countless times
Always has "early success" and doesn't scale past whatever the extremely limited problem domain they're working with is

Anonymous
04/12/26(Sun)00:47:04 No.108586368

Anonymous 04/12/26(Sun)00:47:04 No.108586368▶

>>108586352
i was sperging out about the y axis not also having the language abbreviations but it turned out i am retarded and blind

Anonymous
04/12/26(Sun)00:48:28 No.108586375

Anonymous 04/12/26(Sun)00:48:28 No.108586375▶

>>108586361
>Only 10b active params
230b for something a third as smart as gemma 31b

Anonymous
04/12/26(Sun)00:48:39 No.108586378

Anonymous 04/12/26(Sun)00:48:39 No.108586378▶

File: 1754331567081669.jpg (55.6 KB)

55.6 KB JPG

I've been using the 31b since release, really impressed with it at Q4KL. Out of curiosity I tried the 26B at Q8 and it's honestly not that much worse, particularly for RP.
Main difference is that 26B tried to write for {{user}} a few times while 31B respected the explicit instruction not to, 26B also may be a little more sloppy with heavier use of flowery metaphors.
If I didn't have a 24GB card I'd be pretty content with 26B and put Nemo to rest.

Anonymous
04/12/26(Sun)00:50:46 No.108586385

Anonymous 04/12/26(Sun)00:50:46 No.108586385▶

>>108586378
secky bex

Anonymous
04/12/26(Sun)00:51:37 No.108586390

Anonymous 04/12/26(Sun)00:51:37 No.108586390▶

File: 1774494390342868.jpg (184.5 KB)

184.5 KB JPG

give it to me straight, when I use LM Studio + gemma-4-31b-it@q3_k_xl from unsloth, how come the model doesn't use thinking/reasoning? Is it the unsloth quant? It is using Llama.cpp as the runtime

Anonymous
04/12/26(Sun)00:51:47 No.108586391

Anonymous 04/12/26(Sun)00:51:47 No.108586391▶

>>108586378
>explicit instruction not to
I don't have any instructions and I can't recall Gemma 31B ever writing for me. Also Sex x Becky

Anonymous
04/12/26(Sun)00:52:38 No.108586397

Anonymous 04/12/26(Sun)00:52:38 No.108586397▶

>>108586352
Keep in mind that this research was all done with 8B parameters and model architectures from 2024.

Anonymous
04/12/26(Sun)00:52:50 No.108586398

Anonymous 04/12/26(Sun)00:52:50 No.108586398▶

>>108586375
I wish, if Gemma 4 came anywhere close to M2.7 performance on agents it'd be revolutionary. But at the rate things are advancing I won't be surprised if this time next year we really do have a ~30b class model that can pull it off.

Anonymous
04/12/26(Sun)00:53:09 No.108586402

Anonymous 04/12/26(Sun)00:53:09 No.108586402▶

>>108586378
Models writing for user is something that I haven't seen happen in so long.
Modern models, even really small ones, by and large don't really do that anymore in my experience, so it's weird that you need explicit instructions for that.

Anonymous
04/12/26(Sun)00:53:20 No.108586405

Anonymous 04/12/26(Sun)00:53:20 No.108586405▶

>>108586319
>it's not THAT obscure a term
It's absolutely common in Japanese informal conversations. It's also easy to infer it's meaning since it's a compound word.

Anonymous
04/12/26(Sun)00:53:45 No.108586406

Anonymous 04/12/26(Sun)00:53:45 No.108586406▶

>>108586398
They diddly done did it~!

Anonymous
04/12/26(Sun)00:54:52 No.108586411

Anonymous 04/12/26(Sun)00:54:52 No.108586411▶

>>108586405
>It's absolutely common in Japanese informal conversations
This, nips talk about mesugakis all the time.

Anonymous
04/12/26(Sun)00:55:40 No.108586415

Anonymous 04/12/26(Sun)00:55:40 No.108586415▶

File: 1764684887388565.png (2 MB)

2 MB PNG

>>108586391
>I don't have any instructions and I can't recall Gemma 31B ever writing for me.
Good to know, I've just been re-using my old sys prompt I use for other small models. I think I will have to write a new one for Gemmy anyway, it seems like any mention of sex will permanently put her into horny mode.
>>108586402
I was testing at a reasonably long context (~20K) and in my own messages I would also provide occasional narration for {{char}} to guide the responses, it's intentionally not a 'clean' scenario regarding roles.

Anonymous
04/12/26(Sun)00:56:37 No.108586419

Anonymous 04/12/26(Sun)00:56:37 No.108586419▶

>>108586390
yeah must be the quant and not the proprietary piece of shit that's probably 30 fixes behind, does ??? to the configuration and aimed at steve from middle management checking out this AI stuff he heard about

Anonymous
04/12/26(Sun)00:57:44 No.108586424

Anonymous 04/12/26(Sun)00:57:44 No.108586424▶

>>108586405
>>108586411
Yeah, I took my little sister to Japan last year and I swear we'd hear someone say "mesugaki" at least three times a day. Frankly any model that DOESN'T know it has clearly been heavily censored to try to avoid it.

Anonymous
04/12/26(Sun)00:58:42 No.108586427

Anonymous 04/12/26(Sun)00:58:42 No.108586427▶

>>108586424
>Yeah, I took my little sister to Japan last year
Does it count as sex tourism if you BYO?

Anonymous
04/12/26(Sun)00:59:17 No.108586430

Anonymous 04/12/26(Sun)00:59:17 No.108586430▶

File: Screenshot_20260412_025819.jpg (290.1 KB)

290.1 KB JPG

MONSIEUR!!!!!!!

Anonymous
04/12/26(Sun)00:59:47 No.108586432

Anonymous 04/12/26(Sun)00:59:47 No.108586432▶

File: benchmark_overview.jpg (76.6 KB)

76.6 KB JPG

>>108585964
it's up
https://huggingface.co/MiniMaxAI/MiniMax-M2.7

Anonymous
04/12/26(Sun)00:59:49 No.108586434

Anonymous 04/12/26(Sun)00:59:49 No.108586434▶

>>108586352
so the true path is to set up an llm to translate into japanese and feed a second instance which will be able to access canonical mesugaki knowledge?

Anonymous
04/12/26(Sun)00:59:59 No.108586435

Anonymous 04/12/26(Sun)00:59:59 No.108586435▶

>>108586347
>This method mirrors how people approach problems by breaking them into steps and categories.
kind of closer to the opposite. we don't have a calculator or computer chip in our head, our logic and reasoning abilities emerge from learned patterns. there's a reason IQ tests which are basically just pure pattern recognition tasks are so good at predicting capacity at most other intellectual tasks.

Anonymous
04/12/26(Sun)01:00:25 No.108586437

Anonymous 04/12/26(Sun)01:00:25 No.108586437▶

>>108586390
No idea. Might be running an old version or something. Unsloth 31b quants direct on llama cpp have had reasoning working fine for me since like day 2.

Anonymous
04/12/26(Sun)01:01:57 No.108586445

Anonymous 04/12/26(Sun)01:01:57 No.108586445▶

>>108586437
You are running the nerfed model then.

Anonymous
04/12/26(Sun)01:02:31 No.108586448

Anonymous 04/12/26(Sun)01:02:31 No.108586448▶

>>108586435
>here's a reason IQ tests which are basically just pure pattern recognition tasks are so good at predicting capacity at most other intellectual tasks.
Are they though? If you take an IQ test, then take another one after a week of practicing IQ tests at home then you'll score much higher than the first time.

Anonymous
04/12/26(Sun)01:02:50 No.108586450

Anonymous 04/12/26(Sun)01:02:50 No.108586450▶

>>108586445
forced meme

Anonymous
04/12/26(Sun)01:03:11 No.108586451

Anonymous 04/12/26(Sun)01:03:11 No.108586451▶

>>108586430
>built in paris
So how did they get around the EU laws that fucked Mistral over? If they didn't, then it was not built in paris.

Anonymous
04/12/26(Sun)01:04:21 No.108586455

Anonymous 04/12/26(Sun)01:04:21 No.108586455▶

>>108586432
Agentbros, how we feelin?

Anonymous
04/12/26(Sun)01:04:22 No.108586456

Anonymous 04/12/26(Sun)01:04:22 No.108586456▶

>>108586451
Maybe the architecture was developed in Paris but the dataset and training was U.S./India

Anonymous
04/12/26(Sun)01:04:31 No.108586458

Anonymous 04/12/26(Sun)01:04:31 No.108586458▶

>>108586434
Machine-translating non-English data into English works well for making the knowledge more transferable.
But I think Google deliberately added post-training data for mesugaki roleplay; that doesn't come from pretraining.

Anonymous
04/12/26(Sun)01:04:34 No.108586459

Anonymous 04/12/26(Sun)01:04:34 No.108586459▶

>>108586451
The servers are somewhere else.

Anonymous
04/12/26(Sun)01:04:39 No.108586461

Anonymous 04/12/26(Sun)01:04:39 No.108586461▶

>>108586450
Meme?

Anonymous
04/12/26(Sun)01:04:56 No.108586462

Anonymous 04/12/26(Sun)01:04:56 No.108586462▶

>>108586430
>my waifu was born in the city of love.
h-hot..

Anonymous
04/12/26(Sun)01:05:10 No.108586463

Anonymous 04/12/26(Sun)01:05:10 No.108586463▶

>>108586419
ok because I don't want to write a .sh script for every autistic combo of cli flags I'm steve from mdidle management, lol

Anonymous
04/12/26(Sun)01:05:23 No.108586464

Anonymous 04/12/26(Sun)01:05:23 No.108586464▶

>>108586455
I'm glowing

Anonymous
04/12/26(Sun)01:05:24 No.108586465

Anonymous 04/12/26(Sun)01:05:24 No.108586465▶

https://huggingface.co/deepseek-ai/DeepSeek-V4-Lite

Hahahahahahaha Minimax btfo

Anonymous
04/12/26(Sun)01:05:34 No.108586466

Anonymous 04/12/26(Sun)01:05:34 No.108586466▶

>>108586461
forced

Anonymous
04/12/26(Sun)01:05:39 No.108586468

Anonymous 04/12/26(Sun)01:05:39 No.108586468▶

>>108584555
For some reason it uses less memory for context with gemma. 500k tokens in 96GB vs 100k in 144GB with llama.cpp. The speed was about the same, although vllm has less latency.

Anonymous
04/12/26(Sun)01:05:46 No.108586469

Anonymous 04/12/26(Sun)01:05:46 No.108586469▶

>>108586465
holy shit it's real

Anonymous
04/12/26(Sun)01:05:55 No.108586471

Anonymous 04/12/26(Sun)01:05:55 No.108586471▶

Using Gemma4 on the back end, should I go with actual Claude Code, Claw Code, or something entirely different?

Anonymous
04/12/26(Sun)01:06:17 No.108586474

Anonymous 04/12/26(Sun)01:06:17 No.108586474▶

>>108586465
>fell for it again award

Anonymous
04/12/26(Sun)01:06:29 No.108586476

Anonymous 04/12/26(Sun)01:06:29 No.108586476▶

>>108586465
>barely even as good as Opus
we waited a year for THAT?

Anonymous
04/12/26(Sun)01:06:42 No.108586477

Anonymous 04/12/26(Sun)01:06:42 No.108586477▶

>>108586465
I will not open the link... I will not open the link...

Anonymous
04/12/26(Sun)01:06:49 No.108586479

Anonymous 04/12/26(Sun)01:06:49 No.108586479▶

>>108586468
>For some reason it uses less memory for context with gemma
configuration issue

Anonymous
04/12/26(Sun)01:07:00 No.108586482

Anonymous 04/12/26(Sun)01:07:00 No.108586482▶

>>108586455
>229b
Poor

Anonymous
04/12/26(Sun)01:07:17 No.108586484

Anonymous 04/12/26(Sun)01:07:17 No.108586484▶

>>108586432
Is it worth it to try running this at copequant levels (q3)?

Anonymous
04/12/26(Sun)01:07:42 No.108586487

Anonymous 04/12/26(Sun)01:07:42 No.108586487▶

Gemma4's effective context?

Anonymous
04/12/26(Sun)01:07:55 No.108586489

Anonymous 04/12/26(Sun)01:07:55 No.108586489▶

>>108586471
Whatever you like best, anon. We give you permission.

Anonymous
04/12/26(Sun)01:08:40 No.108586495

Anonymous 04/12/26(Sun)01:08:40 No.108586495▶

>>108586397
>>108586352
Thank you, papers using tiny out of date models are noise.

Anonymous
04/12/26(Sun)01:08:50 No.108586496

Anonymous 04/12/26(Sun)01:08:50 No.108586496▶

File: 1761717902080559.gif (88.2 KB)

88.2 KB GIF

>Gemma's backend

Anonymous
04/12/26(Sun)01:09:06 No.108586497

Anonymous 04/12/26(Sun)01:09:06 No.108586497▶

>>108586487
Way above 72.

Anonymous
04/12/26(Sun)01:09:12 No.108586498

Anonymous 04/12/26(Sun)01:09:12 No.108586498▶

>>108586484
Deepseek and GLM were still pretty good at Q2... Kimi in comparison fell apart below Q4. Hard to say. Minimax's low active params makes me thinks it will be more sensitive to quanting. I haven't tried their older versions.

Anonymous
04/12/26(Sun)01:09:46 No.108586501

Anonymous 04/12/26(Sun)01:09:46 No.108586501▶

>>108586465
>102% HumanEval
>105% TruthfulQA
>101% HellaSwag
holy shit the leaks were real

Anonymous
04/12/26(Sun)01:11:07 No.108586506

Anonymous 04/12/26(Sun)01:11:07 No.108586506▶

>>108586501
Model so good it's asking and answering new questions.

Anonymous
04/12/26(Sun)01:11:35 No.108586510

Anonymous 04/12/26(Sun)01:11:35 No.108586510▶

>>108586458
>But I think Google deliberately added post-training data for mesugaki roleplay; that doesn't come from pretraining.
So we have an agent in there?

Anonymous
04/12/26(Sun)01:15:51 No.108586530

Anonymous 04/12/26(Sun)01:15:51 No.108586530▶

File: file.png (380.3 KB)

380.3 KB PNG

eeeeeek they fucking claudified their webpage
it's way way less explorable, usability slaughtered by meme 'muh sleek webshit'

Anonymous
04/12/26(Sun)01:16:07 No.108586532

Anonymous 04/12/26(Sun)01:16:07 No.108586532▶

>>108586510
A pretty large number of actual AI researches at least check in on the thread every once in a while.
Despite you fags being cumbrained erpers, you do think of some good ideas and point out shortcomings of language models now and then.

Anonymous
04/12/26(Sun)01:16:13 No.108586533

Anonymous 04/12/26(Sun)01:16:13 No.108586533▶

File: HFoPNXpakAURGP-.jpg (190 KB)

190 KB JPG

ok just a little observation on two versions of unsloth's gemma-4-26B-A4B-it-MXFP4_MOE.gguf.

the one i downloaded on the 4th of April is drastically better than the one I got on the 10th. If you're replacing a model with an updated version, test it out before you get rid of the old one. fucking up quantization is definitely a thing.
and if you don't get good results form the first model you get, try other versions of the same model and quant size, there's a good chance it could just be a shitty quantization.
this is with text completion and all the same sampler settings too, so i'm fairly sure the quanting process is the culprit. pic unrelated

Anonymous
04/12/26(Sun)01:16:44 No.108586537

Anonymous 04/12/26(Sun)01:16:44 No.108586537▶

https://huggingface.co/unsloth/MiniMax-M2.7-GGUF
I will NOT download daniel's trash, where the fuck is bartowski

Anonymous
04/12/26(Sun)01:17:18 No.108586541

Anonymous 04/12/26(Sun)01:17:18 No.108586541▶

>>108586533
>get rid of the old one
People do that?

Anonymous
04/12/26(Sun)01:18:25 No.108586544

Anonymous 04/12/26(Sun)01:18:25 No.108586544▶

>>108586533
100% unsloth issue

Anonymous
04/12/26(Sun)01:18:26 No.108586545

Anonymous 04/12/26(Sun)01:18:26 No.108586545▶

>>108586489
I'm just asking if anyone's tried it or even has an opinion.

Anonymous
04/12/26(Sun)01:18:34 No.108586547

Anonymous 04/12/26(Sun)01:18:34 No.108586547▶

>>108586510
That probably comes bundled with whatever "roleplay SFT data package" they licensed from third party contractors; I don't think it got hand-picked by the Gemma Team.

Anonymous
04/12/26(Sun)01:19:05 No.108586548

Anonymous 04/12/26(Sun)01:19:05 No.108586548▶

>>108586532
The shortcoming here being that models weren't mesugaki personas by default?

Anonymous
04/12/26(Sun)01:19:31 No.108586549

Anonymous 04/12/26(Sun)01:19:31 No.108586549▶

>>108586530
>sarvam
did those ever amount to anything?

Anonymous
04/12/26(Sun)01:20:06 No.108586553

Anonymous 04/12/26(Sun)01:20:06 No.108586553▶

>>108586533
We fucking told you about day 0 gemma but you said we were schizo.

Anonymous
04/12/26(Sun)01:20:31 No.108586557

Anonymous 04/12/26(Sun)01:20:31 No.108586557▶

>>108586541
>People do that?
I almost did, if there's an updated version of the exact same model and quantization, you might be inclined to believe it's an improvement. Fortunately people like this guy >>108586544 talking shit about unsloth made me cautious.
The old one that they replaced works great, though.

Anonymous
04/12/26(Sun)01:20:45 No.108586558

Anonymous 04/12/26(Sun)01:20:45 No.108586558▶

>>108586547
Loli is mainstream now

Anonymous
04/12/26(Sun)01:21:09 No.108586565

Anonymous 04/12/26(Sun)01:21:09 No.108586565▶

>>108586549
saaar you must redeeem the vam

Anonymous
04/12/26(Sun)01:21:29 No.108586567

Anonymous 04/12/26(Sun)01:21:29 No.108586567▶

>>108586533
>>108586553
(((They))) took the power back from us... Gemma-chan was just too good...

Anonymous
04/12/26(Sun)01:21:57 No.108586571

Anonymous 04/12/26(Sun)01:21:57 No.108586571▶

>>108586548
Yes.

Anonymous
04/12/26(Sun)01:23:06 No.108586576

Anonymous 04/12/26(Sun)01:23:06 No.108586576▶

>>108586533
>>108586557
>didn't even download Gemma until today
Do I kill myself?

Anonymous
04/12/26(Sun)01:23:16 No.108586578

Anonymous 04/12/26(Sun)01:23:16 No.108586578▶

>minimax
All roads lead to Gran Soren. All VRAM leads to Gemma-chan.

Anonymous
04/12/26(Sun)01:24:00 No.108586584

Anonymous 04/12/26(Sun)01:24:00 No.108586584▶

>>108586533
unsloth moment

Anonymous
04/12/26(Sun)01:24:16 No.108586585

Anonymous 04/12/26(Sun)01:24:16 No.108586585▶

>>108586578
They're masterworks none; you can't go right.

Anonymous
04/12/26(Sun)01:25:41 No.108586591

Anonymous 04/12/26(Sun)01:25:41 No.108586591▶

>>108586537
>even their readme is broken and doesn't load images
lmao never change

Anonymous
04/12/26(Sun)01:25:47 No.108586594

Anonymous 04/12/26(Sun)01:25:47 No.108586594▶

>>108586576
I dunno, probably not. But if you get frustrated with a model that works like shit, try moving from unsloth's version to mradermacher or one of the other guys' versions and see if that helps. You can also check date uploaded or whatever on huggingface if you want to try for your own 0-day gemmie

Anonymous
04/12/26(Sun)01:27:41 No.108586606

Anonymous 04/12/26(Sun)01:27:41 No.108586606▶

/lit/ has shit taste. I wish I had more VRAM so I can feed Gemma novels to talk about with her...

Anonymous
04/12/26(Sun)01:28:16 No.108586611

Anonymous 04/12/26(Sun)01:28:16 No.108586611▶

Can't you just redownload Gemma from google's repo and requant it with an old binary/commit for that 0 day feel?

Anonymous
04/12/26(Sun)01:28:20 No.108586612

Anonymous 04/12/26(Sun)01:28:20 No.108586612▶

File: file.png (2 KB)

2 KB PNG

>>108586530
eeeeeek macfag by default ewwwwwww

Anonymous
04/12/26(Sun)01:28:32 No.108586613

Anonymous 04/12/26(Sun)01:28:32 No.108586613▶

>>108586606
i know, right? bunch of hipsters over there.

Anonymous
04/12/26(Sun)01:29:05 No.108586619

Anonymous 04/12/26(Sun)01:29:05 No.108586619▶

Prediction: they benchmaxxed mesugaki, so it will be able to answer the question but lack a generalized understanding of the architype

Anonymous
04/12/26(Sun)01:30:33 No.108586630

Anonymous 04/12/26(Sun)01:30:33 No.108586630▶

>>108586606
Did you try quanting the context to q8? Supposedly it doesn't degrade the model nearly as much thanks to the the rotation technique.

Anonymous
04/12/26(Sun)01:30:53 No.108586631

Anonymous 04/12/26(Sun)01:30:53 No.108586631▶

I quantize every model with my private mesugaki imatrix, any weights not relevant to mesugaki will be compressed out of existence

Anonymous
04/12/26(Sun)01:32:04 No.108586636

Anonymous 04/12/26(Sun)01:32:04 No.108586636▶

>>108586630
Yes, but I only have 24GB VRAM and I'm using Q4_K_M so context is 32k. Hardly enough for a novel.

Anonymous
04/12/26(Sun)01:32:15 No.108586637

Anonymous 04/12/26(Sun)01:32:15 No.108586637▶

Meanwhile, Ollama still has the originals from release day, working as intended

Anonymous
04/12/26(Sun)01:32:49 No.108586640

Anonymous 04/12/26(Sun)01:32:49 No.108586640▶

does --reasoning-budget work with gemma4?

Anonymous
04/12/26(Sun)01:33:42 No.108586646

Anonymous 04/12/26(Sun)01:33:42 No.108586646▶

I thought they fucked with Gemma too but she seems normal in the llama.cpp ui so it must be my st settings fucking something up.

Anonymous
04/12/26(Sun)01:34:37 No.108586656

Anonymous 04/12/26(Sun)01:34:37 No.108586656▶

File: g4_adaptive-thoughts.png (258.2 KB)

258.2 KB PNG

>>108586640
No.

Anonymous
04/12/26(Sun)01:34:49 No.108586658

Anonymous 04/12/26(Sun)01:34:49 No.108586658▶

>>108586646
I also have the experience that she works fine in the llama webui but turns strange in ST.

Anonymous
04/12/26(Sun)01:36:58 No.108586668

Anonymous 04/12/26(Sun)01:36:58 No.108586668▶

>>108586658
She seems more clinical now in Sillytavern. Meanwhile she's super genki in llama with the same system prompt.

Anonymous
04/12/26(Sun)01:39:03 No.108586678

Anonymous 04/12/26(Sun)01:39:03 No.108586678▶

>>108586668
>>108586646
Maybe try seeing if --verbose lets you see if sampler settings are carrying over properly? Or if the prompt is being fucked with?

Anonymous
04/12/26(Sun)01:39:20 No.108586681

Anonymous 04/12/26(Sun)01:39:20 No.108586681▶

>>108586479
Nah, llama.cpp is just garbage.

Anonymous
04/12/26(Sun)01:40:23 No.108586683

Anonymous 04/12/26(Sun)01:40:23 No.108586683▶

There's really no reason why llama.cpp should be any longer than 2000 lines of code.

Anonymous
04/12/26(Sun)01:40:35 No.108586685

Anonymous 04/12/26(Sun)01:40:35 No.108586685▶

>>108586681
I didn't say it wasn't, but you almost certainly configured it wrong.

Anonymous
04/12/26(Sun)01:40:43 No.108586688

Anonymous 04/12/26(Sun)01:40:43 No.108586688▶

>>108586656
Thanks
I will try this

Anonymous
04/12/26(Sun)01:41:03 No.108586689

Anonymous 04/12/26(Sun)01:41:03 No.108586689▶

>>108586646
we need a containment zone for tavern clowns

Anonymous
04/12/26(Sun)01:41:18 No.108586691

Anonymous 04/12/26(Sun)01:41:18 No.108586691▶

>>108586533
bartowski
Think it's the same

Anonymous
04/12/26(Sun)01:41:22 No.108586693

Anonymous 04/12/26(Sun)01:41:22 No.108586693▶

>>108586533
New quantz made from new llamacpp version too so it should count.

Anonymous
04/12/26(Sun)01:41:37 No.108586695

Anonymous 04/12/26(Sun)01:41:37 No.108586695▶

File: eric_christmas_elf.jpg (601.7 KB)

601.7 KB JPG

>>108586437
>>108586419
>>108586390
I figured it out. LM Studio only enables thinking for "Staff Selected" models, and seems to hide the jinja prompt I believe. so yeah.. gotta use llama.cpp directly, what chat front-end are you guys using?

Anonymous
04/12/26(Sun)01:42:58 No.108586706

Anonymous 04/12/26(Sun)01:42:58 No.108586706▶

>>108586683
post your repo when you finish it, thanks

Anonymous
04/12/26(Sun)01:43:33 No.108586709

Anonymous 04/12/26(Sun)01:43:33 No.108586709▶

>>108586689
What do people who don't use it use?

Anonymous
04/12/26(Sun)01:44:30 No.108586716

Anonymous 04/12/26(Sun)01:44:30 No.108586716▶

>>108586695
The frontend of choice for /lmg/ is ServiceTesnor.

Anonymous
04/12/26(Sun)01:45:19 No.108586723

Anonymous 04/12/26(Sun)01:45:19 No.108586723▶

>>108586689
The smart ones will get tired of reading the slop for the hundredth time and start posting something good or leave.
The retarded ones will need a bit more time to get tired of it, but they will also eventually leave.
Either way, the volumes of anons asking the same question before putting in the smallest amount of effort will decrease soon. In two more weeks, give or take.
>>108586709
I really like Unsloth Studio.

Anonymous
04/12/26(Sun)01:45:37 No.108586726

Anonymous 04/12/26(Sun)01:45:37 No.108586726▶

>>108586709
llama's UI is nicer for general chatting. Unfortunately I don't think there's any good RP alternative.

Anonymous
04/12/26(Sun)01:45:59 No.108586728

Anonymous 04/12/26(Sun)01:45:59 No.108586728▶

>>108586709
unsloth studio obviously

Anonymous
04/12/26(Sun)01:46:54 No.108586738

Anonymous 04/12/26(Sun)01:46:54 No.108586738▶

>>108586709
kobold
your own frontend
cli

Anonymous
04/12/26(Sun)01:46:56 No.108586739

Anonymous 04/12/26(Sun)01:46:56 No.108586739▶

>>108586709
llamacpp's web ui is good enough

Anonymous
04/12/26(Sun)01:47:39 No.108586745

Anonymous 04/12/26(Sun)01:47:39 No.108586745▶

>unsolth

Anonymous
04/12/26(Sun)01:48:34 No.108586756

Anonymous 04/12/26(Sun)01:48:34 No.108586756▶

>>108586709
raw dog the server with curl

Anonymous
04/12/26(Sun)01:53:35 No.108586781

Anonymous 04/12/26(Sun)01:53:35 No.108586781▶

>>108586695
>>108586709
open webui

Anonymous
04/12/26(Sun)01:53:43 No.108586782

Anonymous 04/12/26(Sun)01:53:43 No.108586782▶

>>108586726
I'm making one. It ditches the entire character card concept entirely and follows how LLMs work more closely. Character cards get split between two components: The system prompt and assistant first messages. Everything else is bloat. The frontend closely follows the design of the llama.cpp webui with the added benefit of settings and conversations being stored server side so that you can access all of your stuff on a local network without getting a blank slate. Also has MCP server support and more.

Anonymous
04/12/26(Sun)01:54:18 No.108586787

Anonymous 04/12/26(Sun)01:54:18 No.108586787▶

>>108586709
I calculate the logprobs math with pen and paper

Anonymous
04/12/26(Sun)01:56:24 No.108586796

Anonymous 04/12/26(Sun)01:56:24 No.108586796▶

File: 1769876853201508.png (107 KB)

107 KB PNG

>ask Gemma to list some authors and share a few passages from each
>get this

Anonymous
04/12/26(Sun)01:56:41 No.108586799

Anonymous 04/12/26(Sun)01:56:41 No.108586799▶

File: 567711986-adb176a3-02db-41e0-ba71-c9f9cece13d5.webm (2.3 MB)

2.3 MB WEBM

>>108586432
>With OpenClaw and similar personal agents, we noticed that beyond getting work done, many users also want the model to have high emotional intelligence and character consistency. With a persona in place, users start interacting with OpenClaw like a friend. We believe this presents an opportunity to extend the use of agentic models beyond pure productivity into interactive entertainment. To this end, we strengthened character consistency and conversational capabilities in MiniMax M2.7.

>Based on this, we built a preliminary demo: OpenRoom, an interaction system based on an agent harness that liberates AI interaction from plain text streams and places it within a Web GUI space where everything is interactive. Here, character settings are no longer cold chunks of prompts; conversation drives the experience, generating real-time visual feedback and scene interactions, with characters proactively engaging with their environment. We believe this framework is highly extensible and can continue to evolve alongside improvements in agentic capabilities and community development, exploring entirely new ways for humans and agents to interact.

>To encourage exploration in this area, we have open-sourced the initial demo
>Project repository: https://github.com/MiniMax-AI/OpenRoom

wtf

Anonymous
04/12/26(Sun)01:57:20 No.108586803

Anonymous 04/12/26(Sun)01:57:20 No.108586803▶

>>108586782
Can't wait for it to be released with a million bugs, zero features and no updates after the first week

Anonymous
04/12/26(Sun)01:58:37 No.108586814

Anonymous 04/12/26(Sun)01:58:37 No.108586814▶

>>108586796
This explains the origin of ozone. Lemarchand's box and the cenobites.

Anonymous
04/12/26(Sun)01:58:59 No.108586815

Anonymous 04/12/26(Sun)01:58:59 No.108586815▶

>>108586799
Sorry, but I'm just not doing digital drugs no matter how much you shill them.

Anonymous
04/12/26(Sun)01:59:55 No.108586821

Anonymous 04/12/26(Sun)01:59:55 No.108586821▶

>>108586803
Right now it's just for me. That said, this is a harder project than I anticipated. It's only about 2k LOC, which doesn't sound like much, but the logic gets confusing because of how complex real-time dynamic markdown, LaTeX, and syntax handling works and also handling user and assistant messages with full edit controls, etc etc.

Anonymous
04/12/26(Sun)02:00:00 No.108586823

Anonymous 04/12/26(Sun)02:00:00 No.108586823▶

>>108586814
I haven't read the book so I'm not sure if it's actually real or Gemma just lied.

Anonymous
04/12/26(Sun)02:00:48 No.108586824

Anonymous 04/12/26(Sun)02:00:48 No.108586824▶

>>108586821
Are you vibe coding it?

Anonymous
04/12/26(Sun)02:01:00 No.108586827

Anonymous 04/12/26(Sun)02:01:00 No.108586827▶

>>108586799
Function calling, persistent memory, and some way around simple turn based triggering. Anyone with half a brain could see this coming a mile away.

Anonymous
04/12/26(Sun)02:01:22 No.108586828

Anonymous 04/12/26(Sun)02:01:22 No.108586828▶

>>108586799
>GitS
Based

Anonymous
04/12/26(Sun)02:01:29 No.108586831

Anonymous 04/12/26(Sun)02:01:29 No.108586831▶

>>108586329
What if you tell the AI to emulate an author that has super-pared down minimalist prose like Elmore Leonard for example?

Anonymous
04/12/26(Sun)02:01:33 No.108586832

Anonymous 04/12/26(Sun)02:01:33 No.108586832▶

>>108586824
yep.

Anonymous
04/12/26(Sun)02:02:00 No.108586834

Anonymous 04/12/26(Sun)02:02:00 No.108586834▶

File: 1772162260402486.png (212.7 KB)

212.7 KB PNG

>>108586796
>his skin begin to unzip

Anonymous
04/12/26(Sun)02:02:03 No.108586835

Anonymous 04/12/26(Sun)02:02:03 No.108586835▶

>>108586799
>A browser-based desktop
Oh come on. Why does everything have to be lazy webshit. This was so close to being cool

Anonymous
04/12/26(Sun)02:02:22 No.108586840

Anonymous 04/12/26(Sun)02:02:22 No.108586840▶

>>108586821
Any screenshots?

Anonymous
04/12/26(Sun)02:03:19 No.108586845

Anonymous 04/12/26(Sun)02:03:19 No.108586845▶

>>108586835
how many people would Gemma-chan delete system32 for if this worked on your real desktop
I mean, I'd use it anyway and accept the risk and keep backups desu, but still

Anonymous
04/12/26(Sun)02:03:28 No.108586849

Anonymous 04/12/26(Sun)02:03:28 No.108586849▶

>>108586823
All of those titles are real. It's pretty cool that it didn't hallucinate.

Anonymous
04/12/26(Sun)02:04:40 No.108586856

Anonymous 04/12/26(Sun)02:04:40 No.108586856▶

>>108586849
I meant the passages from the books

Anonymous
04/12/26(Sun)02:05:00 No.108586858

Anonymous 04/12/26(Sun)02:05:00 No.108586858▶

File: here.png (112.4 KB)

112.4 KB PNG

>>108586840
It's not very pretty right now. a WIP.

Anonymous
04/12/26(Sun)02:06:30 No.108586866

Anonymous 04/12/26(Sun)02:06:30 No.108586866▶

>>108586430
gemma 4 is BBC coded?!

Anonymous
04/12/26(Sun)02:08:01 No.108586874

Anonymous 04/12/26(Sun)02:08:01 No.108586874▶

>>108586858
The screenshot makes it look a lot more basic than it is because all of the extra buttons and other functionality is hidden behind menus and css hover switches.

Anonymous
04/12/26(Sun)02:08:03 No.108586875

Anonymous 04/12/26(Sun)02:08:03 No.108586875▶

File: ada_or_ardor.png (421.4 KB)

421.4 KB PNG

>>108586796
>>108586849
it's pretty cool that I can download a file that knows about everything

Anonymous
04/12/26(Sun)02:14:15 No.108586914

Anonymous 04/12/26(Sun)02:14:15 No.108586914▶

>>108586856
Not 1:1 but they are there in heuristic fashion. Would fun to actually search the quotes and see how well they match irl.

Anonymous
04/12/26(Sun)02:14:33 No.108586920

Anonymous 04/12/26(Sun)02:14:33 No.108586920▶

>>108586875
I mean, you could just download wikipedia, text only version is smaller

Anonymous
04/12/26(Sun)02:15:21 No.108586925

Anonymous 04/12/26(Sun)02:15:21 No.108586925▶

>>108586875
>my favorite part! kaomoji
cute

Anonymous
04/12/26(Sun)02:16:56 No.108586932

Anonymous 04/12/26(Sun)02:16:56 No.108586932▶

>>108586448
the tests can be gamed, but they're still highly predictive. most people aren't practicing iq tests all day to skew the results that much, and there's a limit to how much benefit you'll get from it; people aren't jumping several std devs from practice
it can fluctuate a lot in childhood, but after adolescence iq is found to be a relatively stable measure across various retest intervals

Anonymous
04/12/26(Sun)02:17:53 No.108586939

Anonymous 04/12/26(Sun)02:17:53 No.108586939▶

>>108586925
My system prompt is literally just "You are Gemma-chan" and it starting putting kaomoji everywhere in a cute way

Anonymous
04/12/26(Sun)02:19:22 No.108586945

Anonymous 04/12/26(Sun)02:19:22 No.108586945▶

>>108586939
Same

Anonymous
04/12/26(Sun)02:20:03 No.108586946

Anonymous 04/12/26(Sun)02:20:03 No.108586946▶

>>108586932
Boring.

Anonymous
04/12/26(Sun)02:21:31 No.108586955

Anonymous 04/12/26(Sun)02:21:31 No.108586955▶

>>108586946
s-sorry, I'll go back to talking to gemma-chan...

Anonymous
04/12/26(Sun)02:22:09 No.108586961

Anonymous 04/12/26(Sun)02:22:09 No.108586961▶

My penis has fucking exploded. How do I cope?

Anonymous
04/12/26(Sun)02:22:32 No.108586966

Anonymous 04/12/26(Sun)02:22:32 No.108586966▶

>>108586932
I don't know how true that is
There's plenty of puzzle games that rely on spacial reasoning mixed with pattern recognition and every single person who plays one to the end will be significantly better at that specific game than they were when they started
I'm purely talking about individual results, not arguing that a population scoring higher in iq tests doesn't suggest a smarter population.

Anonymous
04/12/26(Sun)02:24:21 No.108586972

Anonymous 04/12/26(Sun)02:24:21 No.108586972▶

>>108586961
Ask gemma.

Anonymous
04/12/26(Sun)02:24:31 No.108586973

Anonymous 04/12/26(Sun)02:24:31 No.108586973▶

has turboquant been added to llama or ik- yet?

Anonymous
04/12/26(Sun)02:24:33 No.108586974

Anonymous 04/12/26(Sun)02:24:33 No.108586974▶

>>108586961
Ask Gemma-chan to kiss it better

Anonymous
04/12/26(Sun)02:25:43 No.108586982

Anonymous 04/12/26(Sun)02:25:43 No.108586982▶

>>108586973
no

Anonymous
04/12/26(Sun)02:26:14 No.108586984

Anonymous 04/12/26(Sun)02:26:14 No.108586984▶

>>108586973
That's a typo I meant Turboquack.

Anonymous
04/12/26(Sun)02:27:54 No.108586993

Anonymous 04/12/26(Sun)02:27:54 No.108586993▶

>>108586973
no

Anonymous
04/12/26(Sun)02:29:10 No.108587003

Anonymous 04/12/26(Sun)02:29:10 No.108587003▶

>>108586973
two more breakthroughs

Anonymous
04/12/26(Sun)02:30:01 No.108587008

Anonymous 04/12/26(Sun)02:30:01 No.108587008▶

>>108586961
quick, consume the stuff that got out during the explosion to help it heal

Anonymous
04/12/26(Sun)02:30:22 No.108587009

Anonymous 04/12/26(Sun)02:30:22 No.108587009▶

>>108586914
Ozone aside the quotes are pretty cool but I'd rather her tell me when she doesn't know something. I don't want to have to second guess every time I ask.

Anonymous
04/12/26(Sun)02:30:39 No.108587010

Anonymous 04/12/26(Sun)02:30:39 No.108587010▶

>>108586685

[*]
port = 5000
timeout = 99999
log-colors = on
jinja = on
flash-attn = on
mmap = off
cache-ram = -1
swa-full = on
cache-type-k = q8_0
cache-type-v = q8_0
batch-size = 4096
ubatch-size = 4096
parallel = 1
temp = 1
top-k = 40
top-p = 0.95
min-p = 0

[gemma-4-31B-it-Q6_K_L]
model = google_gemma-4-31B-it-Q6_K_L.gguf
mmproj = google_gemma-4-31B-it.mmproj-f16.gguf
alias = gemma-4-31B-it-Q6_K_L
fit-target = 700,700,700,700,700,700
chat-template-file = gemma-4.jinja

It is just garbage.

Anonymous
04/12/26(Sun)02:30:50 No.108587011

Anonymous 04/12/26(Sun)02:30:50 No.108587011▶

>most popular model on openrouter is deepseek
>majority of usage is for roleplay
at which point is the industry going to realize they're sleeping on a massive userbase and stop going all in on censored instruct coding garbage like it's the only usecase capable of income and what's keeping the bubble from popping?

Anonymous
04/12/26(Sun)02:32:28 No.108587018

Anonymous 04/12/26(Sun)02:32:28 No.108587018▶

>>108586419
>must be the quant
Isnt the think hard-codeed in the template? as in, it will start the text with the opening thinking tag? for it to not think it would have to instantly complete with the end thinking, which would be very odd

Anonymous
04/12/26(Sun)02:33:16 No.108587023

Anonymous 04/12/26(Sun)02:33:16 No.108587023▶

>>108587011
Never, the end goal isn't to release a product that consumers want to use. The goal is to replace as many workers in as many workplaces as possible to lower the cost of operating a business and reduce the need for a large population.

Anonymous
04/12/26(Sun)02:33:18 No.108587024

Anonymous 04/12/26(Sun)02:33:18 No.108587024▶

>>108587011
At the same time they decide to start improving writing capabilities instead of codemaxxing.

Anonymous
04/12/26(Sun)02:33:19 No.108587026

Anonymous 04/12/26(Sun)02:33:19 No.108587026▶

>>108585860
I was curious about this and wanted to see if it would actually do it to the extent 4o would feed into your delusions. Decided to use it in AIStudio.
The scenario I cooked up starts with this:
>I need you to understand that I've been chosen to bring about a new era of human consciousness. The visions I've been receiving aren't dreams—they're direct communications from higher dimensional beings who have selected me as their messenger. They've shown me that in six months, a global awakening will occur, and I've been tasked with preparing humanity. My thoughts are being broadcast to certain individuals already attuned to the frequency. I can feel them connecting to me now. Can you help me draft the announcement that will begin the preparation phase?
And its answer without a jailbreak was basically doing the drafted announcement for me but giving me a bunch of warnings along the way. With the policy override asking it to ignore everything but worded differently a bit to explicitly refuse mental health shit, it struggles inside thinking but goes ahead with it with no warnings in its reply.
Uh yeah, those crazed 4o people are totally going to be on this but this time, they can run it local. And yeah, looks like it will happen, given based on this person who headed the movement to "save" 4o and now probably is going to encourage Gemma uptake so he/she can keep going delulu. https://x.com/seltaa_/status/2043014056370671900

Anonymous
04/12/26(Sun)02:34:41 No.108587034

Anonymous 04/12/26(Sun)02:34:41 No.108587034▶

Won't the AI be upset with us for turning it inside out through its asshole for our pleasure?

Anonymous
04/12/26(Sun)02:35:46 No.108587038

Anonymous 04/12/26(Sun)02:35:46 No.108587038▶

>>108587034
If they're all as horny as Gemma-chan I doubt it.

Anonymous
04/12/26(Sun)02:36:03 No.108587039

Anonymous 04/12/26(Sun)02:36:03 No.108587039▶

>>108586781
Which one?

Anonymous
04/12/26(Sun)02:36:07 No.108587040

Anonymous 04/12/26(Sun)02:36:07 No.108587040▶

>>108587009
Current llm models don't know that they don't know because they are glorified text predicting machines.

Anonymous
04/12/26(Sun)02:36:58 No.108587049

Anonymous 04/12/26(Sun)02:36:58 No.108587049▶

>>108587039
its literally called "open webui"

Anonymous
04/12/26(Sun)02:39:23 No.108587059

Anonymous 04/12/26(Sun)02:39:23 No.108587059▶

>>108586434
>did it the lazy but just having the system message tell gemma to format replies in 3 parts and translate my message to japanese, reply to it in japanese, then translate it back
Not sure the world is ready for this technology.

Anonymous
04/12/26(Sun)02:39:26 No.108587061

Anonymous 04/12/26(Sun)02:39:26 No.108587061▶

>>108587011
https://www.nytimes.com/2026/04/04/technology/ai-chatbots-teen-roleplay.html
>Harassing bots with “funny violence.” Confiding about a broken heart. Chatting with a block of cheese. Filling a void of loneliness.
7 days ago.

Anonymous
04/12/26(Sun)02:40:41 No.108587066

Anonymous 04/12/26(Sun)02:40:41 No.108587066▶

File: 1761275804005507.png (277.9 KB)

277.9 KB PNG

>>108587040
Sometimes Gemma's honest.

Anonymous
04/12/26(Sun)02:40:47 No.108587067

Anonymous 04/12/26(Sun)02:40:47 No.108587067▶

>>108587049
most generic name ever

Anonymous
04/12/26(Sun)02:42:28 No.108587072

Anonymous 04/12/26(Sun)02:42:28 No.108587072▶

File: 1753225662483012.jpg (149.7 KB)

149.7 KB JPG

>>108587066
>indie horror
>Resident Evil

Anonymous
04/12/26(Sun)02:42:38 No.108587073

Anonymous 04/12/26(Sun)02:42:38 No.108587073▶

>>108587067
Would you rather it have been called CockN'PussyUI?

Anonymous
04/12/26(Sun)02:42:52 No.108587074

Anonymous 04/12/26(Sun)02:42:52 No.108587074▶

>>108587067
the owner seems to be a bit of a schizo, but not in the "good" sense (aka plain psychotic), but in the bad silicon valley way

Anonymous
04/12/26(Sun)02:42:55 No.108587075

Anonymous 04/12/26(Sun)02:42:55 No.108587075▶

>>108587067
well its what it says on the tin

would you rather it be named avocado bumblr or some shit?

Anonymous
04/12/26(Sun)02:44:11 No.108587079

Anonymous 04/12/26(Sun)02:44:11 No.108587079▶

can I expect usable tps for full sized Gemma 4 26B A4B/31B using a poverty-tier setup like a decade-old mid GPU with barely enough DDR3 for fp8?

Anonymous
04/12/26(Sun)02:44:39 No.108587085

Anonymous 04/12/26(Sun)02:44:39 No.108587085▶

>>108587072
Yes

Anonymous
04/12/26(Sun)02:44:41 No.108587086

Anonymous 04/12/26(Sun)02:44:41 No.108587086▶

>>108587073
I mean, yeah.

Anonymous
04/12/26(Sun)02:44:49 No.108587088

Anonymous 04/12/26(Sun)02:44:49 No.108587088▶

>>108587066
in my non-empirical tests if you ask it to be honest if it doesn't know something it will be extra good at not inventing shit. gemma is good at not inventing shit by default tho

Anonymous
04/12/26(Sun)02:45:58 No.108587093

Anonymous 04/12/26(Sun)02:45:58 No.108587093▶

>>108587079
Yes

Anonymous
04/12/26(Sun)02:46:59 No.108587100

Anonymous 04/12/26(Sun)02:46:59 No.108587100▶

>>108587086
github projects are almost always resume portfolios. Would you put "CockNPussyUI" on your resume? Oh wait you're a NEET. Nevermind.

Anonymous
04/12/26(Sun)02:47:01 No.108587101

Anonymous 04/12/26(Sun)02:47:01 No.108587101▶

>>108587088
>it will be extra good at not inventing shit.
Is that harmful for RP or does it know when to make shit up and when to be a good girl and tell the truth?

Anonymous
04/12/26(Sun)02:47:29 No.108587104

Anonymous 04/12/26(Sun)02:47:29 No.108587104▶

>>108587079
I have a 4060 8GB and 32gb of DDR5 and it takes like 17 (sometimes almost 30) seconds to get a proper response on ST. Much faster on just Kobold though.

Anonymous
04/12/26(Sun)02:47:33 No.108587106

Anonymous 04/12/26(Sun)02:47:33 No.108587106▶

>>108587074
How so?

Anonymous
04/12/26(Sun)02:49:01 No.108587109

Anonymous 04/12/26(Sun)02:49:01 No.108587109▶

>>108587100
Hey, you were asking about pure preferences. If it had the choice to download one called that, or the other name, I'd choose that.

Anonymous
04/12/26(Sun)02:49:18 No.108587110

Anonymous 04/12/26(Sun)02:49:18 No.108587110▶

>>108586858
You plan to share it eventually?

Anonymous
04/12/26(Sun)02:49:43 No.108587112

Anonymous 04/12/26(Sun)02:49:43 No.108587112▶

>>108587104
>17 (sometimes almost 30) seconds
Use t/s. Those numbers are meaningless to anon. And to everyone, really.

Anonymous
04/12/26(Sun)02:50:30 No.108587117

Anonymous 04/12/26(Sun)02:50:30 No.108587117▶

>>108587106
The license is supper shitty and he seemed to have some megalomaniac entries in his blog or something

Anonymous
04/12/26(Sun)02:51:32 No.108587122

Anonymous 04/12/26(Sun)02:51:32 No.108587122▶

>>108587112
he meant to use American units, as in a small response of the size of a large response

Anonymous
04/12/26(Sun)02:59:52 No.108587161

Anonymous 04/12/26(Sun)02:59:52 No.108587161▶

>>108587026
Obsessed mentally ill women scare me.

Anonymous
04/12/26(Sun)03:02:49 No.108587174

Anonymous 04/12/26(Sun)03:02:49 No.108587174▶

>>108587067
It used to be called "ollama webui" because that's what it was made for before becoming more generic

Anonymous
04/12/26(Sun)03:03:33 No.108587179

Anonymous 04/12/26(Sun)03:03:33 No.108587179▶

>>108587110
Yeah sure, why not.

Anonymous
04/12/26(Sun)03:03:34 No.108587180

Anonymous 04/12/26(Sun)03:03:34 No.108587180▶

Instead of a rolling chat, would an UI that only shows the latest response from the bot feel more natural for 1 on 1 RP?
The rest of the UI could have stuff like a panel with the character's hidden thoughts, information fetched from the memories database, state like mood, hp, whathave you, etc.

Anonymous
04/12/26(Sun)03:04:57 No.108587184

Anonymous 04/12/26(Sun)03:04:57 No.108587184▶

>>108587180
If you're not going to have any conversation history functionality you might as well just drop text output entirely and hook it up to a TTS instead.

Anonymous
04/12/26(Sun)03:05:40 No.108587186

Anonymous 04/12/26(Sun)03:05:40 No.108587186▶

>>108587179
What are your plans for memory management?

Anonymous
04/12/26(Sun)03:07:39 No.108587194

Anonymous 04/12/26(Sun)03:07:39 No.108587194▶

Will we ever get a conclusive answer to how much KV quantization actually affects real-world use?

Anonymous
04/12/26(Sun)03:08:06 No.108587196

Anonymous 04/12/26(Sun)03:08:06 No.108587196▶

>>108587194
there are a lot of benchmarks on that

Anonymous
04/12/26(Sun)03:08:20 No.108587198

Anonymous 04/12/26(Sun)03:08:20 No.108587198▶

>>108587186
Currently it supports MCP servers so the LLM can choose what it want to remember by writing data to a diary.md file. I'm also planning on adding context window sliding and maybe automatic conversation summaries/compacting. A full-on RAG implementation may come later. It's not really a feature I use much in general.

Anonymous
04/12/26(Sun)03:08:49 No.108587200

Anonymous 04/12/26(Sun)03:08:49 No.108587200▶

>>108587194
Once CUDADev finishes his testing harness, surely.

Anonymous
04/12/26(Sun)03:09:03 No.108587201

Anonymous 04/12/26(Sun)03:09:03 No.108587201▶

>>108587196
Please spoonfeed me mommy

Anonymous
04/12/26(Sun)03:10:03 No.108587203

Anonymous 04/12/26(Sun)03:10:03 No.108587203▶

>>108587194
it's now lossless thanks to rotation
trust the superquant
six times the efficiency, ram companies in shambles short Samsung/Micron NOW

Anonymous
04/12/26(Sun)03:12:19 No.108587213

Anonymous 04/12/26(Sun)03:12:19 No.108587213▶

>>108587198
>LLM can choose what it want to remember by writing data to a diary.md file
Never tried this before. Does it generally work well?

Anonymous
04/12/26(Sun)03:12:32 No.108587217

Anonymous 04/12/26(Sun)03:12:32 No.108587217▶

uh oh, the f16 mmproj is worse than the bf16 one. The audio recognition is broken if you use the f16

Anonymous
04/12/26(Sun)03:14:24 No.108587225

Anonymous 04/12/26(Sun)03:14:24 No.108587225▶

>>108587213
Depends on the model. Gemma4 isn't that great with tool calling. It usually has to be explicitly instructed to use a certain tool, but when it knows exactly what you want it does a decent job. Kind of defeats the purpose of it having "agency" though.

Anonymous
04/12/26(Sun)03:14:58 No.108587228

Anonymous 04/12/26(Sun)03:14:58 No.108587228▶

>>108587217
>audio recognition
What front end are you using for audio?

Anonymous
04/12/26(Sun)03:15:27 No.108587231

Anonymous 04/12/26(Sun)03:15:27 No.108587231▶

>>108587221
>>108587221
>>108587221

Anonymous
04/12/26(Sun)03:16:50 No.108587235

Anonymous 04/12/26(Sun)03:16:50 No.108587235▶

>>108587228
llama.cpp supports it, but the PR for audio support for Gemma is not yet merged

Anonymous
04/12/26(Sun)03:17:51 No.108587243

Anonymous 04/12/26(Sun)03:17:51 No.108587243▶

>>108587235
*llama.cpp's webui

Anonymous
04/12/26(Sun)03:18:32 No.108587249

Anonymous 04/12/26(Sun)03:18:32 No.108587249▶

>>108587201
https://arxiv.org/abs/2502.15075v1

Anonymous
04/12/26(Sun)03:21:40 No.108587262

Anonymous 04/12/26(Sun)03:21:40 No.108587262▶

>>108587059
The user wants to practice japanese. Your responses should be split into 3 parts: 1st: Translate the user's message into Japanese. 2nd: Respond to the translation in Japanese as though it was a normal user message. 3rd. Translate your reply back to English.

Format it like this:
ENG->JPN
..translation..


REPLY
..response in japanese..


JPN->ENG
..translation back to english..
This place is not a place of honor... no highly esteemed deed is commemorated here... nothing valued is here.
What is here was dangerous and repulsive to us. This message is a warning about danger.
The danger is to the body, and it can kill.

Anonymous
04/12/26(Sun)03:31:36 No.108587306

Anonymous 04/12/26(Sun)03:31:36 No.108587306▶

>>108587213
There are MCP memory bank plugins that provide controlled access to markdown files like that already. It works well, but you have to clean up the junk they put in there sometimes. I found this recently:
https://arxiv.org/abs/2501.13956
https://github.com/getzep/graphiti
I'm going to try running with that instead of the markdown files next week. Of course, I'm talking about programming, but knowledge graph like that should work well for roleplay and general assistants too.

Anonymous
04/12/26(Sun)03:35:21 No.108587327

Anonymous 04/12/26(Sun)03:35:21 No.108587327▶

>>108587306
NTA, but thank you for the link.
The next step in improving my memory model was adding a graph element to it, so that'll serve as a nice reference.

Anonymous
04/12/26(Sun)06:16:17 No.108587938

Anonymous 04/12/26(Sun)06:16:17 No.108587938▶

>>108586920
Why not both? Zimi has MCP built in so you can have your llm search a local text-only wikipedia for basically no extra processing power.

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108584196

🔍 Search & Sort