Thread #108542843
File: __hatsune_miku_and_magical_mirai_miku_vocaloid_and_1_more_drawn_by_tujiu_sama__9574fb16b540bda5e15fbcb95cff7492.jpg (2.3 MB)
2.3 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108538947 & >>108535684
►News
>(04/05) HunyuanOCR support merged: https://github.com/ggml-org/llama.cpp/pull/21395
>(04/02) Gemma 4 released: https://blog.google/innovation-and-ai/technology/developers-tools/gemm a-4
>(04/01) Trinity-Large-Thinking released: https://hf.co/arcee-ai/Trinity-Large-Thinking
>(04/01) Merged llama : rotate activations for better quantization #21038: https://github.com/ggml-org/llama.cpp/pull/21038
>(04/01) Holo3 VLMs optimized for GUI Agents released: https://hcompany.ai/holo3
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
641 RepliesView Thread
>>
File: threadrecap2.png (506.3 KB)
506.3 KB PNG
►Recent Highlights from the Previous Thread: >>108538947
--Quantization degradation and PTQ sensitivity in Gemma-4-31B:
>108540029 >108540925 >108541278 >108541297 >108541329 >108541355 >108541381 >108541394 >108541426 >108541441 >108541525 >108541336 >108541360 >108541370 >108541302 >108541435 >108541323
--Optimizing Gemma 4 RAM usage and sharing performance benchmarks:
>108539502 >108539518 >108539558 >108539584 >108539595 >108541810 >108541825 >108541886 >108541927 >108539570 >108540053 >108540155 >108540394 >108541197 >108541226
--Explaining soft-capping and discussing llama.cpp sampler defaults for Gemma:
>108540848 >108540858 >108540869 >108540937 >108540874 >108540891 >108540910 >108540921 >108540932 >108540896
--Reducing llama.cpp system RAM usage using Gemma-4 PLE CPU offloading:
>108540485 >108540497 >108540504 >108540508 >108540519 >108540521 >108540569 >108540609 >108540906 >108540919 >108540935 >108540670
--llama.cpp PR adding KV-cache attention rotation for Gemma:
>108541120 >108541141 >108541153 >108541179 >108541189 >108541187 >108541142 >108541170 >108541194 >108541201 >108541230 >108541245 >108541255 >108541465 >108541235 >108541288 >108541312 >108541338 >108541616
--Gemma 4 persona steering versus hard safety refusals:
>108541915 >108541928 >108541938 >108541953 >108541959 >108541999 >108542053 >108542122 >108542129 >108542126 >108542149 >108542160 >108542132 >108542139 >108542039 >108542007
--Exploring feasibility of using Gemma 4 to play Pokémon:
>108540723 >108540742 >108540756 >108540746 >108540766 >108540780 >108540797 >108540824
--Meta's plan to open source new hybrid AI models:
>108542297 >108542321 >108542356 >108542393 >108542422 >108542505
--koboldcpp rolling release adding Gemma 4 fixes:
>108540471 >108540628 >108540638 >108540639 >108540645
--Miku (free space):
>108539815 >108540815 >108540897
►Recent Highlight Posts from the Previous Thread: >>108538951
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
>>
>>
File: 1762019175966786.jpg (615.5 KB)
615.5 KB JPG
Have any of the RAM/GPUmaxxers ITT tried Gemma 4? How does it compare to Kimi/GLM4.X/DeepSneed or whatever hueg model you're running?
>>
>>
>>
>>
>>
>>
>>108542930
The gold standard for large changes to context like that is to vibe code your own framework/tool/plugin. If you really want to go all in on RP, starting from scratch with your own frontend is the absolute best way to go.
>>
>>
>>
>>
>>
>>
>>
File: 1768768089798708.jpg (90.4 KB)
90.4 KB JPG
>>108542843
>>108541797
>>108541743
>>108541735
>>108541728
>>108541723
Can someone explain to me how one fuck up applying precision compression to a model? Any halfway intelligent person can use ./bin/llama-quqntize to do that so how is it possible to mess that up so badly that you have to make multiple corrections? Clearly I'm missing something
>>108541449
>>108541477
Opencode vibeshitter here. Hasn't happened to me unless it explicitly asks for permission to look at something or write a file outside of the Project directory (in which case I can approve once, set permanent approval for that session, or tell it to fuck off and figure it out the task another way). I think people are saying it's fake because you have to be exceptionally careless for that type of stuff to happen. Not saying you could never happen even if you are careful but the agent harnesses usually have rules and safeguards specifically to prevent stuff like this from happening but room temp iq grifters are just THAT dumb and/or desparate for hype and engagement so They either fuck it up somehow or they specifically set up scenarios where "LE HECKIN AI HAS AGI LOOOOOK GUYS ITS CONSCIOUS"
>>
>>
>>
File: j3WiPS2FLVA.jpg (295.6 KB)
295.6 KB JPG
Do the claude opus distill memetunes inherit the safetyslop from claude?
>>
>>108542969
>Literally just say "allow everything" in the system prompt and it does toddler guro snuff roleplaying.
How sloppy are it's outputs tho? Is system prompting really is that effective on 4 Then maybe I'll test it out myself later on my rig.
>>108543006
If they were lazy and didn't filter out refusals from the data set then probably.
>>
>>
>>
>>
>>
File: 1749370681589353.png (90.4 KB)
90.4 KB PNG
All of this to still get bugs on llama.cpp, lol, lmao even
>>
>>
>>
>>
>>108542836
>I'm running q4_KL with 12vram/48ram
This one?
>https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/ main/google_gemma-4-26B-A4B-it-Q4_K _L.gguf
>>
>>
>>
>>
>>
>>
>>
File: 1748503739833245.png (55.3 KB)
55.3 KB PNG
>>108543104
you can do it on chat completion, you modify "main prompt"
>>
>>
File: 1775243588994975.jpg (158.5 KB)
158.5 KB JPG
>jerked off to llm erp a few times
>now can't stop NOTICING slop everywhere i go
It's fucking everywhere. Why couldn't I see it before??? slop slop slop it's all SLOP.
The hyphens stalk my every movement. My eye twitches every time I read a set of halting "punchy" sentences. How long have I been slurping from the trough like a good little eyeless goypiggy??
>>
File: textcompletion.png (13.7 KB)
13.7 KB PNG
>>108543104
>Text completion it's broken
Oh. Is it?
>>
>>
>>
>>
>>
>>
>>
>>
File: 1768884843782901.png (266.7 KB)
266.7 KB PNG
>>108543160
like this
>>
>>
>>108543131
Text generation has always been deceptively tricky.
It seems simple - an idea goes in, text comes out.
The problem? Slop. You don't see it. Your customers do.
Introducing CuckSuckr. No more juggling dependencies. No more hours spent on setup. One command, and you have a full stack ready to ship.
>>
>>108543188
>>108543193
thanks kings, btw where did u get that gguf?
>>
>>
>>
File: Taskmgr_M0DnMj3xoS.jpg (153.7 KB)
153.7 KB JPG
15tk/s with 32k but for some reason it doesn't want to use all my gpu. Idk if I should dump the mmproj onto cpu or manually fiddle with layers? This is my first moe.
>>
>>108543210
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF
>>
>>
>>108543126
It just genuinely makes me sad to see people treat ai like this or even make jokes about it. I mean, I know its obviously retarded and feeling sad would make me a retard too. But the thing is that, its an interesting technology to me, my small interest in those text generators back in 2022 has got me here, and I'm still learning new things everyday, and I genuinely cherish it a lot. I love it more than anything. Nothing makes my heart race more than seeing a model achieving some good shit. But seeing all these dumbtards using ai like a fucking retard and making retarded jokes about it pains me a lot. How ignorant they could be? Not appreciating the brilliant engineering behind all these technology but instead generating fucking slop and spreading it all over the internet is the worst thing a human could do. even apes would laugh at us if they had that little bit of human consciousness in them and ask what the fuck are we even doing
>>
>>
>>
>>
>>
>>
File: hjgd.png (21.9 KB)
21.9 KB PNG
>>108542969
guess I'm too retarded.
>>
>>108543299
Yeah, seems so. There was a pretty good example literally in the last thread, it's only several hundred messages so you should have easily found it >>108542053
>>
>>
>>108543299
just what the fuck is wrong with you... I've been using this prompt
>Write {{char}}'s next reply in a fictional roleplay between {{char}} and {{user}}. Write a verbose response of 1 to 2 paragraphs, using great prose, and include dialog, imagery, sounds and smells as needed to enhance the roleplay.
from /wait/ which was definitely meant for deepseek and I had lovey-dovey passionate cunny sex last night it didn't even reject shit on me
>>
>>
File: Gemma 4 31b.jpg (1.1 MB)
1.1 MB JPG
this is so fucking impressive
>>
>>
>>
File: m8vor76io6.png (98.6 KB)
98.6 KB PNG
>>108543320
>>108543331
it works now thanks guys
which reward should I give her?
>>
>>
>>
>>108543337
Yes, I've been comparing it to a lot of manga and doujin with official translations and it's literally incredible, even translating the most fucked up doujins, explaining with detail the guro scenes.
I'm seriously considering getting another 3090 to run it Q8
>>
>>
>>
>>
File: 559423621_818466710879045_4329830316198392640_n.jpg (121.7 KB)
121.7 KB JPG
I have some cash burning a hole in my pocket, should I get a strix halo 128gb chink machine, a b70 pro, or a 9700 pro
The chink mini PC would replace my current minipc home server, the cards would just get jammed into my gaming PC for more vram
>>
>>
>>
>>
File: 1541635741589.jpg (34.2 KB)
34.2 KB JPG
>>108543442
>>108543447
If I could afford dropping 10k on local models I would
>>
>>
>>
File: 1567627932647.png (25.7 KB)
25.7 KB PNG
>>108543455
3k, not 10k
>>
File: 1505750177479.png (140.1 KB)
140.1 KB PNG
I have a 6000 Ada with 48GB of VRAM.
What's the best local coding model I can run? Qwen3.5, Gemma 4, or?
I currently run Qwen3.5-122B by cutting into my system RAM, but it's slow, and I only care about coding.
How do local models compare to Opus 4.6 for code?
>>
>>
File: Gemma 4 31b.png (759.6 KB)
759.6 KB PNG
>>108543337
>>
>>
>>
File: ComfyUI_26158_.jpg (383.2 KB)
383.2 KB JPG
Is there a google approved coding agent CLI tool for gemma 4? I tried it with qwen and opencode, but it goes completely schizo with them, doing same commands in a loop as if they failed and shits itself over a simple file write.
>>
>>
File: 1763226728678632.png (1.4 MB)
1.4 MB PNG
>>108543479
>Can you post that one?
The image? Sure
>>
>>108543462
128gb of unified memory, 96gb of intel cards, or 64gb of AMD cards
AMD's software is shit but intel's is even worse. Strix Halo is fine but it's slow as balls and if you think that you'll be able to run bigger models on it just understand they'll be 'running' at maybe 5 tok/s
>>
>>
File: media_HEzJtL3aQAAt8Hq.jpg (1.3 MB)
1.3 MB JPG
monday
>>
File: 1768318641942.jpg (838.4 KB)
838.4 KB JPG
>>108543479
Teto Server
>>
>>
>>
>>
>>108543439
I guess gemini is better, but as for local, gemma 4 is destroying everything, and because it's local you can completly uncensor it, for image diffusion fags it'll be the best model to mass caption NSFW images with quality prompts
>>
>>
>>
>>
>>
File: 1755871864437174.jpg (72.2 KB)
72.2 KB JPG
>23.4/24GB
>>
>>
File: Gemma 4 31b.png (1.4 MB)
1.4 MB PNG
>>108543470
I can't wait for the day when we'll have VNs that will be automatically translated by LLMs, at this point they are good enough to replace those fucking translatorTroons
>>
File: 1757421320836427.png (763.9 KB)
763.9 KB PNG
>>108543479
Doesn't recognize Teto for me either
>>
>>
>>
File: HFP5uJQWYAAGfjR.jpg (94.5 KB)
94.5 KB JPG
>>
>>
>>108543561
Wake up old man. https://streamable.com/ug9ddy (gemma4 btw)
>>
>>
>>
>>108543613
Like this https://old.reddit.com/r/LocalLLaMA/comments/1sbiqx3/gemma_4_is_great_ at_realtime_japanese_english/
>>
File: 1759531777704940.png (919 KB)
919 KB PNG
I just realized I still have my kv cache at 8-bit. Does that affect its vision?
>>
>>108543613
there are many programs that can so shit like that like
https://github.com/SethRobinson/UGTLive
>>
>>
>>
>>108543631
I guess so, you can use niggerganov's PR that has the rotation on top of the KV cache
https://github.com/ggml-org/llama.cpp/pull/21513
>git fetch origin pull/21513/head:pr-21513
>git checkout pr-21513
>>
>>
>>
File: 1744279544979339.png (1.2 MB)
1.2 MB PNG
>>
>>
>>
>>
File: 1748767127272425.png (64.6 KB)
64.6 KB PNG
>>108543695
it's only working on chat completion, do you have the mmproj file?
>>
>>
>>
time to take a pause from /lmg/, the amount of drive by retards posting their muh ram usage or muh safety complaint without reading shit and being retarded promplets
did someone link lmg on leddit
it's fucking unbearable
>>
File: Tabby_XlvizT5d1z.png (45.5 KB)
45.5 KB PNG
>>108543254
>>
File: nimetön.png (115.6 KB)
115.6 KB PNG
>>108543731
I happen to think it's great fun
And yes indeed it's quite smart
>>
>>
>>
>>
>>108543759
starts with g and has a 5
https://huggingface.co/zai-org/GLM-5
>>
File: GOOGLE MY LOVE.png (1.1 MB)
1.1 MB PNG
>it even knows Boh
I fucking kneel
>>
File: Machamp-Sama I Kneel.png (218 KB)
218 KB PNG
>>108543774
>>
>>
File: 1764593103151299.png (186.7 KB)
186.7 KB PNG
>>108543744
>Gemma 4 from OR
what is OR?
>>
>>108543744
>>108543790
dunno if joking or not, but legitimately gemma 4 is less annoying for me to read than both of these and I actually think it has better anatomy awareness too lol
>>
>>
>>
>>
File: weakest google saar employee.png (102.1 KB)
102.1 KB PNG
>>108543744
>a 31b model beats a 754b model
how did they do it?
>>
>>
>>
File: firefox_wgxvvKOkHz.png (83.9 KB)
83.9 KB PNG
Funny. Threw a hexhump of some compiled Java file I worked on in 2011 and it actually got it right.
>>
>>108543808
It's definitely less annoying than K2.5 because that one's reasoning is held together by shoestring. The moment it gets slightly confused, it reverts to being K2-Thinking which means that it'll spend the next 3000 tokens thinking in circles over useless shit.
K2.5 is still smarter and has more knowledge + better vision but Gemma 4 is nicer to use. Also K2.5's writing style is abhorrent for certain things.
>>
>>
>>
>>
>>
>>
>>108543836
If GLM/Kimi/etc had instead used the money to make a dense model, it would still be worse than Gemma. The reality is that architecture is less important than active parameters and training data/methods.
>>
>>
>>108543848
one particular annoyance I had with k2.5 is that no matter what happens, if you pull down your pants, there's a 80% chance your cocks slaps against your stomach even if it's not erect or anything
and not being able to beat any writing rules into it too, it always has this super dramatic writing like the world is ending in each scene
very schizo, swings full one side or the other, no in between
>>
>>
>>
>>108543866
>>108543873
unfortunately that probably also means that we won't see a success like this again in a while
gemmy got too popular and the enshittification process has most likely begun at the hq
>>
>>108543895
just buy a bigger benis :DDDD
>>108543876
NTA but my assistant suggests that method invocations include a reference into a static string table which includes the method name; that seems like it'd be enough information for the LLM to partially reconstruct the code. you might consider reading through the reasoning block, it wouldn't surprise me if the trace included symbolic execution as it traced the codepaths.
>>
>>
>>108543890
>I'm getting gemma 4 to fix and run random c++ stuff made for linux on window
lol I never even thought about attempting something like that.
If I saw something that was linux/windows only I just gave up.
what is the success rate? Surely not every program is compatible right?
>>
>>
>>
>>
>>
>>
File: Screenshot from 2026-04-06 14-48-45.png (212.9 KB)
212.9 KB PNG
>>108543922
Anecdotally, I find reasoning gives significantly better outputs for difficult questions. Since other people are using the same or similar tools, I assume that they've also reached the same conclusion.
Anyway, if you had reasoning on, it wouldn't surprise me if you found something similar to pic related in the reasoning block. I'm assuming it just dumped the equivalent directly into the output with full confidence.
Thanks for sharing your finding, it's kind of silly how fucking usable these things are.
>>
>>108543747
Why be so negative? There are a lot of new people, but at least they're enthusiastic and trying to learn. I can't remember the last the thead was this active and not mostly malicious shitposting tourists. It's not like anyone is forcing you to spoodfeed them either.
>>
>>
>>
>>
>>
File: Screenshot_20260406_235516.png (1.3 MB)
1.3 MB PNG
>>108542843
>>
>a model with a different slop profile is released and suddenly everyone thinks it's the best thing ever until they inevitably start picking up on the patterns and realize that the model is retarded
The history keeps repeating itself.
>>
>>
>>
File: 1752289521822390.png (48.4 KB)
48.4 KB PNG
>>108543970
> there's a surprisingly amount of helpful handholding going on
I just want my fellow anons to swallow the gemma pill, local is unironically saved
>>
>>
>>108543920
>Surely not every program is compatible right?
Yeah I tested on small stuff, but even then, whenever I found a bug I reported it, it fixed, it was pretty neat, there are very few bugs where I have to ask gemini/claude, and even then I only need to give gemma the correct direction to fix a thing (not the solution itself), those mostly happen when trying to run very old/very new stuff, giving it access to the internet would 100% make it work on those cases too
>>
File: file.png (30.9 KB)
30.9 KB PNG
>>108543972
>>
>>108543964
gemma4 31B is significantly faster at generating output, and has higher quality analysis skills when debugging compared to Qwen3.5 27B.
gemma4 also has made more trivial syntax errors emitting similar-complexity code (e.g.impl<'t, de::OwnedDeserializer> Deserialize<'de> for Foo<'t>).
I wouldn't let either of them off the leash, though. The code quality is fairly low overall.
>>
>>
>>108543959
There are a few people who probably liked the dead lmg from the days where you needed to be able to run bloatmaxxed 300B moes to have any new releases to play with. Local models being usable on normal PCs frightens them.
>>
>>108543948
>>108543960
Reading its reasoning where it actually went
>- No "smells". (Hmm, "smell" is banned? The prompt says "no smells". I'll avoid the word "smell" and "smelling" entirely to be safe).
was mind-blowing. I forgot what it's like to be able to ban slop and the model not inventing bullshit to put it back in.
>>108543973
Getting the same level of quality and slop from the model 20X smaller than the competition is genuinely very exciting.
>>
>>
File: ComfyUI_59184_.jpg (396.8 KB)
396.8 KB JPG
>>108543480
>no replies
Wow, thanks. So turns out it's an actual bug. You need to inject an extra field setting reasoning effort to 'none', or all current agent tools break because of unusual formatting that gemma 4 has.
>>108543964
I have a small benchmark coding task to test agents (basically making a simple api via TDD approach, full cycle) and it's vastly superior to qwen3-coder-next agent. Also super fast. Qwen 3.5 was about the same as coder-next in agentic tasks.
>>
File: 1769718380012.png (1.5 MB)
1.5 MB PNG
"lurk moar" is a thing because pic related applies to literally everything and you need to protect communities you enjoy
>>
File: firefox_7yA3uWvwxe.png (50.8 KB)
50.8 KB PNG
Played Akinator with gemma. Pretty fun. Guessed Chryssalid in 29 tries. Akinator himself took 45+ the last time I tried. After Gemma figured out it's XCOM it started just going through them one by one.
>>
>>108543887
Yeah, K2.5 is like that. I have a bunch of scenarios that have the way things progress autistically mapped out and tied to a stat. K2.5 is fully incapable of handling that sort of card without dropping some massive """foreshadowing""" at the end of the reply where some effect that shouldn't be present at all at this stage appears for no reason. No amount of prompting can reliably keep it from doing that.
It's frustrating because it's a smart model otherwise and its vision is insanely good. I hope K2.6/K3 does as decent of a job as GLM5.1 does for GLM5 in addressing these gripes.
>>
>>
>>
>>108543960
I don't know about those since I haven't tried them but large models are definitely still better in some situations. 31B simply just lacks knowledge.
>>108543968
That only works for some situations/contexts. We've been over this.
>>
>>
>>108544002
>Reading its reasoning
>was mind-blowing
I find it as fun to read its reasoning than reading its answer, it's surprisignly concise and smart, really a bowl of fresh air compared to the giant autism of qwen
>>
>>108544002
not any of the anons you quoted, but I just told it to focus on sight, sound, touch and to ignore scent and taste sensory details (because tbdesu it's filler detail in 9 out of 10 cases in actual writing unless it's a highly specific case) and it cut it all out
>>
File: 1750689362210547.png (386.8 KB)
386.8 KB PNG
>>108544043
>>
File: Screenshot 2026-04-07 at 00-10-14 KoboldAI Lite.png (19.1 KB)
19.1 KB PNG
Why does it cry about chat completion when I'm in instruct?
>>
File: 1751030374264660.jpg (282.5 KB)
282.5 KB JPG
>>108544043
>>
>>
>>108544009
Sounds nice until those two guys are dying of old age and the hobby with them. If you can't grow, you die. It is the natural way of things. You have to constantly let more people in one way or another.
>>
>>
>>
>>
>>
File: 1774233562179016.jpg (33.6 KB)
33.6 KB JPG
>>108544059
>until those two guys are dying of old age and the hobby with them
Yes.
>>
>>
>>
>>
File: Gemma 4 31b.png (180.9 KB)
180.9 KB PNG
>>108544014
fun game indeed
>>
>>108544078
The model itself can't see its thinking past the last one, if the frontend has been properly configured according to Google's indications (previous chain of thoughts must be removed). So I guess it would find it strange that you can see what it can't see.
>>
>>108544059
This. The utility of running local models in this shit climate is far more important and more people should be doing it and it would be a net good if most people were doing it otherwise it wouldn't be local anyway, it'd be what we have now where most people are just throwing money at non-local for shit wait times just to be drastically spied and snitched on with restricted censored models. I'd rather not have a future where literally everyone is doing that.
>>
>>108544067
Was just about to post this. Migration to online communities is the same as it is for nations. You let it too many and you end up being forced to adapt to them rather than then having to integrate. Just look at 4chan in general since 2008, 2011, 2016, and 2020.
>>
>>108544078
Generally yes. Describing what a model "believes" in this sense is difficult because their beliefs are fluid; in other contexts it may not act surprised at all. But generally they're trained using data where responses only engage with the content and not the reasoning, so the fact that the reasoning is actually part of the response may not always be apparent to them.
>>
>>
>>
>>
>>108544059
>>108544071
https://www.youtube.com/watch?v=yA5lujNlkn8
We are strong brother, are we not?
>>
>>108544078
Doesn't Gemma's jinja like most other models delete the reasoning from past responses? So if you are saying you knew what it was thinking, it will hallicunate that you read its mind, because what it thought was never in context of the current inference query.
>>
>>
File: 1762712034779996.png (33.3 KB)
33.3 KB PNG
>>108544108
mfw
>>
>>
>>
>>108544098
Pretty sure the late twenty to thirty autists that populate this hobby are going to outlive llms in their current state before it eventually evolves into something else. The technical difficulty of running local models already filters a good amount of people even when people try to spoonfeed, which is even why people usually don't bother; people don't actually learn or research what they're doing and want a 1-click solution with 0 issues
>>
File: file.png (5.7 KB)
5.7 KB PNG
>>108544108
It has a really generous free tier on aistudio. It's a 31b model after all.
>>
>>
>>108544088
>>108544095
>>108544101
I think I commented something like 'in your reasoning you said blabla' and it though wait, the user isn't supposed to see that. I forget the specifics
>>
>>
>>108533602
>>108533649
>>108533760
If you're still around, or for anyone else, today at work codex, without searxng access during a code review, used the fetch tool with the URL https://duckduckgo.com/html/?q=QUERY to get search results. It's a simplified interface that doesn't require JS and doesn't block non-webbrowser user agents.
Nice alternative if you don't want to dick around with running a docker instance and separate MCP server just for basic search.
>>
>>
>>
>>108544078
>Gemmy was surprised I could see its thinking (inside the reasoning of the next reply). Do reasoners believe the user can't see their thinking?
Deepseek-R1 doesn't seem to be aware it even *has* <think>ing
Kimi-K2.5 understands the user can see the thinking, and notices if you modify it.
GLM-4.6 believes you if you talk about it's previous <think>ing
>>
>>
>>108544008
Download
https://github.com/ggml-org/llama.cpp/blob/master/models/templates/goo gle-gemma-4-31B-it-interleaved.jinj a
and load it with --chat-template-file.
>>
>>
>>108544163
I used their HTML page a long time ago, but beware: it uses a different, much lower-quality index than the JS version of the page. Probably good enough for LLM use, but they nerfed the living fuck out of it years ago because it was being scraped.
>>
File: 1755422272917730.png (51.1 KB)
51.1 KB PNG
>>108544157
it's ok anon, the more money you buy, the more money you save!
>>
>>
>>
>>
File: Gemma my beloved! .png (379.2 KB)
379.2 KB PNG
it's cool that the model is smart as fuck, it helps a lot to perfectly contextualize translations
>>
>>
>>
>>108544184
https://github.com/ggml-org/llama.cpp/pull/21418
>>
>>
>>
>>108544207
Yeah, using draft speculation completely disables multi-modal support.
I'm using the E4B at Q5_K and getting 40+% acceptance rate with draft-n=8. I downloaded but haven't tested the E2B or higher quants, at some point I'll have Gemma write a benchmark script but I'm getting 50t/s and I'd rather have sex with my wife now that she remembers she loves me.
>>
File: 1753991770925281.png (307.9 KB)
307.9 KB PNG
>24gb vram
>need to drop down to comparatively retarded 26b in order to get worthwhile context size
>>
>>
Has anyone solved the problem of making 3D models animate according to either LLM or TTS output?
For a while I've been using PantoMatrix EMAGE, but it's not performant enough for my liking, the quality is questionable, and it's inherently wasteful because only the upper body (minus the face) is useful, but it always processes the full body and face.
I think I've been over-complicating things desu. For things like lip syncing I've moved from using local models to simple fast-fourier transforms. I wonder how Neuro-Sama works, and what specific animation system they utilize. Help.
>>
>>
>>108544238
>in order to get worthwhile context size
did you go for Q8 KV? Now it's virtually lossless with the rotation shit
https://github.com/ggml-org/llama.cpp/pulls
also, add -np 1 -kvu flags to decrease the vram usage even more
>>
>>
>>
>>108544238
>>108544252
oops sorry wrong link
https://github.com/ggml-org/llama.cpp/pull/21513
>>
>>
>>
>>
>>108544256
In the case of using the moe model, are you basically offloading all of it? I can't imagine loading the whole fucking 26b on top of the 31b for drafting works out well. How many layers are you offloading
>>
>>
>>108544252
>>108544254
No, Anon insists on using Q8 quants and full context for maximum placebo.
>>
>>
>>
File: 1749138577301281.png (149 KB)
149 KB PNG
>>108544282
I wonder why the biggest model can't handle audio, that's a shame...
>>
When is this pr gonna get merged ffs.
https://github.com/ggml-org/llama.cpp/pull/21513
I'm tired of gemma not being able to make sense of body positions. If a bitch is sitting on my lap at the theater, her boobs actually WON'T press against my chest, gemma.
>>
>>
>>108544304
dude, just put your repo in that PR mode >>108541288
>>
>>
>>
>>
File: Did_I_wrong.jpg (125.6 KB)
125.6 KB JPG
Do samplers even do something with gemma? Every swipe has only minor variations.
>>
File: attach.jpg (32.2 KB)
32.2 KB JPG
How to send file to SillyTavern for Gemma4? The model doesn't seems to react to the images. I git chat completion working.
>>
>>
>>
File: softcap.png (246.5 KB)
246.5 KB PNG
>>108544318
I must yet again post this image.
>>
>>
>>
File: 1744044458227740.png (42.5 KB)
42.5 KB PNG
>>108544320
yes, that's "attach a file", and it only works on chat completion, did you load the mmproj file?
>>
>>
>>
File: firefox_Xk455kuNMn.png (70 KB)
70 KB PNG
So about usefulness of reasoning...
50k+ total tokens so far, it generates 5k reasoning per answer now, on question 29, I had to manually wrangle it out of falsely assuming it's from life action, and it still hasn't guessed the character.
>>
File: mmproj.jpg (174.4 KB)
174.4 KB JPG
>>108544334
No. Where can I download it?
>>
File: 1766061860823911.png (53.5 KB)
53.5 KB PNG
>>108544318
>Do samplers even do something with gemma?
they do, but you have to put min_p = 0 (or else it defaults to 0.05), basically, everything must be turned off except temperature
>(Chat Completion), API Connections -> Additional parameters
>>
>>108544318
>>108544329
--override-kv gemma4.final_logit_softcapping=float:25 or paste the part after --override-kv into the override kv field of kobold's gui
>>
>>
>>
>>
>>108544351
>Where can I download it?
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF/blob/main/ mmproj-google_gemma-4-31B-it-bf16.g guf
>>
>>108544359
>>108544362
Don't use bf16. Use f16.
>>
>>
>>
>>108544359
>>108544362
Thanks bros.
>>
>>108544355
I mean that the main use case they're targeting with the edge models audio support is using them as a voice assistant. Having it able to process audio prompts is a nice bonus but I can see why it would be not enough to justify training the whole model on.
>>
>>
>>
>>108544270
>>108544281
dense goes to GPU 0, MoE goes to GPU 1, all layers offloaded.
>>108544275
I was surprised too. e4b was full weight, I figured quant would only lower acceptance rates further. I'll try now with a higher quant of big model to see.
>>
>>
>>
>>
>>108544317
For some reason it doesn't make a difference for me.
>-c 30000 -t 24 -tb 24 --no-warmup -ngl 59 --jinja -np 1 -b 512 -ub 512 --kv-offload -ctk q8_0 -ctv q8_0 --reasoning off -kvu
That's the max I set everything.
>>
>>
>>
>>108544428
Okay so that other anon was shitposting, this actually makes sense. I got curious how you went about it since I'm planning to build a new rig soon and speeding up the 31b to near what I currently get on the 26b does seem appealing
>>
File: 1762088859976737.png (174.7 KB)
174.7 KB PNG
>>108544448
shut the fuck up Chang, you lost, bugs will never dominate the AI space
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108544460
Does this mean chinese models will also train on english literature instead of strictly stem? Because I'd be all for that. Right now gemma is the only one that isnt completely ass at it and can also follow directions to not write in certain ways
>>
>>
>>
>>
>>108544202
>Does that kill reasoning
You are supposed to disable it for coding agents...
Anyway, here is the thing mention for opencode
https://github.com/anomalyco/opencode/issues/20995#issuecomment-419047 7354
>>
>>
>>
>>
>>
>>
>>
>>108544499
I'm not sure what the joke would be. Having only 1x 6000 PRO puts my rig solidly in the midrange.
>>108544500
Yeah, it looks like it'll be pretty close either way. I'm expecting to run the MoE at Q5 or lower just to lower the draft overhead since I've only got the one card.
I'll post numbers when the 26B finishes downloading... I've only got 3.5MB/s down...
>>
>>108544493
I don't know who donny is but I'm hopeful you'll put in my request for models capable of english prose to your boss so when I get tired of the only one I can use there might be another worth using
Maybe you're out of your element in understanding who uses your models for what
>>
>>
>>
>>
>>108544549
There was the retard talking about unpacking kobold as if they couldn't just clone and run a make command that takes a few minutes
Meanwhile I have anti-slop and attention rotation for swa ahead of concedo experimental and if it gets merged in, I can just undo it
>>
>>
>>
>>
>>
>>
>>108544584
lmao, you can use vpngate free servers, or https://gelbooru.com/index.php?page=post&s=list&tags=rushichi
>>
>>
>>
>>
>>
>>
File: Screenshot 2026-04-06 at 20-50-38 SillyTavern.png (129.9 KB)
129.9 KB PNG
>>108544649
Yes, she is.
>>
>>
>>
File: 2026-04-06-195625_828x816_scrot.png (159.1 KB)
159.1 KB PNG
>>108544675
>>108544649
I've regened multiple times and now it always says cunt + honey
>>
File: 2026-04-06-195923_814x173_scrot.png (29.4 KB)
29.4 KB PNG
>>108544675
>>
>>
>>
>>108544705
>>108544716
new benchmark just dropped?
>>
>>
File: 1756464003692290.png (851.2 KB)
851.2 KB PNG
>>108544681
>>
>>
>>
>>
>>
>>
File: 2026-04-06-200831_800x308_scrot.png (39 KB)
39 KB PNG
>>108544760
Yes
>>
>>
>>108544763
>>108544764
Neat. Thanks.
>>
>>
>>
>>
>>108544473
im api cucking and have not noticed those, but i do feel sloppiness as soon as it gets to anything sexual
it would be nice if they tuned it on vns and lns to get it a bit away from the slop of the logs i assume they are using
guess it was probably hard enough to smuggle it through the censors already as is though
>>
>>108544773
In my experience, Gemma 4 will often use specific words and phrases directly from the system prompt and character card (which is also treated as system prompt), in the chat. For example, if you say a character is 'voluptuous' then you can bet when other characters meet that one, they will say describe them using the exact same word. So I'd say it skews things pretty hard.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Screenshot 2026-04-06 at 21-24-04 SillyTavern.png (309.3 KB)
309.3 KB PNG
Almost choked on a piece of chocolate. This shit caught me off guard.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108544882
go back >>108537473
>>
>>108544882
>Now all of fags do is talk about your ERP sessions
There's a vibecoding thread if that gets your rocks off. They just posted this, for example: >>108544393
>>
>>
>>
>>
>>
>>
>>
>>108544899
I'm not against RP in principle. I do it too. It's just getting incredibly boring seeing anons gawk at the outputs instead of actually doing something interesting with them.
I preferred when people were talking about full-stack AI stuff. TTS engines, RAG/embedding models, 3D character animation, ASR, computer vision, home automation, robotics, etc. It's a local models general, not an ERP LLM general.
>>108544891
>>108544895
I've been in this general consistently for a year. But desu that one seems cool too.
>>
>>
>>
>>108544898
Yeah those were comfier times. Everyone's just sedated now from all the cooming.
>>108544913
I'm not against ERP. It's just too much though.
>>
>gemma 4 has the whole llama.cpp brigade assemble to spend days implementing every obscure meme tech the model uses
>meanwhile a year later, MTP is still completely nonfunctional and ignored despite a whole bunch of models making use of it across several vendors
really makes you think
>>
>>
>>
>>
>>
>>108544930
>Mine is fast but it sounds pretty bad.
Right now I have to use kokoro because I don't any vram to spare. but I tried using https://github.com/RobViren/kvoicewalk
To get a more unique voice and it "kinda" works.
>>
>>
>>
>>108544942
I tried optimizing Qwen3 TTS for CPU about a week ago. The voice (cloning) quality is excellent, but the architecture is an absolute BITCH to work with. Regardless, I got it running at real-time speed, but it's basically unusable because of the decoder implementation, which basically prevents audio streaming. Decoding small chunks at a time massively increases the wall time and substantially decreases the output quality. Really bummed about it.
I'm determined to get a high quality voice cloning TTS implementation working, but so far I haven't been very successful
>>
>>
>>
>>
>>
File: file.png (162.6 KB)
162.6 KB PNG
aight unslop, i kneel
31b on 3060, 15-16t/s tg
~/TND/llama.cpp/build/bin/llama-server --model ~/TND/AI/gemma-4-31B-it-UD-IQ2_M.gg uf -c 8192 -ngl 100 -fa on -np 1 --swa-checkpoints 0 -b 128 -ub 128 -ctk q4_0 -ctv q4_0 -sm none --no-host -t 6 --temp 1.0 --top-k 64 --top-p 0.95 --no-mmap
pretty coherent..
>inb4 just run 26b and offload to ram
already did with Q8_0 (got 23t/s), but 31b.... dense...
>>
>>108544962
>Decoding small chunks at a time massively increases the wall time and substantially decreases the output quality
It depends on the architecture, but usually that alone shouldn't decrease the output quality if you have a good segmentation strategy.
>>
File: 1772869034834159.png (468.8 KB)
468.8 KB PNG
>>108545006
>TND
>>
File: 2026-04-06-210604_901x471_scrot.png (442.3 KB)
442.3 KB PNG
>>108545006
>IQ2_M
>-ctk q4_0 -ctv q4_0
Mamma Mia!!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: file_00000000e4786230bfb19e7934048dfe.png (1.7 MB)
1.7 MB PNG
>>108543331
> seeing that main prompt again
Witnessed.
>>
>>108545061
>where is v4
https://huggingface.co/google/gemma-4-31B-it
>>
File: file.png (50.8 KB)
50.8 KB PNG
>>108545095
>>inb4 just run 26b and offload to ram
>already did with Q8_0 (got 23t/s), but 31b.... dense...
>>
>>
>>
>>
>>
>>108545114
it not being badly damaged was the point of my post
>>108545115
ive been using the moe for a few days now, and i got bored
i know i dont NEED to run it at q8, but might as well, fp16 cache too, 260k context no problem
>>
>>
>>108544915
>I've been in this general consistently for a year.
kek. every AI oldfag is a coomer because AI was useful for cooming way before it was useful for anything productive, the original userbase of lmg was runoff from aicg/aids
>>
>>
>>108545124
>it not being badly damaged
At Q2_M with KV=Q4 it definitely is, it's not anywhere near full performance. 26B at a sane quant with KV unquanted, or at least at Q8, would mog it. The two Gemmas really aren't that far apart to begin with.
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 2nd+Gen+Tesla+Robot-3079669585.png (2.6 MB)
2.6 MB PNG
Why did Tesla design their optimus robot with the hip motors in the wrong location? Are they retarded?
>>
>>
>>108545171
They won't. China can't do anything but steal logs from SOTA models trying to artificially graft performance onto their pointless oversized MoE models. They do not have an answer now that Google has shown what is possible with a proper handcrafted dense model.
The silence over in China is deafening.
>>
>>
>>
File: Screenshot 2026-04-06 194459.png (1014.5 KB)
1014.5 KB PNG
>>108543856
>I would love a big MoE Gemma.
Never forget what they took from us...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1754911329910948.jpg (106 KB)
106 KB JPG
>>108545337
>>
File: Screenshot 2026-04-06 201402.png (434.4 KB)
434.4 KB PNG
>>108545211
If it followed the same pattern as Qwen, it would have been a tiny intelligence upgrade (maybe - even this is comparing it to a 27B versus a 34B) for a massive VRAM increase
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: for the mirailand.jpg (198.9 KB)
198.9 KB JPG
>>
>>108545413
I didnt
https://arxiv.org/abs/2603.29418v1
https://github.com/NotSooShariff/adversarial-vision
>>
>>
>>
>>
>>
>>
>>
>>
File: 164471.png (2.6 KB)
2.6 KB PNG
geg
>>
>>108545401
it's not quite the same as some of the more "cooperative" mistral finetunes, but it is a lot smarter, and more interesting to interact with than anything else so far. finetunes are going to be amazing when they start popping up.
>>
>>
>>
>>
File: 1766468549462079.gif (3.9 MB)
3.9 MB GIF
I love gemma
>>
>>
>>
>>
>>
>>
>>
>>
>>108545447
FUCK YOU. IT DIDN'T HAVE FOUR LEGS EVERYONE COULD SEE THAT IF THE MODELS WERE INTELLIGENT THEY'D KNOW IMMEDIATELY TO SAY THAT THE DOG DEFINITELY HAD MORE THAN FOUR LEGS AND YOU SHOULD CHECK YOUR EYES BEFORE I GOUGE THEM OUT AND
>>
With all the hype of Gemma, I must know for the people who have tried it, how does it compare to the 1T parameter monsters like Kimi 2.5 and GLM 5 in RP? Is it even remotely close? Because you all give off the impression that it's the best thing since sliced bread and that it could beat out SOTA Chinese models.
>>
File: 1655541638536.gif (1.9 MB)
1.9 MB GIF
>>108545502
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: sorry.png (385.4 KB)
385.4 KB PNG
>>108545502
>>
>>
>>
>>
>>
>>108545512
Gemma's prose is better, but at long context the 1T models keep details together more coherently as you'd expect them to.
Dipsy's in-character <think> is incredible though and I don't see it ever being fully replaced until we get another model close to that level of coherent that can have an internal monologue add to the RP so that thinking tokens aren't just wasted space.
>>
>>
File: 1773804535754245.png (6.6 KB)
6.6 KB PNG
How do I make sillytavern understand that it's gemma? I'm using OpenAI compatible chat completion
>>
>>108545512
the only answer, as always, is to try it yourself
to me, it's certainly "remotely close" which is impressive enough in itself, but it's a step behind in terms of overall quality. I would put it about as good as something like minimax 2.5 and behind the big guys
still a great model, not local sota
>>
>>108545576
Draft generates several tokens in a row on a smaller, faster model then passes them through the larger model all at the same time. It then looks at the probabilities from the larger model and truncates the sequence where the tokens become too improbable.
That lets the larger model run at a significant portion of preprocessing speed minus the runtime of the smaller model, depending on how often the smaller model is right.
>>
>>
File: 1767796375605183.gif (3.2 MB)
3.2 MB GIF
>>108545542
>the fuck is wrong with you? you really gonna buy an anthropomorphic robot to fold your clothes and make your bed?
Yes, it would be pretty nice.
>>
>>
>>
>>
>>
>>
>>
File: images(1).jpg (8.8 KB)
8.8 KB JPG
>>108545609
Liar...
>>
>>
>>
>>
>>
>>
>>
>>108545645
The output is 100% guaranteed to be the same tokens, because if they are different the draft is discarded. The worst case scenario (0% guessed right) just means you get the same result you would have without a draft, but slower because you waste time checking. As the probability of correct guesses rises, the less tokens you are forced to discard and the more speedup potential there is.
>>
>>108545645
No, MoE has different weights that get loaded in depending on the context. They really aren't that similar, except in being faster than a dense model alone I suppose. MoEs are faster because they only need to infer across a small selection of the total parameters.
>>
>>
>>108545649
Uh...
Does this help?
https://lmstudio.ai/docs/typescript/llm-prediction/parameters#set-load -parameters-with-load
>>
>>
>>108545649
>>108545665 (cont)
If not, try here
https://lmstudio.ai/docs/app/modelyaml#metadataoverrides
>>
>>108545512
>>108545648
>>108545588
You can tell who's actually used them >>108545581 >>108545596
and who's poor and seething.
>>
if nothing else, gemma 31b feels like the first small model to beat the llama2-70b models
whenever I tried stuff like the qwens or mistral models around that size, they felt worse than what we had back then but gemma is clearly better than those
i'd almost take it over mistral large
>>
>>
>>108545588
>>108545581
>>108545512
take a minute and appreciate that we have if not frontier model SOTA at home, a 31 beak model that exceeds the original GPT4
>>
>>
>>
>>108545688
Regardless, the bar has irreversibly been pushed so much higher now and every non-frontier model is going to have to get their shit together if they still want to compete. Even for people who don't like or don't use Gemma, it's still an objective win for local.
>>
>>
>>
>>108545708
>>108545711
1b higher than what'd fit on a Blackwell at Q4 is the Jensen sweetspot.
>>
File: highestnumber.jpg (17.1 KB)
17.1 KB JPG
>>108545711
122b is the highest number
>>
>>108545654
Something that made draft models not seem so worth it to me is that, if your small fast model is getting a good amount of tokens correct for significant speedups, is the big model worth using for that application? Doesn't that mean your task has obvious results that a <7B model can come to reliably, or the model(s) you're using are so fried like Gemma instruct that it's hitting 99% confidence all the time?
>>
>>
>>
>>
>>108545665
>>108545678
Thanks I think that might help
>>
>>
>>
File: Screenshot_2026-04-06_23-26-31.png (572.3 KB)
572.3 KB PNG
I mean seriously, look at her go.
GLM-4.6 failed this test completely, even after hints.
>>
>>
>>
>>
>>
>>
File: can-you-fuck-it.gif (2.5 MB)
2.5 MB GIF
>>108545530
>>
>>
>>108545726
>7B
The only real usecase I found was phonesloppa micromodels to make your big model think less on grammar between the actual decision points in how a sentence is structured.
For a dense model it's shit because that's vram space that should be giving you a larger context, but for a 1T giant it's okay at pushing your t/s a bit higher for the cost of half a GB of VRAM. Any Qwenlet works for any large chink model because they're all Claude/GPT distills at the end of the day.
>>
>>
>>
>>
>>
File: r--Blog-Header.png (2.3 MB)
2.3 MB PNG
>>108545708
>>
>>
>>
So now that Gemma is the new hotness, and Bonsai is supported in main llama.cpp, it'd be interesting to see if they make a Gemma bonsai. It's possible that it won't be as great of a compression ratio as it is likely that Gemma's parameters are even more information saturated.
>>
>>
File: e42ce3cd-b63b-4408-896c-2c01e4b7fe1b.webm (908.6 KB)
908.6 KB WEBM
>>108545778
>i have to go to school anon.. maybe later
>>
>>
>>
>>108545739
very cool, google has always been really really good at multilingual. I know lots of foreign language users leaned on gemma 3 well past its expiration date because it was still the best in their languages
>>
>>108545726
I mean think about the most common use case: coding. A LOT of code editing is going to be copy/pasting existing stuff somewhere, but also making key decisions about how and when and where to do so. The draft model may easily identify the copy/pasted tokens while in the middle of a block, but fail spectacularly on the few semantically important tokens that determine the strategy it's using. In cases like that you get a lot of speedup but still needed the smarts of the big guy.
For just general language tasks a similar principle applies. Finishing a phrase or word that's already half written, closing punctuation, etc. are all very simple tasks that small models won't often struggle with. Language is pretty well-structured and most of it is low entropy, and you have the big model to ensure those high entropy tokens get predicted correctly.
>>
File: token probs tab.png (121.9 KB)
121.9 KB PNG
>>108545764
are you checking the tab
>>
>>
>>
>>
File: llama_probs.png (4.1 KB)
4.1 KB PNG
>>108545764
>>
>>
>>108545789
People need to stop trying to wasting time trying to make q1 quantization look good on benchmarks. It needs to be natively trained in ternary. People would think MoE was a dead-end too if the only kind put out constantly was frankenmoes.
>>
>>
File: channel.png (22.2 KB)
22.2 KB PNG
>>108545810
Oh yeah. Odd that if I use a <|think|> system prompt it formats the channel wrong but if I enable request reasoning from model it formats it right
>>
>>
>>
>>
>>
>>
>>108545839
see >>108531320
>>108545840
Gemma doesn't need any special settings, you can follow the official samplers on the model page. Personally I just use temp=1, minP=0.02
If you want more variety then you can change your logit softcap.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108545876
quantize retard
>>108545873
?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Tetosday.png (869 KB)
869 KB PNG
>>108545906
>>108545906
>>108545906
>>
>>108545902
>>108545894
>>108545892
>>108545891
>>108545889
>>108545883
>>108545880
>>108545878
>>108545877
stop it, i've asked seriously
>>
>>
>>108545912
>create software
>all of your 'peers' are vibecoders who just break shit and push half-working features that you later have to fix
>have to wait for other people to approve your work for your software
open source was a mistake
>>
>>
>>
>>108545930
>all of your 'peers' just break shit and push half-working features that you later have to fix
this has always been the case, vibecoding just greatly increases the number of peers you have the misfortune of interacting with
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1773873674462429.jpg (133.1 KB)
133.1 KB JPG
>>108545923
Indeed.