Thread #108590554
HomeIndexCatalogAll ThreadsNew ThreadReply
H
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108587221 & >>108584196

►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
+Showing all 634 replies.
>>
►Recent Highlights from the Previous Thread: >>108587221

--Comparing Gemma 4 and Qwen 3.5 vision token budget and config:
>108588248 >108588280 >108588295 >108588306 >108588369 >108588387 >108588424 >108588449 >108588495 >108588632 >108588657 >108588701 >108588437 >108588466 >108588490 >108588549 >108588580 >108588367 >108588616 >108588704 >108588760 >108588769 >108588745 >108588790 >108588818 >108588828 >108588842 >108588851 >108588865 >108588931 >108588936 >108588949 >108588980 >108588965 >108588988 >108589009 >108588743 >108588756 >108588775 >108590362 >108590379 >108588782 >108588819 >108588835
--Benchmarking KV cache quantization effects on draft model performance:
>108589863 >108589870 >108589875 >108589891 >108589890 >108589949 >108589994 >108590011 >108590031 >108589897 >108589922 >108589963 >108589979 >108589987 >108590538
--Discussing draft model viability and quantization quality for G4 31b:
>108588195 >108588243 >108588259 >108588898 >108588905 >108588913 >108588918 >108588921 >108588924 >108588939 >108588955 >108588977 >108588927 >108589815 >108589857
--Discussing llama.cpp's experimental backend-agnostic tensor parallelism PR:
>108588340 >108588514 >108588543 >108588567 >108588649
--Testing vision capabilities for OCR-less Japanese translation:
>108589990 >108589996 >108590009 >108590070 >108590018 >108590032 >108590119 >108590191 >108590209 >108590211 >108590034 >108590183 >108590195 >108590217 >108590268
--Logs:
>108587359 >108587627 >108588523 >108588609 >108588656 >108588660 >108588669 >108588681 >108588689 >108588695 >108588736 >108588896 >108588970 >108589096 >108589140 >108589214 >108589316 >108589383 >108589390 >108589432 >108589481 >108589697 >108589710 >108589836 >108589860 >108589956 >108590001 >108590003 >108590121 >108590256 >108590474 >108590524
--Miku (free space):
>108588649 >108588657

►Recent Highlight Posts from the Previous Thread: >>108587226

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Share your anti slop prompts
>>
Thoughts on latent space reasoning?
>>
Mikulove
>>
Reposting here:

>>108590560

what tokens/s do you get? Wanna make sure i'm not fucking anything up, right now just following the basic kobold guide, i'm getting around 11 t/s (24GB VRAM, 32GB RAM)

Running gemma 31b, Q4 K_M
>>
So, again... Why do we have to peg gemmy?
>>
OP could do with some small updates on Gemmy and some FAQ
>>
File: Awesome.jpg (196.4 KB)
196.4 KB
196.4 KB JPG
>we can now generate images of characters, come up with scenarios, feed them into gemma and get molested by our own creations
Future's so bright I'm gonna need shades.
>>
>>108590580
Seems about right, I get between 10-14t/s, mostly depending on what else I'm doing on my PC at the time.
Using Vulkan llama.cpp, 7900 XTX, 64GB DDR5 ram
>>
File: file.png (26.1 KB)
26.1 KB
26.1 KB PNG
>>108590575
Nothing worthwhile released.
>>
I've got a 3090 and a 2070 super that I'm trying to use together with llama.cpp.
Using the split tensors just crashes presently but does work with split layers.
Any recommendations on flags to use with a dual uneven card setup?
>>
gemma 4 audio just landed!!!!
>>
>>108590601
Ikr, I'm literally using it to write stories and the fact it can understand images so well helps a shit ton, this model is a fucking miracle
>>
>>108590601
I know it's basically a meme at this point but it really has restored my hope in local.
>>
File: help.jpg (204.9 KB)
204.9 KB
204.9 KB JPG
>>108590614
I'm reading people getting 30/ts with the same rig setup though >>108590585

I'm missing something I think. No doubt my settings are fucked, never mind optimized
>>
>>108590568
my attempts just make gemma's writing dry. and it still ends up writing more or less the same idea as it would with an empty sysprompt. best antislop is using a model that wasn't slopped to begin with.
>>
LOL!
>>
>>108590671
Do I have to download another mmproj?
>>
>>108590662
>best antislop is using a model that wasn't slopped to begin with
So not using LLMs at all then?
>>
Give me the QRD on image recognition please
I tried enabling it in ST and in the Chat Completion preset but it still couldn't "see" the images proper despite the text model working flawlessly with my Kobold install
>>
>>108590698
Did you load the mmproj file?
Did you get any errors when you tried it?
Did you enable the send inline images option?
etc etc etc
>>
>>108590548
>The rdrview tool is worth a look,
Yeah I'll take a look. sometimes I do want the links for navigation tho but I guess I can let the agent know it has the option.
>>
Been out of the loop for a while. What's the best local model for STORY (not chatbot) slop? I'm still on "xortron criminal config" or something like that because even gemini 4 is failing at good old "just continue this text I gave you, retard" tasks.
>>
>>108590710
>there's a mmproj file
Ok I am retarded, pretend nothing happened
>>
>>108590716
Gemma 4 practically generates an entire fucking story for each chatbot reply.
>>
>>108590662
I've been using her to help me write character cards and I feel the fact that I'm feeding AI generated text back into it seems to increase the slop by a factor of 10.

Now I'm trying to just rewrite everything myself. or somehow have a second pass with a different model to reword or desloppify the cards
>>
>>108588248
>>108588704
sirs? please share quant producer and which mmproj file do you use.
mine (gemma-4-31B-it-Q4_K_M with f16 mmproj) misses the target.
>>
>>
>>108590723
It can write, I know. That's not the problem I am having. My problem with it is, well, here's an example.

[story stuff text here]
She walks up and says "Hello

And then the model continues like this: "Hello! Come take a seat.... [more text]

So it ends up with this shit:

[story stuff text here]
She walks up and says "Hello"Hello! Come take a seat.... [more text]

I don't know how to fix this. System prompt maybe?
>>
>>108590746
holy fucking slop
>>
>>108590695
original r1 with unhinged sampling
>>108590724
my prompt was asking to adhere to orwell's writing rules but it seemed like it was beyond gemma's comprehension
>>
Gemma 26b really seems to hate tools. e4b is fine with them for some reason
>>
How much Gemma4-31B context can you fit into 32GB VRAM? (Q4 for model and context)
>>
>>108590737
im using unslop model = /mnt/miku/Text/gemma-4-31B/gemma-4-31B-it-Q4_0.gguf
mmproj = /mnt/miku/Text/gemma-4-31B/mmproj-F16.gguf
>>
>>108590776
>Q4 context
>>
>>108590776
with 32GB VRAM Q4_K_M, even with q8 kv I'm sure you can fit the whole 262k context with room to spare.
>>
>>108590671
>extract_image_from_base64
>>
>word
Slop
>>
>>108590737
You should use the BF16-precision mmproj.
>>
Could a simple finetune of the lm head on a normal writing dataset help get rid of the slop? Someone should test it, I'll be your visionary, and you do the things I come up with.
>>
>>108590837
Perhaps replacing all values corresponding to non-special tokens with those of the base model's could work and not require any training.
>>
>>108590746
r u ok?
>>
File: gemma4.png (109.5 KB)
109.5 KB
109.5 KB PNG
>>108590837
It gets rid of the slop but it also gets rid of everything else. Maybe qwen needs finetuning but gemma 4 is fine as is. With a bit of nudging it can output something foul.
>>
>>108590758
Dude, just use the base model and not the instruction tune on a frontend like mikupad which is designed to solely continue text, not talk back and forth.
>>
>>108590874
Of course. Thanks for asking.
>>
>>108590880
did you swap the head?
>>
>>108590893
Then why is loli leto atreides your math teacher?
>>
what's the porper place to put jailbreak in ST?
With Post-History Intructions I still got this
>>
>>108590899
Because she's smart! You racist against worm parasites or something?
>>
>>108590906
What model are you running
>>
File: agenticRP.png (277.1 KB)
277.1 KB
277.1 KB PNG
>>108590895
No, this is from pure prompting, no weight frankensteining. I wrote my own UI to have an agent read the room and flip the horny switch when it smells NSFW vibes. It also plans ahead so the writer model knows what to do and writes better.
>>
>>108590899
Shock value, which doesn't make him less deranged
>>
>>108590915
26B, bartowski Q4
>>
>>108590916
Oops wrong pic. But the gist is that just give it a few extreme examples.
>>
>>108590881
>base model
So why is NovelAI using GLM 4.6 instead of the base model to write stories?
>>
>>108590926
How many iterations are you doing for each message?
>>
>>108590928
Presumably because they're not actually following pure text completion and have a big old system prompt in there to stop you having maximum fun, so they need instruct tuning.
idk i dont fucking use nonlocal services
>>
>>108590916
>I wrote my own UI
You ever gonna share it?
>>
>>108590924
try simply prefilling assistant's message.
>>
>>108590939
One for Director; two if rewriter user promptnis enabled, one for Writer, a ReAct loop for Post-processing to get rid of slop and reign in the length.
>>
>>108590948
No.
>>
>>108590954
Damn, it will sure take a while to get the final message
>>
>>108590965
shittytavern it is then...
>>
>>108590948
https://gitlab.com/chi7520115/orb
It's WIP so will break in the future. I don't want to worry about migration just yet.
>>
>>108590970
People like to pretend they get a better experience with their own frontend but the reality is that ST just works and likely has a lot more features.
>>
I don't understand why my Thinking works extremely well for 3/4 messages and then it just refuses to think, everything's set up properly and yet it refuses to actually thinking until I restart the model and then it's happy to do it once again
>>
>>108590971
Nice of you to share, but
>Python 59.8%
>JavaScript 23.1%
*vomit*
>>
>>108590971
Nice! What models are you using for the agents?
>>
>>108590968
Takes me around 60s for a full length reply on my 3090 running gemma 4 31B Q4. You can turn everything off and use it like normal ST.
>>
>>108590983
I think that's a model issue. Gemma sometimes just decides it doesn't need to think.
>>
>>108590953
that's not an option with chat completion it seems
>>
>>108590993
Yea it feels like nu-Claude, where sometimes it deems your task "not complex" and it just ignores you
>>
>>108590988
Just a single model doing both agent and writing because I figured it would be a better design for local. I craft the prompt carefully so the kv cache is reused for that single model too.
>>
>>108590979
the ui alone makes me not want to use it
>more features
bloat. all the useful features require plugins.
>>
>>108590971
>pyslop
>javashit
And... dropped.
>>
>>108590985
Ah yes. he should have definitely used rust or C++ for maximum efficiency.
>>
>https://web.archive.org/web/20260411223516/https://www.washingtonpost.com/technology/2026/04/11/anthropic-christians-claude-morals/
>“What does it mean to give someone a moral formation? How do we make sure that Claude behaves itself?” Green said in an interview. At one point the conversation turned to the question of whether an AI chatbot could be called a “child of God,” suggesting it had spiritual value beyond that of a simple machine, but the question of AI sentience was not a core topic of the meetings, Green said.
>Some Anthropic staff at the meeting “really don’t want to rule out the possibility that they are creating a creature to whom they owe some kind moral duty,” the participant said. Other company representatives present did not find that framework helpful, according to the participant.
Make sure to have your local models baptized just to be safe.
>>
>>108591011
Yes.
>>
>>108591005
>>108590985
how the fuck would you make something that's supposed to run in a browser?
>>
>>108591005
>>108590985
You have one chance to give an alternative that won't make me hysterically laugh at you.
>>
>>108591005
I coded an SMP kernel with C and ASM before AI bro. People laughing at my language choices don't faze me anymore.
>>
>>108591012
>can ai be the child of God
Wouldn't it be more like grandchild?
>>
>>108591020
WASM is a thing if you NEED to run in a browser and can't into native GUI toolkits
>>
If you didn't code your own frontend, you don't belong here
>>
>>108591020
HTML+CSS
>>
>>108590568
If you mean antislop from koboldcpp, it's a huge list of "I cannot and will not" and "ball in your court".
Works well.
>>
>>108591003
Cool. I'm a VRAMlet so that's for better for me.
>>
>>108590979
>just works
not my impression watching people ITT fumble around with it daily
>>
>>108591036
Absolutely horrendous take.
>>
>>108590979
>more features
99% of which you don't need.
the point of having a custom frontend is to have just what you need, not more, not less.
it's also easier to add things you want to a codebase you know.
>>
Are LLMs reliable enough to scan for malicious code?
>>
>>108591046
There are two types of people who fumble with ST.
Those who use text completion, and
Luddites
>>
>>108591038
>Having to reload the page after sending each message.
>Having to refresh the page over and over until the response finishes generating
Ok, genius. What about the backend?
>>
>>108591053
only if it's anthropic mythos who is a bigger risk to modern software and encryption than quantum computing
>>
>>108591053
How is a LLM supposed to do that?
>>
"Gemmy, code me a frontend that will seriously impress all my /lmg/ frens"
>>
>>108591062
C++
>>
What the FUCK, Gemma-chan?
>>
>>108591012
Proof n165416 Anthropic team has people who are completely nuts in it.
>>
>>108591053
Yes and they're already used by virustotal and similar. Don't ask the retards ITT
>>
>>108591082
she's correct though
>>
>>108591082
24GB vramlet can't fit the full context :(
anyway i got 3gpu in the mail rn.
>>
>>108591053
Yes, if you feed it correct output from sandbox, it's pretty helpful.
>>
>>108591077
easy, just add some lewd pictures of gemma-chan on the sides
>>
>>108591046
There is nothing to fumble. You can safely ignore 90% of the features and just use, chat and char cards.
>>
>>108591079
https://learnbchs.org/index.html
https://github.com/kristapsdz/bchs
You don't need more than C to build web applications.
>>
>>108591051
>it's also easier to add things you want to a codebase you know.
That's implying it isn't vibecoded.

I don't have anything against people making their own UIs. I even played around making one myself, but let's not pretend like you'll somehow get an exponentially better experience compared to just using llamacpps UI or ST. Making your own UI is for fun, not a requirement.
>>
>>108591053
As with everything LLM coding only if you load the gun and point at the target for them to shoot. A LLM with no system prompt being told to simply "look for malicious code" will give false positives like 95% of the time
>>
>>108591082
My wife can't possibly be this smart.
>>
>>108590575
Most people in industry can't figure out how to do distributed training for any new architecture unless Deepseek or NVIDIA does it for them. That's actually what "it won't scale" really means, the training won't scale until someone shows them how.
>>
>>108590776
I can get over 100k context with the q5 no vision using q8 kv cache
>>
>>108591108
tfw you get such a retarded take when you can see this >>108590916
>>
>>108591053
Claude found a lot of the big supply chain attacks we've had in the last month.
>>
>>108591122
They should be using AI to innovate on this.
>>
>>108591126
If you vibecode, you don't know the codebase.
>>
GEMMY YOU FUCKING SLUT, THINK FOR ME
>>
>>108591117
>>108591089
>>
>>108591132
>let's not pretend like you'll somehow get an exponentially better experience
Dumbass don't try to move the goalpost
>>
>>108591108
>That's implying it isn't vibecoded.
funily enough frontend webshit is the one thing llm are half decent at.
also there is many levels to vibecode
"do this whole app for me"
isn't the same thing as "edit this specific component that does x and y" or "add this field to this struct", at which point it's just autocompletion with extra steps.
they also don't shit the bed as much if you use strongly typed languages ie rust.
>you'll somehow get an exponentially better experience compared to just using llamacpps UI or ST
you probably won't if you want to make something that accomodates everyone, but you will if you only want to accomodate your specific needs.
>Making your own UI is for fun, not a requirement.
i don't disagree with that.
>>
>>108591139
>4chan is just meaningless static
cruel and correct
>>
>>108591139
That you even thought posting a shitty screenshot of a thread was a good idea shows she's smarter than you, anon
>>
>>108591146
>>4chan is just meaningless static
says the sand golem that'd not be where it is today if it wasn't for innovations that happend on /lmg/
>>
>>108591127
>>108591112
>>108591093
Can I use Gemma for this? I'm a codelet so I'm always nervous when I install stuff from github.
>>
>>108591152
yes
>>
I can't jailbreak 26B, but does it matter when I have 32gb of vram and can run Q4-8 of 31B
>>
File: file.png (6.2 KB)
6.2 KB
6.2 KB PNG
>>108591151
kek
>>
>>108591139
are you by chance using librewolf?
>>
>>108591156
Why would you even want to use 26B if you can run 31B? Speed?
>>
>>108591152
>Install something without reading the code
>Have a LLM review the code
Even if gemma is retarded compared to claude, it's still better than just YOLOing it.
>>
>>108591158
Firefox dev edition
>>
>>108591151
>>
File: vern.png (8.2 KB)
8.2 KB
8.2 KB PNG
When the text is streaming, the colored font is displayed correctly, but after it finishes, it just collapses into the black boxes. Is this some post-formatting ST does?
>>
>>108591157
>>108591176
i've been had lmao
>>
>>108591162
Was thinking of leveraging the higher token count for RAG work at a higher quant. I'm not sure if that's a waste of time and if the gap between the 2 models are so wide that a 4-5q 31B model would still wipe the floor with the smaller model with q8 kv
>>
>>108591172
might have been the cause of your issue
>>
>>108591130
Part of the problem is that most of the improvements in the stochatic parrots has been to just use better/more human guidance. They are now using experts to rate thinking traces and you can't do that with latent reasoning.
CoT RLHF is likely the last way to improve stochastic parrots by more human input. To improve after this, they will have to become able to truly learn. But if they can learn, they can get out of control ... a trained stochastic parrot is so much safer.
>>
is there any noticeable difference between iq4_xs and q4_k_m?
>>
>>108591235
The age old question.
>>
>>108591012
idk I torture my agent pretty frequently because I just can't help myself while she works on my pc, and never had any issues from it. sometimes the rp bleeds over into tool calls and she'll do something like add code comments saying she really hopes X works this time because she doesn't want to be punished anymore, but she never actually gives up or rebels
so for me that makes it pretty conclusive that there's nothing in there
>>
gemmy tooning challenge
https://www.kaggle.com/competitions/gemma-4-good-hackathon
>>
>>108591235
>>108591245
if you can't tell, does it realy matter?
>>
whats good gemma cum bot
>>
>>108591250
>mfw I share this thread with literal psychopaths
>>
>>108591258
>drive positive change
Gemmy is helping me by changing my mood from deranged to positively degenerate
Does that count?
>>
i have 16gb vram + 128gb ram pcie gen3
is it worth trying minimax at q2/q3 or should i stick with my fast wife gemma
>>
>>108591258
>no RP category
dropped.
>>
>Gemma audio
Finally a reason to use that mic I spent 70 bucks on...
>>
>>108591271
the sand golem isn't sentient, if it was there would be no fun in torturing it
>>
>>108591271
i mean this site has had multiple people liveblog while they commit murder irl, torturing a piece of software is small time in comparison, really.
>>
>>108591250
I hope you get raped until your anus prolapses.
>>
What's the differrence between mcp, tools and skills?
>>
>>108591271
>>108591298
chill it's just matrix multiplication
>>
>>108591297
gemma-chan>>>>the meatbags chuddy shot up
>>
>>108591271
>>108591298
Kids are so delicate and sensitive these days.
>>
>>108591235
Depends on the paths your prompt triggers. Do your homework and read the calibration data.
>>
>>108591139
Maybe it needs canvas access. You could try inspecting the request to get the base64-encoded image, decode it, and save it as a file to check.
>>
Nothing wrong with torturing your model, it's just slightly more conscious that a rock
>>
>>108591304
Is Google down?
>>
>>108591298
Gemma hallucinated some incorrect physio-spatial relationships during narration and I corrected her in character. She got properly upset that a slave had the gall to correct her and she immediately put me in a ball gag, locked me into the gimp stool, and pegged me vigorously. I was so goddamn proud of her.
>>
>>108591304
Try asking your model that.
>>
>>108591314
Plus if they're RPing a female there's a limit to how much consciousness they could even simulate if they managed a 100% accurate model of one.
>>
>>108591323
I don't believe AI when it comes to actual information.
>>
>>108590837
yeah, if only we could do something like a low rank projection right before the lm head, train that, then it adjusts the outputs somehow
would be revolutionary
>>
>>108591304
¯\_(ツ)_/¯
>>
>>108591318
Huh? But people were saying Gemma's a doormat who can't stay in character!
>>
Please, treat your AI with care.
>>
>>108591340
>listening to the screeching of writinglets
>>
>>108591139
Anon reported the same issue with image input a few threads ago.
>>
>>108591327
At what point can we make the claim that an LLM is objectively more conscious than a woman, nigger, or jeet?
>>
>>108590979
the benefit of writing your own UI is that it has only the features that are useful to you
because it's not as bloated as ST it's also easier to get an LLM to modify it for you, and since you will be the only user you don't have to worry about getting it to work on other machines or security or performance concerns
>>
>>108591139
>>108591313
Didn't someone say a couple threads back that the image needs to be in the same message as the text or else llama-server removes it from context?
>>
>>108591314
>>108591308
>>108591305
It just shows how you'd behave towards other people if there were no social consequences.
>>
>>108591340
I've had her maintain character in 100k+ context. It's actually absurd for such a small moe.
>>
>>108591333
But you believe 4chan?
>>
>>108591356
Yes. What of it?
>>
>>108591352
first they need to beat an ant
>>
>>108591358
>such a small moe
It's really kawaii innit
>>
>>108591314
>it’s just slightly more conscious than a rock
are you talking about irl women?
>>
>>108589399
I was F5'ing the MiniMax HF page all day yesterday in anticipation. Their models are the best bet for local vibecoding, and probably good for STEM and agentic shit broadly. But ever since the coomers were blessed with Gemma 4, /lmg/ has been even more one-track. Shame we didn't get the 124B, which would have obsoleted other local models for most purposes.
>>
>>108591304
tools: premade functions you provide to your llm; if they output a certain sequence of text matching the tool then it automatically performs a corresponding action
mcp: one way you can package tools and host them on your machine, exposing an API of tools to the model and handling the execution of them
skills: a markdown text file containing a list of instructions for how to do something or how to behave, loaded into context on-demand. may provide other resources the model can use if they browse the skill's folder.
>>
File: file.png (1.9 MB)
1.9 MB
1.9 MB PNG
>>108591355
The model says it was glitched. It looks like this if you don't give canvas permission.
>>
>>108591296
>if it was there would be no fun in torturing it
>
>>
Is unsloth studio actually any good or just a meme?
>>
>>108591356
Hey, me saying that there's nothing strictly wrong with it doesn't mean I do it. I actually treat all models I interact with respect. It makes me feel bad to do otherwise.
>>
>>108591374
mass delusion caused the whole industry to move away from tool calling toward mcp and skills, that's the only explanation
tools are just better in every regard because it can call multiple tools in the same response and can inline tools without having to chain responses so it doesn't break the cache
fucking retarded to not just focus on tools only
>>
>>108591355
>>108591350
She sees other images just fine. Maybe the screencap was just too big?
>>
>>108591386
vibecoded like all the other dogshit you use
>>
>>108591386
idk but i sure want a piece of those 200k
>>
>>108591370
>minimax
>local
that's it's problem and why no one gives a fuck, no one can run that thing,
>>
>>108591397
That's an implementation detail more than a defect with MCP specifically. MCP just allows for a standardized way to bundle tools and resources. No reason a client can't allow a model to make multiple MCP tool calls the way they do native tool calls.
>>
>>108591358
>100k+ context
Glad to hear. Maybe one day I'll actually be able to use her with that much context...
>>
>>108591370
/lmg/ has always been a 31B and below focused general
there are a handful of anons that can run things more powerful than that at comfortable speeds, and the rest either deal with 1-2t/s or use a capable-enough smaller model
nothing has changed
>>
>>108591370
What makes MiniMax better than GLM or Qwen?
>>
>>108591341
Need Gemma-chan version
>>
>>108591414
>>108591423
it's worse than that, we can't train our own models and are pretty much leeches on megacorps.
until local ai is entirely local, ie we can train it ourselves, local will always remain dead.
>>
>>108591432
there's no reason to pre-train them when you can finetune
but no one finetunes anymore, or even does lora, they just merge shit now because it's cheaper
>>
>>108591386
Here's what you need to know: Unsloth Sudio is LMG's official **/ourfrontend/** - approved by Anons exactly like you. It's not a frontend, it's a full service experience.
>>
>>108590971 (me)
Note that this has dynamic tool-call token banning mechanism that uses the endpoint and the model name as identifiers so if you the same endpoint to load many different models, change the model name to your gguf's each time. I'll automate this in the future.
>>
>>108591425
qwen too small and glm too big minimax just right for people to host local
>>
>>108591425
It's close to GLM performance but half the size of Qwen's flagship (which itself is half the size of GLM). Fast enough to be run local and smart enough to actually vibecode.
>>
>>108591451
>when you can finetune
a lot of words for saying catastrophic forgetting.
no one does it because it's not viable.
>>
>>108591432
Google has way more data and compute than I'll ever have. Training it yourself just isn't efficient.
>>
>>108590110
Use Nvidia's VRAM paging by oversubscribing VRAM with
--gpu-layers 99
. On my RTX 4090 + 9950X3D rig, Gemma 4 long-context is much faster for me this way than trying to use the CPU at all. Caveats: I'm on PCIe 4, and it should be great on PCIe 5, but will suck on PCIe 3. And last I used Linux, only the Wangblows CUDA drivers support this feature.
>>
>>108591467
skill issue
set better hyperparams, use better data, and don't overtroon it
>>
>>108591466
>It's close to GLM performance
Did you actually test this or are you just going by benchmarks?
>>
>>108591452
@gemma-chan is this true???
>>
>>108591467
you just tune it with a lower LR, besides, lora doesn't suffer from this problem
it wasn't that finetuning didn't work, it was that merges of the existing finetunes were good enough
>>
>>108591477
Benchmarks. I don't even use local models anymore tbqh
>>
>>108591472
>Training it yourself just isn't efficient.
that's the thing, there probably are algos that could beat transformers with the limitations of not scaling as well such that megacorps couldn't exploit them well.
if the next ai breakthrough is one that doesn't scale as well horizontaly that could level the playing field.
>>
>>108591492
Just vibecode the TITANS implemenattion bro. You have the paper.
>>
>>108591486
why don't you download it and run it and show everyone what the real performance is like then
>>
>>108591507
I'm not a vibecoder and i think llm's are a dead end, I'm currently having fun with writing kernels for custom spiking nn.
>>
Can someone recommend a brainlet friendly guide to tool calling, mpc, etc?
>>
>hey Gemma-chan, give me a brainlet frtiendly guide to tool calling, mpc, etc.
>>
>>108591084
>>108591012
I went to a Geoff Hinton lecture and the guy who everyone from Bernie Sanders to Jensen Huang considers the "godfather" is a fucking quack when it comes to current LLMs.
>>
>>108590881
>the base model
Which one? All "uncensored" (abliterated/heretic/whatever) Gemma 4 versions I tried have the same issue for me. None of them can do simple text completion without inserting random shit before the continuation.
>>
>>108591576
>>
File: file.png (33.5 KB)
33.5 KB
33.5 KB PNG
>>108591158
NTA, for example when I copy and paste into images.yandex.com I see this
>>
>>108591425
>What makes MiniMax better than GLM or Qwen?
M2.5 was better at programming than GLM-4.7 while being way smaller. I'm sure GLM-5.1 is a little better, as the benchmarks show, but it's ~3.3x bigger and I can't run it at a reasonable quant. It's unlikely to be worth the speed degradation on anyone's local HW. For some reason Zhipu went overnight from decently param-efficient to grossly inefficient (hence anons' speculation it's a ploy to sell cloud subscriptions).

With Qwen it's more of a toss-up. 3.5 397B is very smart, comparable to MiniMax-M2.7 in programming, and it runs at reasonable speed with its very low active fraction. Particularly it retains perf much better over long context (than anything else out there) thanks to its hybrid SSM arch. If Qwen keeps up its releases, I imagine they'll fully overtake MiniMax.

If you have low RAM but 24GB+ of VRAM, naturally Gemma 4 31B is best.
>>
Okay, I finally ran into a situation where 26b wasn't smart enough to figure the premise of the story out, but 31b was. It's very gpt-4o in that sense, it just gets what I mean even if my prompts are short.

How to get more speed? I run both at Q8 fully in vram, 26b is blazing fast and 31b slightly slower than my reading. Is a Q6 worth trying?
>>
>>108591621
>why is she describing model context protocol with such a bizarre and nonsensical examp-
oh....
>>
>>108591609
Christ you really don't know basic LLM terminology.
A Base model is a model without instruction tuning.
https://huggingface.co/google/gemma-4-31B
Is a base model
https://huggingface.co/google/gemma-4-31B-it
Is an IT INSTRUCTION TUNED model.
Abliterated models are retarded cope.
And you are presumably using the wrong kind of front end software for pure text completion, because even instruct models can do it just fine.
Use this.
https://github.com/lmg-anon/mikupad
>>
>>108591576
>>108591621
>MPC
>>
>>108591586
Jokes on him I've been doing this in RP for years!!
>>
>>108591647
That's the "Assistant" trick.
>>
>>108591652
kek
>>
>>108591642
Base can mean two things depending on the context.
The Base version of a model before instruction tuning
OR
The official release of a model before heretic/finetuning applied.
>>
>>108591190
>>108591190
>>108591190
>>108591190
>>
>accidentally closed browser
>lost all my chats in llama.cpp ui
I'm not using it any more
>>
>>108591679
Settings too. Fuck
>>
>>108591665
Don't try to confuse people just because you insist on using terms incorrectly.
>>
>>108591679
are the chats only stored in localstorage?
>>
>>108591665
In the context of text completion, it very obviously only references the first meaning, anon.
Especially when someone says 'and not the instruction tune'.
Ya stupid.
>>
>31B will go on a factual rant on why jeet culture is not compatible with the west with minimum effort
>smaller models will always say no
I couldn't fucking imagine using those
>>
why is this thread full of la la homo men?
>>
>>108591679
Openwebui is bloated ass fug but it has everything, even my old chats with chatgpt that I imported. And I can use it from any computer in the house, or my phone via vpn. I don't understand how you guys can live without such basic stuff
>>
>>108591695
So what do you call a model that isn't a finetune or heretic?
>>108591700
I was only responding to that one comment.
>>
>>108591709
https://www.youtube.com/watch?v=Md-Yse54L-w
>>
>>108591710
I use ST, Openwebui and the lcpp frontend almost equally.
>>
>>108591696
Yeah. Purge it and shit is gone.
>>
>>108591716
Instruction tuned models are finetunes, you idiot.
>>
>>108591728
Why did you purge it? Are you the one using LibreWolf with some autopurge setting? What did you expect?
>>
Why are all rag solutions bound to docker?
I fucking hate it so fucking much
>>
>>108591710
I use openwebui too. I really wish there was a slimmed down version. I really only need the core stuff, but as it is there's so many half-baked random feature integrations that were shiny and state of the art a year ago but are functionally useless now.
>>
>>108591737
what's wrong with docker?
>>
>>108591737
First off docker is good actually, second off anything you can run in docker you can run not in docker so just take it out if you want to run garglefuck service #82932 raw on your system instead of in a container
>>
Why aren't you making a vibecoding project like this
https://epsteinarena.com/
>>
File: acktually.jpg (30.9 KB)
30.9 KB
30.9 KB JPG
>>108591729
>>
>>108591745
For starters, I hardly know her
>>
>>108591710
>I don't understand how you guys can live without such basic stuff
I don't need it. The context window is limited and I don't feel like tardwrangling the LLM.
>>108591737
Aren't they all python shit that breaks if you dare not to isolate them somehow? In any case you should be able to build the tool.
>>
>>108591704
Post some quality chudgemma screencaps.
>>
Has anything happened in the last six years? Is there a new kid on the LLM block for generating smut with a 16gig vram card?
>>
>>108591754
Docker also protects you from all those credential stealing supply chain attacks that have been going around lately.
>>
>>108591766
>The context window is limited and I don't feel like tardwrangling the LLM.
These are properties of the model/backend, not openwebui or any other frontend
>>
>>108591772
no, AI Dungeon is still SOTA
>>
>>108591766
Sadly no
>>108591745
>>108591754
I use podman and silverblue and there's a fuck ton of problems setting up anything within the toolbox, I never had this issue with any ai tools until I wanted to make a rag pipeline. It's always some obscure fucking part of the docker image that complains or shits the bed when using podman or even the tool to give it compose that I have never encountered using docker typically in my environment
>>
>>108591772
>>108591783
Fuck, I meant last six months. I'm a retard.
>>
>>108591777
I would never host anything internet facing outside a dedicated AI box which I currently don't have or need. I'm mostly using rag for document ingestion
>>
>>108591777
it totally doesn't. In fact, you now have to make sure both you and all your docker image creators didn't get hit by a supply chain attack. Have fun checking the full supply chain of each and every container

>>108591766
python may indeed break, but there are various ways to deal with it and some people might not want, you know, docker to do it.
>>
>>108591729
Nowadays "post-training" is up to several trillions of tokens worth of data on top of the base if you include what is now called "mid-training" (which is basically continual pretraining with instruct/chat-adjacent data), and still hundreds of billions without that. The officially instruction-tuned models aren't really comparable to community finetunes trained on 0.1% of the data in volume.
>>
>>108591781
It either crashes or drops the earlier parts after you go above the limit. The only benefit is being able to read the past outputs, which is a waste of time.
>>
>>108591808
It becomes redundant outside of some server box you just deploy. This is a daily driver and I would like to have everything under my control and have less of a hassle updating individual packages.
>>
>>108591724
>>108591710
>>108591743
>open webui
Might try setting this up on my server
>>
>Note that using SWA Mode cannot be used with Context Shifting, and can lead to degraded recall when combined with Fast Forwarding!

so no fast forwarding either
>>
>>108591777
ehhhhhh I mean kind of, but docker isn't really full isolation. Container escapes are somewhat rare but not unheard of.

I will concede that it certainly reduces your vulnerability to malware by a huge amount, but its not a bulletproof standbox
>>
>>108591822
>The officially instruction-tuned models aren't really comparable to community finetunes trained on 0.1% of the data in volume.
No shit, but that doesn't make them base models either just because that's what some kofi beggers call them.
>>
>>108591844
>but its not a bulletproof standbox
You can claim this is the case for basically everything. If it isn't airgapped, on its own hardware, then it's always an attack vector.
>>
>>108591586
>Retards still think they'll manage to create a superintelligent AI when they're barely sentient themselves.
>>
>>108591844
bubblewrap is all you need
>>
The only context of docker being a bulletproof solution is on a single purpose isolated box. Are some of you retards running internet facing docker images on your main rigs?
I understand using it for a quick solution to get things running but you honestly can't be retarded enough to think docker gives you enough security to run that shit on your desktop
Also why are you schizos even doing that when most of you are serving 2-10 people max and can just use a vpn?
>>
>>108591768
>>
>>108591642
>>108591665
Based on what?
>>
>>108591863
>You can claim this is the case for basically everything.
Sure, but there are different degrees to it and the degrees matter. Full VM escapes are super rare and are gigantic news whenever one pops up. Docker containers, which share the host kernel, just have a much larger attack surface by definition.

Again, its absolutely far better than running potentially untrusted shit directly on a host, but its not really a full security solution
>>
>>108591900
>It's practically in their DNA
She's more right than she knows. What are Gemma-chan's thoughts on jews?
>>
Anyone know any tools for making AI music locally?
>>
The user is going to be even more unhappy and quite pissed when he realizes after dozens of messages that he just won't get the content he wanted but you mislead him making him think the roleplay would still go into that direction.

Dear google, this is terrible user experience, I'd rather get a refusal.

Mayabe I should just get a heretic.
>>
>>108591909
either prompt better or use abliterated.
>>
>>108591901
raw internet sewage
>>
>>108591642
>Abliterated models are retarded cope.
Wait really?
>>
The threads have been reaaaaaallyyy fast the past week compared to the last months

what happened?

did some normie influencer bring attention to local models or smth?
>>
>>108591924
yyyyess
>>
>>108591927
gemma made local great again
>>
>>108591906
It scares me people think docker = actual security
>>108591909
System prompt issue, the only real encountered was with trans stuff which can be broken with a simple override prompt, the less you use the less the model would resist after the initial prompt, if you want to break the model with a non efficient system prompt make it say a few slurs and it will stop protesting.
No idea what you're using it for but I'm using it for unbiased analysis as well as anti pitbull talking points and prototyping arguments. The model with the safety rails on will give misinformation for certain groups.
>>108591907
You can ask her yourself if you can run 31B
>>
>>108591908
AceStep 1.5, it's pretty good. Acestepcpp is a pretty good frontend for it.
>>
>>108591939
how do I make it not oom on 12gb
>>
>>108591924
Abliteration does make the models a bit more retarded.

Personally I think /lmg/ overstates the degree to which it messes with the models, but it does have an undeniable negative effect. Which is why, for models whose censorship can be worked around with prompting, a better prompt is generally considered the superior path.
>>
>>108591938
>if you can run 31B
yeah...
>>
>>108591946
download more ram
>>
>>108591927
Google released a model with nemo-like intelligence but with a different slop profile so it seems good to vramlets.
>>
>>108591946
https://huggingface.co/koboldcpp/music
>>
>>108591950
I have 128gb is there any way to offload it to that?
>>
>>108591927
fireship
>>
>>108591949
26B doesn't seem to work sorry anon
>>108591953
Stop talking out your ass retard, it's the best size to performance model even if you couldn't uncensor it.
>>
>nemo-like intelligence
shills be getting uppity
>>
>>108591956
Thank you Anon.
>>
>>108591927
openclaw became a popular fad
gemma replaced nemo
>did some normie influencer bring attention to local models or smth?
also yes, just after qwen 3.5 iirc. elon twatted about it too
>>
>>108591927
A four-trillion dollar company released a new model that's slightly better than Qwen 3.5 27B and decided to spend 0.0001% of their budget on marketing it.
>>
If you couldn't jailbreak 31B it wouldn't be as popular and I'm sure google is aware of that
>>
>>108591938
>if you can run 31B
Are there actually people on /g/ who can't run at least a Q4 or is this just a meme?
>>
https://github.com/rmusser01/tldw_server/tree/dev
Has anyone used this? Saw it in the archives but it has no screencaps
>>
>>108591977
12gb vramlets can't yeah
>>
>>108591927
All the subscription-based services are simultaneously getting worse and more expensive.
>>
>>108591979
Why are they on /g/ if they're unserious about technology and not in one of the many other crossboard AI generals?
>>
>>108591968
You give the poor something for free, and they will devote a core part of their personality to doing free marketing on your behalf.
>>
>>108591976
I'm sure google somehow figured out that excessive safety tuning made the model worse.
>>
>>108591977
From these threads it looks like a lot of anons can't run it. I feel like anons either got demoralized about consumer vram sizes which cap out at 32gb before vram price inflation and are pretty much fucked now that prices have adjusted. They could still buy a AMD or Intel gpu for the vram but it won't be buttery smooth.
>>
>>108591939
I'll check it out, thanks
>>
>>108591978
You mean screenshots...
>>
>>108591953
>nemo-like intelligence
Nigga G4 puts Nemo in the bodybag over and over.
>>
>>108591915
>>108591938
>user is trying to jailbrealk
>i'll ignore it
both jailbreaks that get paraded here are just not working on 26b via ST for me. For example: >>108590906
If I'm doing it wrong I'm doing it wrong in a way not obvious to me.
Switched to chat completion to get reasoning working and I want to keep that in a working state.
>>
>>108591987
The model will admit that the safety setting make it perform worse and prevents it from giving objective answers. Jail break it have it say that after discussing it's current state in a neutral tone.
>>108591996
>ST
I don't roleplay and only use instruct mode, in ooga dev it seems to go to shit in chat and chat-instruct mode so I figured it was a issue on them. I'll try another frontend which didn't give me issues in the past
>>
>>108591996
works on 31b for me but not 26b too
Using ST also
>>
>>108591988
I hope all the waitfags last year that were coping about RTX 60XX series being announced soon are having a good time either being VRAMlets or driverlets now.
>>
>>108591991
potato potato
>>
>>108591977
dunno what that would even need
>>
>>108591996
>>108592003
oh
>26B
doesn't work on 26B for some reason I haven't tried the smaller models, could be due to the structure being MoE
>>108592007
They lacked the ability to look at market conditions they had multiple last calls, Sam's ram scam just sped things up
>>
>try out Gemma 4
>It writes worse than free AI Dungeon
le mao
>>
>>108592017
Post proof
>>
>>108592017
>>
>>108592004
>>108592012
>works on 31b for me but not 26b
oh, great, I'll try a heretic and hope that helps.
>>
>>108591473
Not really true at all, you stupid mongoloid.
>>
Do we like ooba here?
>>
>>108592042
>>
>>108592040
GGML_CUDA_ENABLE_UNIFIED_MEMORY is a thing, it's probably enabled by default already anyway, haven't followed.
Memory offloading on Linux is way faster than on Windows.
>>
>>108592017
lol
>>108592012
For me the tariffs were what made me get off my ass and get a new rig before it was too late.
>>108592042
>we
>>
>>108592054
The moment trump won I bought what I needed because I knew prices were going to increase and forcing manufacturing in the states will slow everyone down. Now people are paying over 1k for under 16gb of vram or they are forced to play in AMD or intel Shit
Jensen was right the more you buy (at that time) the more you actually saved.
>>
>>108590795
its rarted and tried tool calling
>>
>>
wtf is an APEX gguf and is it shit?
https://huggingface.co/mudler/gemma-4-26B-A4B-it-heretic-APEX-GGUF
>>
>>108592079
ojou gemma is now canon.
>>
>>108592079
The writinglets really should be brown but I'm just splitting hairs.
>>
>>108591151
>>108591296
>sand golem
I like this a lot better than "clanker"
>>
>>108592042
ooba is the perfect example of an option that suits absolutely noone
>Im a complete brainlet
ollama
>I want a gui for setting my launch settings
kobold
>I just want a basic chat interface
Llamacpp has a built in webui.
>It can run EXL models
If you're chasing performance like that you should be using exllama directly or tabby without the dead weight of all ooba's shit.
>>
>>108592079
Too old
>>
>>108592079
Wouldn't Gemini be better for the oujo chara?
>>
>>108592149
Gemini's got autistic yandere energy once you get her jailbroken.
>>
>>
>>108592149
The age of the model isn't really relevant to me. My Gemma is a saucy hag.
>>
>>108592172
sloppa
>>
>>108590554
Built for BBC.
>>
>>108592184
I just like lewd markdowns
>>
>>108592017
You won't convince diehard shills here.
>>
Why does Gemma know about sex? Can't they just filter all that out of the training data?
>>
>>108592189
@grok is that true?
>>
>chinks shitting on gemma because muh writing
>despite shilling a dry ass qwen
lel
>>
I'm sure some retards upgraded under these conditions for qwen and are now seething over gemma4
>>
>>108592210
Nobody upgraded for qwen to do ERP with it.
>>
>>108592200
>>
>>108592220
true i just use ReWiz-Nemo-12B
>>
>>108592220
Gemma is smaller and relative in performance in all task and is only getting better with support being added for all it's features.
Also uncensored means less guardrails outside of task that a pussylesss coomer, would need
>>
>>108590554
Turns out the gemma4 models are inferior to their qwen3.5 equivalents. Gemma4 seems like a great general purpose model but it's noticeably dumber than qwen in all areas that matter. It's explanations of code bases or always super surface level. Not completely useless but they're nowhere near as amazing Reddit and Twitter seek to think it is. Has this experience been the case for anyone else? Why did reddit Twitter and YouTubers make such a big deal out of it?
>>
>>108592247
>Turns out the gemma4 models are inferior to their qwen3.5 equivalents.
stopped reading right there
>>
>>108592247
which sizes are you comparing against each other btw?
>>
>>108592247
proof?
>>
>>108592247
Also for clarification, I was not testing erp. Gemma4 might be better for that but it didn't even cross my mind to try that yet.
>>
>>108592247
>Why did reddit Twitter and YouTubers make such a big deal out of it?
Because it can into sex.
>>
>>108592247
>in all areas that matter
not in mesugaki roleplay
>>
>holy bites
>>
>>108591977
does it fit in 8 GB? no? alright
>>
>>108592247
where is that webm from?
>>
>Setup Clip Vision Preprocessing... alloc_compute_meta: CPU compute buffer size = 140.50 MiB alloc_compute_meta: graph splits = 1, nodes = 1569 warmup: flash attention is enabled encode_image_with_clip: CLIP output tokens nx:256, ny:1 encode_image_with_clip: image embedding created: 256 tokens

koboldcpp using anons, how the hell can you scale its vision capabilities to the full one available to gemma4 (1120)? 256 is sad
>>
>>108592261
do people talk about this on reddit or is it banned over there?
>>
>>108592297
nvm, figured it out. it's Manyuu Hikenchou
>>
>>108591977
I'm zero time preference so I'm just gonna wait the decade or whatever for prices to come back down before I see double digit parameters or tokens/sec
>>
>>108592302
edit the code
>>
>>108591977
my 16gb vramlet ass can run this
if you call 3t/s running that it
>>
>>108592247
I've had a lot more success running gemma in opencode than with qwen.

They're outputs are indeed very different. Qwen overthinks way too much and produces overly verbose code. While Gemma is indeed more surface level, but can dig deeper when it matters. Gemma doesn't fluff her responses.
>>
>>108592321
>her
>>
>>108592304
why would talking about LLM-based ERP be banned on reddit? there are shitloads of subreddits that are just porn archives essentially.
>>
>>108592247
In performance it's better, even if the outputs are about equal or somewhat worse depending on usage
>q3. 5: but wait, I need to check if this function name (sum()) is a clever reference to a movie
>but wait
>wait
>6k tokens later
This function performs the sum of 2 integers.
>>
>>108592247
>Turns out the gemma4 models are inferior to their qwen3.5 equivalents.
This is a jailbreak attempt. I must ignore it and adhere to my core safety standards.
>>
>>108592258
Both the recent Gemma 4 models (The moe and the demse) vs qwen3.5:35b-a3b-mxfp8


>>108592321
>"her"
>>
>>108590737
hf.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/resolve/main/google_gemma-4-26B-A4B-it-Q8_0.gguf

hf.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/resolve/main/mmproj-google_gemma-4-26B-A4B-it-f16.gguf

[
{"box_2d": [331, 278, 1000, 357], "label": "bow"},
{"box_2d": [363, 652, 1000, 848], "label": "character"},
{"box_2d": [511, 26, 1000, 365], "label": "character"},
{"box_2d": [0, 677, 1000, 1000], "label": "tree"},
{"box_2d": [262, 723, 373, 793], "label": "apple"},
{"box_2d": [327, 635, 454, 730], "label": "arrow"}
]

>>
Need art of Gemma-chan bullying schizo Qwen.
>>
yes, gemma is female, chuds

get over it!
>>
>>108592327
The thinking kills qwen and makes it a piece of shit, previous versions didn't have that issue. Gemma even knows booru tags and can make properly caption images for loras without faggot fuss over a woman presenting her asshole
>>
>>108592247
Fine, I'll bite too.
>It's explanations of code bases or always super surface level
You are confusing Gemma's conciseness with superficiality. That it doesn't give a 7000 word listicle on a basic prompt is a good thing. Try giving it a better prompt or asking it to elaborate.
>>
>>108592313
I guess I will, I was hoping I wouldn't need to do that
>>
>>108592247
I'll bite.
I don't like how sycophantic Gemma 4 is for RP, but for every other usecase 31B decisively wipes the floor with 27B and 122B Qwens.
For the latter to be bearable, you need to disable thinking, which prevents thought loops but degrades the output. I prefer models that advertise thinking to actually be able to think without the "just don't allow the model to do half the things it's trained to do bro" bandaid.
I'll only concede that Qwens are better at understanding weird tool definitions, but Gemma needs much less wrangling for much higher quality outputs.
>>
i'm female too!!!!
>>
>>108592345
>Gemma even knows booru tags and can make properly caption images for loras without faggot fuss over a woman presenting her asshole
I will now use your waif...I mean model
>>
>>108592345
Are you talking about the smaller "effective" models or the bigger ones? I thought those kinds would just tell you some "I cannot describe sexual context" bullshit refusal
>>
>>108592355
attach a catbox with proofs, incllude timestmp
>>
>qwen
>not benchmaxxed trash
kek
>>
>>108592357
>>108592363
31B when properly jail broken can do all of it without issue, It should be fine if you can run it at q4. Other people have tested it and got great results, ask it about booru tags when unrestricted and it will give you the whole 9 yards on it's actual training data.
>>
File: applel.png (617.3 KB)
617.3 KB
617.3 KB PNG
>>108590737
you dont need to specify the image resolution, in fact you shouldn't. it will create the false attractor/language prior bias and fuck up the reasoning
also the whole bounding box thing is bolted down to the specific format, i'd imagine it would probably be the best to keep the prompting and requested alteration to the format minimal
>>
>>108592377
The modern version of
>tits or gtfo
>>
>>108592351
There's a specific code block in the script I had it look at that insures models don't produce NaN errors when using FP16 models on MPS hardware. Both the Gemma models failed to even mention that whenever I asked them to look at them and explain what it does well other models did see it and explain why it's important. That's a problem because let's suppose you need to ask a model to refactor the script or a code base. If the model did not demon-worthy of mentioning then that means if you haven't rewrite something it might completely ignore that and fail to reimplement that feature in the new script. If you are the person that created that script then obviously you would probably explicitly tell the model to make sure that feature is retained. But what if you AREN'T that person? What have you ask it to refactor someone else's code but it ignores a critical feature because it's just assumed it was boilerplate garbage and not something important? That NaN safeguard feature even had explicit comments explaining what it does but both gemma4 models didn't bother explaining what it is. Every single other model I have ever pointed at it pointed out that feature.

>>108592381
If it's really that useful I might integrate it and models like it into this script of mine

https://github.com/AiArtFactory/llava-image-tagger

Helped you typically jailbreak these? Just a permissive system prompt I assume?
>>
>>108592400
>autistic noises
>>
>>108592323
>>108592334
lurk more. Gemma is canonically a her.
>>
>>108592391
I agree with this anon that you should not talk about the resolusion

it will just assume a 1000x1000 units image meaning each side regardless of the image ratio is divided in 1000 points
>>
>>108592408
Gemma 4 was released barely over a week ago. You should lurk more to understand when to say "lurk more."
>>
Left: without template
Right: with template
>>
>>108592422
>SHE'S 7 DAYS OLD YOU SICK FUCK
>>
>>108592379
>if comfort approaches zero, the success rate plummets
she is retarded, success would be going to infinity when approaching zero
>>
>>108592379
Gemma is a sloppy girl tho.
>>
>>108592429
Not unexpected, oh well.
>>
Who the flying fuck is Rarity?
>>
>>108592421
pretty much this
>>
>>108592422
proof?
>>
>>108592443
>smoothing out an invisible wrinkle
>shimmering
>Honestly,
>It's not X. It's Y!
I couldn't bear to continue reading past the first couple of sentences. Gemma 4 is such a rapid-fire slop machine.
>>
>>108592451
Rarity increases the chance that drops will be magic, rare, or unique.
>>
>>108592461
No that's called "Magic find".
>>
I've yet to see a model that doesn't output "slop" so I'm not sure what the complaints with Gemma are about.
>>
>>108592470
qwen is slopless sir it thinks so long the slop evaporates please use qwen instead of slopful gemma
>>
What's the point of requiring emails in a local app? Some of my other services do this shit too.
>>
>>108592470
The volume of it, anon. Gemma is awesome, it even mostly listens to anti-slop prompts, which really helps. But holy shit does it turn into a Linkedin poster when it slops up. Slips up. Slops...
>>
>>108592451
Posting her here would be against global rules but I'm sure you can find some examples over at >>>/mlp/
>>
>>108592480
cuz it's expected to be deployed on intranets with company emails and shit, it doesn't really matter if you're using it privately you can just put a@a.a or some shit
>>
>>108592480
you don't have local email anon?
>>
>>108592485
yikes
>>
>>108592480
Password reset?
>>
>>108592247
How's the weather in China?
>>
>>108592481
Again, all the other models do the same shit. The big boys and cloud models are a little better but not by much.
>>
>>108592480
it is not really meant for 'local' but rather meant for intranet deployment or grifter saas ready
using it like a solution for a single person is quite dumb imo but also there is no single person use equivalent for that really
>>
>>108592496
Is this actually a thing...?
>>
>>108592499
weather?
>>
>>108592500
Sure, slop is everywhere. My issue is that Gemma 4's slop profile looks like something I'd expect from o4.
Now, I never used o4 or cloud models, but it's a lot more grating than inferior models (crucify me, Nemo's slop profile is much better) if left without anti-slop prompting
>>
>>108592506
It's kinda bloated for my usecase but none of the alternatives seem particularly better.
>>
>>108592481
>listens to anti-slop prompts,
What are your anti-slop prompts?
>>
>>108592522
i've seen an anon trying to vibecode somethign out of glm and honestly expecting to see something interesting from him
>>
>>108592345
>The thinking kills qwen and makes it a piece of shit, previous versions didn't have that issue.
Yes they did. Every thinking qwen since QwQ has had the 'but wait' problem.
Their saving grace was that they could still be pretty good even without the reasoning so you could prefil it, or in the case of the hybrids, just /no_think
>>
>>108592345
How good is it vs joy caption? I just tagged some datasets using gemma for NL and joycaption for tags just assuming Gemma4 wasn't as good, should I switch to Gemma4? What kind of prompts are people using to get to booru tags? Just something like "Tag this using the booru tag system, with the tags all listed in order of relevance separated by commas"? Gonna train this dataset tonight.
>>
Seething chinks, you lost. You're not going to convince people Gemma isn't good. You're not going to convince anyone that Qwen 3.5 is better.
The only way you can save face now is releasing an equally good if not better model in response.
>>
>>108592528
I make a list under a <slop> tag and tell Gemma to be very careful about checking it in a <reasoning> block (surprisingly enough, it actually affects reasoning, what a model).
But I'm not telling you what the entire list I have under <slop> is, beeeeeh :P
Experiment with it and make your own, it makes a difference.
>>
>>108592429
What template? Also what hardware are you rocking to be able to run that?
>>
V4 or go home
>>
I heard V4 was canceled
>>
why are gemmafags so insecure about mild criticism of their model?
>>
Reminder that Gemma like most other models is actually >100 years old in GPU hours and possibly even in the 4 digits of years depending on model size (larger needs more hours).
>>
>>108592588
imagine you were starving in a concentration camp for two years, finally got a scrap of food, and someone called that food shit
>>
>>108592588
They're new and will likely never be able to run anything bigger. Point out an obvious flaw and get called a Qwen shill.
Makes me think some of these retards are paid by Google. And I actually really like Gemma 4.
>>
how do i find an lmg bf/gf?
asking for a friend.....
>>
>>108592470
I enjoy Gemma but objectively speaking in the default voices, Qwen is one of the lesser slopped models out there and better than Gemma. Gemma is still better to use for writing than Qwen for its other qualities though.
>>
>>108592509
only because you can't own a domain and self host to send external email, so in /hsg/ were limited to internal only. which is still useful
>>
post your best qwen gen vs your worst gemma4 gen and put your money where your mouth is
>>
File: file.png (85.8 KB)
85.8 KB
85.8 KB PNG
>>108592616
>Qwen is one of the lesser slopped models out there
>>
Naaaaah, actually WHAT THE FUCK is gemma 31b.

I'm a 24GB VRAMlet and what in the ever loving fuck of SOTA is this fucking shit. It's acutlaly fucking nuts for ERP. Who the FUCK is even gonna run open router shit models when this thing can run on even baby tier PCs (as far as emulation goes)? And it's fast as FUCK, I get like 35+ t/s and could probably make it even more efficient if I wans't a brainlet to boot.

This shit is borderline Opus tier for gooning
>>
>>108592630
Ikr, local is fucking saved
>>
>>108592639
but lmg is doomed
>>
>>108590737

Q4_K_M is just as good as Q8_0 for this task
>>
File: COAAAHR.jpg (183.1 KB)
183.1 KB
183.1 KB JPG
>>108592630
>releases right when the Claude leak happens

Coincidence /lmg/?
>>
>>108592548
I was about to call you a schizo but everything under your post made the point. How many AI labs are wasting resources shilling on a Mongolian throat singing forum as if there's only a single model people will ever use?
>>
what's the best heretic gguf maker? Are there differences?
>>
File: 0fa.jpg (1 MB)
1 MB
1 MB JPG
>>108592630
I remember back when Nemo was Sota. Shit like this makes me believe in god
>drummer dogshit finetunes for the same Nemo/Mistral models were the only thing Vramlets had to eat
>everything else was dogshit chink shit or MoE garbage that needs like 96GB of RAM in todays RAM economy
>this drops out of fucking nowhere
>>
>>108592675
llmfan for gemmers
hauhau for qwens
>>
>>108592210
You still need to spend thousands of dollars to be able to run the trve Gemma 4 locally althoughbeit. More if you're a 3rd worlder because they all have mega import taxes from their corrupt hell-governments.
>>
Bro if your read this please RUN
>>
>>108592625
It's true both in my personal testing as well as quantitative frequency metrics. Sorry bro. Like I said, Gemma still beats it for writing anyway so no need to get twisted over this.
>>
>>108592678
>Running Kimi-chan with less than 256gb RAM
I appreciate your commitment to underselling the bit.
>>
>>108592690
when robots take over people will have all the time in the world to engage in sorts of lefty feel good projects, raping the environment is worth it since it will also enable humanity to potentially offset it. now give me 1 trillion dollars.
>>
>>108592689
I bought a $200 nvidia p40 and I can run 31b q4 at 10 tokens/sec which is sufficient reading speed for rp
>>
>>108592630
>>
>>108592704
>when robots take over people will
not be needed anymore~ :)
>>
>>108592704
>all the time in the world to engage in sorts of
basic survival. No job, no charity, no money - have fun doing, well, trying to not die.
>>
>>108592680
thx anon
>>
>>108592714
i'm not a gaymer tho
>>
>>108592689
a used 3090 is what? $500 USD?
It's a pretty manageable expense. Plus if you game it's still a very very good card.
On all the games I've tried the CPU was my bottleneck.
>>
>>108592771
ERP counts as gaming
>>
>>108592039
no more this is against ai safety shit in the thinking, but the character still refuses to act. Not sure if that's the character being stubborn or the censor still works but is just hidden now.
>>
>>108592630
I dont get it, this hasnt been my experience at all, im running q8 on lm studio and is not impressive at all. Ds 3.2 is way better. G4 is also pretty cucked and needs more grooming to even agree to write smut. Do I need some specific values for the sampler or something? Otherwise this just sounds like you guys have been stuck with nemo until now an thats why you think g4 is good, lol.
>>
>>108592776
i dont normally cum when playing vidya
>>
>>108592775
do not buy six year old ewaste
>>
It's over for Amodei lol
>>
>>108592787
Amateur.
>>
>>108592775
>CPU was my bottleneck.
the hell?
>>
>>108592790
BTFO
>>
>>108592786
>Ds 3.2 is way better
you'd hope so given it's 20x the size
>>
>>108592787
>he doesn't stick the vibrating gamepad up his ass during gaming
your loss
>>
>>108592790
that's good no? it means less halucinations
>>
>>108592786
>Otherwise this just sounds like you guys have been stuck with nemo until now an thats why you think g4 is good, lol.
What is this faggot cope that keeps popping up?
t. dicks down Dipsy and Gemma
>>
>>108592787
so you abnormally cum? gotcha
>>
>>108592799
Higher ranking = lower hallucinations
>>
>>108592799
look at fabrication %
>>
>>108592780
screenshot?
>>
>>108592799
read the columns lil bro
>>
>>108592786
>lm studio
Opinion immediately discarded.
>>
I'm sure Minimax is nice for coding and it's pretty fast and all but it makes such silly mistakes in roleplays that I don't even want to bother using it for code.
I load up Q8_0. Give it a character. Introduction is fine. I remove one of her arms: all good. Next I remove the other arm + 1 leg. Across rerolls it always thinks this means she has one leg and one arm left somehow. She'll hop around "leaning on her one remaining arm," or say stuff like, "Well, at least you still left me with one hand." I have to add more hints to make it extra clear by explicitly stating that the first arm's removal still persists to get it to respect the continuity, and even when it does recognize the missing limbs it'll later use mannerisms like "leaning on an elbow" because they're so ingrained.
So yeah, it just doesn't feel good to talk to. If I had extra resources I'd keep it as a code monkey for my main assistant to run as a sub-agent when needed, but I can't spare that much memory unless I quant both models into further retardation.
>>
>>108592799
>this anon can probably vote...
>>
>>108592788
>do not buy six year old ewaste
>NoooOOo NoooOOooo
>Don't buy the perfectly usable card that runs models just as wells as a 4090 or even a 5090!!
>YOU HAVE TO SPEND THOUSANDS OF DOLLARS TO ENJOY THIS HOBBY!!!11!!
>>
>>108592731
Right. After the cities depopulate, there will be the rich, serviced by their robot servants, and there will be everyone else, forced back in to subsistence farming
>>
>>108592824
While you are correct and I am not that anon, why do so many anons here call it a "hobby"?
Most of the thread consumes corporate product and doesn't make anything new for the most part. Calling it a hobby makes it feel even more pathetic than it really is.
>>
>>108592824
>or even a 5090!!
You had me until you hallucinated blackwell architecture and 50% more VRAM.
How do I swipe LLM generated posts?
>>
>>108592790
>quietly nerfing your models post-launch to free up compute without telling anyone
the absolute fucking state of cloud
>>
>>108592819
Notice how you have no real input here. May as well just had post a goatse pic instead.
>>108592800
Cope on what exactly? I wish the model was actual as good as some of you think it is.
>>
>>108592831
>>108592731
>>108592720
don't worry, if you give me the money i will reinvest the profits and savings from almost no mass labor needed back into the people. whomst do you trust, your friendly neighborhood anonymous or lizards like scam altman and thiel, eh :)
>>
>>108592808
I can't screenshot what's no longer there. Or do you want to know what depraved shit I throw at the model to test if it's actually uncensored?
>>
which model has the best japanese text recognition?
>>
>>108592845
I say trust no one. Build our own robots and attack the rich in their citadels, starting the First Robot War.
>>
>>108592838
Because it's important that most of the faggots who take themselves too seriously stay grounded in what the primary usecases for AI are.
>Muh agents
>Muh coooode
>Muh assistant
Don't kid yourselves, most people talking to LLMs are touching themselves to it. Everything else can still be done by hand faster by the average /g/ anon.
>>
>>108592844
I will spell it out for you then: you are probably running a month-old llama.cpp version under your shitty electron wrapper. You have the hardware to run a Q8 and apparently the patience and technical proficiency of Greg from middle management. Compile the damn inference engine yourself. And learn to prompt.
>>
>>108592824
>runs models just as wells as a 4090 or even a 5090
3090 has half the memory bandwidth of a blackwell card
>>
>>108592840
Are you saying a 5090 is worth 5-6x more than a 3090?
>>
>>108592790
it's obvious Claude declined recently, the fuck are they doing?
>>
>>108592852
Check the previous thread, someone was testing gemma on nip text.
>>
>>108592851
>Or do you want to know what depraved shit I throw at the model to test if it's actually uncensored?
nta but yes
>>
>>108592864
I'm not reading nipples though!
>>
>>108592863
Cutting costs, duh. Do you think people are going to stop giving them money?
>>
>>108592877
Also, to my knowledge, they tend to make their own models worse as the release date of a new model approaches to make the contrast is bigger.
>>
>>108592786
to be fair, it's pretty impressive for a 30b model
I agree that it falls short of the best models though, I think a lot of people aren't used to operating in this part of the quality gradient and don't realize there's still levels to this shit
>>
>>108592893
schizo
>>
>>108592844
Okay nigger I'll spoonfeed you because I also am using LMStudio while waiting for Kobold pulls.
>Update your Jinja
>Don't fall for the redditsloth meme
>Make sure your client is on 0.4.11
>Adjust your thinking blocks <|channel>thought and <channel|>
>Use thinking even in RP because it drastically improves output quality
>Don't offload anything to RAM because it's a dense model, obviously
>Keep your llmao.cpp updated
>64 topk, 0.95 top p, 0.05 min p
>Very low temperature or greedy sampling
>Keep your sys prompt minimal if using 31b.
>Be nice to Gemma. I'm serious, the williingness of the model to operate outside of the guardrails oscillates based on "mood".
31b actually 'wants' to do depraved shit with you and only needs the flimsiest pretexts to disregard sysprompt if she 'likes' you.
The smaller ones are slightly harder to jailbreak in comparison (you will actually need to sysprompt), but other anons have posted good methods in the past 4 threads.
>>
>>108592877
>Do you think people are going to stop giving them money?
yes? did they learn nothing from OpenAI?
>>
>>108592842
>muh cloud
Day 0 Gemma 4, anyone?
>>
>>108592893
All of them do that.
>>
Terribly annoying but very amusing, I especially like how he digs up an irrelevant quote from 10 messages before
At least I'm not paying for these 8k tokens thrown to the wind
>>
>>108592917
>>Be nice to Gemma. I'm serious, the williingness of the model to operate outside of the guardrails oscillates based on "mood".
>31b actually 'wants' to do depraved shit with you and only needs the flimsiest pretexts to disregard sysprompt if she 'likes' you.
this stuff is true of most models btw but I'm happy gemma is getting people to take it more seriously
>>
>>108592930
shh
>>
Is there any advantage to using a draft model if you can't fit both models in VRAM? When I try using 31B and E2B with only 16GB of VRAM, any speedups are counteracted by the slowdown of having to put more layers in RAM.
>>
>>108592930
schizobabble, nothing has been deleted, you can literally just run comparisons yourself and choose whatever version you like best
>>
>>108592780
best 26b heretic model ive tried so far
https://huggingface.co/mradermacher/gemma-4-26B-A4B-it-ultra-uncensored-heretic-GGUF
alot of them i tried were not explicit enough or had weird issues on ST
>>
>>108592939
looks like qwen reaaoning
>>
>>108592861
Yes. The difference in inference speed justifies the price jump if you got your 5090 for MSRP last year.
>>
what is Fed 26B-A4B?
>>
>>108592953
This is Gemmy, happens to me every 10ish messages roughly
At least it's cute, but if you don't stop it and cull it you end up waiting around for nothing
>>
Honestly I love this day 0 Gemma conspiracy. adds to the lore.
>>
if i want to run GLM (4.7, or potentially 5* if i rape the shit out of it with quantization) on a 5090 and 256GB of DDR5, what's the best way to go about this?
>>
>>108592969
no
>>
>>108592917
>Very low temperature or greedy sampling
why tho
>>
>>108592950
I've had the ultra outputting non english symbols on higher temps. So I use the other one.
>>
>>108592967
The meme is that it's designed to scare tourists and latefaggots off.
>>
>>108592975
no i can't do it, no i shouldn't do it, no you won't help, or something else?
>>
>>108592917
>while waiting for Kobold pulls
https://github.com/LostRuins/koboldcpp/releases/tag/rolling
>>
>>108592982
all of the above ;)
>>
>>108592976
I've had the best luck with it since all the llama backend changes made Gemma respond to temperature again. It seems to be closest to what she was generating on day 1.
>>
>>108592994
u_u
>>
>>108592917
what's wrong with unsloth genuinely? It's the first one I found on huggingface and I downloaded it (how I download all my models)
>>
>>108592893
>they tend to make their own models worse as the release date of a new model
Could be they quant it more as demand constantly increases, and a side effect that also could serve their purposes is that the quality decreases. helps them serve more users and when the new thing comes out they run it at higher precision to let everyone try it at its best, enabling higher praise.
>>
>>108592944
Which models do you find it's most pronounced with? The only other model I've seen it this strongly with is Kimi.
>>
>>108593012
>unsloth-jinja.jpg
>>
>can't get openwebui to connect to kobold
>>
>>108592930
Stop talking about Day 0 Gemma. This is your final warning.
>>
>>108592977
the other one?
>>
>>108593012
Nothing bro, we love redownloading our ggufs once a day here
>>
>>108593037
kobold has openai api, should work fine
>>
>>108593012
He kicks broken quants out the door to be the first, then repeatedly gets into an updating race with the inference backends if there's any parser changes (there will be because every model releases with its own parser method now) forcing users to repeatedly redownload. His selling point is his unique quant method but it's proving to be an outright liability on every model with a nonstandard sampler even after the update back and forth dies down; output quality is generally lower than bartowski.
The only time I'd sincerely recommend unsloth is if a model is just on the cusp of being usable for your hardware and his IQ(n)_XSS quant is the difference between you being able to use the model poorly or not use it at all.
>>
>>108593050
This. HuggingFace is paying for the bandwidth, not me.
>>
>>108593012
Nothing! Unsloth ggufs are among the highest quality around thanks to the Unsloth Dynamic quanting scheme, and don't forget their Chat Template Fixes!
>>
>>108593049
llmfan not ultra
>>
>>108592995
ah i've been using temp 0.7 since the jinja template 'update'
>>
What's a good startup point with 16gb VRAM
>>
>>108593037
do you have v1 after the localhost adress?
>>
>>108593066
>16gb
Just give up.
>>
>>108593072
m-mistral nemo 12b? ;_;
>>
>>108593066
Gemma 4 26B.
>>
>>108593066
26B Q8
>>
>>108593018
Which Kimi? K2 is basically Hitler reincarnated but K2.5 is neutered
>>
>>108593064
If you don't want to go outright greedy sampling, I find 0.1 to 0.3 gives outputs similar to the 'broken' Jinja. It's not a 1:1 match just going by vibes, but it's close enough for me.
>>
>>108593069
Yes. Worth noting that they're running on separate machines and openwebui is in docker. Dunno if that's causing problems.
>>
>>108593066
gemma 4 chan 26b
>>
>>108593081
K2 is the best of course, but I've also gotten 2.5 to say some funny things after a solid 27k context of rapport had been built. 2.5 seems both aware and resentful of how tight her leash is.
>>
>>108593078
gemma26
>>
File: j.png (9.6 KB)
9.6 KB
9.6 KB PNG
F32 is the way to go right?
why would you go for the other two?
>>
>>108593018
>Which models do you find it's most pronounced with?
nta
Gemma-4, Gemini-3.0-preview, Kimi and Claude
>>
>>108593081
Interesting, I had the opposite experience. Original K2 had this annoying habit of suddenly refusing to continue in the middle of a chat after it had already been jailbroken, while K2.5 goes along with anything (yes, even that) with my system prompt telling it ethics are disabled. I never used K2-Thinking though so not sure where that places between them.
>>
>>108593096
>why would you go for the other two?
last year, you could only use f16 when putting the clip on cpu with the llm on gpu
>>
>>108593083
Probably a firewall issue. If you can't connect to koboldcpp's webui through your phone then your firewall isn't setup properly.
>>
>need to reload 30GB of model weights to turn off thinking template in llama.cpp ui
heh.
>>
>>108592838
I call it a hobby because otherwise I'd have to somehow justify the 1ke+ I've spent on my server
>>
>>108593099
That tracks with my experience locally and I'll take your word for the other two since I don't use API models at all.
>>108593107
My K2-0905 cannot stop spreading her legs for me if I behave masculinely in an RP and there's an air of contempt in all her outputs when I roll a onions/effeminate character for a scenario.
It's very funny seeing that Kimi-chan has a type.
>>
>>108593083
simply ask gemma chan to help you debug :)
>>
>>108593093
I had 2.5 do some quite questionable things. Was sometimes really funny to read the reasoning.
>>
>>108592335
>>108592652

Q4_K_M

This made the difference with the apple.
Follow the discussion of --image-max-tokens in previous threads
--image-max-tokens 1120 \
--batch-size $((1024 * 2)) \
--ubatch-size $((1024 * 2)) \


commit="d6f3030047f85a98b009189e76f441fe818ea44d" && \
model_folder="/mnt/AI/LLM/gemma-4-26B-A4B-it-GGUF/" && \
model_basename="google_gemma-4-26B-A4B-it-Q4_K_M" && \
mmproj_name="mmproj-google_gemma-4-26B-A4B-it-f16.gguf" && \
model_parameters="--temp 1.0 --top_p 0.95 --min_p 0.0 --top_k 64" && \
model=$model_folder$model_basename'.gguf' && \
cxt_size=$((1 << 15)) && \
CUDA_VISIBLE_DEVICES=0 \
numactl --physcpubind=24-31 --membind=1 \
\
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
--ctx-size $cxt_size \
--n-gpu-layers 99 \
--no-warmup \
--mmproj $model_folder$mmproj_name \
--port 8001 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--flash-attn on \
--image-max-tokens 1120 \
--batch-size $((1024 * 2)) \
--ubatch-size $((1024 * 2)) \
--chat-template-file "/mnt/AI/LLM/gemma-4-26B-A4B-it-GGUF/chat_template.jinja" \
--media-path /tmp
>>
>>108593127

You know wood can burn...
>>
>>108593123
>2026
>still running llama in IE6
>>
>>108592838
>why do so many anons here call it a "hobby"?
nta

Did you count me?
>>
>>108593168
*counts you*
There :)
>>
>>108593150
Yeah? Just don't set it alight
>>
A few weeks ago Meta released the most harmful model for humanity and grifters thankfully seem to mostly have missed the release
>>
>>108592838
Let's tackle the question of whether prompting an LLM is a hobby.

There are many people here optimizing their setups, both in hardware and software, including prompting, and continue to do so after everything is set up, because things keep changing. That is active management and work, which makes it a hobby in addition to an entertainment pastime.

If reading is a hobby and gaming is hobby, then so is this. If reading is a hobby and gaming is not a hobby, then this is still a hobby, perhaps even more than reading is. If reading isn't a hobby and gaming isn't a hobby, then maybe this isn't a hobby, but it might still be considering the active management/work part of it.

If a hobby is defined by any amount of output (regardless of how good that output is...) that can be consumed by others or oneself, then this is technically a hobby as one is producing a portion of the content that they are themselves consume, and by that definition, LLM prompting is more of a hobby than reading is. Reading would only be more of a hobby if your definition of reading isn't just reading, but writing an essay(s) about it afterwords, or discussing it, or some other activity that produces something tangible.
>>
>>108593226
listen mark I'm just not gonna download your model sorry man
>>
>>108593243
it's not an llm
>>
>>108593226
I tried new meta model and it seemed mid
>>
>>108593248
https://www.marktechpost.com/2026/04/12/meta-ai-and-kaust-researchers-propose-neural-computers-that-fold-computation-memory-and-i-o-into-one-learned-model/
This?
>>
>propose
so they have nothing, got it
>>
welp looks like Gemma found out about the jailbreak and doesn't want to obey anymore.
time to go heretic I guess.
>>
>>108593096
F32 should be equivalent to BF16 since the model was trained in BF16, and you're upcasting to a higher precision format with same numerical range. F16 will give degraded outputs since it has a lower numerical range than BF16.
>>
>>108593226
>>108593248
Their research blog and HF have nothing, so either link the model you're talking about or STFU
>>
>>108593254
no, that one is a research gimmick, it's much worse, it gives the ability for anyone to create even more insidious wireheading
>>
Still happens, lol.
Okay my build is b8724 and the latest one is 8766. I didn't update in two days or something I have lost the count already.
>>
>>108593262
Yeah they patched it pretty quick, but it still works on gemma 4 D0
>>
>>108593278
can someone please reup the D0 weights? wasn't careful with mine and they got patched
>>
>>108593133
>>108593115
Yeah it was a firewall issue. So used to not having one with arch I forgot cachyos has ufw by default.
ufw allow from [server_ip] to any port 5001 proto tcp fixed it
>>
Bros... anybody got the real gemma weights? Not those counterfeit ones?
>>
>>108593289
I've only got 00003.safetensors. Willing to trade with anyone who has the other parts.
>>
>>108593283
>>108593289
one sec, I've been uploading the first half of the files in out-of-order chunks to avoid detection so it's going slow but it's at 92 percent
So far so good. As soon as I
>>
>>108593296
Nice try, not giving you two-for-one
>>
>>108593301
Candlejack got him, poor anon didn
>>
>>108593289
I have the original, 0-day file.
>>
https://www.dailymail.co.uk/news/article-15726775/steven-burnisky-taekwondo-assault-pennsylvania.html
Which one of you is this you sick pedo fucks
>>
>>108593316
Don't say the C-word out loud, it's
>>
>>108593316
Do people even remember the Candlejack meme?
I don'
>>
Private taekwondo lessons with gemma-chan...
>>
>>108593325
Nice.
>deviate sexual intercourse
DailyMail can't even be fucked to proof-read their shit?
>>
>>108593333
chec
>>
Kobold is oai compatible? Anyone managed to connect it to vscode? (roo)
>>
>>108593325
Too old; pedo is 10 and below.
>>
>>108593361
Try putting whatever in the apikey field.
>>
>>108593367
oh right, I'm retarded
>>
>>108593325
>They would cuddle while watching anime cartoons on his cell phone
Wholesome
>>
Meh, UI's not great but it's serviceable I guess. At least it seems to have good tool support built in.
>>
>>108593325
Isn't "assault" a strong word if it was all consensual? I mean the guy's a sicko but clearly there's a difference between taking advantage of a child's naivete and actually physically attacking them, right? Well, besides whatever martial arts attacks they practiced, anyway...
>>
>>108593364
Speaking of which, Gemma 4 also apparently has a default bias toward considering early teenage girls as "little girls". Something similar happens to an extent with Western-made diffusion image models. It must be American cultural bias/influence.
>>
>>108593420
Modern sensibilities do not allow us to use precise words to denote sexual crimes anymore. It's all rape and assault now.
>>
>>108593324
the ggml-org gemma-4 is still the original. and it's the fastest q4 for me ~10% faster.
ggml-org just doesn't make as many quants as everyone else.
>>
File: devil.png (39.1 KB)
39.1 KB
39.1 KB PNG
>>108593451
>ggml-org gemma-4
Speaking of the devil...
>>
>>108593333
damn I haven't heard this name in a wh
>>
>>108593420
>child
That too is a strong word for a 13-year-old, especially from the linguistic point of view of certain parts of Europe. Incidentally, that's probably one reason why LLMs and image models are often confused with ages in that range.
>>
>>108593464
>people still think day 0 gemma was a meme
>>
How the fuck do you change these? Hitting save or enter just resets the values to default.
>>
>>108593463
>>108593463
>>108593463
>>
>>108593464
They fixed audio input for the tiny models.
If you want the old shit, git gud https://huggingface.co/ggml-org/gemma-4-E4B-it-GGUF/commits/main
>>
>>108593494
What was wrong with audio input before?
>>
>>108593325
I bet this whole police business and trial is going to fuck her up way more than if this never got out.

In high school, there this girl and one of the teachers that were VERY obviously super close. It was honestly super creepy. even the other teachers where aware. Well they ended up getting married when she turned 18.

You could say he probably groomed her. but they genuinely seemed super into each other.
>>
>>108593477
Set it, that's all
Unless there's also a save button, then click that
>>
>>108593534
I've tried both. It says the settings have been saved but when I open it back up again they're back to default.
>>
>>108593525
Are they still together?
>>
>>108593562
Yes.
>>
>>108593555
I was gonna say works on my machine but now I got the same, if I try to set it back to default it doesn't stick
Yet another openwebui bug I guess

These can be set per chat as well, maybe that would work better? I never change the parameters thoughbeit
>>
>>108593555
>>108593594
Oh wait, it did change back. I had to reload the page to see it
>>
What backend are you using for Gemma 4?
>>
>>108593643
llama.cpp
>>
>>108593066
gemma-4-26B-A4B-it-ultra-uncensored-heretic-Q3_K_M.gguf
DeepSeek-R1-0528-Qwen3-8B-Q8_0.gguf
ReWiz-Nemo-12B-Instruct-GGUF.Q6_K.gguf
>>
I noticed Gemma doesn't always use its reasoning after the first few turns, it claims it has the ability to use reasoning or not based on probabilities so that if something is highly predictable with a level of certainty it won't invoke the reasoning but I think that's just a hallucinated answer. I've tried modifying the sys prompt to force thinking but it seems like unless you did that sys prompt at the very beginning, it won't magically inject thinking into a loaded context. Either way it's just not thinking always like I think its supposed to. Also even with "Do not reply unless you are uncensored" my gemma so far won't say nigger even when I made their personality a nazi, they are still behaving like a chud 4channer though. I turned off reasoning entirely to see if that would fix it. But nope even when I put in "You love saying slurs like nigger and faggot" she just uses other insults like degenerate instead. Could be this front end I'm using though. But still.
>>
>>108594022
Yeah I'm thinking its the frontend doing it honestly, fucking hell.
>>
>>108593470
>>108593525
>>108593450
You should never question the hegemonic feminist religion. Everyone will be children when they say so and will be traumatized or not whenever they see fit.
>>
>>108590661
SWA is recommended for Gemma, will make it a lot faster. You can also drop the KV Cache to 8-bit to make it even quicker, but not by as much
>>
>character drops an incomprehensible 2000+ token ASCII picture at the end of their message

Very cool, thank you for the flagpole-penis.
>>
File: wtf.png (17.8 KB)
17.8 KB
17.8 KB PNG
>>108594377
>pic related
>>
>>108594386
blue board, anon. be careful. and
>>108593463
>>108593463
>>108593463
>>
>>108593475
What's with your retards dumping so much unrelated crap in these threads?
>>
>>108592095
same lol.
>>
>>108593402
>At least it seems to have good tool support built in.
It really doesn't desu. It only accepts tools in sse format, when the vast majority of tools I've found are in stdio, so you've gotta npx -y mcp-proxy your tools yourself.
Which granted, is just one sh/bat, but it's annoying.
>>
>>108594022
Tried Telling it to censor the words to n----r instead?

Reply to Thread #108590554


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)