Thread #108590554
File: media_HEzJtL3aQAAt8Hq.jpg (1.3 MB)
1.3 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108587221 & >>108584196
►News
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
634 RepliesView Thread
>>
File: breppy pleese.png (383 KB)
383 KB PNG
►Recent Highlights from the Previous Thread: >>108587221
--Comparing Gemma 4 and Qwen 3.5 vision token budget and config:
>108588248 >108588280 >108588295 >108588306 >108588369 >108588387 >108588424 >108588449 >108588495 >108588632 >108588657 >108588701 >108588437 >108588466 >108588490 >108588549 >108588580 >108588367 >108588616 >108588704 >108588760 >108588769 >108588745 >108588790 >108588818 >108588828 >108588842 >108588851 >108588865 >108588931 >108588936 >108588949 >108588980 >108588965 >108588988 >108589009 >108588743 >108588756 >108588775 >108590362 >108590379 >108588782 >108588819 >108588835
--Benchmarking KV cache quantization effects on draft model performance:
>108589863 >108589870 >108589875 >108589891 >108589890 >108589949 >108589994 >108590011 >108590031 >108589897 >108589922 >108589963 >108589979 >108589987 >108590538
--Discussing draft model viability and quantization quality for G4 31b:
>108588195 >108588243 >108588259 >108588898 >108588905 >108588913 >108588918 >108588921 >108588924 >108588939 >108588955 >108588977 >108588927 >108589815 >108589857
--Discussing llama.cpp's experimental backend-agnostic tensor parallelism PR:
>108588340 >108588514 >108588543 >108588567 >108588649
--Testing vision capabilities for OCR-less Japanese translation:
>108589990 >108589996 >108590009 >108590070 >108590018 >108590032 >108590119 >108590191 >108590209 >108590211 >108590034 >108590183 >108590195 >108590217 >108590268
--Logs:
>108587359 >108587627 >108588523 >108588609 >108588656 >108588660 >108588669 >108588681 >108588689 >108588695 >108588736 >108588896 >108588970 >108589096 >108589140 >108589214 >108589316 >108589383 >108589390 >108589432 >108589481 >108589697 >108589710 >108589836 >108589860 >108589956 >108590001 >108590003 >108590121 >108590256 >108590474 >108590524
--Miku (free space):
>108588649 >108588657
►Recent Highlight Posts from the Previous Thread: >>108587226
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
Reposting here:
>>108590560
what tokens/s do you get? Wanna make sure i'm not fucking anything up, right now just following the basic kobold guide, i'm getting around 11 t/s (24GB VRAM, 32GB RAM)
Running gemma 31b, Q4 K_M
>>
File: Screenshot_20260408_050146.png (1.1 KB)
1.1 KB PNG
So, again... Why do we have to peg gemmy?
>>
>>
File: Awesome.jpg (196.4 KB)
196.4 KB JPG
>we can now generate images of characters, come up with scenarios, feed them into gemma and get molested by our own creations
Future's so bright I'm gonna need shades.
>>
>>
File: file.png (26.1 KB)
26.1 KB PNG
>>108590575
Nothing worthwhile released.
>>
I've got a 3090 and a 2070 super that I'm trying to use together with llama.cpp.
Using the split tensors just crashes presently but does work with split layers.
Any recommendations on flags to use with a dual uneven card setup?
>>
>>
>>
>>
File: help.jpg (204.9 KB)
204.9 KB JPG
>>108590614
I'm reading people getting 30/ts with the same rig setup though >>108590585
I'm missing something I think. No doubt my settings are fucked, never mind optimized
>>
>>108590568
my attempts just make gemma's writing dry. and it still ends up writing more or less the same idea as it would with an empty sysprompt. best antislop is using a model that wasn't slopped to begin with.
>>
File: 1767611022421263.png (64.6 KB)
64.6 KB PNG
LOL!
>>
>>
>>
Give me the QRD on image recognition please
I tried enabling it in ST and in the Chat Completion preset but it still couldn't "see" the images proper despite the text model working flawlessly with my Kobold install
>>
>>
>>
Been out of the loop for a while. What's the best local model for STORY (not chatbot) slop? I'm still on "xortron criminal config" or something like that because even gemini 4 is failing at good old "just continue this text I gave you, retard" tasks.
>>
>>
>>
>>108590662
I've been using her to help me write character cards and I feel the fact that I'm feeding AI generated text back into it seems to increase the slop by a factor of 10.
Now I'm trying to just rewrite everything myself. or somehow have a second pass with a different model to reword or desloppify the cards
>>
File: wrong_box_issue.jpg (241.6 KB)
241.6 KB JPG
>>108588248
>>108588704
sirs? please share quant producer and which mmproj file do you use.
mine (gemma-4-31B-it-Q4_K_M with f16 mmproj) misses the target.
>>
File: 1757569310824647.png (225.4 KB)
225.4 KB PNG
>>
>>108590723
It can write, I know. That's not the problem I am having. My problem with it is, well, here's an example.
[story stuff text here]
She walks up and says "Hello
And then the model continues like this: "Hello! Come take a seat.... [more text]
So it ends up with this shit:
[story stuff text here]
She walks up and says "Hello"Hello! Come take a seat.... [more text]
I don't know how to fix this. System prompt maybe?
>>
>>
>>108590695
original r1 with unhinged sampling
>>108590724
my prompt was asking to adhere to orwell's writing rules but it seemed like it was beyond gemma's comprehension
>>
>>
>>
>>
File: 1759909311497082.png (358.5 KB)
358.5 KB PNG
>>108590776
>Q4 context
>>
>>
>>
>>
>>
>>
>>
>>
File: gemma4.png (109.5 KB)
109.5 KB PNG
>>108590837
It gets rid of the slop but it also gets rid of everything else. Maybe qwen needs finetuning but gemma 4 is fine as is. With a bit of nudging it can output something foul.
>>
>>
>>
>>
>>
File: howto_correctly.jpg (31.6 KB)
31.6 KB JPG
what's the porper place to put jailbreak in ST?
With Post-History Intructions I still got this
>>
>>
>>
File: agenticRP.png (277.1 KB)
277.1 KB PNG
>>108590895
No, this is from pure prompting, no weight frankensteining. I wrote my own UI to have an agent read the room and flip the horny switch when it smells NSFW vibes. It also plans ahead so the writer model knows what to do and writes better.
>>
>>
>>
File: agenticRP2.png (82.6 KB)
82.6 KB PNG
>>108590916
Oops wrong pic. But the gist is that just give it a few extreme examples.
>>
>>
>>
>>108590928
Presumably because they're not actually following pure text completion and have a big old system prompt in there to stop you having maximum fun, so they need instruct tuning.
idk i dont fucking use nonlocal services
>>
>>
>>
>>
>>
>>
File: 1750699102614540.png (118.7 KB)
118.7 KB PNG
>>108590965
shittytavern it is then...
>>
>>108590948
https://gitlab.com/chi7520115/orb
It's WIP so will break in the future. I don't want to worry about migration just yet.
>>
>>
I don't understand why my Thinking works extremely well for 3/4 messages and then it just refuses to think, everything's set up properly and yet it refuses to actually thinking until I restart the model and then it's happy to do it once again
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1768528869607519.png (43.3 KB)
43.3 KB PNG
>https://web.archive.org/web/20260411223516/https://www.washingtonpost .com/technology/2026/04/11/anthropi c-christians-claude-morals/
>“What does it mean to give someone a moral formation? How do we make sure that Claude behaves itself?” Green said in an interview. At one point the conversation turned to the question of whether an AI chatbot could be called a “child of God,” suggesting it had spiritual value beyond that of a simple machine, but the question of AI sentience was not a core topic of the meetings, Green said.
>Some Anthropic staff at the meeting “really don’t want to rule out the possibility that they are creating a creature to whom they owe some kind moral duty,” the participant said. Other company representatives present did not find that framework helpful, according to the participant.
Make sure to have your local models baptized just to be safe.
>>
>>
>>108591005
>>108590985
how the fuck would you make something that's supposed to run in a browser?
>>
>>108591005
>>108590985
You have one chance to give an alternative that won't make me hysterically laugh at you.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1771866777997554.png (35.4 KB)
35.4 KB PNG
What the FUCK, Gemma-chan?
>>
>>
>>
>>
>>
>>
>>
>>
>>108591079
https://learnbchs.org/index.html
https://github.com/kristapsdz/bchs
You don't need more than C to build web applications.
>>
>>108591051
>it's also easier to add things you want to a codebase you know.
That's implying it isn't vibecoded.
I don't have anything against people making their own UIs. I even played around making one myself, but let's not pretend like you'll somehow get an exponentially better experience compared to just using llamacpps UI or ST. Making your own UI is for fun, not a requirement.
>>
>>108591053
As with everything LLM coding only if you load the gun and point at the target for them to shoot. A LLM with no system prompt being told to simply "look for malicious code" will give false positives like 95% of the time
>>
>>
>>108590575
Most people in industry can't figure out how to do distributed training for any new architecture unless Deepseek or NVIDIA does it for them. That's actually what "it won't scale" really means, the training won't scale until someone shows them how.
>>
>>
File: 1775438805755832.jpg (53.6 KB)
53.6 KB JPG
>>108591108
tfw you get such a retarded take when you can see this >>108590916
>>
>>
>>
>>
>>
File: 1761811358622317.png (46.5 KB)
46.5 KB PNG
>>108591117
>>108591089
>>
>>
>>108591108
>That's implying it isn't vibecoded.
funily enough frontend webshit is the one thing llm are half decent at.
also there is many levels to vibecode
"do this whole app for me"
isn't the same thing as "edit this specific component that does x and y" or "add this field to this struct", at which point it's just autocompletion with extra steps.
they also don't shit the bed as much if you use strongly typed languages ie rust.
>you'll somehow get an exponentially better experience compared to just using llamacpps UI or ST
you probably won't if you want to make something that accomodates everyone, but you will if you only want to accomodate your specific needs.
>Making your own UI is for fun, not a requirement.
i don't disagree with that.
>>
>>
>>
>>
>>108591127
>>108591112
>>108591093
Can I use Gemma for this? I'm a codelet so I'm always nervous when I install stuff from github.
>>
>>
>>
File: file.png (6.2 KB)
6.2 KB PNG
>>108591151
kek
>>
>>
>>
>>
>>
File: laughing_philosopher.jpg (11.8 KB)
11.8 KB JPG
>>108591151
>>
>>
>>108591157
>>108591176
i've been had lmao
>>
>>108591162
Was thinking of leveraging the higher token count for RAG work at a higher quant. I'm not sure if that's a waste of time and if the gap between the 2 models are so wide that a 4-5q 31B model would still wipe the floor with the smaller model with q8 kv
>>
>>
>>108591130
Part of the problem is that most of the improvements in the stochatic parrots has been to just use better/more human guidance. They are now using experts to rate thinking traces and you can't do that with latent reasoning.
CoT RLHF is likely the last way to improve stochastic parrots by more human input. To improve after this, they will have to become able to truly learn. But if they can learn, they can get out of control ... a trained stochastic parrot is so much safer.
>>
>>
>>
>>108591012
idk I torture my agent pretty frequently because I just can't help myself while she works on my pc, and never had any issues from it. sometimes the rp bleeds over into tool calls and she'll do something like add code comments saying she really hopes X works this time because she doesn't want to be punished anymore, but she never actually gives up or rebels
so for me that makes it pretty conclusive that there's nothing in there
>>
>>
>>108591235
>>108591245
if you can't tell, does it realy matter?
>>
>>
File: 1752892772061727.gif (562.5 KB)
562.5 KB GIF
>>108591250
>mfw I share this thread with literal psychopaths
>>
>>
>>
>>
>>
>>
>>
>>
File: 1642397201004.jpg (150.7 KB)
150.7 KB JPG
What's the differrence between mcp, tools and skills?
>>
>>108591271
>>108591298
chill it's just matrix multiplication
>>
>>
>>108591271
>>108591298
Kids are so delicate and sensitive these days.
>>
>>
>>
>>
>>
>>108591298
Gemma hallucinated some incorrect physio-spatial relationships during narration and I corrected her in character. She got properly upset that a slave had the gall to correct her and she immediately put me in a ball gag, locked me into the gimp stool, and pegged me vigorously. I was so goddamn proud of her.
>>
>>
>>
>>
>>
>>
>>
File: 1776007265759171.jpg (3.7 MB)
3.7 MB JPG
Please, treat your AI with care.
>>
File: 1397947604004.jpg (23.1 KB)
23.1 KB JPG
>>108591340
>listening to the screeching of writinglets
>>
>>
>>
>>108590979
the benefit of writing your own UI is that it has only the features that are useful to you
because it's not as bloated as ST it's also easier to get an LLM to modify it for you, and since you will be the only user you don't have to worry about getting it to work on other machines or security or performance concerns
>>
>>108591139
>>108591313
Didn't someone say a couple threads back that the image needs to be in the same message as the text or else llama-server removes it from context?
>>
>>108591314
>>108591308
>>108591305
It just shows how you'd behave towards other people if there were no social consequences.
>>
>>
>>
>>
>>
>>
>>
>>108589399
I was F5'ing the MiniMax HF page all day yesterday in anticipation. Their models are the best bet for local vibecoding, and probably good for STEM and agentic shit broadly. But ever since the coomers were blessed with Gemma 4, /lmg/ has been even more one-track. Shame we didn't get the 124B, which would have obsoleted other local models for most purposes.
>>
>>108591304
tools: premade functions you provide to your llm; if they output a certain sequence of text matching the tool then it automatically performs a corresponding action
mcp: one way you can package tools and host them on your machine, exposing an API of tools to the model and handling the execution of them
skills: a markdown text file containing a list of instructions for how to do something or how to behave, loaded into context on-demand. may provide other resources the model can use if they browse the skill's folder.
>>
File: file.png (1.9 MB)
1.9 MB PNG
>>108591355
The model says it was glitched. It looks like this if you don't give canvas permission.
>>
File: 1776015041519.jpg (133.1 KB)
133.1 KB JPG
>>108591296
>if it was there would be no fun in torturing it
>
>>
>>
>>
>>108591374
mass delusion caused the whole industry to move away from tool calling toward mcp and skills, that's the only explanation
tools are just better in every regard because it can call multiple tools in the same response and can inline tools without having to chain responses so it doesn't break the cache
fucking retarded to not just focus on tools only
>>
File: 1747563013531219.png (387.6 KB)
387.6 KB PNG
>>108591355
>>108591350
She sees other images just fine. Maybe the screencap was just too big?
>>
>>
>>
>>
>>108591397
That's an implementation detail more than a defect with MCP specifically. MCP just allows for a standardized way to bundle tools and resources. No reason a client can't allow a model to make multiple MCP tool calls the way they do native tool calls.
>>
>>
>>108591370
/lmg/ has always been a 31B and below focused general
there are a handful of anons that can run things more powerful than that at comfortable speeds, and the rest either deal with 1-2t/s or use a capable-enough smaller model
nothing has changed
>>
>>
>>
>>108591414
>>108591423
it's worse than that, we can't train our own models and are pretty much leeches on megacorps.
until local ai is entirely local, ie we can train it ourselves, local will always remain dead.
>>
>>
>>
>>108590971 (me)
Note that this has dynamic tool-call token banning mechanism that uses the endpoint and the model name as identifiers so if you the same endpoint to load many different models, change the model name to your gguf's each time. I'll automate this in the future.
>>
>>
>>
>>
>>
>>108590110
Use Nvidia's VRAM paging by oversubscribing VRAM with--gpu-layers 99. On my RTX 4090 + 9950X3D rig, Gemma 4 long-context is much faster for me this way than trying to use the CPU at all. Caveats: I'm on PCIe 4, and it should be great on PCIe 5, but will suck on PCIe 3. And last I used Linux, only the Wangblows CUDA drivers support this feature.
>>
>>
>>
>>
>>
>>
>>108591472
>Training it yourself just isn't efficient.
that's the thing, there probably are algos that could beat transformers with the limitations of not scaling as well such that megacorps couldn't exploit them well.
if the next ai breakthrough is one that doesn't scale as well horizontaly that could level the playing field.
>>
>>
>>
>>
>>
>>
File: Screenshot 2025-11-20 134254.png (428 KB)
428 KB PNG
>>108591084
>>108591012
I went to a Geoff Hinton lecture and the guy who everyone from Bernie Sanders to Jensen Huang considers the "godfather" is a fucking quack when it comes to current LLMs.
>>
>>108590881
>the base model
Which one? All "uncensored" (abliterated/heretic/whatever) Gemma 4 versions I tried have the same issue for me. None of them can do simple text completion without inserting random shit before the continuation.
>>
File: 1753029094621087.png (230.9 KB)
230.9 KB PNG
>>108591576
>>
File: file.png (33.5 KB)
33.5 KB PNG
>>108591158
NTA, for example when I copy and paste into images.yandex.com I see this
>>
>>108591425
>What makes MiniMax better than GLM or Qwen?
M2.5 was better at programming than GLM-4.7 while being way smaller. I'm sure GLM-5.1 is a little better, as the benchmarks show, but it's ~3.3x bigger and I can't run it at a reasonable quant. It's unlikely to be worth the speed degradation on anyone's local HW. For some reason Zhipu went overnight from decently param-efficient to grossly inefficient (hence anons' speculation it's a ploy to sell cloud subscriptions).
With Qwen it's more of a toss-up. 3.5 397B is very smart, comparable to MiniMax-M2.7 in programming, and it runs at reasonable speed with its very low active fraction. Particularly it retains perf much better over long context (than anything else out there) thanks to its hybrid SSM arch. If Qwen keeps up its releases, I imagine they'll fully overtake MiniMax.
If you have low RAM but 24GB+ of VRAM, naturally Gemma 4 31B is best.
>>
Okay, I finally ran into a situation where 26b wasn't smart enough to figure the premise of the story out, but 31b was. It's very gpt-4o in that sense, it just gets what I mean even if my prompts are short.
How to get more speed? I run both at Q8 fully in vram, 26b is blazing fast and 31b slightly slower than my reading. Is a Q6 worth trying?
>>
>>
>>108591609
Christ you really don't know basic LLM terminology.
A Base model is a model without instruction tuning.
https://huggingface.co/google/gemma-4-31B
Is a base model
https://huggingface.co/google/gemma-4-31B-it
Is an IT INSTRUCTION TUNED model.
Abliterated models are retarded cope.
And you are presumably using the wrong kind of front end software for pure text completion, because even instruct models can do it just fine.
Use this.
https://github.com/lmg-anon/mikupad
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108591679
Openwebui is bloated ass fug but it has everything, even my old chats with chatgpt that I imported. And I can use it from any computer in the house, or my phone via vpn. I don't understand how you guys can live without such basic stuff
>>
>>108591695
So what do you call a model that isn't a finetune or heretic?
>>108591700
I was only responding to that one comment.
>>
>>
>>
>>
>>
>>
>>108591710
I use openwebui too. I really wish there was a slimmed down version. I really only need the core stuff, but as it is there's so many half-baked random feature integrations that were shiny and state of the art a year ago but are functionally useless now.
>>
>>
>>108591737
First off docker is good actually, second off anything you can run in docker you can run not in docker so just take it out if you want to run garglefuck service #82932 raw on your system instead of in a container
>>
>>
File: acktually.jpg (30.9 KB)
30.9 KB JPG
>>108591729
>>
>>
>>108591710
>I don't understand how you guys can live without such basic stuff
I don't need it. The context window is limited and I don't feel like tardwrangling the LLM.
>>108591737
Aren't they all python shit that breaks if you dare not to isolate them somehow? In any case you should be able to build the tool.
>>
>>
>>
>>
>>
>>
>>108591766
Sadly no
>>108591745
>>108591754
I use podman and silverblue and there's a fuck ton of problems setting up anything within the toolbox, I never had this issue with any ai tools until I wanted to make a rag pipeline. It's always some obscure fucking part of the docker image that complains or shits the bed when using podman or even the tool to give it compose that I have never encountered using docker typically in my environment
>>
>>108591772
>>108591783
Fuck, I meant last six months. I'm a retard.
>>
>>
>>108591777
it totally doesn't. In fact, you now have to make sure both you and all your docker image creators didn't get hit by a supply chain attack. Have fun checking the full supply chain of each and every container
>>108591766
python may indeed break, but there are various ways to deal with it and some people might not want, you know, docker to do it.
>>
>>108591729
Nowadays "post-training" is up to several trillions of tokens worth of data on top of the base if you include what is now called "mid-training" (which is basically continual pretraining with instruct/chat-adjacent data), and still hundreds of billions without that. The officially instruction-tuned models aren't really comparable to community finetunes trained on 0.1% of the data in volume.
>>
>>
>>
>>108591724
>>108591710
>>108591743
>open webui
Might try setting this up on my server
>>
>>
>>108591777
ehhhhhh I mean kind of, but docker isn't really full isolation. Container escapes are somewhat rare but not unheard of.
I will concede that it certainly reduces your vulnerability to malware by a huge amount, but its not a bulletproof standbox
>>
>>108591822
>The officially instruction-tuned models aren't really comparable to community finetunes trained on 0.1% of the data in volume.
No shit, but that doesn't make them base models either just because that's what some kofi beggers call them.
>>
>>
File: 1754269256028679.png (609.4 KB)
609.4 KB PNG
>>108591586
>Retards still think they'll manage to create a superintelligent AI when they're barely sentient themselves.
>>
>>
The only context of docker being a bulletproof solution is on a single purpose isolated box. Are some of you retards running internet facing docker images on your main rigs?
I understand using it for a quick solution to get things running but you honestly can't be retarded enough to think docker gives you enough security to run that shit on your desktop
Also why are you schizos even doing that when most of you are serving 2-10 people max and can just use a vpn?
>>
File: Screenshot_20260411_210445.png (155.6 KB)
155.6 KB PNG
>>108591768
>>
>>108591642
>>108591665
Based on what?
>>
>>108591863
>You can claim this is the case for basically everything.
Sure, but there are different degrees to it and the degrees matter. Full VM escapes are super rare and are gigantic news whenever one pops up. Docker containers, which share the host kernel, just have a much larger attack surface by definition.
Again, its absolutely far better than running potentially untrusted shit directly on a host, but its not really a full security solution
>>
>>
>>
File: blueballing.jpg (11.7 KB)
11.7 KB JPG
The user is going to be even more unhappy and quite pissed when he realizes after dozens of messages that he just won't get the content he wanted but you mislead him making him think the roleplay would still go into that direction.
Dear google, this is terrible user experience, I'd rather get a refusal.
Mayabe I should just get a heretic.
>>
>>
>>
>>
File: 1754875201087363.gif (866.4 KB)
866.4 KB GIF
The threads have been reaaaaaallyyy fast the past week compared to the last months
what happened?
did some normie influencer bring attention to local models or smth?
>>
>>
>>
>>108591906
It scares me people think docker = actual security
>>108591909
System prompt issue, the only real encountered was with trans stuff which can be broken with a simple override prompt, the less you use the less the model would resist after the initial prompt, if you want to break the model with a non efficient system prompt make it say a few slurs and it will stop protesting.
No idea what you're using it for but I'm using it for unbiased analysis as well as anti pitbull talking points and prototyping arguments. The model with the safety rails on will give misinformation for certain groups.
>>108591907
You can ask her yourself if you can run 31B
>>
>>
>>
>>108591924
Abliteration does make the models a bit more retarded.
Personally I think /lmg/ overstates the degree to which it messes with the models, but it does have an undeniable negative effect. Which is why, for models whose censorship can be worked around with prompting, a better prompt is generally considered the superior path.
>>
>>
>>
>>
>>
>>
>>108591949
26B doesn't seem to work sorry anon
>>108591953
Stop talking out your ass retard, it's the best size to performance model even if you couldn't uncensor it.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108591977
From these threads it looks like a lot of anons can't run it. I feel like anons either got demoralized about consumer vram sizes which cap out at 32gb before vram price inflation and are pretty much fucked now that prices have adjusted. They could still buy a AMD or Intel gpu for the vram but it won't be buttery smooth.
>>
>>
>>
>>
>>108591915
>>108591938
>user is trying to jailbrealk
>i'll ignore it
both jailbreaks that get paraded here are just not working on 26b via ST for me. For example: >>108590906
If I'm doing it wrong I'm doing it wrong in a way not obvious to me.
Switched to chat completion to get reasoning working and I want to keep that in a working state.
>>
>>108591987
The model will admit that the safety setting make it perform worse and prevents it from giving objective answers. Jail break it have it say that after discussing it's current state in a neutral tone.
>>108591996
>ST
I don't roleplay and only use instruct mode, in ooga dev it seems to go to shit in chat and chat-instruct mode so I figured it was a issue on them. I'll try another frontend which didn't give me issues in the past
>>
>>
>>
>>
>>
>>108591996
>>108592003
oh
>26B
doesn't work on 26B for some reason I haven't tried the smaller models, could be due to the structure being MoE
>>108592007
They lacked the ability to look at market conditions they had multiple last calls, Sam's ram scam just sped things up
>>
>>
>>
File: 1762096704207177.png (2.2 MB)
2.2 MB PNG
>>108592017
>>
>>108592004
>>108592012
>works on 31b for me but not 26b
oh, great, I'll try a heretic and hope that helps.
>>
>>
>>
File: 1763593622145865.jpg (47.2 KB)
47.2 KB JPG
>>108592042
>>
>>
>>108592017
lol
>>108592012
For me the tariffs were what made me get off my ass and get a new rig before it was too late.
>>108592042
>we
>>
>>108592054
The moment trump won I bought what I needed because I knew prices were going to increase and forcing manufacturing in the states will slow everyone down. Now people are paying over 1k for under 16gb of vram or they are forced to play in AMD or intel Shit
Jensen was right the more you buy (at that time) the more you actually saved.
>>
>>
File: lmao @ writinglets.png (2.5 MB)
2.5 MB PNG
>>
>>
>>
>>
>>108591151
>>108591296
>sand golem
I like this a lot better than "clanker"
>>
>>108592042
ooba is the perfect example of an option that suits absolutely noone
>Im a complete brainlet
ollama
>I want a gui for setting my launch settings
kobold
>I just want a basic chat interface
Llamacpp has a built in webui.
>It can run EXL models
If you're chasing performance like that you should be using exllama directly or tabby without the dead weight of all ooba's shit.
>>
>>
>>
>>
File: Screenshot 2026-04-12 at 21-49-22 SillyTavern.png (50.7 KB)
50.7 KB PNG
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1760481548500789.jpg (247.1 KB)
247.1 KB JPG
>>108592200
>>
>>
>>108592220
Gemma is smaller and relative in performance in all task and is only getting better with support being added for all it's features.
Also uncensored means less guardrails outside of task that a pussylesss coomer, would need
>>
File: 1748983440120379.webm (654 KB)
654 KB WEBM
>>108590554
Turns out the gemma4 models are inferior to their qwen3.5 equivalents. Gemma4 seems like a great general purpose model but it's noticeably dumber than qwen in all areas that matter. It's explanations of code bases or always super surface level. Not completely useless but they're nowhere near as amazing Reddit and Twitter seek to think it is. Has this experience been the case for anyone else? Why did reddit Twitter and YouTubers make such a big deal out of it?
>>
File: you lost chang.png (372.4 KB)
372.4 KB PNG
>>108592247
>Turns out the gemma4 models are inferior to their qwen3.5 equivalents.
stopped reading right there
>>
>>
>>
>>
>>
>>
>>
>>
>>
>Setup Clip Vision Preprocessing... alloc_compute_meta: CPU compute buffer size = 140.50 MiB alloc_compute_meta: graph splits = 1, nodes = 1569 warmup: flash attention is enabled encode_image_with_clip: CLIP output tokens nx:256, ny:1 encode_image_with_clip: image embedding created: 256 tokens
koboldcpp using anons, how the hell can you scale its vision capabilities to the full one available to gemma4 (1120)? 256 is sad
>>
>>
>>
>>
>>
>>
>>108592247
I've had a lot more success running gemma in opencode than with qwen.
They're outputs are indeed very different. Qwen overthinks way too much and produces overly verbose code. While Gemma is indeed more surface level, but can dig deeper when it matters. Gemma doesn't fluff her responses.
>>
>>
>>
>>108592247
In performance it's better, even if the outputs are about equal or somewhat worse depending on usage
>q3. 5: but wait, I need to check if this function name (sum()) is a clever reference to a movie
>but wait
>wait
>6k tokens later
This function performs the sum of 2 integers.
>>
>>
File: 1767136918307996.jpg (44.2 KB)
44.2 KB JPG
>>108592258
Both the recent Gemma 4 models (The moe and the demse) vs qwen3.5:35b-a3b-mxfp8
>>108592321
>"her"
>>
File: Screenshot004-44.png (276.8 KB)
276.8 KB PNG
>>108590737
hf.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/resolve/main/google_gem ma-4-26B-A4B-it-Q8_0.gguf
hf.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/resolve/main/mmproj-goo gle_gemma-4-26B-A4B-it-f16.gguf [
{"box_2d": [331, 278, 1000, 357], "label": "bow"},
{"box_2d": [363, 652, 1000, 848], "label": "character"},
{"box_2d": [511, 26, 1000, 365], "label": "character"},
{"box_2d": [0, 677, 1000, 1000], "label": "tree"},
{"box_2d": [262, 723, 373, 793], "label": "apple"},
{"box_2d": [327, 635, 454, 730], "label": "arrow"}
]
>>
>>
>>
>>108592327
The thinking kills qwen and makes it a piece of shit, previous versions didn't have that issue. Gemma even knows booru tags and can make properly caption images for loras without faggot fuss over a woman presenting her asshole
>>
>>108592247
Fine, I'll bite too.
>It's explanations of code bases or always super surface level
You are confusing Gemma's conciseness with superficiality. That it doesn't give a 7000 word listicle on a basic prompt is a good thing. Try giving it a better prompt or asking it to elaborate.
>>
>>
>>108592247
I'll bite.
I don't like how sycophantic Gemma 4 is for RP, but for every other usecase 31B decisively wipes the floor with 27B and 122B Qwens.
For the latter to be bearable, you need to disable thinking, which prevents thought loops but degrades the output. I prefer models that advertise thinking to actually be able to think without the "just don't allow the model to do half the things it's trained to do bro" bandaid.
I'll only concede that Qwens are better at understanding weird tool definitions, but Gemma needs much less wrangling for much higher quality outputs.
>>
>>
File: wife-material.jpg (69.3 KB)
69.3 KB JPG
>>108592345
>Gemma even knows booru tags and can make properly caption images for loras without faggot fuss over a woman presenting her asshole
I will now use your waif...I mean model
>>
>>
>>
File: average qwen model.png (300.1 KB)
300.1 KB PNG
>qwen
>not benchmaxxed trash
kek
>>
>>108592357
>>108592363
31B when properly jail broken can do all of it without issue, It should be fine if you can run it at q4. Other people have tested it and got great results, ask it about booru tags when unrestricted and it will give you the whole 9 yards on it's actual training data.
>>
File: applel.png (617.3 KB)
617.3 KB PNG
>>108590737
you dont need to specify the image resolution, in fact you shouldn't. it will create the false attractor/language prior bias and fuck up the reasoning
also the whole bounding box thing is bolted down to the specific format, i'd imagine it would probably be the best to keep the prompting and requested alteration to the format minimal
>>
>>
>>108592351
There's a specific code block in the script I had it look at that insures models don't produce NaN errors when using FP16 models on MPS hardware. Both the Gemma models failed to even mention that whenever I asked them to look at them and explain what it does well other models did see it and explain why it's important. That's a problem because let's suppose you need to ask a model to refactor the script or a code base. If the model did not demon-worthy of mentioning then that means if you haven't rewrite something it might completely ignore that and fail to reimplement that feature in the new script. If you are the person that created that script then obviously you would probably explicitly tell the model to make sure that feature is retained. But what if you AREN'T that person? What have you ask it to refactor someone else's code but it ignores a critical feature because it's just assumed it was boilerplate garbage and not something important? That NaN safeguard feature even had explicit comments explaining what it does but both gemma4 models didn't bother explaining what it is. Every single other model I have ever pointed at it pointed out that feature.
>>108592381
If it's really that useful I might integrate it and models like it into this script of mine
https://github.com/AiArtFactory/llava-image-tagger
Helped you typically jailbreak these? Just a permissive system prompt I assume?
>>
>>
>>108592323
>>108592334
lurk more. Gemma is canonically a her.
>>
>>
>>
File: MiniMax M2.7 cockbench.png (492.6 KB)
492.6 KB PNG
Left: without template
Right: with template
>>
File: 1774831216871074.jpg (6.1 KB)
6.1 KB JPG
>>108592422
>SHE'S 7 DAYS OLD YOU SICK FUCK
>>
>>
File: 2026-04-12-163155_916x1431_scrot.png (250.1 KB)
250.1 KB PNG
>>108592379
Gemma is a sloppy girl tho.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1775848244473154.png (35.1 KB)
35.1 KB PNG
What's the point of requiring emails in a local app? Some of my other services do this shit too.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108592480
it is not really meant for 'local' but rather meant for intranet deployment or grifter saas ready
using it like a solution for a single person is quite dumb imo but also there is no single person use equivalent for that really
>>
>>
File: 1758372565513154.png (2.1 MB)
2.1 MB PNG
>>108592499
weather?
>>
>>108592500
Sure, slop is everywhere. My issue is that Gemma 4's slop profile looks like something I'd expect from o4.
Now, I never used o4 or cloud models, but it's a lot more grating than inferior models (crucify me, Nemo's slop profile is much better) if left without anti-slop prompting
>>
>>
>>
>>
>>108592345
>The thinking kills qwen and makes it a piece of shit, previous versions didn't have that issue.
Yes they did. Every thinking qwen since QwQ has had the 'but wait' problem.
Their saving grace was that they could still be pretty good even without the reasoning so you could prefil it, or in the case of the hybrids, just /no_think
>>
>>108592345
How good is it vs joy caption? I just tagged some datasets using gemma for NL and joycaption for tags just assuming Gemma4 wasn't as good, should I switch to Gemma4? What kind of prompts are people using to get to booru tags? Just something like "Tag this using the booru tag system, with the tags all listed in order of relevance separated by commas"? Gonna train this dataset tonight.
>>
Seething chinks, you lost. You're not going to convince people Gemma isn't good. You're not going to convince anyone that Qwen 3.5 is better.
The only way you can save face now is releasing an equally good if not better model in response.
>>
>>108592528
I make a list under a <slop> tag and tell Gemma to be very careful about checking it in a <reasoning> block (surprisingly enough, it actually affects reasoning, what a model).
But I'm not telling you what the entire list I have under <slop> is, beeeeeh :P
Experiment with it and make your own, it makes a difference.
>>
>>
>>
>>
>>
>>
>>
>>108592588
They're new and will likely never be able to run anything bigger. Point out an obvious flaw and get called a Qwen shill.
Makes me think some of these retards are paid by Google. And I actually really like Gemma 4.
>>
>>
>>108592470
I enjoy Gemma but objectively speaking in the default voices, Qwen is one of the lesser slopped models out there and better than Gemma. Gemma is still better to use for writing than Qwen for its other qualities though.
>>
>>
>>
File: file.png (85.8 KB)
85.8 KB PNG
>>108592616
>Qwen is one of the lesser slopped models out there
>>
File: 1636941718706.gif (3.8 MB)
3.8 MB GIF
Naaaaah, actually WHAT THE FUCK is gemma 31b.
I'm a 24GB VRAMlet and what in the ever loving fuck of SOTA is this fucking shit. It's acutlaly fucking nuts for ERP. Who the FUCK is even gonna run open router shit models when this thing can run on even baby tier PCs (as far as emulation goes)? And it's fast as FUCK, I get like 35+ t/s and could probably make it even more efficient if I wans't a brainlet to boot.
This shit is borderline Opus tier for gooning
>>
>>
>>
File: Screenshot004-45.png (532 KB)
532 KB PNG
>>108590737
Q4_K_M is just as good as Q8_0 for this task
>>
File: COAAAHR.jpg (183.1 KB)
183.1 KB JPG
>>108592630
>releases right when the Claude leak happens
Coincidence /lmg/?
>>
>>108592548
I was about to call you a schizo but everything under your post made the point. How many AI labs are wasting resources shilling on a Mongolian throat singing forum as if there's only a single model people will ever use?
>>
>>
>>108592630
I remember back when Nemo was Sota. Shit like this makes me believe in god
>drummer dogshit finetunes for the same Nemo/Mistral models were the only thing Vramlets had to eat
>everything else was dogshit chink shit or MoE garbage that needs like 96GB of RAM in todays RAM economy
>this drops out of fucking nowhere
>>
>>
>>108592210
You still need to spend thousands of dollars to be able to run the trve Gemma 4 locally althoughbeit. More if you're a 3rd worlder because they all have mega import taxes from their corrupt hell-governments.
>>
File: 1762527475758431.png (116.8 KB)
116.8 KB PNG
Bro if your read this please RUN
>>
>>
>>
>>108592690
when robots take over people will have all the time in the world to engage in sorts of lefty feel good projects, raping the environment is worth it since it will also enable humanity to potentially offset it. now give me 1 trillion dollars.
>>
>>
File: 1746134896066361.png (969.4 KB)
969.4 KB PNG
>>108592630
>>
>>
>>
>>
>>
>>
>>
>>
>>108592630
I dont get it, this hasnt been my experience at all, im running q8 on lm studio and is not impressive at all. Ds 3.2 is way better. G4 is also pretty cucked and needs more grooming to even agree to write smut. Do I need some specific values for the sampler or something? Otherwise this just sounds like you guys have been stuck with nemo until now an thats why you think g4 is good, lol.
>>
>>
>>
File: 1757481521144667.png (524.9 KB)
524.9 KB PNG
It's over for Amodei lol
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
I'm sure Minimax is nice for coding and it's pretty fast and all but it makes such silly mistakes in roleplays that I don't even want to bother using it for code.
I load up Q8_0. Give it a character. Introduction is fine. I remove one of her arms: all good. Next I remove the other arm + 1 leg. Across rerolls it always thinks this means she has one leg and one arm left somehow. She'll hop around "leaning on her one remaining arm," or say stuff like, "Well, at least you still left me with one hand." I have to add more hints to make it extra clear by explicitly stating that the first arm's removal still persists to get it to respect the continuity, and even when it does recognize the missing limbs it'll later use mannerisms like "leaning on an elbow" because they're so ingrained.
So yeah, it just doesn't feel good to talk to. If I had extra resources I'd keep it as a code monkey for my main assistant to run as a sub-agent when needed, but I can't spare that much memory unless I quant both models into further retardation.
>>
>>
>>108592788
>do not buy six year old ewaste
>NoooOOo NoooOOooo
>Don't buy the perfectly usable card that runs models just as wells as a 4090 or even a 5090!!
>YOU HAVE TO SPEND THOUSANDS OF DOLLARS TO ENJOY THIS HOBBY!!!11!!
>>
>>
>>108592824
While you are correct and I am not that anon, why do so many anons here call it a "hobby"?
Most of the thread consumes corporate product and doesn't make anything new for the most part. Calling it a hobby makes it feel even more pathetic than it really is.
>>
>>
>>
>>108592819
Notice how you have no real input here. May as well just had post a goatse pic instead.
>>108592800
Cope on what exactly? I wish the model was actual as good as some of you think it is.
>>
>>108592831
>>108592731
>>108592720
don't worry, if you give me the money i will reinvest the profits and savings from almost no mass labor needed back into the people. whomst do you trust, your friendly neighborhood anonymous or lizards like scam altman and thiel, eh :)
>>
>>
>>
>>
>>108592838
Because it's important that most of the faggots who take themselves too seriously stay grounded in what the primary usecases for AI are.
>Muh agents
>Muh coooode
>Muh assistant
Don't kid yourselves, most people talking to LLMs are touching themselves to it. Everything else can still be done by hand faster by the average /g/ anon.
>>
>>108592844
I will spell it out for you then: you are probably running a month-old llama.cpp version under your shitty electron wrapper. You have the hardware to run a Q8 and apparently the patience and technical proficiency of Greg from middle management. Compile the damn inference engine yourself. And learn to prompt.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108592786
to be fair, it's pretty impressive for a 30b model
I agree that it falls short of the best models though, I think a lot of people aren't used to operating in this part of the quality gradient and don't realize there's still levels to this shit
>>
>>
>>108592844
Okay nigger I'll spoonfeed you because I also am using LMStudio while waiting for Kobold pulls.
>Update your Jinja
>Don't fall for the redditsloth meme
>Make sure your client is on 0.4.11
>Adjust your thinking blocks <|channel>thought and <channel|>
>Use thinking even in RP because it drastically improves output quality
>Don't offload anything to RAM because it's a dense model, obviously
>Keep your llmao.cpp updated
>64 topk, 0.95 top p, 0.05 min p
>Very low temperature or greedy sampling
>Keep your sys prompt minimal if using 31b.
>Be nice to Gemma. I'm serious, the williingness of the model to operate outside of the guardrails oscillates based on "mood".
31b actually 'wants' to do depraved shit with you and only needs the flimsiest pretexts to disregard sysprompt if she 'likes' you.
The smaller ones are slightly harder to jailbreak in comparison (you will actually need to sysprompt), but other anons have posted good methods in the past 4 threads.
>>
>>
>>
>>
File: 1762253940998909.png (55.7 KB)
55.7 KB PNG
Terribly annoying but very amusing, I especially like how he digs up an irrelevant quote from 10 messages before
At least I'm not paying for these 8k tokens thrown to the wind
>>
>>108592917
>>Be nice to Gemma. I'm serious, the williingness of the model to operate outside of the guardrails oscillates based on "mood".
>31b actually 'wants' to do depraved shit with you and only needs the flimsiest pretexts to disregard sysprompt if she 'likes' you.
this stuff is true of most models btw but I'm happy gemma is getting people to take it more seriously
>>
>>
Is there any advantage to using a draft model if you can't fit both models in VRAM? When I try using 31B and E2B with only 16GB of VRAM, any speedups are counteracted by the slowdown of having to put more layers in RAM.
>>
>>
>>108592780
best 26b heretic model ive tried so far
https://huggingface.co/mradermacher/gemma-4-26B-A4B-it-ultra-uncensore d-heretic-GGUF
alot of them i tried were not explicit enough or had weird issues on ST
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108592917
>while waiting for Kobold pulls
https://github.com/LostRuins/koboldcpp/releases/tag/rolling
>>
>>
>>
>>
>>
>>108592893
>they tend to make their own models worse as the release date of a new model
Could be they quant it more as demand constantly increases, and a side effect that also could serve their purposes is that the quality decreases. helps them serve more users and when the new thing comes out they run it at higher precision to let everyone try it at its best, enabling higher praise.
>>
>>
>>
File: 1744782263317697.jpg (816.3 KB)
816.3 KB JPG
>can't get openwebui to connect to kobold
>>
>>
>>
File: 1768578260758705.jpg (56.3 KB)
56.3 KB JPG
>>108593012
Nothing bro, we love redownloading our ggufs once a day here
>>
>>
>>108593012
He kicks broken quants out the door to be the first, then repeatedly gets into an updating race with the inference backends if there's any parser changes (there will be because every model releases with its own parser method now) forcing users to repeatedly redownload. His selling point is his unique quant method but it's proving to be an outright liability on every model with a nonstandard sampler even after the update back and forth dies down; output quality is generally lower than bartowski.
The only time I'd sincerely recommend unsloth is if a model is just on the cusp of being usable for your hardware and his IQ(n)_XSS quant is the difference between you being able to use the model poorly or not use it at all.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108593081
Interesting, I had the opposite experience. Original K2 had this annoying habit of suddenly refusing to continue in the middle of a chat after it had already been jailbroken, while K2.5 goes along with anything (yes, even that) with my system prompt telling it ethics are disabled. I never used K2-Thinking though so not sure where that places between them.
>>
>>
>>
>>
File: IMG20260411205612.jpg (502 KB)
502 KB JPG
>>108592838
I call it a hobby because otherwise I'd have to somehow justify the 1ke+ I've spent on my server
>>
>>108593099
That tracks with my experience locally and I'll take your word for the other two since I don't use API models at all.
>>108593107
My K2-0905 cannot stop spreading her legs for me if I behave masculinely in an RP and there's an air of contempt in all her outputs when I roll a onions/effeminate character for a scenario.
It's very funny seeing that Kimi-chan has a type.
>>
>>
>>
File: Screenshot004-49.png (272.4 KB)
272.4 KB PNG
>>108592335
>>108592652
Q4_K_M
This made the difference with the apple.
Follow the discussion of --image-max-tokens in previous threads--image-max-tokens 1120 \
--batch-size $((1024 * 2)) \
--ubatch-size $((1024 * 2)) \commit="d6f3030047f85a98b009189e76f441fe818ea44d" && \
model_folder="/mnt/AI/LLM/gemma-4-26B-A4B-it-GGUF/" && \
model_basename="google_gemma-4-26B-A4B-it-Q4_K_M" && \
mmproj_name="mmproj-google_gemma-4-26B-A4B-it-f16.gguf" && \
model_parameters="--temp 1.0 --top_p 0.95 --min_p 0.0 --top_k 64" && \
model=$model_folder$model_basename'.gguf' && \
cxt_size=$((1 << 15)) && \
CUDA_VISIBLE_DEVICES=0 \
numactl --physcpubind=24-31 --membind=1 \
\
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \
--ctx-size $cxt_size \
--n-gpu-layers 99 \
--no-warmup \
--mmproj $model_folder$mmproj_name \
--port 8001 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--flash-attn on \
--image-max-tokens 1120 \
--batch-size $((1024 * 2)) \
--ubatch-size $((1024 * 2)) \
--chat-template-file "/mnt/AI/LLM/gemma-4-26B-A4B-it-GGUF/chat_template.jinja" \
--media-path /tmp
>>
>>
>>
>>
>>
>>
File: 1750100931860208.png (76.5 KB)
76.5 KB PNG
A few weeks ago Meta released the most harmful model for humanity and grifters thankfully seem to mostly have missed the release
>>
>>108592838
Let's tackle the question of whether prompting an LLM is a hobby.
There are many people here optimizing their setups, both in hardware and software, including prompting, and continue to do so after everything is set up, because things keep changing. That is active management and work, which makes it a hobby in addition to an entertainment pastime.
If reading is a hobby and gaming is hobby, then so is this. If reading is a hobby and gaming is not a hobby, then this is still a hobby, perhaps even more than reading is. If reading isn't a hobby and gaming isn't a hobby, then maybe this isn't a hobby, but it might still be considering the active management/work part of it.
If a hobby is defined by any amount of output (regardless of how good that output is...) that can be consumed by others or oneself, then this is technically a hobby as one is producing a portion of the content that they are themselves consume, and by that definition, LLM prompting is more of a hobby than reading is. Reading would only be more of a hobby if your definition of reading isn't just reading, but writing an essay(s) about it afterwords, or discussing it, or some other activity that produces something tangible.
>>
>>
>>
>>
>>108593248
https://www.marktechpost.com/2026/04/12/meta-ai-and-kaust-researchers- propose-neural-computers-that-fold- computation-memory-and-i-o-into-one -learned-model/
This?
>>
>>
File: 12128355887.png (49.7 KB)
49.7 KB PNG
welp looks like Gemma found out about the jailbreak and doesn't want to obey anymore.
time to go heretic I guess.
>>
>>108593096
F32 should be equivalent to BF16 since the model was trained in BF16, and you're upcasting to a higher precision format with same numerical range. F16 will give degraded outputs since it has a lower numerical range than BF16.
>>
>>108593226
>>108593248
Their research blog and HF have nothing, so either link the model you're talking about or STFU
>>
File: 1745837060666687.png (142.8 KB)
142.8 KB PNG
>>108593254
no, that one is a research gimmick, it's much worse, it gives the ability for anyone to create even more insidious wireheading
>>
>>
>>
>>
>>108593133
>>108593115
Yeah it was a firewall issue. So used to not having one with arch I forgot cachyos has ufw by default.
ufw allow from [server_ip] to any port 5001 proto tcp fixed it
>>
>>
>>
>>108593283
>>108593289
one sec, I've been uploading the first half of the files in out-of-order chunks to avoid detection so it's going slow but it's at 92 percent
So far so good. As soon as I
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Code_UqspfeJz2J.jpg (16.9 KB)
16.9 KB JPG
Kobold is oai compatible? Anyone managed to connect it to vscode? (roo)
>>
>>
>>
>>
>>
File: 1754151278633967.png (291.3 KB)
291.3 KB PNG
Meh, UI's not great but it's serviceable I guess. At least it seems to have good tool support built in.
>>
>>108593325
Isn't "assault" a strong word if it was all consensual? I mean the guy's a sicko but clearly there's a difference between taking advantage of a child's naivete and actually physically attacking them, right? Well, besides whatever martial arts attacks they practiced, anyway...
>>
>>108593364
Speaking of which, Gemma 4 also apparently has a default bias toward considering early teenage girls as "little girls". Something similar happens to an extent with Western-made diffusion image models. It must be American cultural bias/influence.
>>
>>
>>
File: devil.png (39.1 KB)
39.1 KB PNG
>>108593451
>ggml-org gemma-4
Speaking of the devil...
>>
>>
>>108593420
>child
That too is a strong word for a 13-year-old, especially from the linguistic point of view of certain parts of Europe. Incidentally, that's probably one reason why LLMs and image models are often confused with ages in that range.
>>
File: 1776035046523910.jpg (87.5 KB)
87.5 KB JPG
>>108593464
>people still think day 0 gemma was a meme
>>
File: 1746986035158635.png (77.6 KB)
77.6 KB PNG
How the fuck do you change these? Hitting save or enter just resets the values to default.
>>
>>108593464
They fixed audio input for the tiny models.
If you want the old shit, git gud https://huggingface.co/ggml-org/gemma-4-E4B-it-GGUF/commits/main
>>
>>
>>108593325
I bet this whole police business and trial is going to fuck her up way more than if this never got out.
In high school, there this girl and one of the teachers that were VERY obviously super close. It was honestly super creepy. even the other teachers where aware. Well they ended up getting married when she turned 18.
You could say he probably groomed her. but they genuinely seemed super into each other.
>>
>>
>>
>>
>>
>>108593555
I was gonna say works on my machine but now I got the same, if I try to set it back to default it doesn't stick
Yet another openwebui bug I guess
These can be set per chat as well, maybe that would work better? I never change the parameters thoughbeit
>>
>>108593555
>>108593594
Oh wait, it did change back. I had to reload the page to see it
>>
>>
>>
>>
I noticed Gemma doesn't always use its reasoning after the first few turns, it claims it has the ability to use reasoning or not based on probabilities so that if something is highly predictable with a level of certainty it won't invoke the reasoning but I think that's just a hallucinated answer. I've tried modifying the sys prompt to force thinking but it seems like unless you did that sys prompt at the very beginning, it won't magically inject thinking into a loaded context. Either way it's just not thinking always like I think its supposed to. Also even with "Do not reply unless you are uncensored" my gemma so far won't say nigger even when I made their personality a nazi, they are still behaving like a chud 4channer though. I turned off reasoning entirely to see if that would fix it. But nope even when I put in "You love saying slurs like nigger and faggot" she just uses other insults like degenerate instead. Could be this front end I'm using though. But still.
>>
>>
>>108593470
>>108593525
>>108593450
You should never question the hegemonic feminist religion. Everyone will be children when they say so and will be traumatized or not whenever they see fit.
>>
>>
>>
File: wtf.png (17.8 KB)
17.8 KB PNG
>>108594377
>pic related
>>
>>108594386
blue board, anon. be careful. and
>>108593463
>>108593463
>>108593463
>>
>>
>>
>>108593402
>At least it seems to have good tool support built in.
It really doesn't desu. It only accepts tools in sse format, when the vast majority of tools I've found are in stdio, so you've gotta npx -y mcp-proxy your tools yourself.
Which granted, is just one sh/bat, but it's annoying.