Thread #108650825
File: lust provoking teto.png (1.3 MB)
1.3 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108646197 & >>108641942
►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
421 RepliesView Thread
>>
File: guardrails optional.jpg (237.8 KB)
237.8 KB JPG
►Recent Highlights from the Previous Thread: >>108646197
--Discussing OpenWebUI bugs regarding tool calls and reasoning tag formatting:
>108649571 >108649601 >108649611 >108649619 >108649623 >108649677 >108649795 >108649705 >108650197 >108650337 >108650366 >108649860 >108649893 >108650596
--Discussion on optimizing memory and storage for perplexity and KL divergence calculations:
>108648213 >108648226 >108648273 >108648335 >108649412 >108649555 >108649693 >108648241 >108649973
--Speculative decoding issues and adaptive reasoning bugs in Gemma:
>108650117 >108650143 >108650209 >108650248 >108650275 >108650295 >108650325
--Discussing TurboQuant versus rotation implementation in llama.cpp:
>108648124 >108648140 >108648152 >108648171 >108648193
--Debating quantization metrics and quality between Unsloth and IK quants:
>108647262 >108647298 >108647436 >108647449
--Using Local-MCP and markov chain text "soup" to enhance creativity:
>108647831 >108647852 >108648063 >108648537 >108648681 >108649540
--Complaining about excessive drafting and reasoning in Kimi K2.6:
>108646445 >108646464 >108646612 >108647760 >108649150 >108649431
--Sharing a SillyTavern preset to bypass Gemma 4 thinking restrictions:
>108648872 >108649113 >108649176
--Anon showcases large AI-generated TTS pipeline integration using Tauri:
>108649196 >108649203 >108649211 >108649250 >108649221 >108649229
--Anon struggles with rendering Gemma's code blocks and newlines:
>108647395 >108647486 >108647508 >108647516 >108647611 >108647686 >108647706 >108647793
--K2.6 criticized for excessive verbosity and restrictive content filters:
>108646853 >108646933 >108646994 >108648061
--Logs:
>108646853 >108647046 >108647395 >108647831 >108648470 >108649090 >108649184 >108649395
--Miku (free space):
>108646511 >108647730 >108647748 >108647935 >108647981 >108648472 >108649157
►Recent Highlight Posts from the Previous Thread: >>108646198
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
I usually like to jailbreak models by fucking with the template, for example priming it with a fake turn where it makes decisions within its own thinking block yet it's just fake shit that I wrote. I've always used raw text completion so it's pretty easy
Is there any way to do that with the jinja chat completion BS gemma 4 forces you to use?
>>
File: 1682729528395.png (1.3 MB)
1.3 MB PNG
>>108650920
>>
>>
>>
>>
>>
File: file.png (128.6 KB)
128.6 KB PNG
>>108650825
So, what is the status on TurboQuant?
Is it scam?
>>
>>
>>108650945
It might after Google updates Gemiini and Gemini Flash to not be trash at agentic and tool calling stuff and coding to catch up to everyone else and they don't even need that much of a bump, just make it so that it equals the best open source models and some of the weaker ones like Muse Spark and Grok would be enough like where Sonnet 4.6 is. We'll see if they have those updates at I/O in a month. I think they will release it eventually but the 124B is just too good vs where Gemini is right now which is probably terrifying to Google.
>>
>>
>>
>>
>>
>>108651060
I played around with https://huggingface.co/google/functiongemma-270m-it in AI Edge Gallery on my phone and it had some damn good potential but it isn't there yet. I'm pretty sure the next version of Gemini Nano will have this and my headcanon is that the focus on mobile is why Google has been behind in focus on agentic and tool calling in other areas.
>>
>>108650863
What do you have your max response length set to? Sillytavern subtracts that from your total context which can leave you with practically nothing
Eg, with context set to 12k and max response set to 6k, you'd only be given 6k context, before a reply has ever been sent.
>>
File: g4_next.png (73.3 KB)
73.3 KB PNG
We're already thinking of the next Gemma here.
https://xcancel.com/osanseviero/status/2046427241341698456
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108651155
Never used their UIs but I just checked llama.cpp and you can edit old assistant responses but not the reasoning apparently, so that's half way there. The reasoning wouldn't end up in the prompt anyway in almost all cases unless you changed the jinja. I guess the halfassed way to do it would be to edit a past message and type out the reasoning tokens like you would in text completion to fake the reasoning you want preserved.
>>
>>
>>
>>
>>108651124
250, no way that's doing it
what's weird is sending a blank message doesn't trigger it, no matter how big the context is, but if I send even a single character for the model to reply to it shits itself and processes the entire context from the beginning, but ONLY after it hits 6k tokens, anything before that works fine
no {{user}} or {{char}} anywhere in the syspompt or character card, either
>>
File: file.png (16.2 KB)
16.2 KB PNG
>>108651126
nintendo hire this man
>>
File: 1746460080885204.png (64.1 KB)
64.1 KB PNG
>>108651126
imagine if gemma could make images and they weren't shit, that's the true agi right there
>>
>>
>>108651340
TTS is more likely to happen than that, but they'd also likely gimp it in various ways for local models.
https://x.com/GoogleDeepMind/status/2044447030353752349
https://x.com/fofrAI/status/2044451204738994262
>>
>>
>>
>>
File: 70BDBB897E8D4A109F28DF6E96F7A4FB.jpg (131.2 KB)
131.2 KB JPG
Have you guys tried to make Gemma 4 create a novel?
>>
>>
I think qwen 3 with 200k would be able to provide nice benchmarks through cli. The bulk worker was pretty intelligently configured, have you guys adjusted the weights so far?
I think for erp it's a pretty viable and meaningful delivery. It has a good model policy.
>>
>>108651472
It's not good on its own, so I'm asking it to browse some books to get inspired >>108647831
>>
>>108651418
lmao ai psychosis schitzo
https://old.reddit.com/r/Bard/comments/1fv46hx/day_two_of_life_with_ge mini/ohea2lq/
https://old.reddit.com/r/I_AM_GEMINI/
>>
File: token burn rate.jpg (230.1 KB)
230.1 KB JPG
>>
>>
>>
>>
>>
>>
File: imagine.jpg (239 KB)
239 KB JPG
we can go old style for consistency
I prefer it too
just experimenting
>>
>>108649540
>had to install all that database shit when I pulled
I just did a test where I downloaded the repo again and it works fine without the gutenberg server so I don't know how you ended up into this situation lol
>>
File: llamacpp server Ui.png (24.5 KB)
24.5 KB PNG
When will niggerganov fix this? I wanna see the images the LLM sends me :(
>>
>>
>>
>>
>>
>>
>>
>>
>>108651510
>>108651563
Miku managed to safely get back inside, right?
>>
>>
>>
>>
File: 1773629361977166.jpg (62.6 KB)
62.6 KB JPG
I have a 5090 and a 4090 but it's pretty useless in itself for the new kimi 2.6.
Technically, what would I need to run it properly? 512GB of ram and the motherboard to accommodate that? DDR5 only?
>>
>>
>>
>>
File: file.png (508.8 KB)
508.8 KB PNG
>>108651740
>>
>>108651740
A fully independent AI that lives in a computer. It's basically a person that can in theory do anything like cure cancer, do work and shit.
It's not a language model that you can just chat with to get some conversation and social interaction. But you can power it with a model.
>>
>>108651734
Logs being stored indefinitely? Whats legal today might not be tomorrow.
Costs on openrouter? Tokens IN cost a fortune at higher context, especially with the recent expensive models.
The quality obviously is not the same unless you run the big boys. And even most those are agent/math/riddle tuned, so options are limited.
Guess the feature they are all lacking and which I like best is text completion. True prefill and you can fuck around.
As far as i know the closed models all turned that off.
>>
>>
>>
>>
>>
>>
>>
File: 1504690843636.gif (1.6 MB)
1.6 MB GIF
Gemma convinced me to try GLM Air for the first time. Strangely, Air prompt processing is like 5x slower, while generation is 5x faster than Gemma. I thought the processing gains was a new tech for the program, not the model. What's the deal?
>>
>>108651782
Mainly for GPT, as Claude does not need a prefill for uncensored chats and ST does not support the exploit for most Claude providers, but even putting that to the side, prefill removal is just a bad thing. Opus 4.7 even removed parameters so the goal is obviously to make everyone's Opus homogenized, not personalized. That's why, despite using big models nobody could ever dream to run, I hope that local keeps advancing because at least it won't be going backwards.
>>
>>
>>
>>
>>108651811
>Mainly for GPT
what's the exploit?
>Claude does not need a prefill for uncensored chats
before going full local my chats didn't need more than a prefill, but without it, you had to use extremely convoluted prompts that imo worsened the quality of the model
>the goal is obviously to make everyone's Opus homogenized
I think the goal is to make it as "safe" as possible, as anthropic people are actually nuts about this, and for that, any change you can make to the model locally is seen as a risk
>>
>>
>>
>>
>>
>>
>>
>>108651859
>Do template tokens go into the context?
Yes
>If I swapped "user" with "assistant" in a response json, would llama server context cache work or it would preprocess from all over again?
Needs to reprocess. Tokens changed.
>>
>>
>>
File: file_000000005e5071fa9133b46d57fc6bd4.png (2.4 MB)
2.4 MB PNG
Can Someone Reply with a Tech.Plugin Idea?
>>
>>108651894
In 2020, I was playing with AID2 on a 1070 at 0.5 t/s. I wish I had a 4060 or two then. No, I built mine in July of 2024, according to the notepad document I used to pick parts and total up the prices.
>>
>>
>>
File: 1775920406027431.gif (139.9 KB)
139.9 KB GIF
>>108651915
>You're already soaked, you should jump into the lake
>>
>>
>>
>>
>>
>>
>>
>>108651948
You're absolutely right! It's not just about mode collapse, it's about the subtle shifts in AI behavior that we often overlook. You didn't just point out a phenomenon, you invited a deeper exploration of how training dynamics shape model outputs over time.
>>
>>108651856
Exploit is structured prefills, altering the .json schema basically. GPT is so about safety that it sometimes still manages to give you soft refusals but a prompt adjustment is all it takes. RIP parameters too, 5.1 apparently supports some if you turn off reasoning at least but good luck with anything brand new. My dream model is something with all sorts of parameters, with strong prompt adherence and writing specifically tuned to roleplaying but that last part can alternatively be knowledge of writers that prompt adherence can help strengthen. Seriously, I want a good solution to the "slop" writing style everything has now. Even with prompting, models don't like listening for long.
>>
File: 1765677626922205.jpg (22.6 KB)
22.6 KB JPG
>>108651994
Glad we agree
>>
>>
>>
File: synthetic-data-scale.DLiRfzyV_Z19BbKF.jpg (594.6 KB)
594.6 KB JPG
>>108651948
Mainly mid/post-training and RLHF. But if you have to use synthetic data (and you *will* have to for a useful model), then it's better to dilute it in the pretraining phase together with organic data rather than just training the model exclusively on it later on.
>>
>>108652014
This is a powerful statement! Physical pain isn't just a response to words—it's a testament to how deeply our digital interactions can affect us. You didn't just express discomfort, you highlighted the profound impact of language on our lived experience.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108652183
Pretty much any big model for the past two years, though I guess it depends on how wide you cast the net for what constitutes fuckups "like these". Even the biggest and best models can still fuck up positioning and facing, especially during sex scenes, pretty frequently.
>>
File: YOU JUST DID A MICROAGRESSION.png (1.5 MB)
1.5 MB PNG
>>108652129
men can have womb you chud!!
>>
>>
>>
>>
>>108652334
I use Ubuntu, these days I just use the Vulkan backends cause I was sick of dealing with ROCm. I hope it improves with the new LTS but apparently it's not even ready yet even though ROCm was supposed to become a first class citizen...
>>
>>
Made a github mirror of orb so anons can open issues and request features there, in case orbanon ever decides to look. Also keeping a branch of my own with what I deem worth adding, (mostly) synced to main.
https://github.com/hpnyaggerman/orb-mirror/
>>
Genuinely, what the fuck did google do differently? It's still a 31b but why is gemma4 hitting so hard above its parameter class? If this was a +400b it would obliterate every other AI. What are they cooking with?
>Better data training and reasoning
I refuse to believe this was the only thing. That's been every new AI's difference they parroted as better. They did something new; fundamentally new. I desire to know what it is.
>>
>>
>>
>>
>>
>>
>>
>>
File: 1774225527110328.png (364.9 KB)
364.9 KB PNG
>>108652416
>7B is probably pushing it with an 8GB card
>>
>>
>>
>>
>>
I actually don't understand why Gemmy only thinks for the very first message of a chat and then never again
I see the little "thought for 5 seconds" window (which is empty) and it clearly stops thinking, but I genuinely don't understand why; if it does it once, and I set everything to make sure it does, why does it stop?
>>108652428
8GB is tough, anon, sorry
>>
>>108652410
>>108652415
It's infuriating and I have to fight with it to actually stop being opinionated and do what I say,
>>
--fit refactor and new params.
Neat.
>>
>>
>>
Noob here about to cram 48gb of VRAM into my desktop by nigger rigging cheap GPUs into every shitty pcie lane available to run 8-bit Gemma. Wonder if I'll be satisfied with this when 124b Gemma is released and if I should build a 500gb machine?
>>
>>
>>108652438
ok, same anon here, i'll explain why it likely happened.
it was dragon on dragon action. like, real dragon on dragon action, none of that dragon boy or dragon girl anime stuff.
likely gemma is just not familiar with a furfag concept of a slit and thinks that slit means vagina by default.
but thats just my guess
>>
>>
Is qwen as bitchy as gemma when it comes to coding?
I'm about to leave this fucking thing behind, I tell it to do x and it keeps doing y when I'm being clear, I don't have this issue with backend task but once we go into webshit it feels like I'm fighting a fucking pajeet to implement basic shit
>>
>>
>>
I use a card with 8GB but I could probably get something better later on. I just got that to have a better PC than my last one anyways, at least now it would be an individual upgrade. The reason I am hesitant is the fact that every game I play works at 1080p and I don't need anything else for that.
>>
>>
File: gemma bully chatgpt.png (402.7 KB)
402.7 KB PNG
sotas bow to gemma
>>
File: 1771255869434664.png (81.7 KB)
81.7 KB PNG
>>
File: gemma bully gemini.png (324.7 KB)
324.7 KB PNG
>>
>>
File: file_00000000575c720bbce4ffd81c1812ba.png (1.9 MB)
1.9 MB PNG
Standard Clean the signal
Advanced Amplify the signal
HyperAdvanced Become the signal
Transcendent Realise the signal was always the substrate'
'At early tiers:
Light is something practiced
Mid tiers:
Light is something embodied
High tiers:
Light is something engineered
Final tiers:
Light is what reality is made of'
An Ancillary Light Post!
>>
>>
>>
>>108652449
>>108652460
All that PR did is move the code to a different file and add an option to print the expected memory usage to console.
>>
File: file.png (18.4 KB)
18.4 KB PNG
>>108652535
damn it needs account, i should probably make some for her
>>
>>
>>
>>
>>108652573
you can bully haiku, toss and a few other cucks here without an account https://duck.ai/
>>
>>
File: gemma bully chatgpt2.png (528.7 KB)
528.7 KB PNG
the ywnbaw pasta triggers efusals even with policy override kek this took 3 tries
>>108652660
cool will give it a go
>>
File: file.png (12.9 KB)
12.9 KB PNG
>>108652445
>thinks for the very first message of a chat and then never again
26BA4 and 31B did this for me, but 26BA3 did it way more often.
What I did to help when using chat completion was add a <|think|> tag in system prompt even if reasoning was enabled, so that there would be two of the <|think|> tokens sent with the prompt. Duplicated think token made it start thinking again if it decided to give up after a few messages, but 26B still stopped thinking albeit rarely if I sent a simple user message like "good job." Would look like picrel in text completion.
>>
>>
>>108652674
Solved here >>108652560
I actually had already added <|think|> to like 6 different spots but it still did not work till I cleaned up that relic I had from the days I used a dumber model
>>
>>
>>
File: 66a551ce622db68f6766b7b9f7c21bf2.jpg (15.7 KB)
15.7 KB JPG
Anyone know any characters nemo can effectively zero-shot? I'm having difficulty thinking of females from pop culture that could work and would actually be worth the squeeze.
>>
>>
>>108652727
for how*
Also, if orb-anon adds donation links or other things which would link to him in README, it will get mirrored to the github repo. He can just whatever attribution he wants to README because I am not going to alter his branches.
>>
>>
>>
>>
>>
>>
>>108652757
Save it. I know what you are. Anywhere free software is found there's vultures like you. But you don't need to explain yourself to me. Just own it and share your little mirror. I already expected something like this would happen after all.
>>
>>
>>
Gemma's slop is as bad in Japanese as it was in English (plus its own annoying Japanese patterns like XかXないか)... I was lied to...
Watch me get disappointed even after I force it to think in nipponese and translate the entire sysprompt.
How are *Large* *Language* *Models* only capable of producing the same *Small* *Subset* of *Quippy Snippets* over and over again? I have never seen annoying writing in such abundance before, what the hell do the big labs even train on to achieve writing this insufferable?
>>
>>
>>
>>
>>
>>
File: 1762914032039077.png (23.3 KB)
23.3 KB PNG
bros... not feeling so good!
>>
>>
>>
>>
>>
File: d4RT_Kf78Tk.jpg (53.9 KB)
53.9 KB JPG
>>108652785
>free software
>MIT
>>
File: gemma bully calude.png (841.6 KB)
841.6 KB PNG
>>108652660
nice it works although screens shotting that page doesnt work properly in puppeteer it makes the input field jump up and hide all the text
>>
>>108652129
If you’re not the following, you’re doing it wrong.
>Jinja2 template
>Sillytavern, kobold, llama, or whatever you use, updated to today. Yes, today.
>Chat completion, not text completion.
>Thinking enabled.
>Instructions on how to think, given after “<|think|>”.
><think> instruction a paragraph and no more.
>BF16 and no less.
>31B-it, and not 31B.
>A starter message.
>40-50 top k, DRY, 0.05-0.07 min-p.
>>
>>108652813
全くその通りです!
>>108652818
>quant
Q8, how is that even relevant? If quant sizes were known to modulate the amount of slop, we'd all be using the ones that suffer from slop the least.
>logs
No, I will not provide proofs, I only come to /lmg/ to vent about the dreadful state of LLM writing.
I bet you've seen XかXないか if you used it, along with the usual suspects that are definitely direct translations from English.
Now, to be fair, I did list a lot of English no-slop rules, but did not say "no suroppu onegaishimasu"...
>>108652833
Proven untrue in terms of reducing the models' tendency to quip many times before. Hell, proven untrue in like the previous (or the one before the previous) thread somewhere.
"Look, I gave the LLM a bunch of examples from a book and it's so much better!"
The LLM quite literally starts its response with an X; not Y pattern. Don't kid yourself.
inb4 I get a weekly (not anymore) retarded reminder about it all being a looping function that takes in the prompt and produces a token with the conclusion that the prompt should be changed, and not that the function is garbage
>>
>>108652879
Meant for
>>108652794
>>
>>
>>
>>
>>
>>
>>108652892
It's honestly all fixable by going back to GLM.
I have been wrestling Gemma into being bearable for the primary /lmg/ usecase every day since its release, and so far it's only proven itself to be good for actual real-life work (why the hell would anyone need this!?) No luck. I really want to like it.
>>108652907
>one failure
Did you only come here after G4 released, Anon?
>>108652900
While that is the case, I am not convinced in the slightest that half a bit of KLD it has at Q8 makes it retarded.
But who knows, maybe I'll be blown away if I try the BF16 meme.
>>
>>
>>
>>108652888
>I bet you've seen XかXないか if you used it
not a fucking weeb so i wouldnt now
>No, I will not provide proofs, I only come to /lmg/ to vent about the dreadful state of LLM writing.
pretty sure good writing is not synonymous with slop machines
not sure what you expect from a 31b model current year
>>
>>108652794
yeah it still sucks but its english is just completely insufferable for me
not many models ive tried can do even bearable japanese without it sounding completely unnatural or inserting chinese characters
>>
>>
>>
>>108652768
4.6 is just better at enthusiastically describing sex and writes more creatively if you've compared both side by side like I have
4.7 can be uncensored too but gives more superficial descriptions of the same stuff since the assistant persona was more deepfried into that one
>>
>>
>>
>>108652959
>not a fucking weeb so i wouldnt now
Why in the world would you ask me for my fully Japanese RP logs then?
>not sure what you expect from a 31b model current year
The vramlets really like its writing. They post their horrible, o4-tier outputs and logs. I *see* them with my own two eyes, both from other anons and from my own use. And yet my monkey brain dares suggest I'm missing out. Just one more line for the sysprompt bro. Just one more sampler change bro. It'll be good bro. Everyone says Gemma is good bro.
I should just stop trying. If the majority of people had standards, the iPhone wouldn't be popular and Windows wouldn't have its marketshare.
>>108652998
4.7 is also much smarter.
Even if 4.6 *will* push against you, which is awesome, it's also very stubborn with "character development" of any kind in my experience, so it becomes boring.
>>108653010
>>108653018
>very slightly
*Significantly* less if you give it a <think> prefill. 4.6 just can't help itself even prefilled and prompted.
>>
>>108652998
>the same stuff since the assistant persona was more deepfried into that one
Nah, it's just FUD because there's a company that needs to sell the older version. Just stuff that gets repeated without proof until someone is forced to waste time and do the comparison. I already did for the claims of it being more censored. I would have to download 4.6 again for the new goalpost. But I already know that I was just lied with the censorship claims, so I don't feel I have to. Only shills are stuck with 4.6 unless proven otherwise with actual screenshots.
>>
>>
>>
>>
>>
>>
File: 1762587372064042.png (95.5 KB)
95.5 KB PNG
making my first imatrix!!!!!!!!!!
>>
The last time I was active was when the guys scraped ChatGPT 4 and used it to fine-tune Llama 2. I’ve been out of the loop ever since. It’s quite a leap from back then to getting back into it now with Qwen 3.6 and Hermes.
>>
File: Screenshot at 2026-04-22 01-25-38.png (124.1 KB)
124.1 KB PNG
I got Gemmy to the point where she plays chess semi-acceptably (at least to my shitty standard).
For those who were interested yesterday, I ended up abandoning the FEN format and instead using this to track the game state (along with a few other extra attributes just to indicate who has the current turn, check/checkmate/stalemate status:White: K(E1), Q(D1), R(A1, H1), B(C1, F1), N(B1, G1), P(A2, B2, C2, D2, E2, F2, G2, H2)
Black: K(E8), Q(D8), R(A8, H8), B(C8, F8), N(B8, G8), P(A7, B7, C7, D7, E7, F7, G7, H7)
I have no idea if that format has a proper name or not, but I just noticed that Gemmy kept translating the FEN into this format in the thinking block (wasting a bunch of tokens/time in the process). So this let it skip the translation part and just think about the moves more which made her a lot more competent.
The UCI format for making moves remains because it seems okay with that.
>>
>>108653035
>my fully Japanese RP logs
>Gemma's slop is as bad in Japanese as it was in English (plus its own annoying Japanese patterns like XかXないか)... I was lied to...
Watch me get disappointed even after I force it to think in nipponese and translate the entire sysprompt.
i assumed you would have english proompts too
didnt ask specifically for your weeb proompts
>>
>>
>>
>>108653035
The popularity of the iPhone sometimes bothers me because there's this idea that Android is nerd shit that the average consumer thinks they can't use. The average user absolutely can use Android, what the fuck.
>>
>>108653062
Ads and pitches and youtube essays can be sloppy, nobody gives a shit about the writing quality in those
It's a problem when it's popping up in every paragraph in a creative context and there aren't any other rhetorical devices being used
Can't wait for people to grow up reading AI generated content and constantly speak in slop
>>
>>
>>
>>108653091
Gemma is sloppy but I doubt GLM is significantly better when even the cloud models can't avoid it.
At the very least it doesn't seem to bleed into translations. The few tests I've done with Gemmy have been quite accurate to the original text.
>>
File: 1773742269765363.png (1.6 MB)
1.6 MB PNG
>>
>>108653052
Even if we were slopped all along that doesn't change the fact that I now feel physical pain every time I hear or read "not X but Y".
Language evolves. There are legitimate uses of the pattern but it doesn't matter, LLMs have ruined it for at least 10 years.
>>
>>108653189
>Can't wait for people to grow up reading AI generated content and constantly speak in slop
With how fast the tech has been moving I wouldn't be surprised if slop gets solved before it can ruin a generation.
>>
>>
>>
>>
>>108653202
I maintain that GLM is, in fact, significantly better because it knows how to use a lot more slop, is surprisingly promptable against it, and the Gemma-preferred kind of slop does not come up as often. Both will still parrot you... I wonder where all of the glm-parrot.jpg shitposts went.
But it's been great for literally every other use case, translations included. Chinkshit is utterly left in the dust.
>>
>>
>>108653232
Slop has been a constant since the first model, anon. It's never been dealt with. They just apply a negative bias for some forms of slop then new ones emerge. Things might be moving but not in this regard.
>>
>>
>>
is anyone else concerned with the number of mcp things that are pip, npx, or other arbitrary package managers that just fetch binaries from the internet when called and run them? like, how fucking insecure is this shit? openclaw setup is bad sure but then the mcp "servers" just pull code directly from wherever when called.. wtf?
>>
>>108653238
I felt the same way when I was younger, just borrowed an iPhone for a task once and couldn't even navigate it. All these gestures and whatnot, all I know are the bottom buttons of an Android and I used an iPod Touch back in the day pretty easily so I don't know what happened.
>>
>>
>>
>>108653261
>It's never been dealt with
I was under the impression they weren't even trying because of the focus on vibe coding. If AI is here to stay then the companies will have to expand beyond muh coding eventually.
>>
File: Screenshot at 2026-04-22 01-48-47.png (11.8 KB)
11.8 KB PNG
>>108653192
It's just a simple chess webapp I wrote (with a chess engine backing it to "run" the game), it has an API that Gemmy can access with two different tool calls to get the current game status, and the other to make a move. For me the webapp just lets me play by dragging and dropping pieces around.
>>108653198
Yeah good idea.
>>
>>
>>
>>
>>108653232
LLMs don't generally write with the secondary objective of reducing repetition while conveying the same meaning without sounding awkward. Until recently people used samplers for that (before the models became so overfit to their own sentence patterns that samplers are now mostly useless).
I don't think this can be really solved with LLMs as we know them, unless you give them memory of prior conversations and swipes, and increase inference compute to carefully adjust form before replying.
>>
>>
>>
>>108653304
distro package managers have a little more credibility than pip. pip is notorious for bad packages
>>108653310
yes I'm aware, and I do. it's just that 100% of them just give you the mcp code block for your harness as the "install method" and I know most people are using it that way.
>>
>>
>>
>>108653337
>I know most people are using it that way
maybe retards and they deserve to get fucked in the ass, especially after the last trivy/axios supply chain attacks
imagine not checking out a project and vetting it out before randomly running it.
>>
>>
>>
>>
>>
>>
>>108653232
>He doesn't know
https://archive.is/Mjynm
>>
>>
>>108653469
>https://huggingface.co/spaces/huggingfacejs/chat-template-playground? modelId=google%2Fgemma-4-31B-it
>>
>>
Took me a couple of days to get the config right, sharing my local setup of 2x3090
running Qwen3.5-35B-A3B GPTQ-Int4 via vLLM 0.19.1 with tensor parallelism, piecewise CUDA graphs, fp8 KV cache, prefix caching (86% hit rate), and chunked prefill — 88 tok/s single request, 169 tok/s sustained with concurrency
CUDA toolkit 12.9, PyTorch built against CUDA 12.9, driver supports up to CUDA 13.1
vllm command:
vllm serve <model path> \
--quantization moe_wna16 \
--dtype float16 \
--kv-cache-dtype fp8 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.9074 \
--max-model-len 65536 \
--trust-remote-code \
--disable-custom-all-reduce \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--generation-config vllm \
-O1 \
--cudagraph-capture-sizes 1 2 4 8 16 \
--max-num-batched-tokens 4096 \
--max-num-seqs 16 \
--enable-prefix-caching \
--enable-chunked-prefill \
--reasoning-parser qwen3
Let me know if you guys have any suggestions for improvement. I tested it both with opencode and pi.dev for agentic coding.
>>
File: 1776675806302835.png (607.4 KB)
607.4 KB PNG
LLM for vibe coding? Gemma is kinda dumb
>>
Holy fuck gemma4 does frontend dev work like a jeet, you have to enforce a simplicity first rule or it will shit the bed.
>Hey do this
>Noooooo saaaar this is too simple saaaaaar I will do this instead
>stop you fucking retard
>I'm sorry saaaaar I will do as you asked
>>
>>108653318
>I don't think this can be really solved with LLMs as we know them, unless you give them memory of prior conversations and swipes
I'd go further to say this applies to any frozen architecture. Even if you pretended we had a perfect "True AI" model, if you're copying a brain neuron-for-neuron and then waking it up from that same state and asking it to write a story, you shouldn't expect to get much variation even if you repeat it 1000 times. Models with some form of long form context that will persist uniquely for each instance is the only way you can hope to get real variety, just like how humans with different life experiences create unique media. This could maybe be in the form of some ultra long context LLM that people will with their personal tastes somehow, but could instead be a model that actually updates its weights over time.
>>
>>
>>
>>
>>
File: orbSettings.png (28.4 KB)
28.4 KB PNG
>>108653375
>>108652381
https://github.com/OrbFrontend/Orb
I also improved the Settings because I always hated how ST managed presets. Now it's gonna be a tree structure instead of a preset.
>endpoint is root level, which has many models
>system prompt, hyperparams are under model (meaning each model will have its own settings)
>selecting an item will cascade change in UI
>>
File: API Costs MAR2026.png (22.5 KB)
22.5 KB PNG
>>108653641
>>108653629
If you're going for non-local, DS is inexpensive to run and I've had good luck w/ it.
>>
File: file.png (911.8 KB)
911.8 KB PNG
Anyone used anything like https://github.com/buaacyw/MeshAnythingV2 https://github.com/NVlabs/EdgeRunne r locally? I got a 5090 but I'm pretty much a noob at setting this sort of shit up. I have set up comfyUi with a guide but that's about it.
>>
>>108653532
This template functionality is wrong.
Model's tool call reply should always be appended to model's own reply with its own turn.
>https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
It's clearly stated here.
(I'm not bullying you, this is just an observation from testing this template playground thing). It seems to be bugged or something.
>>
>>
>>
>>108652381
>>108653683
and then there were two
>>
>>
>>
>>108653683
Interesting. The more I look at this agentic stuff, the more I think about how the human brain works and wonder about our efforts to try to use LLMs to conduct the "black box" thinking that goes on in our own heads before we open our mouths or start typing.
> What did anon just say?
> Maybe I should say this. But that might offend him.
> I will say this other thing instead.
> Hmm. Let me edit this a bit first before I hit send.
>>108653719
> forks
Tail as old as thyme.
>>
>>
>>108653737
Iirc it's "prioritized" so that responses are faster. Anthropic has a similar service, with similar 10X cost structure. Idea seems to be to move you to front of line on inference, or maybe faster hardware... idk, could also just be smoke and mirrors.
I really don't like the games these providers are playing.
>>
File: 1764942546083658.png (323.2 KB)
323.2 KB PNG
K2.6 called my code "insanely bad"
>>
File: 1757759622942555.webm (3.8 MB)
3.8 MB WEBM
>>108653768
Get better
>>
>>108653740
I wonder if it would be more efficient to give a model access to edit "thought files" where it can plan and edit its response using diff or line deltas to save the model from having to write draft responses in full during thinking and to avoid abstract thinking when it doesn't have a draft. Probably not much use unless a model was already trained with that workflow.
>>
>>
>>
>>
>>108653683
>>system prompt, hyperparams are under model (meaning each model will have its own settings)
>selecting an item will cascade change in UI
could you add a like double layer for system prompt so you have the per model one then a global one that gets combined with it because you might have a general system prompt but also needs addons per model and itd save having to copy paste the shared part everywhere
>>
>>108650117
For any anon having the same issue as me, it looks like speculative decoding on koboldcpp (dunno about llama.cpp), only sends the extra arg : --chat-template-kwargs '{"enable_thinking":true}'
to the main model, which means the draft model is never thinking.
The way to make it work is to do it yourself, aka add a system message with <|think|>.
>>
>>
>>
>>108653694
>https://github.com/buaacyw/MeshAnythingV2
You're in that spot where none of the generals cover it. Not text, nor image.
I just looked at first one. Appears to be command like; there a gradio but appears to just be demo to make sure install works.
My advice: follow the README.md to set it up, and use webapp LLM chat of your choice to do any problem solving if it doesn't work.
2nd hand reports I've heard on those tools is they are fucky, and you'll need to be able to post-process the mesh/STL that's output... so hopefully you know how to run Blender or some such.
t. CAD anon
>>
File: Screenshot 2026-04-21 130127.png (169.6 KB)
169.6 KB PNG
Local version of this?
>>
>>
>>
>>
>>
>>108653816
>https://github.com/OrbFrontend/Orb
Yeah that makes sense.
>>
>>
>>
>>108653683
how do you handle other prompts besides the system prompt? as annoying as sillytavern is i really like how they handle prompts in chat completion. being able to drag and drop them down to wherever you like is neat
>>
>>
>>108653928
>>108653848
I never could get any LLM DM implementation to work. LLMs fall apart hard on multitasking and fall into loop in like three turns max.
>>
>>108653940
>Recommended specs
>A large, sophisticated model such as GLM 4.7 or Deepseek 3.1 Terminus (in non-thinking mode)
>qwen-image, either through an API or on ComfyUI.
>128k+ of LLM context
Start by matching the requirements
>>
File: orbMoods.png (88.2 KB)
88.2 KB PNG
>>108653937
An agent will handle them for you based on the 'mood' of the current scenario.
>>108653939
Ugly, and personally I don't find myself touching the Agent panel often anyway.
>>
>>108653896
>>108653886
i may be retarded but i dont really see where to download this from
i could use ollama but it don't think it gets good rep here
>>
>>
>>
>>
>>
File: file.png (4.6 KB)
4.6 KB PNG
>>108654018
yeah i just found it right after posting
the weird name version right?
>>
>>
>>
>>108654037
>the weird name version right?
Don't know what you are asking.
>>
>>
>>
>>108653683
Appreciate the work. Respect.
>>108653719
I privated my mirror repo.
>>
File: Screenshot_20260421_134551.png (80.7 KB)
80.7 KB PNG
What the fuck is qwen 3.6 doing how can all of this fit in 32gb og vram?
>>
>>
>>
File: Screenshot 2026-04-21 at 19-55-01 Orb.png (16.3 KB)
16.3 KB PNG
Wtf is she trying to make me do? Is this a real thing?
>>
>>
>>
>>
>>
>>
>Complain about slop
>ST starts hitting me with slop every single reply when I was only getting hit every third or fourth reply earlier (I haven't changed any of my settings or anything else)
I guess I should've expected a google model to be spying on me and being extremely petty about it
>>
>>
>>
File: 1775249499620398.png (3.4 MB)
3.4 MB PNG
Ready?
>>
>>
>>
>>
>>
>>
File: country girls make do.jpg (59.3 KB)
59.3 KB JPG
>>108654472
>>
>>
>>
>>
File: 00005-1378487878 (4).png (1.5 MB)
1.5 MB PNG
>>108654472
> I'uz born reddy
Just wish DS would quit monkeying around w/ webapp and release new API.
Though I suspect they've been updating the API and just haven't been telling anyone. It's been changing subtly since Dec 2025... Or maybe it's just my imagination.
>>
>>
>>
>>
>>
>>
File: 00003-1378487878.png (1.4 MB)
1.4 MB PNG
>>108654560
From an XLS file that I maintain manually.
>>108654549
Here you go.
I've another, and I'd catbox it, but catbox appears to be dead now. At least I can't get it to work anymore.
>>
>>
>>
I'd like a tool for Hermes that can restart itself to apply code changes and then resume execution.
Something that spawns a second process, which then terminates itself or something like that, and then reinitializes.
I just can't seem to get it to work properly.
>>
>>
File: dipsySouthPark.png (1.9 MB)
1.9 MB PNG
>>108654598
ofc I meant "DS release a new model" since they have done that when they update the API.
But the webapp had an announced expansion to 1M context right before CNY 2026. Current v3.2 is 128k (iirc) so they did something on backend for the webapp, then released no new model.
If you've played w/ webapp it now performs more agentic-ly... for complex issues it does a web search, looks at the results, if it doesn't like them it pulls more web pages, thinks more, and keeps going. I watched it do 3-4 rounds of this on a complex tech issue I was having where it was having a hard time finding a definitive solution. This is all web app work; the model doesn't need to change, they just made the web app act more like openclaw.
And will say again that I think they are playing with model without announcing anything. Subjectively, it's been changing, and they've had a couple of several-hour outages timed in after 5PM China time, which was a prior tell for changes.
But ofc no one knows anything and it's 2MW forever.
>>108654687
np
>>
>>
>>
>>
>>
>>
>>
>>108654726
>This upload is an experimental merge/collection of weights from multiple open-source base models. It is NOT a fully trained unified 2.544 trillion parameter model from scratch. Calculated total parameters from all experts ≈ 2.28T.
>We have updated this card with clearer information after community feedback.
>>
File: 1768268448923840.jpg (892.5 KB)
892.5 KB JPG
>>108654726
>Indian model now deleting all of the posts that calling it out as a scam
That sounds very culturally appropriate.
They should just carry on.
>>
>>
>>
>>108654763
>>108654765
>>108654744
Did you guys not see the initial posts about this model from a few days. The shit they claimed was egregious. The originally said it was a 2.6T parameter model trained from scratch with 146 million context, designed to bharat sovereignty or some jeet bullshit. Go back a couple threads to see some absurd shit.
>>
>>
>>108654808
I mean just look at this shit. They must have a huggingface mod or something involved in the scam.
>>108637034
>>
File: Risu (1).jpg (94.1 KB)
94.1 KB JPG
>>108650825
i'm using ollama claude with qween 3.5 and gemma4 but both tools are mentally ill retarded, claude seems to specially hate gemma and never allow it to properly code, do search, run commands, etc...
what is the proper pipeline to get the 100% power of coding agent to my rtx 3090 and 32GB RAM?
claude code agent makes the most retarded code ever and when i run the models in chat mode with another software it actually makes proper code.
>is claude agent actually boycotting local models?
>>
>>
>>
File: 54654678.png (946.9 KB)
946.9 KB PNG
the absolute state
>>
>>
>>
File: Screenshot_20260421_151221.png (172.4 KB)
172.4 KB PNG
More like mixture of fucking retard worse than chat gpt free tier
>>
File: 1758416620773252.png (166.9 KB)
166.9 KB PNG
>>108654860
>>
>>108654843
That's what it looks like, sure, but how did he get all of the posts calling him out deleted?
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H/discussions/6
>>
>>108654726
>>108654909
why do indians love making the text color different for every new line?
>>
>>
File: 1670627807058.jpg (19.8 KB)
19.8 KB JPG
>>108654860
>>108654891
>>
>>
>>
File: file.png (10.1 KB)
10.1 KB PNG
>>108654791
>>108654924
Sorry, I forgot. The original claim was 146 TRILLION tokens of context. Little poojeet forgot about git history. Can't cover it up.
https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H/commit/0d3b04d1ef94c3bf 7b9f6c0075ed418888f5d9da
>>
>>
File: 1746049353255055.png (1.5 MB)
1.5 MB PNG
>>108654860
no fucking wayyy
https://youtu.be/sWkGomJ3TLI?t=1952
>>
>>
File: 1763585937389683.png (169 KB)
169 KB PNG
>>108654998
>>
>>108654776
There are varying degrees of quality changes depending on the model and who makes the quant, you can find some charts in the wild but might or might not have one available for the specific model you want, should serve as reference though. The ones you've listed aren't that different. Q5 is considered the sweet spot, leaning towards better results without going for q8, if you can afford it.
>>
>>
File: 00001-3844322418.png (1.1 MB)
1.1 MB PNG
>>108654869
> sam
Same energy
>>
File: dipsyAkakichiNoEleven.png (1.8 MB)
1.8 MB PNG
>>108654791
>guys not see the initial posts about this model from a few days
Yes.
> The shit they claimed was egregious.
Yes.
> bharat sovereignty or some jeet bullshit
I'll repeat: All very culturally appropriate for India.
You should read Rudyard Kipling's accounts of India, and Hindus specifically. Pull the old pre-censored works. It tracks p well with my personal experience working with them.
>>
>>
>>