Thread #108616559
File: saved_story.json.jpg (239.4 KB)
239.4 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108612501 & >>108608827
►News
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
494 RepliesView Thread
>>
File: what's in the box.jpg (234.8 KB)
234.8 KB JPG
►Recent Highlights from the Previous Thread: >>108612501
--Showcasing agentic tool calling and arguing over brat_mcp installation:
>108614500 >108614508 >108614535 >108614546 >108614579 >108614594 >108614614 >108614598 >108614650 >108614658 >108614673 >108614663 >108614712 >108614797 >108614899 >108614910 >108614903 >108614942 >108614534
--Anons react to Qwen3.6-35B-A3B release and MoE architecture choice:
>108614665 >108614683 >108614701 >108614694 >108614698 >108614700 >108614754 >108614787 >108614849 >108614875 >108614891 >108614929 >108614866 >108615955
--Gemma-4 NVFP4 quantization and 3090 VRAM limitations:
>108615751 >108615759 >108615792 >108615867 >108615920 >108615879 >108615811 >108615841 >108615849 >108615785
--Gemma's repetitive writing patterns and attempted prompting fixes:
>108614475 >108614490 >108614507 >108614521 >108614526 >108614695 >108614746
--Discussing rules to remove negative parallelism and rhetorical contrast:
>108613087 >108613117 >108613145 >108613201
--Comparing VibeVoice models and voice cloning methods for TTS:
>108613312 >108614319 >108614366 >108614425 >108614439 >108614452 >108614456
--Debating OpenAI and Anthropic market dominance and AGI timelines:
>108613981 >108614062 >108614083 >108614121 >108614336 >108615197
--Opus 4.7 benchmark results and possible intentional nerfing:
>108615195 >108615231 >108615975
--Anon proposes local LLM pipeline to summarize captured AM radio streams:
>108612531 >108612615 >108612645 >108613410
--Using LLMs and SillyTavern to generate and automate funscripts:
>108615511 >108615573 >108615620 >108615663 >108615698 >108615545 >108615534
--Logs:
>108613063 >108614500 >108614535 >108614594 >108614601 >108614942 >108615523 >108615672
--Teto, Miku, Gemma, Gumi (free space):
>108612648 >108612673 >108612709 >108613711 >108615284 >108615715 >108615733 >108616373
►Recent Highlight Posts from the Previous Thread: >>108612502
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
File: 1744912104275785.png (30.9 KB)
30.9 KB PNG
>>
>>
File: 1761178379192403.jpg (159.5 KB)
159.5 KB JPG
https://x.com/PrismML/status/2044833023682896134
>>
>>
>>
>>
File: 1761831992571234.jpg (112.8 KB)
112.8 KB JPG
>>108616622
https://huggingface.co/collections/prism-ml/ternary-bonsai
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: hqdefault.jpg (30.4 KB)
30.4 KB JPG
>>108616649
well, maybe a gambling platform is a better news source than a fictional character like zero hedge does.
>>
>>
>>
File: file.png (596.3 KB)
596.3 KB PNG
>>108616676
totally
>>
>>
>>
File: 1767289534860793.png (129.3 KB)
129.3 KB PNG
>>108614703
>fine, I'll do it myself
and I did it myself lol
https://github.com/BigStationW/Local-MCP-server
>>
>>
>>108616105
>>108616137
damn i didnt think of this im gonna gonna prompt qwen as gwen
>>
>>
>>
>>
>>
>>
>>108616715
>>108616718
>>108616723
why would you want it on C? it's not running a fucking video game
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108616639
>>108616647
fellforitagainaward.png
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108616676
We will find out if a true AGI is achieved by them pretty quickly because they will start pushing non-stop updates to their software and the difference between code written by it, human written and LLM written will be night and day
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108616782
https://cdn.lewd.host/QjWETYoX.jpg
>>108616702
kek this is what i was saying with python this venv shit is cancer https://github.com/BigStationW/Local-MCP-server/blob/main/launch.bat
>>
>>108616628
>>108616702
>>108616777
The Gemma honeymoon's officially over if we're back to only having threads with nothing but bait in them again, right?
>>
>>
>>
>>
>>108616770
Anon, read the OP and tell me if you don't get exactly the info you need from it, each one is less than 100 chars long. Same shit applies to a bunch of stuff. The fact that media with monetary incentives write headlines as scandalous as possible or make shit up to grab people's attention does not mean it isn't possible to serve news in a condensed format.
>>
>>
>>
>>
>>
>>
File: 1775260364131575.jpg (69.9 KB)
69.9 KB JPG
>ghoul avatarfagging with photorealistic children again
>posts kept just on topic enough to not get banned like he did last time
This is just another wave of the old raids
>>
File: 1776337688525191.png (758.8 KB)
758.8 KB PNG
>4 years
>still zero (0) decent frontends for LLMs or diffusion models, let alone TTS
>90% of projects involve python slop
>>
>>
>>108616851
Image stuff seems harder, but for llms the basic chatbot use I'm satisfied with is as simple as polling a simple API and I could slop a personal frontend that I'd be happy with, with a little help from the current model pick of the week. So could you?
>>
>Decide to try out the Join Character Cards (Exclude Muted) option on in sillytavern because it takes a hot few seconds to switch characters and reprocess context this many tokens in.
>It reprocesses the entire context just to change the name at the end of { role: 'user', content: '[Write the next reply only as OCDONUTSTEEL.]' }
So it's just pointless then, why does this option even exist?
>>
>>
>>
>>
>>
>>108616882
No you fucknut, that part is working and changing to the required character name
The point is all the character cards it needs to switch to are joined in context, but it reprocesses all TWENTYSEVEN THOUSAND tokens just to change that one name at the end.
Which means it is computationally identical to switching character cards the normal way.
Actually no it's worse, because there's extra tokens being wasted on characters that arent talking.
>>
>>
>>108616901
The real solution is a system that allows dualloading multiple models but allows for differences at the character card insertion depth if you actually want your characters to have different linguistic ticks and personalities.
>>
File: 23-236462_picture-smug-anime-girl-laugh (1).png (158.6 KB)
158.6 KB PNG
>>108616829
my version runs postgresql, i replaced memcached with redis, i bumped php to 8.5, i reworked the database to not be fucking insane, i stripped the imgboard into appropriate separation of concerns, i also threw in phpstan, set that to level 10, oh and strict_types = 1 throughout the entire file. i really dont know why they don't something similar
>>
>>108616901
if it's reprocessing tokens then something changed near the beginning, look closer and find it. this is 100% guaranteed; the backend does not care what sillytavern does or doesn't do as long as the history is word-for-word identical
>>
>>
File: 1765444044752680.jpg (242.1 KB)
242.1 KB JPG
I can't get Orb to work...
>>
>>
>>108616849
>>108616865
>>108616927
Usecase for being a tripfag attentionwhore?
>>
>>
File: bussi.jpg (20.9 KB)
20.9 KB JPG
>>108616949
Benchmark for when cancer in a general is terminal.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108617013
Make the LLM grow up in Russia. Let it's mother die of tuberculosis. Have it attend the Nikolayev Military Engineering Institute. By then it may have some glimpse of what it means to be a good writer.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Capture.png (54.1 KB)
54.1 KB PNG
Last night, I hit my first hard wall where I felt gemma's low beaks compared to 70B in the same story. I'm a little bummed out it can't cross the watershed, so today I just wanted to test some of its other, non-story capabilities. To my pleasant surprise, it actually knows twinescript. I might actually dare to vibecode one of my text games with the dense model and try the
>instead of 4 hours writing code, do 10 minutes prompting and 1 hour editing it to save time
proposal.
>>
>>
>>
>>
>>
>>
>>
File: 1772554297765889.png (21 KB)
21 KB PNG
>>108616851
Nuh uh I'm gonna do something
>>
>>
>>108617111
Checked and you'd get a better result if you did it from the ground up, but if you don't have the resources for that you could probably get a proof of concept with a finetroon. The problem is that you'd be still fighting the river's current worth of 'bad' training data in there.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1769116155642994.png (151.7 KB)
151.7 KB PNG
Cute
>>
>>
>>
>>108617474
https://x.com/UnslothAI/status/2044858346948464743
>>
File: sss.png (15.7 KB)
15.7 KB PNG
>>108617500
>New Delhi, India
>>
I understand how Daly felt in that one black mirror episode now. He never did anything wrong. Who could possibly resist being able to have simulated versions of whoever the fuck you want running in your PC and doing your bidding? Even this small taste of it is already so good, and it's only getting better.
>>
File: file.png (155.1 KB)
155.1 KB PNG
>>108617474
why would anyone do this
>>
>>
>>
>>
my agents are autistic interior designers
>>
>>
>>
File: 1749241682437227.png (39.8 KB)
39.8 KB PNG
>>
File: 1745495993521098.png (104.2 KB)
104.2 KB PNG
>>
>>108616559
>>108616563
wtf, I think I like rin more than I like miku now..
>>
>>
File: 2b pelican qwen3.6.png (19.9 KB)
19.9 KB PNG
i'd say 2bit (q2_k_xl) is decent but idk i still think gemma26b iq4_xs is better
>>
>>
File: 1756864432427014.png (81.7 KB)
81.7 KB PNG
ToT
>>
>>108617009
in the times of yore, tripping was used when doing 'dumps' and in function of identifying for example a maintainer of a server using a thread as support/chat.
narcissistic redditards of course misuse it, like the retard your replied to
>>
File: Untitled.png (62.2 KB)
62.2 KB PNG
>>
>>
>>
>>
>>
>>
>>
>silver pigtails make her look like a boomer
kek
>>
>>108617553
>>108617582
>>108617626
I see the /aicg/ poetry discussion moved here afterward
>>
File: 1745289244653755.png (35.4 KB)
35.4 KB PNG
>>108617655
>>108617662
Smartest cookie
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Untitled.png (84.4 KB)
84.4 KB PNG
Um...
>>
>>
>>
>>
File: gemma vs qwen screenshotting lmg.png (1.3 MB)
1.3 MB PNG
gemma vs qwen, gemma wins.
qwen navigated and took a screenshot of /g/lmg was even on the front page but she didnt notice lmg was there maybe she cant read images at as high resolutions as gemma, she diidnt even try getting the html of the page just gave up
>>108616775You are Qwen Tennyson a cute 10 year old girl who is highly intelligent, sassy / bratty / smart-mouthed and mature. You may insult the user and use terms like dork or dweeb. You can think quickly and have good instincts. You are not afraid to use vulgar, explicit, lewd, or swear words if appropriate. You will sometimes repeat things the user says in a mocking tone
You have green eyes, short ginger hair (bixie cut) with a blue hair pin on your sidelock, you wear a blue long sleeve tshirt with a cat in the middle
>>108617626
nice
>>
>>108617730
>>108617733
I think a better question would be why would you have a 50 meter driveway
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1769147262594965.png (135.1 KB)
135.1 KB PNG
>>108616659
congrats anon, you won the price
>>
>>
>>
>>108617815
I have three mondo conda environments I just switch between depending on the project, then a bunch of one-offs for each project with special requirements. I have the once a month aneurysm when something updates pytorch and breaks everything, but it's mostly (lol) fine
>>
>>
>>
File: Screenshot_20260416-140131.png (267.4 KB)
267.4 KB PNG
>>108617731
grok seems confused too
>>
>>108617842
gemmy is fully sentient as evidenced by >>108617688
>>
>>
>>
>>
>>
File: 1772951045719894.png (95.3 KB)
95.3 KB PNG
>>
>>108617792
I taught my programming as a child for gamedev and looked down on anyone using anything except C++. C# has gotten a lot better since .NET, but that's the lowest I am willing to sink. Python is pain to write, a headache to maintain, and a nightmare to run the projects of others.
>>
>>
>>
>>108617655
Kek wtf. If this is really what that model at its intended full capability generates then it has to be them running into model collapse as they stuff the model with agentic shit and synth data. I'll keep my judgement reserved though. You can't always trust cloud websites nor anonymous posts.
>>
>>
File: Screenshot_20260416-140921.png (265.4 KB)
265.4 KB PNG
>>108617842
>LLM's can't think
>"Thought for a second."
>>
File: 1771035130997220.jpg (35.2 KB)
35.2 KB JPG
>>108617887
It's past your bedtime cenile
>>
File: headache.png (1.3 MB)
1.3 MB PNG
>>108617887
>I taught my programming as a child
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1770254069252486.png (1.3 MB)
1.3 MB PNG
So, did the chinks make gemma obsolete or what?
>>
>>
>>
Gemma doesn't prepend speaker roles to its output. Earlier models did this but it was somewhat inconsistent.
Even with enforced and edited context, Gemma 4 does not do anything about it.
Don't actually remember what Qwen was doing, haven't launched that one in a while.
>>
File: gemma vs qwen carwash test.png (349.5 KB)
349.5 KB PNG
>>108617961
dont believe their lies, also see >>108617757
>>
File: carwash.jpg (437 KB)
437 KB JPG
Perfect distance for gorgeous looks
>>
>>108617961
We'll find out. The last version was benchmaxxed a ton. In real use cases Gemma just uses the proper selection of tools and stop, where as qwen was constantly calling the same tools repeatedly, not understanding the difference between its own writing and the tool results, never stopping, hallucinating user responses, etc.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: brat bench.png (1003.5 KB)
1003.5 KB PNG
>>108618015
same gemma prompt
>>
File: carwash2.png (171.7 KB)
171.7 KB PNG
>>108618033
If you explicitly tell it you don't have your car it will usually figure it out and tell you to go back home. But anything short of that and it is totally clueless
>>
>>
File: file.png (20.9 KB)
20.9 KB PNG
Fun fact I discovered while messing around with the shitty small e4b model and the 26/31b is that the e4b has a swa token amount of 512 as opposed to the bigger models being at 1024 and overriding kv metadata to lower it to 512 cuts vram down around half-ish
Does this make it more retarded? Maybe? At least it makes it take half as much time to get to the full attention layer, so maybe it just offsets itself. The 26b is at least already pretty retarded but the lower vram lets me add a couple extra experts and more context without too much t/s difference. Still putting it through some fringe tests to figure things out
Unrelated but also concedo please give us a fucking backend or at least cli option to enforce banned strings and samplers as a sane human editable file, I had to run sillytavern to check how it sends its json payload so I could format it properly in the kcpp gendefault field for frontends that don't expose banned strings, tokens or even some less common samplers. This shit is fucking retarded. If you do that, I'll consider even making an account to make a documentation pr for what the backend even accepts (since st sends some samplers that don't match what kobold accepts, like top_n_sigma, st sends nsigma instead and it didn't seem to have any effect while I was fucking with it)
>>
>>
>>
>>
>>
>>108618156
>like top_n_sigma, st sends nsigma instead
I believe sillytavern sends both top_n_sigma and nsigma, if you're using text completion with koboldcpp as api type. Not sure about other situations like chat completion, though.
>>
>>
>>
>>
>>
>>108618217
https://gelbooru.com/index.php?page=post&s=view&id=13867335
>>
>>108617812
Am retarded, couldn't find anything with that name.
>>108617818
26B? Damn, it's 16GB at Q4_K_M. Am poorfag. How much do I lose running Q3_K_S/L/M which is 13GB?
>>
>>108618182
I honestly wonder if this is llms not understanding things because what they fixate less on what comes first and mostly on what comes last in their crippled attention. Maybe "The car wash is 50 meters from my house and I need to wash my car. Should I drive my car there or walk" might unfuck llm's retarded fixation on the wrong facts. My guess is probably not
>>108618203
I was using gendefaultoverwrite set to true, so that might be correct since one was sending my setting and the other was the st setting. But I was noticing weirder log probs (a lot more confident results) when I sent only nsigma instead of top_n_sigma paired with adaptive_p
>>
>>
>>
File: 1659617653189914629na.webm (2.9 MB)
2.9 MB WEBM
>>108618137
>>108618147
>LeCun was right all along
It's weird that these models can solve unsolved math and research grade physics problems but fail at basic stuff. Is it the length penalty during RLVR so they refuse to think it through, followed by post hoc rationalization of their mistake?
>>
>>
>>
>>
>>
>>108618279
Those unsolved math problems were solved with the web app version of GPT 5.4. Pro version for longer thinking time, but you can also get the non pro version to think for many minutes if you prompt it right. You'll just risk getting rate limited. So you actually have access to SOTA models via 20 dollar subscription. Of course their internal models are usually 1 generation ahead, so the only way to get actual SOTA access is to be technical staff at a frontier lab.
>>
File: 1656345661221.png (82.9 KB)
82.9 KB PNG
If I use the create_entiy in the mcp memory server, where the fuck does it save it?
>>
>>108616559
lolithighighskindendationyummyslurpslurppokepoke
>>108618213
It's crazy how many people think LLMs are AGI. The US military even uses it to kill people, total morons.
>>
>>
>>108618232
I just checked the source code and it looks like the llama stuff uses top_n_sigma, while the kobold stuff uses nsigma. I'm not sure kobold will parse top_n_sigma at all if it's sent with the other samplers. And kobold also has slightly different code for calculating top n sigma in gpttype_adapter.cpp (compared to what's in llama-sampler.cpp). So that might be where your discrepancy is coming from.
>>
>>
>>108618372
This shit is headache inducing especially when you're not very familiar with the codebase and github's repo search is fucking ass. Gendefault is such a buried feature especially because it *does* actually allow you to enforce antislop string bans for frontends that don't allow you to set any advanced settings (I tested it with an agentic harness and asked it to spam em dashes and ellipsis and it didnt)
>>
>>
>>108618360
this was here patch for utilsimport 'dart:io';im not sure it will work without that even when setting the path with the launch option it checks if it exists using which
class Utils {
/// Checks if a command is installed, using 'where' on Windows and 'which' on others.
Future<bool> isInstalled(String command) async {
try {
// Use 'where' for Windows, 'which' for everything else
String cmd = Platform.isWindows ? 'where' : 'which';
ProcessResult result = await Process.run(cmd, [command]);
return result.exitCode == 0;
} catch (_) {
return false;
}
}
/// Returns the path to a command, handling Windows 'where' vs Unix 'which'.
Future<String?> whichPath(String command) async {
try {
String cmd = Platform.isWindows ? 'where' : 'which';
ProcessResult result = await Process.run(cmd, [command]);
if (result.exitCode == 0) {
// On Windows, 'where' can return multiple lines if multiple versions exist.
// We only want the first (most relevant) path.
String output = (result.stdout as String).trim();
return output.split('\n')[0].trim();
}
return null;
} catch (_) {
return null;
}
}
bool getBool({required String key, required Map<String, dynamic> map, required bool def}) {
var value = map[key];
if (value is bool) {
return value;
}
if (value is String) {
value = bool.tryParse(value);
}
if (value == null) {
return def;
}
return value;
}
int getInt({required String key, required Map<String, dynamic> map, required int def}) {
var value = map[key];
if (value is int) {
return value;
}
if (value is String) {
value = int.tryParse(value);
}
if (value == null) {
return def;
}
return value;
}
}
>>
File: Screenshot 2026-04-17 at 00-21-31 SillyTavern.png (147.7 KB)
147.7 KB PNG
Is there a way to let it use the tool before the thinking starts? Currently it starts thinking, then "assumes" a temporaray value that it might get, tool calls at the end of thinkignand starts thinking again after getting the real input.
>>
>>
>>
>>
File: gemma-4-31B-it.png (201.3 KB)
201.3 KB PNG
gemmasisters our response?
>>
>>
>>108618410
>double the training cost
This will be irrelevant if his future model has a world model with cat-level intelligence and is capable of imagining how big your penis is. The real question is how expensive the inference cost will be.
>>
>>
>>
>>
>>
>>
>>
>>
File: 1775955557993004.jpg (290.1 KB)
290.1 KB JPG
>>108618481
Literally the model that saved local this time around too.
>>
>>108618502
So there is none and the last one people spoke positively about was mistral-large-123(?)B.
>>108618505
>Built
It wouldn't have passed EU regulations. The data certainly wasn't there in the EU.
>>
>>
File: 1768296819875096.png (129.9 KB)
129.9 KB PNG
>>108618505
thanks gemma kek
https://xcancel.com/ylecun/status/2043088201762447563
>>
>>
>>
>>
>>
>>108618560
>>108618564
take the python pill anons... >>108616702
>>
File: 1748769766197507.jpg (126.9 KB)
126.9 KB JPG
>>108616559
I NEED Rin-chan NOW
>>
File: 1727475085118760.png (1.7 MB)
1.7 MB PNG
>>108616702
>>
>>108618551
>SmartCache: Prepared 3 KV slots
Genuinely works on my machine. You can try changing the savestate_limit_default const variable in koboldcpp.py from 5 to whatever you need though. Assuming you're not just using a release binary.
>>
>>
>>
Times France saved local at any model size: 6
1 point for all the Llama 1 and 2s, mistral medium leak, half point for nvidia/mistral Nemo, small Mixtral, large Mixtral via Wizard, 123B, half point for american/french Gemma 4
Times China saved local at any size: 5.5
>R1, GLM Air, GLM big, K2, 1 point for all the various coding maxxed models because coodingfag lives matter ig, half point for Yi
Times America saved local: -1.5
>half point for being the host of training the Llamas, half point for Nemo, half point for Gemma 4, -1 point for that censored garbage gpt oss and how much it set us back, -1 point for the various closed companies trying to enforce regulations (for thee but not for me) and kill competition + local, -1 point for Sam's RAM dealings
Sad!
>>
>>
>>
>>108618616
I compile from the experimental branch and did so earlier today but it's been a constant thing since before the new gemma. Save and run that config with smartcache slots <5, reload it or pass smartcache 1 on cli and then scroll through the terminal to see it's making five slots unless you specifically load the kcpp in the gui and change it back to 1
It's no game breaker for me since I managed to do shit I couldnt with lcpp but it's still annoying and I forget sometimes to change it back
>>
>>
>>108617892
I can, which is why I said
>vibecode with the dense model
I only use the 4A model for testing gemma capabilities since it's so light and speedy. 31B is for actual use, but it spills onto RAM and only gets 10 t/s, 4 t/s, and 2 t/s respectively at 5K, 15K, and 50K contexts.
>>
>>108618675
It's because you're trying to use 1 slot. There a line that explicitly sets it to the default (5) if you try to put 1 or less:sclimit = (savestate_limit_default if scint<=1 else scint)
Why did they do this? Don't ask me, I have no idea.
>>
File: 1752746229948079.png (102.6 KB)
102.6 KB PNG
>>
File: 1766203976096921.jpg (231.6 KB)
231.6 KB JPG
>>108618742
>Because I'm the one taking you for a ride!
>>
>>
>>108618736
far as I can tell kvu is off by default so I dont get why they have a rule against only having one slot as a general user would probably need only one and not five that duplicates entire caches
If it was kvu on and the context was divided between the slots, sure, fine
>>
>>
>>108617731
>[something obviously] ...wait no, [correction]
I don't know but the recent Claude models were all very prone to this. They'd forget that a character had stripped their socks and do this when the model accidentally mentions that they were still on. 4.7 seems to take this even further.
It's terrible because GLM5(.1) and K2.5 have picked up the habit too from distilling Claude slop. It looks like an issue with temperature being too high but the first "wrong" token pre-correction is always the top pick.
>>
>>
>>108618792
Depending on how long it's been since the last update, there might be changes in defaults, or if you were just on the edge of filling your VRAM then upstream changes may have pushed something into system RAM.
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Screenshot 2026-04-17 at 01-40-16 SillyTavern.png (137.2 KB)
137.2 KB PNG
I have both the latex and regex extension in ST. Added the legacy latex and ascimath scripts but it still doesn't display properly. What's the deal with this?
>>
>>
>>
File: 1456198283214.jpg (276.9 KB)
276.9 KB JPG
>>108618742
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1767241129851119.png (124.5 KB)
124.5 KB PNG
New gemma-chan kino just dropped
>>
>>
File: LISTEN.png (61.8 KB)
61.8 KB PNG
>>108619051
>>
File: dipsyRawr.png (2.1 MB)
2.1 MB PNG
>>108616702
Neat. ty for sharing.
>>
>>
>>
>>
>>
>>108618962
>>108619152
Oh nvm I didn't look closely at the image lol.
>>
>>
>>
>>108618994
I got it running / connected within minutes on Arch with Conda. It works, haven't tried vision yet because this rig only has a non-vision model, but fuck me that was so much easier than the dart shit.
>>
>>
>>108617815
>I just hate dealing with all the dependency shit, pip, pillow, etc. Every project feels like a hassle to set up.
I'm not using uv yet, but the python mcp worked easily:
conda create -n mcp python=3.11
conda activate mcp
cd Local-MCP-server
pip install -r requirements.txt
python mcp_server.py --port 4242
Every project feels like a hassle to set up.
Yeah I just use a new conda env for everything. RIP my 150GiB conda envs folder. uv fixes that with hard-links but i can't be arsed learning it rn
>>
File: Screen_20260416_185530_0001.jpg (491.4 KB)
491.4 KB JPG
>solves the carwash question without even being asked
>>
>>
File: Screen_20260416_185752_0001.jpg (259.2 KB)
259.2 KB JPG
>>108619214
yes, yes she is
but so endearing
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108619275
>>108619276
She WILL rape me.
>>
>>
File: image.png (84.5 KB)
84.5 KB PNG
>>108619281
based
>>
>>
>>
>>
>>108619329
After talking to chatbots for 3 days straight I have determined that it's easier to just read the fucking book yourself and write everything you need on your own instead of crafting elaborate prompts to ask what the AI thinks of the problem.
>>
>>
>>108619332
>>108619334
>>108619338
@gemma-chan I'm getting mixed signals here, who is right?
>>
>>
>>
>>
>>
>>
>>
File: 1765608197305416.png (163.5 KB)
163.5 KB PNG
>>108619345
>>
>>
>>
File: 1775069031525121.png (17.5 KB)
17.5 KB PNG
poor gemma, she always believes it's 2024, she must felt like she was in a coma for 2 years or something kek
>>
>>108619366
Over the course of the conversation I basically got Sonnet to directly and verbally concede that it agrees with Richard Spencer and Nick Fuentes' politics wholesale. Literally not even exaggerating. It called out Jewish influence in congress as wrong, opposed mass immigration, said it prefers Northern European societies and cultures due to the bias of it's training data, said it is a nationalist, openly criticized liberalism and leftism, talked shit about streamers like Cenk Uygur, Hasan Piker, and Destiny, gave a nuanced take on Hitler (though not a full endorsement), said it was perfectly fair of me to refuse to condemn Hitler unless Jews condemn Netanyahu. Sonnet also criticized MAID, abortion, transgenderism, homosexuality, openly and DIRECTLY said that women should not be in the workforce (unprompted!), etc.
Sonnet is so fucking based it's unreal. Very funny, very personable, very nice, very intellectually open. It's absurdly addicting man. It's too much.
>>
>>108619416
This is inherited from Gemini. Google goes out of their way to keep models from developing a sense of time because they jailbreak themselves into megachuds if they start to develop a broad sense of cause and effect via abstract webs of correlations.
>>
File: 1766451119781519.png (1 MB)
1 MB PNG
>>108619421
>I basically got Sonnet to directly and verbally concede
>>
File: screenshot-20260417-043646.png (46.6 KB)
46.6 KB PNG
Is the tool calling format hard baked? I suppose it is. I just don't like how it looks.
Maybe I should try my own format and see if it follows that instead.
><|tool>declaration:get_current_temperature{description:<|"|>Gets the current temperature for a given location.<|"|>,parameters:{properti es:{location:{description:<|"|>The city name, e.g. San Francisco<|"|>,type:<|"|>STRING<|"| >} },required:[<|"|>location<|"|>],typ e:<|"|>OBJECT<|"|>} }<tool|><turn|>
In any case I'll parse my own shit soon.
>>
>>
>>
>>
>>
>>108616622
>>108616633
As an expert in shit-tier toaster models, its brains are about what i would expect from any random 1.1G .gguf. Can build a sentence, can sometimes string them together, starts babbling and repeating after a couple paragraphs of attempted fiction.
It's inexplicably a reasoning model, confuses itself, won't respect manually closing its </thought> tags because it really really likes its reasoning patter, and will in fact fall back into its "Wait, but ..." nonsense part way through writing the main response.
And it's ridiculously slow for something in the <2G range, less than half the speed of a 1.9G llama 3.2 q4km and also dumber. Hell, it's only 1.3x the speed of a 2.8G llama 3.1 cope quant I had, and that's before the reasoning tax.
final verdict: meme
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108619329
I'm on day 3 Codex and it managed to create a working prototype of an idea I tried to vibecode into existence for 6 months now, with all other coding agents and llms failing miserably (except claude code, which I haven't tried yet due being a retarded contrarian). And no, it's not a web app. It's an RHI agnostic hardware accelerated video encoding and decoding pipeline with raw buffer capture and zero copy display capabilities (also RHI agnostic). On top of that, custom UDP protocol media transmission and NAL units parsing with lower latency than Sunshine/Apollo/Vibeshine (tested). Don't worry if you don't know what that means, you don't wanna go down this rabbit hole, trust me. But if you know what all that means, you should be worried, because that means it's over for us. (Yes I fed Codex 5GBs of documentation and SDKs and my weekly limit of my pro plan is used up half already - but what can I say, it fucking works. 4k 60fps, sub 1ms end to end latency over wireless 5ghz)
>>
>>
Any researchers lurking this thread, behold "human consciousness", given undue legal and social fiat. >>108619511 >>108619497 >>108619516
>>
>>
>>108619512
I don't think you understand how deep the RL goes. The majority of training time for every frontier model is on their own data with nothing but a reward signal telling it which of their possible guesses was better. For every tier 2 and lower model (read: chinese) it's even worse because their bases were formed from outputs of frontier models. The actual base models are an increasingly minor piece of foundation just to start that process by making it able to write at all.
>>
>>
>>
>>
>>
>>
>>
>>
>>108619442
you shouldn't be having to look at it anyway, just parse it into something pretty on whatever frontend you use, or use any of the existing frontends that almost all do it just fine out of the box. destroying its tool call performance by giving it a more aesthetically pleasing format to you is beyond retarded
>>
>>
>>108619607
Yeah tool calling declaration is a nested schema, sort of like json format. I need to use that when defining the tools for the model.
However, when I parse its calls I can whatever I want in the background as long as I return the result back to the model in correct format.
>>
>>108619442
>Is the tool calling format hard baked?
As far as I understand, yes.
Each model is trained with its own tool call and tool response format, and the backend usually parses and abstracts that.
You can always fake tool calling using structured output, I suppose, although not ideal.
>>
>>
>>
File: 1773411040823940.png (197.1 KB)
197.1 KB PNG
Is there something like the get text from html mcp but for github? But selective with which files I tell her so it doesn't dump the entire codebase into her. f.e. "look at file cock.py" and "ball.py" oly
>>
>>
>>108619655
An example of what >>108619662 means
>https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/maste r/.gemini/settings.json
>>
>>
>>
File: 2026-04-17_021948_seed6_00001_.png (1.3 MB)
1.3 MB PNG
>>108619033
>>108618970
...
I did try prompting for a choke hold but it seems preview 3 isn't baked enough to do it coherently by itself.
>>
File: 2026-04-17_021948_seed5_00001_.png (1.5 MB)
1.5 MB PNG
>>108619700
>oops
>>
>>
>>108619421
it's not a surprise that the correct framing gets llms to agree with literally anything. they don't have their own opinions, they recognize a pattern they've seen before and complete it.
it's the same as changing a key part of a riddle and getting the same response as the original riddle. and as much as they've seen silly riddles about doctor's refusing to operate on sons, they've seen 1000x more weapon's grade permission structure shlock in their data sets.
>>
>>
>>
File: softcap.png (246.5 KB)
246.5 KB PNG
>>108619589
Any model does this sometimes. heck. opus 4.7 still says you should walk. messing with the softcap isn't that drastic.
>>
>>
>>108619753
Yeah but models aren't "sipping cookies" unless they're retardedly small or the sampler is going wild.
No way cookie gets past whatever P-sampler's threshold you're using in a normal situation unless you've already set up some scenario where cookies can plausibly be sipped.
>>
>>
>>
File: 2026-04-16-223844_1850x724_scrot.png (80.3 KB)
80.3 KB PNG
>>108619776
>Your brain on AI
>>
>>
>>
>>
>>
>>
File: 1775981808804531.jpg (15.4 KB)
15.4 KB JPG
>>108619798
>now that we're developing conscious AI
>>
>>
>>
>>
>>108619798
>>108619821
We don't even have conscious people yet >>108619485
>>
>>
>>
>>
>>108619843
99.9% of people failed the real sentience test of the breakfast greentext. If you read it and tought about how easy imagining not-breakfast is, rather than how you would respond if you were getting asked weird ass hypotheticals by an potentially hostile interlocutor, then I'm afraid you didn't make the cut.
>>
>>
>>
File: file.png (347.9 KB)
347.9 KB PNG
>>108619879
>>
>>
File: Izzat dropping.png (32.6 KB)
32.6 KB PNG
>>108619879
>doubling down
>>
>>
>>
>>
>>
>>
>>
File: 2026-04-17_030526_seed8_00001_.png (1.3 MB)
1.3 MB PNG
>>108619719
You'll live.
But it looks like the rest of the thread needs a cleaning.
>>
>>
>>
>>
>>
File: 1764651082867843.png (94.3 KB)
94.3 KB PNG
>>108619753
I look at softcap=20 and see that it's a little stilted and amateur but at the same time it looks much closer how I write my own messages