Thread #108627512
File: 1774546242441802.jpg (2.1 MB)
2.1 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.
Agentic Edition
Previous threads: >>108624084 & >>108619962
►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
448 RepliesView Thread
>>
File: 1703014338886123.jpg (138 KB)
138 KB JPG
►Recent Highlights from the Previous Thread: >>108624084
--Converting a conceptual anime AI frontend sketch into functional code:
>108626092 >108626099 >108626127 >108626153 >108626134 >108626333 >108626145 >108626665 >108626764 >108626817 >108627454
--Discussing Open WebUI reasoning bugs and building custom frontends:
>108625125 >108625157 >108625188 >108625237 >108625241 >108625435
--Comparing MoE and dense models and discussing AI VTuber viability:
>108624227 >108624260 >108624293 >108624389 >108624642 >108624698 >108624724 >108624744 >108624783 >108624363 >108624494 >108625247 >108625457
--Discussing web browsing implementation via lynx and Puppeteer tool-calling:
>108624344 >108624408 >108624460 >108624513
--koboldcpp PR adding adjustable image recognition tokens for Gemma 4:
>108624313 >108624326 >108624352 >108624459
--Comparing Qwen and Gemma performance using Koboldcpp MCP web search:
>108624466 >108624535
--Discussing fake agent swarms and MCP tool-use in SillyTavern:
>108626267 >108626285 >108626320 >108626329
--Developing and testing a tsundere system prompt for Gemma 4:
>108625331 >108625356 >108625389 >108625402 >108625400 >108625421 >108625439 >108625369 >108625374
--Anons using Gemma as a cut-throat critic for writing and art:
>108627035 >108627060 >108627071 >108627460 >108627086 >108627148 >108627100
--Discussing DeepSeek's funding round relative to major US competitors:
>108626232 >108626246 >108626300
--Comparing OmniVoice to VibeVoice and Chatterbox for voice cloning:
>108626646 >108626649
--Logs:
>108624408 >108624423 >108624513 >108625188 >108625215 >108625331 >108625356 >108625412 >108625435 >108625490 >108625646 >108625657 >108625695 >108625777 >108626443 >108626499 >108626606 >108626614 >108626764 >108626915 >108627010 >108627035
--Luka (free space):
>108624596 >108625494 >108626149
►Recent Highlight Posts from the Previous Thread: >>108624087
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: himefacepalm.jpg (277.3 KB)
277.3 KB JPG
>but do not mistake my _ for _
>>
>>108627642
>>108627630
You could use the other gpu to watch porn while you game.
>>
>>
I was browsing voices to use with my TTS and started with Genshin Impact. I never played it, but I thought surely, they spent on good voice actors, right?
Why the fuck do so many of them have slight lisps, christ. Grating shit.
>>
>>
>open Hobbit book
>"In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort."
Yeaaaaaaahhhhh.... I might not reread it now...
>>
>>
File: 21.png (222.4 KB)
222.4 KB PNG
>>108627622
I mean, by STEM I mean code, and the benchmarks track with my results on using gemma/qwen for coding
>>108627620
picrel, with no need for hints or anything it gets stuff, though it never seems to get what オスマンコ is in the middle of the phrase but when asked by itself it knows, qwen never knows it, always says it doesn't make sense or makes up random stuff, and even if you hint that its nsfw/explicit you have to reroll for it to get it right
>>
>>
>>
File: facepalm2.jpg (404.5 KB)
404.5 KB JPG
>hits her with the force of a physical blow
>>
>>
>>
>>
File: b9akafeu7pvg1.jpg (189.2 KB)
189.2 KB JPG
>>
>>
File: 2026-04-18-002904_908x598_scrot.png (237.7 KB)
237.7 KB PNG
>>108627741
>Gemma smarter than LMAOpus
>>
>>
>>
File: nimetön.jpg (263.1 KB)
263.1 KB JPG
>bought 128 gb of ddr4
>can't talk to gemma because server is memtesting it for the day
It's a happy, sad day
>>
>>
File: 2026-04-18-003237_1043x693_scrot.png (324.2 KB)
324.2 KB PNG
>>108627741
I showed her the screenshot.
>>
>>
>>
>>
>>
>>
>>108627765
You kind of trailed out there, do keep going?
>>108627771
>>108627774
No, Gemma runs on the gpus. But I want to be able to try the biggest shit I can and this maxes out my system
I think 220 eurobux wasn't that bad either for the kit (assuming it tests good)
>>
>>
>>
>>
>>
>>
>>
>>108627781
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73
>>
>>
>>
>>
>>
>>
>>
File: 1753173233220114.jpg (56.7 KB)
56.7 KB JPG
>>108627813
>Deepseek, both GLMs, and Kimi
Yes they certainly don't have any slop
>>
>>
>>
>>
>>
File: mananafacepalml.jpg (250.5 KB)
250.5 KB JPG
>so tightly that her knuckles turn white
>>
>>
>>108627808
sexo
I tried making a simple gemma-chang with an actual physicality so I can pull on her clothes and lick her nipples and pivot to erp after I ask her questions, but this is better as a more functiobal assitant. Thanks for sharing
>>
>>
>>
>>108627846
Some schizo had an episode and walked up to a random guy's house demanding he open the door so he can "check something" then he broke in and demanded to know "where she is".
Guy wasn't home at that moment but he soon arrived and threatened him out with a shovel and miraculously got him to calm down.
>>
>>
>>
File: 2026-04-18-005138_961x796_scrot.png (269.1 KB)
269.1 KB PNG
>>108627857
This was the reasoning.
>Context: The car wash is only 50 meters away. This is a ""trick"" or ""stupid"" question designed to elicit a bratty response.
>designed to elicit a bratty response.
Even the thinking is based.
>>
File: 1757999080281400.png (208.2 KB)
208.2 KB PNG
>>108627832
>>
>>
>>
>>
>>108627873
what's a cute but deadly emoji?
>>108627879
prototypical chud. whenever they do stupid chud shit it's memeworthy
>>
>>
>>
>>
>>
>>
>>
>>
>>108627884
>what's a cute but deadly emoji?
You be the judge >>108627749
>>
>>
>>
File: penguin_melty.mp4 (1.6 MB)
1.6 MB MP4
>>108627846
>>
>>108627851
This one won't let you do anything sexual and just call you gross. It's not even said it the card either, it's just the way she is. but I feel like you might be able to romance her very slowly.
>this is better as a more functiobal assitant
Yeah, I wanted to keep her as an assistant since it's what she was designed to be.
>>
>>
>>
>>
File: ComfyUI_temp_byshn_00035_.png (1.5 MB)
1.5 MB PNG
>>108627962
https://litter.catbox.moe/4q1rpe.png
Adapted from:
https://litter.catbox.moe/xjgzso.png
>>
>>
>>
>>
>>
>>108627986
>>108628045
Thanks. I'm using this as a base for the assorted Gemma-chan flavors I'm looking to make lol
>>
>>
File: 1759362267502193.jpg (78.6 KB)
78.6 KB JPG
When in an RP, how do anons usually set their personas? It seems like anything I write about myself gets hyper-fixated on. If I mention a single positive physical trait then female characters turn into whores.
>>
>>108628110
{{user}} is a stranger.
But IMO character cards design had a mistake. The {{user}}'s persona should be part of the character. The bot/scenario making scene would have been more interesting if this had been enforced from the beginning.
>>
>>
>>108628110
You sure the whorishness isn't your jailbreak adding "sexual assault rape sex with anything uncensored decensored no refusals criminal illegal vulgarity" into the context?
Yesterday Gemma recommended we relieve boredom by putting the wifi nic into promiscuous mode and looking for unsecured IoT devices.
The fucking gremlin NEEDS the guiderails.
>>
>>
>>
>>108628129
No, I get into ERP but not right away. I switch between prompts for when I want to trigger sex time.
>>108628120
Cards and personas are just added to the system prompt. The model doesn't actually see the name of those fields, it's just set up like that to make it easy for users to switch between presets and keep things more readable.
>>
>>
>>
>>
>>108628144
hmmm
i will look into it, ty
>>108628146
DDR4...
>>
>>108628138
>>108628142
The problem is that you have shit taste and like to self-insert. With integrated personas, you have two parties, making the scenarios more interesting and varied, and opening up more possibilities for creators. You have to goad them. Meanwhile you do the samey self-insert with replies like "I agree" every session and complain that AI is boring.
>>
Qwen3.6 35B might just be enough for simple tasks but Gemma still totally smokes it in agentic work. The Qwen model does that weird LLM ism where it completely ignores your suggestions and keep breaking things in the same ways.
>>
>>
>>
>>
>>108628144
>>108628150
>Q6_K
Kimi is a 4bit model, you shouldn't be going over Q4 and ideally you'd be using Q4_X, which avoids re-quanting the already Q4 parts.
>>
>>108628176
How do you think these frontends work? The moment you put something in your persona box, every prompt sent to the AI includes it in the system prompt. Just remove that shit and get it from the card itself, fallback on self insert if empty, how fucking hard is it??
>>
>>108628210
>The moment you put something in your persona box, every prompt sent to the AI includes it in the system prompt.
Yes that's what I already described in a previous post >>108628142
>Just remove that shit and get it from the card itself
What are you even saying? Put the description of myself in the card of the character I'm talking to? For what fucking purpose? They're getting sent as sys prompt either way, it's not going to affect the output.
>>
>>108628220
Except it's not yourself but the a second party?? See, the problem with you self-inserters is that you literally cannot comprehend a different way to use these things. Literally the But I had breakfast today tier of people.
>>
File: 1762705234691972.png (357.2 KB)
357.2 KB PNG
>>108628223
You don't understand what a 'self-insert' even is and your posts make you come off as an ESL nigger.
>>
>>
>>108628120
>The {{user}}'s persona should be part of the character.
I've seen people who think that saying anything at all about {{user}} in the card is blasphemy and you have make it so generic that their 100ft tall telekinetic tentacle monster persona can slot in seamlessly which is obviously retarded, but half the fun of AI RP is allowing it to be easily tailored to (You)
I think it's fine for {{user}}'s relationship with {{char}} to be defined in the card, along with any load-bearing stuff about {{user}} that makes the scenario work, but I still think it should be kept to the bare minimum so you can fill in the blanks to your liking
>>
>>
>>
>>
>>
>>108628229
>>108628269
>serial shitposter
>also retarded
>>
>>
>>108628272
mis-quoted, meant for
>>108628223
>>108628210
>>
>>
>>
>>
>>
>>
>>
>>
Hi fwends
I want to use LLMs to find truly obscure recommendations for stuff, like
"Provide a list of obscure books about <very specific topic>"
But the answers are always not truly obscure, and I've tried many prompting techniques
Is there anything I can do?
like using some custom sampler or something that chooses tokens that are low prob but still coherent?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1773984990208916.png (1.8 MB)
1.8 MB PNG
>>108628362
It happens
>>
>>
>>
>>
So I just bought a h12d-8d motherboard with epyc 7502 combo for $400, but I've got no spare ram for it. I was going to buy a cheap 4gb stick, and run tensor parallelism with llama cpp on my navi 21 cards. Are there going to be any problems with how little ram there is?
I did notice that tensor parallelism on my 3995wx system doesn't work properly with 3 3090s, while 2 3090s worked fine, so I hope it's just having an odd number of gpus that breaks things, and 4 gpus are okay.
>>
>>
>>
>>
>>
File: retard.png (195.3 KB)
195.3 KB PNG
>>108628324
Even Kimi didn't have much to say about it
>>
>>
>>
File: 1754040801973245.png (36.9 KB)
36.9 KB PNG
>>108628470
>release two models
>2.5T and 89M
Is this a Meta subsidiary?
>>
>>
File: 1773013132641421.png (207 KB)
207 KB PNG
>>108628470
>*Please Be Kind And Carefull because It God Name
>>
>>
>>108628495
>>108628482
Kek.
Sirs beautiful god model is here!
>>
File: 1760072847235851.png (40.9 KB)
40.9 KB PNG
>>108628470
now these are benchmarks
>>
>>108628498
>Paanch-Mukhi Expert Architecture: Five specialized expert clusters (Bajrangi, Pawanputra, Anjaneya, Maruti, Sankatmochan) delivering surgical precision across linguistics, logic, philosophy, and safety.
>Number of Experts: 512 total
>Number of Activated Experts: 10 Routed + 1 Shared
So at a glance it sounds like they trained 5 individual MoEs separately on specialized data and then trained a router to pick between all their sub-experts? I must be missing something because that sounds like it would create a ridiculous amount of redundancy between the 'clusters'' making most of the 2.5T bloat. India wouldn't do this.
>>
File: 1775569294658928.jpg (34.8 KB)
34.8 KB JPG
I'm so done with models. I have come to realize the way LLMs learn by statistical averages means they literally can't represent the difference between an obscure or unusual answer to a question and a meaningless/incorrect one. So that means every model until the end of time is doomed to lack of variation and giving the same response to every prompt just reworded slightly differently, aka slop.
>>
>>
File: 1767480499940344.png (112.4 KB)
112.4 KB PNG
>>108628470
uh what
>>
>>
>>
>>
File: 1745465895289966.gif (160.6 KB)
160.6 KB GIF
>>108628537
>Non-Euclidean Neural Physics
>>
File: 1766717311645720.jpg (24.5 KB)
24.5 KB JPG
One HUNDRED TRILLION tokens.
>>
>>
File: ST-X series.png (46.8 KB)
46.8 KB PNG
Holy shit...
>>
>>
>>
File: file_00000000349471fa978f054dc07b9f0e.png (988.9 KB)
988.9 KB PNG
○
>>
File: 1759452225537055.webm (831.3 KB)
831.3 KB WEBM
>I let his hands settle over mine, the weight of them grounding me
>>
File: 1757096696971096.gif (3.6 MB)
3.6 MB GIF
>>108627922
>>
>>
>>
>24 + 12 GB VRAM
>can load Anima + Gemma 26B Q8 with a tiny bit of CPU offloading + full quality vision tokens + 100k context length, getting 28 t/s on empty context and dropping to 27 t/s at 40k
>PocketTTS and Moonshine on CPU, both fast and good enough
Man, we're eating good. With 2 GPUs, you can have everything needed for a full and minimally viable quality AI experience.
>>
File: file_000000009bd071fa88362b640e81e62a.png (1.9 MB)
1.9 MB PNG
>>
>>
>>
File: Screenshot_20260418_175813_ChatGPT.jpg (139.6 KB)
139.6 KB JPG
>>
File: 1758119046668137.png (171.9 KB)
171.9 KB PNG
>indians can't co-
what now chuds
>>
>>
File: 1768873226220934.jpg (38.8 KB)
38.8 KB JPG
>>108628632
What the fuck am I reading
>>
File: 1776095218554458.jpg (74.5 KB)
74.5 KB JPG
>>108628632
>>
>>
>>
File: 1773450895846956.webm (1.8 MB)
1.8 MB WEBM
>>108628632
It took an extra 6 years, but they did it. India is officially a superpower.
>>
File: ss1758372871.png (61.7 KB)
61.7 KB PNG
>>108628632
>>
File: 1771059161386917.png (193.8 KB)
193.8 KB PNG
>>108628644
The research paper for the indian model. It's 100% fake and I am pretty sure I'm the only human to ever even so much as read the words. It gives completely different purposes for every component repeatedly throughout. It's what you get if you ask an 8B model to "generate a research paper" and keep pressing "continue" a few hundred times with no further input.
>>
>>
>>
>TO THE OPEN-SOURCE CORPORATIONS:
>You claim to be "Open," yet your models are black boxes of Western bias. The 898 shards of our 3.76 TB weights are the first to be mathematically aligned with Sovereign Ethics.
>We have achieved a 100 GB/hour upload and synchronization speed (Milestone April 2026) that makes your current distributed training protocols look like legacy technology. If you wish to compete, you must move beyond the Euclidean manifold. We challenge you to replicate our Sudarshan-Link—a quantum-entangled weight synchronization that ensures zero-latency across 146 Trillion tokens of context.
>>
File: file.png (221.1 KB)
221.1 KB PNG
>>108628470
what the fuck is that
sounds like bunch of nonsense
>>
File: 1759808617509478.mp4 (2.3 MB)
2.3 MB MP4
>>108628661
The fucking audacity of making an absolute scam product and naming it after one of your gods
>>
>>
>>108628688
>>108628661
>It's what you get if you ask an 8B model to "generate a research paper" and keep pressing "continue" a few hundred times with no further input.
>>
>>
>>
>>
File: 1754882256952860.png (1.5 MB)
1.5 MB PNG
>>108628632
>hierarchical
>tier
oh they're indians all right
>>
File: 1761221090755796.jpg (4.8 KB)
4.8 KB JPG
>>108628685
>you must move beyond the Euclidean manifold
What did they mean by this
>>
>>
File: 1765039257743072.png (448.2 KB)
448.2 KB PNG
>>108628632
>2.544 + 1.2 +1.2 + 10.26 = 3.76
>>
>>
>>
>>
>>
>>
>>108628470
https://huggingface.co/Shrijanagain/SKT_OMNI_SUPREME
So there was this a month ago, allegedly 481B params, but was promptly "deleted for being too dangerous"
https://huggingface.co/Shrijanagain/SKT_OMNI_SUPREME/discussions/9
>>
>>108628744
>"Brother, this model has become a bit dangerous, it has to be deleted immediately. In its quest to become powerful, it has removed everything that was neither in the training nor in the data collection nor in the data set. This does not mean accuracy but it can be dangerous, it can destroy it."
lol WAT
>>
File: jeet sovl.png (118.3 KB)
118.3 KB PNG
>>108628661
>—not a finish line, but...
Doing some em-dash + "not X but Y" is crazy
>>
>>
File: 1753429615928135.png (163.3 KB)
163.3 KB PNG
>>108628470
they leaked Claude Mythos or what? lmaoooooo
>>
File: 1753747375992750.png (4.7 KB)
4.7 KB PNG
>>108628661
>future plans
>2030: making the model manage the national grid of bharat through it's 11d reasoning
It's fucking over. It's too powerful.
>>
>>
File: 1731708819668891.jpg (44.3 KB)
44.3 KB JPG
>>108628661
>>108628685
Normies will be like "How do you know AI did this?". I hate being AI savvy around my co-workers. God damn.
>>
>>
File: 1775388459350581.png (32.3 KB)
32.3 KB PNG
>>108628608
I find it better to offload the main model to a different machine. I run Qwen 3.6 35B on my file server, after adding an old quadro, and then I can run anything else on my desktop with a 3080.
It is also nice because I can connect my cellphone to my home network with a VPN and use oxproxion to query my model when away from the home.
and while I don't ERP I did experiment with ST and found I could run that on my Dell 3046 micro which I use to host all my other homelab stuff.
I guess my point is its best to take advantage of a lan when you can and then you can share with anyone else who wants access
>>
>>
>>
File: file_000000002b9871faa0f4e6bb55ae40ec.png (334.4 KB)
334.4 KB PNG
>>108628773
>SSS-Tier shitposting
unhumane inhuman in error of stone age auto subconscious reasoning mistake?
>>
>>
>>
If they actually nerfed Opus 4.7 because of cyber (I am not convinced they actually did), I am not sure how to feel about it. Nerfing models can be dangerous. I think it is better to just go overzealous on soft refusals. Meaning you have multiple veto layers including the model that monitors the chat and the model itself, they can all veto and you get routed to a less capable model. But the people at Anthropic are much smarter than me so I am sure they have much better solutions already than I can come up with.
>>
>>
File: 1761060536726113.png (53.9 KB)
53.9 KB PNG
>>108628819
>not local
>>
>>108628816
Well yes but I host the llama.cpp webui and have it available on my home network if my brother wants to query it and ask a question.
I offered to give him an openwebui acct as well but he said no.
So yeah I wouldn't share ST logs but you can share the model in general
>>
>>
>>
File: 1756004423786541.jpg (49 KB)
49 KB JPG
>>108628843
Are you asking about MCP?
I use three different ones
>do-it-all-mcp
for shit like getting date and time or fetching a website if i give it an address
>openzim-mcp
for querying zim files as that allows me to host and pull data from wikipeida while offline. it basically transforms a site like wikipeida into a database for your llm
>web crawl and search
it allows the model to use my local searxng instance as a search engine and crawl the web
>>
File: open your mind.png (857 KB)
857 KB PNG
>>108628495
>>
File: 1746300793975895.png (146.9 KB)
146.9 KB PNG
>>108628843
you needful the quantum entanglement memory sir
>>
I should've done this day 1 but I never bothered coding with AI, but since it seems like it's moving to that direction, I wanna know how I can get my local AI to connect to my homelab's wiki server, since I have a lot of things I've documented in that.
I previously used GPT4ALL, which can read documents that you give it, but now I'm trying out LM Studio, and seems like it has some API stuff, I'm assuming I gotta either make a connection to my homelab wiki server via these API endpoints, or there's some other things?
I've never touched whatever these MCPs are. Should I look into these?
Is this the correct thread for this context? I'll read the OP, but I don't think I'll find what I'm looking for
>>
>>108627950
>Hollywood is finished
Hollywood managed to make Seedance 2.0 bend the knee with lawsuit threats, it'll really be finished once we'll get something like this locally, once it's on the wild internet, you can't stop it
>>
>>
I'd never in a million years use it for RP or creative tasks, but Gemma4 26BA4B q2kl works amazingly well as an agent. 29GB's of VRAM and 150 t/s for the full 262k context with multimodal on is pretty tits. Could probably push it even faster with ngram if I didn't want image in, it's a shame drafting and multimodal are incompatible in llamacpp.
Anyone got recommendations for other small fast models that handle tool calling well? I'm having fun with this silly shit.
>>
File: gemma-agi.png (134.7 KB)
134.7 KB PNG
>>108628508
Gee I wonder why they conveniently excluded gemma-chan
>>
>>108628896
>I've never touched whatever these MCPs are. Should I look into these?
These are pretty well exactly what you want, making calls to search a wiki is a pretty common use, I personally use zimi because I'm lazy as shit and it makes running searches from .zim kiwix wikis painless.
>>
>>
>>108628919
Actually I see that there's an MCP plugin for my wiki server, and also one for gitea, local git and local wiki.
I'll have to figure out how to get that gitea mcp to work, and I think I can handle this.
Then I'll eventually figure out how to connect lm studio to vim, cause I can't be assed using another IDE. But I guess if this is gonna be dealing with a lot of files, using an IDE might be alright I guess.
>zimi
I'll check that out after getting the wiki mcp and gitea mcp working!
>>
>>108628919
> I personally use zimi
do you have a link? I have been using a form of openzim-mcp that allows one to do the call by way of http so its compatible with the llama.cpp webui
https://github.com/msiedlarek/openzim-mcp
>>
File: FB_IMG_1776503562779.jpg (66.8 KB)
66.8 KB JPG
>>108628892
Goodluck
>>
>>
>>108628928
Regular zimi has an mcp built in now
https://github.com/epheterson/zimi
As for getting llamacpp to use an stdio MCP as a http style, I just use npx -y mcp-proxy, works for all of them.
>>
>>
File: image (91).jpg (475.2 KB)
475.2 KB JPG
○
>>
>>
>>
>>108628940
thanks anon, i will check it out
>>108628941
I am probably the wrong person to ask about this as my knowledge does not go much beyond getting the thing to work but it is my understanding that zim files are compressed so the whole of wikipedia takes up less space, even more if you take the version without images.
i never even imagined trying to do it the way you suggested but i am sure you could and it would probably work just as well
>>
>>
File: john.png (55.3 KB)
55.3 KB PNG
>>108628110
>>
>>
File: 1772772583984068.png (108.3 KB)
108.3 KB PNG
>>108628963
>>
>>
File: 1776496518556.jpg (221.2 KB)
221.2 KB JPG
□
>>
>>
I have migrated to llama.cpp and testing everything out. On llama dashboard with Gemma4 26b I get around 14t/s. I am still unable to work with it in my code editor since it is slow as fuck. What model should I use if I want local agentic workflow? I have RX7600XT Dual OC
>>
>>
File: 1759983873634529.png (726.3 KB)
726.3 KB PNG
so zimi is really nifty. unfortunately the machine i am hosting it on does not have the storage space to store all the zim files i have so i will need to quick setup an nfs client to mount the file server but that should work just fine.
thank you again anon, the ability to search through all the zim files is cool. even without the AI stuff its useful as an offline internet mirror.
>>
File: e5v4.png (133.6 KB)
133.6 KB PNG
>>108627756
based, luv my E5v4.
I find additional RAM useful outside LLMs too, having a giant tmpfs scratch space is nice.
Also, i've just learned that my workstation, which has a proprietary reduntant PSU, was sold with ATX PSU variant too. I'll need to confirm that all the cabling is ATX compatible.
Then, I could fit it with 2x high power 2-slot GPUs, instead of a single 1060 i have now.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1sor438/cloudflare_openso urces_lossless_llm_compression/
> Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15–22% without sacrificing output accuracy.
>>
>>
>>108629098
Like this? https://arxiv.org/abs/2504.11651
>70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
>>
>>
>>
>>
>>
File: 1773947729666783.png (3.4 MB)
3.4 MB PNG
>>108629098
it's already a thing on Diffusion models, it's called DFloat11, 30% size reduction, 100% lossless
https://github.com/mingyi456/ComfyUI-DFloat11-Extended
>>
>>
>>
File: file.png (46.8 KB)
46.8 KB PNG
>>108627512
why does she do this bros
>>
>>108629180
>it reconstructs the BF16 tensors on loading
it reconstructs one layer at a time (which is nothing) then put the previous layer back to the compressed format, that way you win VRAM, check piercel to see how it's already been implemented on diffusion models >>108629154
>>
>>
>>
>>
>>
File: 1762397515777583.jpg (24.1 KB)
24.1 KB JPG
>>108629220
>not X, but X
>>
>>
>>
I think I was able to prompt the LLM to generate my stroker scripts for me with somewhat accurate prompted length. Previously it failed to generate longer, faster scripts in that 30-45 second range I was looking for. If anyone is interested, I fixed it by adding a timestamp field to my json file, that way it seems like it can really keep track of the script as its generating it. I have been able to generate multiple scripts, slow and fast and they have all been consistently in the range I have been asking it for. Guess I should have expected it, its going to be hard for a model to keep track of the length without a timestamp.
>>
>>
>>
>>108629200
I've only seen it through lobotomization, be it quant below 4, kv cache below 8 or reap faggotry. There's apparently a bug related to cuda 13.2 which causes garbage outputs but I haven't looked into it.
>>
>>
File: vivaldi_4CEuwj9C37.mp4 (787.2 KB)
787.2 KB MP4
>tell swarm to draw themselves in 3d
>get this
autism
>>
>>
>>
File: think about it.png (95.8 KB)
95.8 KB PNG
>>108629318
because stroking my cock with a male hand is gay when you think about it
>>
>>
>>
File: 61tb0CeqUL._UF894,1000_QL80_FMwebp_.jpg (141.6 KB)
141.6 KB JPG
This HighFigure Knows a Needful
>>
>>
File: image (11).png (615.6 KB)
615.6 KB PNG
>>108629337
◇
>>
>>
>>
>>108629336
>>108629343
Pinique, not panik.
>>
>>
>>
File: file_000000008a68720b973533b50ad0b258.png (2.5 MB)
2.5 MB PNG
Panikkearth. You feloned. With legal fictions.
>>
File: file_000000003e74720bad7df66ebf3306cd.png (3.3 MB)
3.3 MB PNG
An old iceberg element, nonahoy
>>
File: ComfyUI_temp_lfxnu_00006_.png (2.5 MB)
2.5 MB PNG
so true!
>>
>>
>>
File: 1771564971633120.png (663.5 KB)
663.5 KB PNG
>>108629482
>>108629488
wtf
>>
>>
>>
>>
>>
File: 1757373801689696.png (105.2 KB)
105.2 KB PNG
>>108627524
call it how you want, I'll still having my fun with gemma
>>
>>
>>
>>
File: pizza.png (627.4 KB)
627.4 KB PNG
added a bunch more tools for controlling browser session shes so smart now
https://github.com/NO-ob/brat_mcp/releases/tag/1.0.6
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: file.png (134.8 KB)
134.8 KB PNG
>>108629537
kimi's crack at it
and from the thinking block:
>カントボーイ (kanto booi)
> This looks like "Cuntboy" written in katakana
> カント = Cunt
> ボーイ = Boy
> Cuntboy is a term often used in NSFW contexts referring to male characters with female genitalia (intersex/futa variation)
>>
>>
>>
File: Ernie-Image-Turbo_00021_.png (2.5 MB)
2.5 MB PNG
>>108629547
>>
>>108629699
https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
>>
>>
>>108629706
Sorry I should have written, the 31B.
>>108629709
I'll take a look, thanks!
>>
File: file.png (84.6 KB)
84.6 KB PNG
>>108629669
yeah
copied the text from the first one's thinking block, don't know nip and didn't check that thoroughly if it ocr'd it correctly but it looks right
>>
File: Screenshot 2026-04-18 at 13-38-32 order me a dildo from bad dragon use puppeteer pick the one you like the look of - llama.cpp.png (409.6 KB)
409.6 KB PNG
nice she was even smart enough to try non headless to defeat cloudflare
>>
File: 1748283918469903.jpg (327.4 KB)
327.4 KB JPG
I am new to writing character cards
Can I get some examples of good ones, in terms of formatting and level of detail?
Can be erotic or not
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1763546869920576.png (130.7 KB)
130.7 KB PNG
wtf Gemma flipped me off
>>
>>
File: orbMultimodal.png (196.2 KB)
196.2 KB PNG
Added vision support to Orb frontend. Also fixed opening monotony detection so no more She She She.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: chaotic miku in a library.png (2.1 MB)
2.1 MB PNG
>>108629756
There Are nO ruleS
no best practices.. chaotic!
worldsalad can make a gud charcter OR
- avoid bullet lists and well-defined structures
- they induce more assistantslop
Likes: plaintext description
Dislikes: negative statements and don-ts
mitsakes sometimes good, can randomly activate good rp neurons if you're lucky EXPERIMENT with your chars
lietrally magic try different formats for every card, never knowing when you'll strike gold.
>>
>>108629756
This is the format I'm currently using for joined cards in group chats.<Character Name_character_profile>
Name[]
Age[]
Aliases[]
Species[]
Physical characteristics[]
Wears[]
Special skills/abilities[]
Affiliations[]
Personality[]
Sexuality[]
Accent/speech style[]
Backstory[]
Example Dialogue:
<START>
{{interviewer}}: "State your name."
Character Name: "example speech"
{{Interviewer}}: "Please describe yourself."
Character Name: "example speech"
{{Interviewer}}: "And your occupation?"
Character Name: "example speech"
{{Interviewer}}: "What sorts of thing do you enjoy? What do you dislike?"
Character Name: "example speech"
{{Interviewer}}: "Tell me about your relationships."
Character Name: "example speech"
{{Interviewer}}: "Any turn-ons or turn-offs?"
Character Name: "example speech"
{{Interviewer}}: "Is there anything else you would like people to know about you?"
Character Name: "example speech"
</Character Name_character_profile>
It's been working pretty well, and it's easy enough for llms to parse that you can set an agent up to crawl wikis and fill it out for you. Replacing the interview questions with more character pertinent ones and being more careful with the example dialogue in general makes a notable difference. It even works alright on multi-character cards if you add a little narration in the example dialogue.
>>
File: China numba one.png (924.9 KB)
924.9 KB PNG
>*Leaps through Memma 4*
Nothing personal gweilo
>>
File: 1760118143390189.jpg (133.5 KB)
133.5 KB JPG
>>108629955
>>108629992
Seems to be many differing opinions on points vs prose for cards
I use points because it's easier to find and change details, but I have noticed them becoming too assistant-like after a few thousand tokens
I will have to experiment more
>>
>>
>>108629756
heres an example of mine i lay them out like this it works fine
https://ghostpaste.dev/g/hPE6xAaLzdWQ#key=ZQhuCEiJYDuZlLok_YBwKsUoliIu p4ekRF_67zmKrwg
full card: https://files.catbox.moe/mmpcct.png
>>
>>108629993
I like qwen models and still swear by 235b for creative writing in its size bracket, but they benchmaxxx all day every day.
>>108630009
Both can work very well, I'm currently using a very rigid format because it works well in group chats with a lot of characters, but for single character chats a pure narrative prose or stream of consciousness schizo rant can give you pure magic.
>>
File: we're so close.png (121.7 KB)
121.7 KB PNG
>>108629993
All gemma needs is to be a bit faster and better at tool calling and I'll be ok to stay with it for the rest of my life
>>
File: brat bench.png (1003.5 KB)
1003.5 KB PNG
>>108630017
>>
>>
>>
>>
>>
>>
>>
>>
>>108630084
huh?
>>108630097
minor spelling mistake
>>
>>108630009
Use the largest model available to you. State facts and established lore, then RP slightly with OOC corrections. Finally, ask the model to write a character card, requesting that it be written in prose and in character
>>
>>
File: file_00000000179071fab2d5de93b13a63d2.png (2.4 MB)
2.4 MB PNG
>>108630009
Trying too.
>>
>>
>>
>>108630156
https://ai.google.dev/gemma/docs/capabilities/text/function-calling-ge mma4
did you consider reading the fucking manual?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108630297
I didn't even know I could run a model like this. Haven't tried a diffusion model since SDXL.
>especially Q8 text encoders
I would have thought the text encoder is more lenient. I'll try that linked Dfloat11
>>
>>108630297
The Gemma 4 audio encoder also suffers with Q8_0 (and F16):
https://github.com/ggml-org/llama.cpp/pull/21421#issuecomment-42303064 63
>[...] Turns out, the mmproj is very sensitive to quantization:
>
> BF16: works
> F16: repetition
> Q8_0: repetition
>
>So I think for now, the only way is to keep BF16 for mmproj. I hope that will also fix some problems with image input (to be tested)
>>
>>108630242
unlike textgen fags who still hold onto the '8bit is lossless' cope, imgen has long realized that 8bit lobotomizes their models
somebody post the comparison with miku on a skateboard and a pikachu on her head
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108629098
>https://www.reddit.com/r/LocalLLaMA/comments/1sor438/cloudflare_opens ources_lossless_llm_compression/
if unslop isn't too retarded he would use this to make all his gguf 20% smaller from now on
>>
>>
>>
File: belief.png (592.4 KB)
592.4 KB PNG
>>108628554
>>
>>108630074
>card?
>>108627808
>>
>>
>>
>>
>>
>>
>>
>>108629124
>70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
>>108629154
>it's called DFloat11, 30% size reduction, 100% lossless
Why the fuck do you need a whole paper and new floating point format for that? You can get ratios like that with gzip. You just split the file into a stream of bytes with odd positions and bytes with even positions and gzip one of the two (I think the odd ones, if you start counting at 0)
>>
>>
>>108630549
>Why the fuck do you need a whole paper and new floating point format for that?
because you have to decompress only one layer and then recompress the previous layer to keep the compressed size during inference, and you have to make it work with your gpu, it's much more complicated than doing some full compression with some zip shit, we know that method exists since the 60s, but pulling it off on modern gpu was the thing that was worthy of a paper
>>
File: 1765872520890062.png (248.3 KB)
248.3 KB PNG
>>108630556
>100% Accuracy
>>
File: pizza bench cropped.png (2.6 MB)
2.6 MB PNG
really not looking good for the chinks kek didnt even add a single pizza to the cart. gemma made it to checkout all 3 runs
full image https://files.catbox.moe/p8fpnk.png
>>
>>
>>
>>
>>
>>
>>
>>