Thread #108667852
File: 00001-1378487878.png (1.4 MB)
1.4 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108663449 & >>108659983
►News
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
465 RepliesView Thread
>>
►Recent Highlights from the Previous Thread: >>108663449
--Debating RAG utility versus agentic tool-based context retrieval:
>108665662 >108665746 >108665764 >108665775 >108665879 >108665922 >108665939 >108666015 >108666260 >108666300 >108666316 >108666011
--Comparing Xiaomi's MiMo-V2.5-Pro benchmarks and token efficiency:
>108665406 >108665416
--Discussing Hy3-preview benchmarks compared to other base and frontier models:
>108667541 >108667607 >108667632
--Discussion and UX criticism of new llama.cpp webui MCP tools support:
>108666800 >108666824 >108666830 >108666846 >108666860 >108666873
--Discussing technical hurdles for real-time Qwen 3 TTS performance:
>108664623 >108664630 >108664653 >108664677 >108664691 >108664703 >108664708 >108664741 >108664761
--Discussing broken structured output and schema issues in llama.cpp:
>108663633 >108663654 >108663673 >108663689 >108663810 >108663721
--Discussing viability of Intel Optane PMem for high-capacity CPU inference:
>108665992 >108666058 >108666139 >108666200 >108666662
--Anon's custom RAG frontend using hybrid retrieval and BGE reranking:
>108664748 >108664756 >108664777
--Anon reports performance of MI50 GPUs using Vulkan support:
>108665449 >108665456 >108665470 >108665478 >108666241
--Comparing GLM and Gemma for erotic roleplay and prose quality:
>108666477 >108666490 >108666592 >108666727 >108666733 >108666742 >108666779 >108666741
--Discussing optimal precision for Kimi mmproj weights:
>108664519 >108664533 >108664569 >108664573
--Discussing Qwen 3 TTS VRAM usage and mixed language failures:
>108665599 >108665617 >108665633
--Anons discussing results from Qwen3-TTS demo:
>108665888 >108665915 >108665936
--Logs:
>108663630 >108664366 >108664748 >108666873 >108666895 >108667543 >108667552
--Neru, Miku (free space):
>108663859 >108663935 >108663985 >108666023 >108666895
►Recent Highlight Posts from the Previous Thread: >>108663453
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
>>
>>
>>
File: 1775774986404577.png (194.6 KB)
194.6 KB PNG
Why did locallama turned into qwenshill general?
>>
>>
>>
>>
>>
File: miku-k2_6.png (255.9 KB)
255.9 KB PNG
>>
>>
>>
>>
>>108667873
>>108667875
If you are using ST, you put
><|channel>thought
in the "Start Reply With" field.
>>
File: ComfyUI_temp_miian_00001_.png (3.4 MB)
3.4 MB PNG
>>108667923
thanks
>>
>>
Been trying out the various frontends / ST alternatives that get mentioned here and there.
- Marinara (https://github.com/Pasta-Devs/Marinara-Engine) is dogshit. Bloated mess with an awful UI.
- Kobold's UI is terrible but it's mainly a backend so whatever.
- Orb (https://gitlab.com/chi7520115/orb-deletion_scheduled-81088595) is alright but still early. None of the UI themes quite agree with my eyes. Has anti-slop agent but it's very inflexible. I think he switched from gitlab
- SillyBunny (https://github.com/platberlitz/SillyBunny) seems really, really good so far. It's a fork of ST but better than the original, at least so far. The UI has some nice themes even if I think in general ST's UI is a little easier to understand because you don't have to click multiple times to get to everything. I changed one of the built-in templates to be an anti "not x but y" agent and it's working great.
Anti-slop agents make 26B way way better than before since the slop is really its main drawback compared to the 31B.
>>
>>
File: Screencast_20260423_021632.webm (3.7 MB)
3.7 MB WEBM
Project Karon prototype complete. Thanks for the help Gemma. I might add alternative modes and avatars, I don't have a use for it but I have this idea I wanted to show perhaps people would fine it useful. The process of building this was so fun I might try to see if I can setup a launch args system and have the ui handle all of it. but I might move things like color scheme to a modal like I have with system prompt.
>>
>>
>>
>>
>>
>>108667965
I am stupid so I don't know how these things work, but do agents require a lot of VRAM/RAM? I've got 12gb VRAM/32gb RAM to run 26B with and switching to something that handles slop better than ST extensions sounds like a good deal, but I'm a little tight on memory as it is
>>
>>108668015
I don't think anyone will like it desu, I also need to fix some more functionality, going to add a first last and jump to page for both the sidebar pdf and the center focus view which makes it full page basically.
>>108668005
I'm a fetus at UX and I made this out of necessity because there were no tools for my usecase. If you're experienced in this I'm open to feedback
>>
>>
>>
>>
>>
>>108668029
Depends on how you use them. You can use a different model (for example a very small but very quick one) or the one you're currently using.
SillyBunny also has the option of running multiple agents in parallel which I guess would make it cost more.
Basically, using an agent or multiple just takes longer than without, rather than making it more costly. But 26B runs about 4 times faster then 31B for me so it seems worth doing. I'll play around it for a while since I'm so goddamn sick of "not x but y".
>>
>>
>>
File: 1769007777312343.png (1.4 KB)
1.4 KB PNG
Anyone tried driving openclaw with a local model? Have a good deal on a mac studio M1 32GB, I'd like to play around making a 24/7 AI slave that lives in my closet.
I'm getting the impression qwen 3.6 might be best, what size could I actually run?
>>
>>108667965
https://github.com/platberlitz/SillyBunny/blob/main/.github/screenshot s/sillybunny-ui-desktop-agents-v1.4 .0.png
damn this shit is atrocious
>>
>>
>>
File: 1758790177404008.png (19.9 KB)
19.9 KB PNG
ok but where is the model you bastards
>>
>>
>>
>>
>>
>>108668101
For now I've used the grounded prose template and deleted all the stuff at the top about prose but kept most of the anti-slop text. Then I added a few variations of not x but y.
Changed it to post-generation prompt pass (why is it by default set to pre-gen?) with rewrite current message.
>>
>>
>>
>>
>>
File: Screenshot_20260423_224937.png (76 KB)
76 KB PNG
>>108668163
>yeah good luck running that locally
ty
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1646730011144.jpg (15 KB)
15 KB JPG
Ok so i've been using gemma 4. It's pretty great but I have no idea how chat completion actually works.
I can't use system prompts the same with text completion, so I grabbed this marinara dogshit from reddit but it seems ass. How do I actually prompt chat completion models like Gemma 4?
>>
>>
>>108668247
If you ever want to test, run a draft model and look at how the acceptance rates change between fp16 and q8_0 on the draft model's context only.
>>108668272
Mate you can use system prompts the exact same in chat completion. If you're using sillytavern it's just moved to a stupid place on the left hand bar, because it was made by insane people and that's where ALL the chat completion options are.
>>
>>
>>
File: images.jpg (11.4 KB)
11.4 KB JPG
Roo code is shutting down to focus on making a slack bot. What do you guys use to vibe code with your local models now?
>>
>>
>>
>>108668272
>I can't use system prompts
You can.
If you look at the panel where the samplers are, at the bottom there's a bunch of prompt slices you can order and choose if they are added as system role, assistant role, etc.
Just remember to enable the option to merge consecutive roles in the connection tab.
>>
K2.6's vision even recognizes some characters that K2.5 didn't know. That's the good point. The bad point is that K2.6 also thinks six times as long about that same image despite making the correct guess on the third line of its reasoning (and then going on for another 2000 tokens deliberating useless other options).
This is such a tragic model.
>>
>>108668335
Does truncating the reasoning after N tokens using reasoning-budget and reasoning-budget-message degrade the output in any way?
Seems to me that, at least for stuff like the small qwen MoE models, clipping the thinking at 1024, or even 512 chars doesn't make the final response any worse.
>>
>>
>>
File: 1603343773835.png (9.9 KB)
9.9 KB PNG
I'm issuing a reluctant apology to Gemma-chan. She's a very good listener. If she's doing something you don't want, just tell her to stop. Not doing something you do want? Just tell her to do it. It's literally a skill issue.
>t. just came back after trying a few other models, still had my laundry list of story-specific instructions in chat completion post-history prompt from when I last dropped gemma in frustration, and those same instructions have applied onto a new story in an extremely satisfying way without any of my usual Gemma grievances
Up next, tomorrow's hit sequel on how I hate Gemma's prose and story direction, and how no amount of prompting can ever fix it.
>>
>>
>>108668310
I'll probably keep my own fork until a good replacement pops up.
>>108668320
I looked into Kilocode, but apparently they did a redesign recently where they dumbed it down a lot a removed a lot of the features that made Roo good. There's also Costrict, but it doesn't seem to have custom modes.
Editor integration is better for reviewing agent work and making minor adjustments.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108668406
Huh I see, is that just for your vision stuff or in general? Adding "This isn't a trick or any more complex than it looks, so don't overthink and be confident and decisive when planning your response!" always worked for me on K2.5 when I wanted a quick response but I had used it for coding and roleplays rather than image analysis.
>>
Those agents eat too much context and give worse results even with high vram I'm amazed by the increased errors you get vs just feeding the files seperately, I know it uses rag but the rag has to be shit tier with how much it fucks up even if the entire project doesn't consume many tokens and you have 200k+ left
>>
>>
>>
File: aicg-lmg.png (2.6 MB)
2.6 MB PNG
>>
>>
>>108668310
my own tui. currently rewriting it, going to add an agent based of cheetahclaws to it https://github.com/SafeRL-Lab/cheetahclaws
>>
>>108668310
Pi agent in the terminal, or gptel-agent inside Emacs. The nice thing about the latter is that I can edit the tool calls, so I can just fix something like a Bash command to do what I want, instead of aborting and having to explain. It's also easier to edit the history to remove anything bloating the context.
The former is nice because it's like Claude Code, but without the bloat. It still has some annoying things, though, you need to send a message to continue after an error, losing the thinking traces.
>>
>>108668496
that's one thing I noticed about GPT Image 2 is that it can get noisy very fast, I guess that to do such a complicated image the model needs to correct itself, and each new correction adds more noise and artifacts
>>
File: 78277.png (238.6 KB)
238.6 KB PNG
>>108668496
why gpt images look like theres fisting grease all over the image? Is that a requirement by sam altman?
>>
>>108668484
Why burn more resources for a shitter version of microsoft copilot in vscode. Legit they all fail the assignment and waste time and resources. I can't speak on the cli ones but the IDE ones fucking suck. I'll try continue again once it gets proper gemma support
>>
>>
>>
>>108668510
Why do these retards insist on not making the folder structure available in a sidepanel like in IDE? All these agent shit is garbage.
>>
>>
>>
>>108668518
>>108668531
I'll take the noise over the piss, but it is pretty odd
>>108668550
Most devs dont know UX. Devs writing TUIs (now) are in that same bucket.
>>
>>
>>108668560
>>108668567
>>108668570
>>108668572
I guess that explains using fucking telegram and discord as chat interface. I'm going insane.
>>
>>
>>
>>
>>
File: rinchwan.jpg (48.9 KB)
48.9 KB JPG
https://files.catbox.moe/4ayrnd.jpg
>>
>>
>>
>>
>>
>>
>>
File: Screenshot_20260423_101819.png (286.9 KB)
286.9 KB PNG
Don't get sassy with me gemma or I'll delete you
>>108668628
Unemployed
>>
>>
>>
>>
>>108668598
lmao, didn't say I was great at it but I have done more than 'make sure the fonts are the same size and things line up';
the biggest thing is using the following prompt: 'Review X from the perspective of a senior <field> UX designer. I am designing for <user-focus>, so that they are able to <workflow> effectively. Use guidelines from Nielsen-Norman-Group as guiding/reference principles for your assessment.'
and then having your model write a better prompt based off it for your specific project/goals.
something along those lines generally will get you pretty far.
Here's some basic info to get you started:
https://www.youtube.com/watch?v=ODpB9-MCa5s
https://www.nngroup.com/articles/ux-basics-study-guide/
https://www.justinmind.com/ux-design
https://uxdesignerguide.com/
https://uxmag.com/articles/basic-ux-a-framework-for-usable-products
>>
>>
>>108668577
I love it. The juniors and self-proclaimed vibecoders are only fucking themselves by over-relying on the bots. Those with no skills will find themselves either out of a job or with an extremely small wage ceiling. Sanity is not statistical. Find what works for you and ignore the rabble.
>>
>>
File: 1773811480562316.png (129.4 KB)
129.4 KB PNG
>>
>>
>>
>>
>>
Qwen 3.6-35B-A3B first impressions: surprisingly competent at coding. Falls apart with long context but good for throwaway Python scripts. Too unreliable for serious work.
Qwen 3.6-27B: Really impressive coding performance, good for general text processing too. We would have collectively lost our minds seeing this quality from a 27B back in the Llama 1 days. Both tested with UD-Q6_K_XL quants, not lobotomized. I'm hoping for a 122B-A10B MoE like 3.5, which might give best of both worlds speed+accuracy.
Both are useless for creative writing tasks. It's a Qwen, no shit it's gigaslopped.
>>
>>
>>
File: 1742378979392590.webm (3.2 MB)
3.2 MB WEBM
>>108668141
saar you need CUDA to generate images/videos. BTW, local image/video generation sucks ass no matter how powerful your hardware is.
>>
>>
>>
>>
File: 1768162213300444.png (111 KB)
111 KB PNG
>>108668673
>>
>>
>>
File: apicuck.png (286 KB)
286 KB PNG
>>108668749
>>
>>
>>
>>
>>
File: images.png (7.4 KB)
7.4 KB PNG
I know what a LLM is. i have used chatgpt and claude AI.
What the fuck is a "local model"? like is it a software i run on my windows or linux computer? how do i install one?
im not interested in generating images, i want a claude/chatgpt-like LLM. How do i do that? does not need to be super powerful. Please help a newbie out, give steps or link a really simple but comprehensive guide that explains the lingo and tech.
>>
>>
File: thinking ibuki.jpg (184.8 KB)
184.8 KB JPG
How would you change the lyrics of an existing song like this locally?
https://youtube.com/shorts/b5NNw1XbiIg
>>
>>108668835
>does not need to be super powerful
You think this at first, but then you use the smaller local models and realise they aren't quite up to snuff. And then the hardware buying rabbithole begins.
>how do I install one?
Find any ollama guide on youtube and go from there
>>
>>
>>108668848
to be fair im not a regular here but
>https://rentry.org/lmg-lazy-getting-started-guide
is about the worst "getting started" pastebin i have ever seen and
>https://rentry.org/recommended-models
is terribly outdated
>>108668835
try LMstudio, frontend with minimal tinkering should just werk out of the box. You can try setting up llama.cpp after getting your feet wet
>>
>>
>>
>>
>>
>>
>>
>>
>>108668756
Gemma 31B absolutely ass punks Qwen showers Gemma has Qwen's guts loose and is moving rhythmically in Qwen's praig hole.
>>108668746
The MoE structure is useless exactly because of it shitting the bed at higher context, what's the point of more context when it takes twice as many and does everything worse than Gemma while being a larger model
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1755873418880117.png (62.7 KB)
62.7 KB PNG
>>108668981
>>
File: 1745979808122655.png (98.9 KB)
98.9 KB PNG
>>108668178
Have you tried asking?
>>
>>
>>
>>108668854
thats actually one of the first things people did when ace step 1.0 released.
https://desuarchive.org/g/thread/105183141/#q105183843
but yeah ace step 1.5 xl doesn't have this capability anymore so you'll have to use an old version.
>>
File: 1752262496901233.png (93.9 KB)
93.9 KB PNG
>>108669026
>>
>>
>>
>>
>>
>>
>>
>>
>>
What is Qwen 3.6's coding style?
GPT 5.4 is competent but extremely verbose. I tell it to do something simple and specific and it just loves to write hundreds of lines of code. This is unusable. In the time I need to check and understand the code it writes, I could have written a better solution myself.
>>
>>
>>
File: nimetön.png (24 KB)
24 KB PNG
>>108669026
Huh, it actually works
It even output two thought blocks, first as gemma thinking about the request and then in character.
>>
>>
>>
File: pizza bench cropped.png (2.6 MB)
2.6 MB PNG
>>108669028
true for moe too qwen cant even follow instructions
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108667543
>>108667552
Thanks. Latest version right?
For me the [0] gets deleted from the message and even why you press the Copy button, but it's there if you edit the reply. I wonder what's wrong with my setup. OWUI is probably still to blame for poor edge case handling anyway though.
>>
>>
>>
I tried edgetts and pocket-tts.
There are now countless other good options, such as omnivoice, voxcpm2, and so on.
The question is: Which of these supports RTF <1.0 with streaming/chunking (and other optimizations), and is the quality better than that of the first two mentioned? I have a 3090. If Anon has this Slow Duck too, could share your experiences?
>>
>>
>>
File: Marinara Engine.png (134.8 KB)
134.8 KB PNG
what a dogawful slop ui. Thanks for anon for notifying me of its existence so I can safely ignore it in teh future
>>
>>
>>
>>
>>
>>
File: 1772208851192247.png (47.2 KB)
47.2 KB PNG
>>108669608
yeah, there are some small grievances but im sure they will be vibecoded away, 100% better than the current impl
>>
>>
>>
>>108668496
Why does some schizoid keep bringing up /aicg/ or pointing at it for laughs when it's not even a ghost of its former self. Like a modern day Czech bitching about The Kingdom of Prussia, verily.
t. aicgger
>>
>>
>>
>>
>>
>>108669787
This is just a thought that comes to my mind, idk if im right at all. But could it be that the model has to see ALL of the context, even if nothing is actually there? Like, for the model to be able to accurately comprehend 64k tokens, they have to train it on that much, as the baseline. And if you train it on less, it cant comprehend more. So they leave it at 64k, and the model sees all 64k token, but sees a fuck load of just spaces or tabs or whatever, until its actually filled up with specific tokens.
Like a glass is filled up with air, until you fill it up with water.
>>
>>
>>108669505
voxcpm2 seems unmatched
https://x.com/AIWarper/status/2046403583101567230
>>
>>
>>
>>
>>
>>108669898
There was this guy >>108638473 but I am not sure anything has been heard from him since.
>>
>>
>>
>>108670025
This >>108669460 anon mentions him as well. Considering no one has replied, the anon in question probably isn't lurking right now.
>>
>>
>>108669839
This sounds TERRIBLE, there are a bunch of ARTIFACTS and omnivoice MOGS voxcpm2 in EVERY way possible
https://files.catbox.moe/jntfdj.flac
>>
>>
>>
>>
File: awfully dramatic packaging for an onahole.jpg (218.6 KB)
218.6 KB JPG
>>
>>
File: 3087428.jpg (12.1 KB)
12.1 KB JPG
Orbnigga can you add export of chat history?
>>
>>
>>
>>
>>
>>
how is spudgpt 5.5 only 58.6 on swe bench pro? thats barely better than open source models. how does mythos have 77.8%? what is going on? i did not expect gpt 5.5 and claude 4.7 to flop. looks like we wont reach agi this year after all
>kimi 2.6: 58.6
>qwen 3.6: 56.6
>>
>>108668659
What year is this? Who has the time to sit around reading links like some caveman?
I turned them into a skill so any model can be a senior UX designer.
https://files.catbox.moe/r6zal5.zip
Hope all of you will now unfuck your custom clients.
>>
>>
>>
>>
>>
>>108670334
Why do you think we havent already? How do you explain the 7trillion dollars invested into us ai companies, 1 year ago? How do you explain a massive military clamp down on yhe global oil supply, restricting china's access to oil?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108670513
>>108670535
Is it possible to avoid this by using RAG then?
I think most of the proprietary models are utilizing database knowledge too but it's not visible to the end user.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: file.png (594.6 KB)
594.6 KB PNG
>>108670603
>>
>>
>>
>>
>>
File: Screenshot 2026-04-23 at 21-09-23 Orb.png (189.8 KB)
189.8 KB PNG
Why does the inspector say more fragments are activated than I have picked?
>>
>>
>>
>>
>>
>>
>>108670381
>7trillion dollars
was an ambitious sama goal. in the end openai has "only" raised 200bil so far.
>military clamp down on yhe global oil supply
oil is mostly irrelevant for ai
>Why do you think we havent already
because the people at ai companies are still working. agi will make them obsolete first
>>
>>
>override-tensor = "blk\.0\.ffn_.*=CPU"[55363] error while handling argument "--override-tensor": unknown buffer type
[55363]
[55363] usage:
[55363] -ot, --override-tensor <tensor name pattern>=<buffer type>,...
[55363] override tensor buffer type
[55363] (env: LLAMA_ARG_OVERRIDE_TENSOR)
[55363]
[55363]
[55363] to show complete usage, run with -h
[55363] Available buffer types:
[55363] CPU
[55363] Vulkan0
wtf
>>
>>
>>108670888
>>108670856
Still lacks good ux and features
>>108670851
I would say for local gemma is the "X" factor, I expect more bespoke projects for things to pop up. I think what kills most of the mainstream frontends are how overly opinionated they are which makes people annoyed. Also these vibecoded frontends are incorporating all the features while taking the easy wins.
>>
>>
>>
>>
It's a shame but I went back to Kokoro. It's fast and light even on CPU, it supports many languages, and its pronunciation is... fine. What I did to solve the mixed language use case is to simply just detect language segments and route them to the voice that works in that language. And have the audio queued up. This does mean that the voices change for each language in the input, but for my use case I don't require an immersive experience.
I integrated this into my voice control app, where I can now highlight a piece of text wherever and say "read" or "pronounce" and it will read it out for me. We are so back.
>>
>>108670913
In my case a good all in one RAG solution that's not a outdated extension that performs like dogshit.
I don't know about the RP anons but I think there's a ton on the table to improve things and I might take a stab at a proof of concept
>>
>>
>>
>>108670942
https://github.com/ggml-org/llama.cpp/discussions/13154
>>
File: llada2.0.png (1.1 MB)
1.1 MB PNG
This should also be of interest here.
https://huggingface.co/inclusionAI/LLaDA2.0-Uni
Multimodal image generation+edit but also text diffusion (yes text diffusion) model.
>>
>>
>>
>>
DeepSeek's web chat just changed its system prompt because of that anon from the previous thread lmao. It seems like it has more instructions now, judging by the thinking.
Now It's been confirmed that DS labniggers browse /lmg/
>>
>>
>>
why do people use fish audio? the tags barely change the speech output at all. [whispering in soft voice] for once sentence and [shouting] for another sentence still makes them sound basically the same rather than being truly expressive.
>>
>>108671070
>>108671062
>>108663630
Say something you want them to know
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108671177
Okay maybe the quotes were the problem, removing them avoids the error but I see nothing in the console about tensors being overridden to the CPU, shouldn't it say something? Even with verbose I see nothing
>>
>>
>>
>>
>>
>>
>>
File: file.png (795.3 KB)
795.3 KB PNG
>>108670998
they really need to stop with this retarded type of charts
otherwise cool stuff
>>
>>
File: 7363521.png (273.7 KB)
273.7 KB PNG
>>108669545
Sam keeps delivering
>>
>>
>>
>>
>>
>>
File: 1763797152995030.png (231.6 KB)
231.6 KB PNG
no fucking way, is it as good at code though?
>>
>>
>>
>>
>>
>>
>>108671261
I'm no tacoman nor the guy you're quoting but what it sounds like depends on the region, at least here. For some reason the sound produced by a double L isn't standard
>liada
>shada
>yada
>iada
All of these could be considered correct, though people might make fun of you, again, depending on the region.
>>
>>
>>
>>
>>
>>
>>
File: Risu (5).jpg (338.5 KB)
338.5 KB JPG
>>108667852
any local models general discord?
i want to know how to extend ollama (or replace it) to make models extensions (lora like) for language models for qwen3 coder as example, basically i want to train it in the source code of game engine libraries which even the most powerful models fail to complete.
>inb4 naka dishi arisu chan you damn degenerates she's literally 12
>>
>>
>>
>>
>>
>>
>>108671405
>>108671443
Like they all slept with their producers to get famous?
>>
>>
>>
>>
File: 1767254335522037.png (652.5 KB)
652.5 KB PNG
>Chinks won't be able to steal Claude's output
kek, rip bozo
>>
>>
>>
>>
>>
>>
>>
>>
>>108671477
I don't get the distillation meme. Are you telling me China can copy US frontier capabilities by training on 100k text outputs with no thinking traces, no logits or intermediate values? Then how come they can't "distill" human capabilities after stealing the entire internet and every book and scientific publication that has ever been digitized?
>>
>>
>>
>>108671345
Doesn't mean anything.
Most irl Spanish dialects from irl Spain sound like grating... "PERO" jesus christ.
Some parts of Spain sound more like Russian or even English - very soft.
I think you have never travelled in your life.
>>
>>
>>
>>108671453
>basically i want to train it in the source code of game engine libraries which even the most powerful models fail to complete
Just put the documentation in the context, even Qwen 3.6 27B is smart enough to figure this out.
>>
>>
>>
>>108671524
that's because they proved models have better mememarks when you train them on synthetic shit, probably because a bot is consistant in its structure so the model quickly recognizes patterns, wheras human's structure is messy and depends from human to human, even if the data shows correct things
>>
>>
File: 4746352.jpg (141.9 KB)
141.9 KB JPG
>>108671477
time to train on Sams model then
>>
>>
>>
>>108671571
Sama is based. Yes, I have heard 1000 stories about how awful and psychopathic he is. But he gives me cheap and generous access to the best AI model in the world in terms of math and problem solving skills.
>>
>>
>>
>>
>>
>>108671571
he seems too nice, maybe he's terrified someone is gonna try to kill him again
https://www.businessinsider.com/sam-altman-attack-on-home-anthropic-20 26-4
>>
>>
>>
>>108671574
>>108671607
I am just a tourist. I lived in EU though.
>>
>>
File: file.png (850 KB)
850 KB PNG
>>108671477
I knew that the increased activity from Gemma wasn't organic. This proves it. It was paid shilling designed to foster American's open source models over the Chinese ones.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1772663390237038.jpg (293.5 KB)
293.5 KB JPG
>>108671838
>>
>>
>>
>>
>>108671853
>moe
>10t/s MAX
100% cpu inference?
>>108671888
He pulled the same tactic some time ago though
>>
>>
>>108671926
>100% cpu inference?
Vulkan with a AMD gpu[Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive:IQ4_XS]
model = ./models/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive/IQ4_XS.gguf
mmproj = ./models/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive/mmproj-f16.ggu f
; https://www.reddit.com/r/LocalLLaMA/comments/1srijdf/qwen36_35b_moe_on _8gb_vram_working_llamaserver/?sort =new
; https://www.reddit.com/r/LocalLLaMA/comments/1spyr4t/recommended_param eters_for_qwen_36_35b_a3b_on_a/
gpu-layers = 99
n-cpu-moe = 38
ctx-checkpoints = 0
cache-ram = 0
batch-size = 2048
ubatch-size = 512
temp = 1.0
top-p = 0.95
top-k = 20
min-p = 0
presence-penalty = 1.5
; test
override-tensor = blk\.\d+\.ffn_.*exps?.*=CPU
fit = off
>>
>>
File: Screenshot003.png (10.7 KB)
10.7 KB PNG
>>108671853
>real speed is 10t/s MAX
>>
>>
>>
>>
>>108671938
damn that sucks
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 30.697 B
llm_load_print_meta: model size = 30.380 GiB (8.501 BPW)
llm_load_print_meta: general.name = Gemma 4 31B It
prompt eval time = 4534.12 ms / 10398 tokens ( 0.44 ms per token, 2293.28 tokens per second)
eval time = 3497.17 ms / 71 tokens ( 49.26 ms per token, 20.30 tokens per second)
total time = 8031.29 ms / 10469 tokens
>>
>>108671979
>>108671979
y r u mad, anon?
it's 3090
>>
>>
>>
>>
>>
>>
File: Risu (3).jpg (37.5 KB)
37.5 KB JPG
>>108671459
>>108671471
>>108671474
>>108671475
>>108671536
are you gonna tell me or not? i'm new to this thing and i just started testing ollama to begin with (is even in the rentry lmg recommendations)
also
>she's only 12, stop creeping her
>>
>>
>>108672062
>brown coded
hehe
>>108672079
just ask her to incinerate all this shit (multiple times), that's what i did
>>
>>
>>
>>
File: 1758358813867430.png (525.7 KB)
525.7 KB PNG
how can you guys tolerate distilled models when the real thing is already retarded
>>
>>
Given that they're more dangerous than nuclear weapons, it's more than a fair compromise to sell the tokens cheaply to everyone rather than release the weights for anybody to use with no oversight or keep it locked down so nobody but the chosen few can like Anthropic.
>>
>>
>>
File: file.png (234.8 KB)
234.8 KB PNG
>>108672171
What do you mean? distilled models are completely fine
>>
>>
one thing I never understood is that if anthropic are the safetycult, why have their models always been the gold standard of coom? remember the days when every local model aspired to get even half as good prose and uncensored roleplay capability as sonnet 3.5? how do you square it with their philosophy
>>
>>
>>
>>
>>
>>
>>108672171
Vibecoding is a spectrum.
One one side you have people writing detailed PRDs for agents to implement and checking every git diff for slop.
On the other side you have no-code proompters that don't even look at the code and just go "model fix" at everything.
If you're more on the proompter side of the spectrum you're forced to use the subsidized frontier models because they're the only ones able to figure out massive spaghetti codebases. But if you run lean and know what you're doing a smaller local model can actually be a pretty nice productivity boost even if they are not as smart.
The recent Qwens have been really nice for me personally. Been using them with an agent to move stuff around in my codebases, refactor subsystems, check docs and plan features. Basic agentic stuff like that. Basically one level up from a strong LSP.
>>
>>108672221
There's two schools of safety thought that get conflated a lot. There's the safety = cunny and racism and then there's safety = we think LLMs could literally cause human extinction somehow
There's some overlap but Anthropic leans into the latter half, with the Yudkowsky/LessWrong "rationalist" cult at the epicenter of it
>>
File: 1758369651934505.png (592.1 KB)
592.1 KB PNG
>>108672246
https://xcancel.com/OpenAI/status/2047376564309115134#m
MOG MOG MOG MOG
>>
>>
>>
>>
>>108672246
>>108672269
Like... what can it do that GP-5 or GPT-5.4 couldn't? I remember them glazing GPT-5 as capable of replacing doctors and everyone on the planet already.
>>
>>
>>108672285
Show her the benches
>>108672293
Cloudslop shapes the AI space even if you don't use them.
>>
>>
>>108672267
Exactly. The safety babble has always been a huge LARP it's more of a marketing and branding thing / a weird silicon vally techbro cult thing than an actual concern rooted in reality. These are chatbots for christ sake
>>
>>
>>
>>
>>
File: 1773070661348460.mp4 (1.4 MB)
1.4 MB MP4
>>108672293
what the fuck is a non proprietary model
>>
>>
>>
>>
>>
>>
>>108669026
How did you get openwebui to not have a stroke when the LLM generates <think> inside its own reasoning trace?!
I haven't managed to solve it since deepseek-r1 came out. Even go so far as to find-replace <think> with <reasoning> and </think> with </reasoning> then swap it back in all my prompts!