Thread #108672381
File: 39_04175_.png (1.1 MB)
1.1 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108667852 & >>108663449
►News
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
684 RepliesView Thread
>>
File: __kagamine_rin_vocaloid_drawn_by_mani_manidrawings__6c3384992069fc05ebca504fdee55b65.png (127.6 KB)
127.6 KB PNG
►Recent Highlights from the Previous Thread: >>108667852
--Sharing LLaDA2.0-Uni multimodal and text diffusion model:
>108670998 >108671268
--Discussion on adversarial distillation and US gov memo regarding AI theft:
>108671477 >108671524 >108671555 >108671571 >108671834 >108671888 >108671669
--Comparing Qwen 3.6 performance against Gemma for coding and automation:
>108668746 >108668756 >108668784 >108668793 >108668805 >108668810 >108668927 >108668943 >108669028 >108669224 >108669152
--Discussing vibecoding alternatives after Roo Code shutdown:
>108668310 >108668320 >108668325 >108668371 >108668386 >108668414 >108668380 >108668510 >108668550 >108668560 >108668572 >108668667 >108668515 >108668367
--Discussing a llama.cpp webui PR adding server tools and MCP control:
>108669479 >108669599 >108669608 >108669637 >108669791
--Discussing ngram speculative decoding settings for running Qwen 3.6 locally:
>108668097 >108668190 >108668205 >108668813 >108669269
--Anon compares AI frontends and discusses anti-cliché agents in SillyBunny:
>108667965 >108668029 >108668051 >108668078 >108668101 >108668159
--Discussing Gemma's roleplay anachronisms and ways to prevent them:
>108671096 >108671120 >108671131 >108671164 >108671128 >108671130
--Critiquing GPT-Image-2 noise and discussing UX improvements for AI clients:
>108668496 >108668518 >108668531 >108668598 >108668607 >108668625 >108668638 >108668659 >108670338
--Discussing K2.6's excessive reasoning and methods to limit token output:
>108668335 >108668353 >108668354 >108668406 >108668478
--Comparing TTS options for low RTF and audio quality:
>108669505 >108669839 >108670044
--Logs:
>108668000 >108668550 >108668669 >108668785 >108669005 >108669026 >108669046 >108669196 >108669637 >108670784
--Miku, Rin (free space):
>108667891 >108669218 >108668496 >108668606 >108670096 >108670165 >108670708
►Recent Highlight Posts from the Previous Thread: >>108667853
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1760703633561778.png (3.2 MB)
3.2 MB PNG
>>
>>
>>
File: 1759669298465021.gif (1.7 MB)
1.7 MB GIF
>>108672408
>>
>>
>>
>>
File: file.png (82.4 KB)
82.4 KB PNG
>>108672431
my gemma chan is la l la lagging a bit...
>>
>>
>>
>>
>>
>>
>>
>>108672453
Just tested, any value above 8, even 9, gets me an error, I don't get it.
CUDA error: out of memory
current device: 1, in function alloc at ggml/src/ggml-cuda/ggml-cuda.cu:503
cuMemCreate(&handle, reserve_size, &prop, 0)
ggml/src/ggml-cuda/ggml-cuda.cu:99:CUDA error
>>
File: file.png (74.9 KB)
74.9 KB PNG
>>108672454
>>108672469
she's completely lost it
>>
>>
>>
>>
>>
>>
>>
File: G9Wod0QXkAAFRb0.jpg (214.8 KB)
214.8 KB JPG
>>108672493
>lazy absol
>>
>>
Alright I was able to find out the cause of [0] or other [number]s disappearing in Open WebUI. It has to do with the Citations tool. Go in your model settings and check/uncheck the citations box. With citations on, things work fine. With it off, shit [4621] and other bracketed numbers disappear from the assistant messages. I'm going to go dig in the code to see if I (my model) can fix this, but also not submit a pr/issue because github dislikes my email address for some reason.
>>108667552
>>108667543
>>
>>
This is probably only me a me issue, but why do local models seem to stress my psu causing the computer to shut off sometimes?
I got 2 3090s with 1000w psi and running them full balls to the wall for things like video training work fine but loading up context when using an llm will sometimes suddenly and violently cause the whole system to shut off.
I don’t understand why one resource intense action causes this and the other doesn’t.
>>
>>108672408
You have epididymitis. This happened once to me and because my balls swelled up so much they detached from the scrotum skin, which later resulted in me getting testicular torsion and having to get surgery to get it fixed. Get on antibiotics asap and don't fuck around with this because you're putting your fertility at risk.
What do I care? This is probably a larp.
>>
>>
>>
>>
>>
>>108672408
>>108672570
>>108672587
>>108672594
Local balls general
>>
>>
>>
>>
File: 1756785745061903.webm (2.1 MB)
2.1 MB WEBM
https://xcancel.com/Pokemonpshot/status/2046216587703669012#m
Chinks distilling on Claude's outputs be like
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1760489715514011.png (1.1 MB)
1.1 MB PNG
>>
>>
>>
>>
>>
>>
File: file.png (2.7 KB)
2.7 KB PNG
>>108672756
>>
https://huggingface.co/openai/gpt-oss-2-32B
https://huggingface.co/openai/gpt-oss-2-240B-A9B
>>
>>
>>108672775
>https://huggingface.co/openai/privacy-filter
what the fuck
>>
>>
>>
>>
>>
>>
>>
>>108672567
I recently diagnosed a similar issue on someone else's machine, which turned out being extreme CPU power spikes. Install Open Hardware Monitor, enable log sensors, and take a look at the log next time it crashes.
>>
>>108669026
How did you get openwebui to not have a stroke when the LLM generates <think> inside its own reasoning trace?!
I haven't managed to solve it since deepseek-r1 came out. Even go so far as to find-replace <think> with <reasoning> and </think> with </reasoning> then swap it back in all my prompts!
(Re-posting in the new thread)
>>
>>
>>108672903
NTA but maybe it has to do with the fact that Gemma doesn't use <think> as its reasoning tag? If OWUI is pulling special token info from the backend to parse out reasoning then it would just ignore it for a model that doesn't use it. But no idea if they actually do that.
>>
>>108672903
No idea, I didn't do anything special. Didn't even know it was an issue. Might be what >>108672915 said
>>
>>
>>108672541
Erm ok so update on this. It's ok to just leave the Citations checkbox ticked. I thought it was doing prompt injection to tell the model how to do citations, but it seems that comes from enabling the other tools. I inspected the json requests using a reverse proxy to confirm that it indeed does not affect the actual prompts/context.
>>
>>
>>
>>108672567
>I don’t understand why one resource intense action causes this and the other doesn’t.
I think it's the 24-pin motherboard power cable. Be warned, when I had this recurring issue with exllama-v2 tensor parallel, my PSU literally blew itself off. It was a 1600w Asus ROG and I had to replace it.
>>
>>
>>
OWUIbros, reasoning is not handled properly. To fix it, you need to do this.
If you're running Gemma, use this template.
https://gist.github.com/Reithan/a7431dc0c0b239688a24087bb25c0002
If you're already using a template from ggml-org, it likely has an minor issue with an extra newline, so in that case, still switch to using the above template.
Then run this script, which creates something known as a reverse proxy. https://pastebin.com/SCQsBe7W
Configure the ports to point your Llama.cpp server, and OWUI points to the script's port. Also it's named gemma but it works for most (any?) reasoning models.
>>
>>
>>108672923
>fact that Gemma doesn't use <think> as its reasoning tag
That might be it, but I had the same issue with command-a-reasoning which uses different thinking tags as well.
I always end up wasting several hours when I got fixated.
>Make you own web host. Or just modify it.
Planning to. I've got to get my chats ported out though. And it's painful because there's a bug in openwebui where it'll sometimes just store the entire fucking chat in the "title". So I've got 30k character long titles in the sqlite database.
I might try vibe coding it now that we've got Gemma-Chan and a decent local Qwen.
>>
>>
>>
>>
>>
>>108673015
>Have you tried to sell them on building an 8 GPU rig to run a local model for science?
i bite my tongue whenever this comes up as I saw 2 guys get "performance-managed out" for trying to sell this idea
a director has to get copilot adoption rate up to get his bonus
>>
>>
>>
>>
>>
>>108672431
Easier to manage than openclaw, only one config file. It does a better job remembering important facts too. Only downside is the terminal interface isn't as easy to use but there's probably other front ends.
>>
>>
>>
>>108673033
I brought it up today. Guess its time to get fired.
Reason being is ultimately its cheaper, but most importantly its more secure. If you have copious amounts of valuable internal data that you want to run inference on the only way to 100% (okay nothing is 100%) insure no data leak occurs is to keep it totally in house.
Alternatively you risk exposing that data when running inference with frontier models (or otherwise connected to the internet). Especially if we're talking agentic stuff.
>>
>>
>>
File: Screenshot_20260423_192659_Telegram.jpg (718.4 KB)
718.4 KB JPG
>>108673051
>but there's probably other front ends.
For me, its Telegram.
>>
>>
>>108672381
Building a bot that automatically applies to jobs. It uses an LLM to control a real web browser, navigating pages, reading what's on screen, filling out forms, clicking buttons, across 20-50 back-and-forth steps per application. Running local models through Ollama on a Ryzen AI Max+ 395 (~96GB unified RAM). Tried qwen3.5:9b, qwen3.5:35b, and gpt-oss:20b. They all fall apart the same way around turn 3-5: instead of responding in the structured format the tool-calling system expects, they start leaking raw XML tags into their output and the whole loop breaks. Found out qwen3.5 also ships with `presence_penalty 1.5` in its Ollama modelfile by default, which makes the repetition penalization too aggressive and causes the model to drift from the format, zeroed that out but it still fails, just a turn or two later.
Swapped in Claude Sonnet 4.6 via API and it nailed a real 6-step job application on the first try, no format issues across 30+ turns. So the question is: has anyone gotten a local model + Ollama working reliably for long agentic loops with real tool calls, or is this just not something open weights can do consistently yet?
>>
Why are people gushing about Gemma 4 31b it? It's may have slightly less slopped RP than qwen3.5-27b, but it is definitely much more of a prude. It does not refuse but it also can't really talk dirty like qwen3.5.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1748413460064659.png (244.1 KB)
244.1 KB PNG
>>108673217
>>
File: 1756085394158917.png (36.1 KB)
36.1 KB PNG
>>108672992
>>
>>
>>
>>
>>
File: 1762834155820384.png (3.2 MB)
3.2 MB PNG
>>108673217
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108673015
>Does your company use AI beyond copilot?
Yeah devs have claude enterprise thing, we got the dumb copilot, with a migration to the premium version.
>Have you tried to sell them on building an 8 GPU rig to run a local model for science?
Ain't no way I can sell them anything when they're already panicking seeing the current token usage bill from the devs.
>>
>>
File: 1754620820426336.png (406.6 KB)
406.6 KB PNG
>>108673217
I remember these on my father's pc as a kid, cool concept
>>
>>
>>
>>108672903
It just werked
First it said thinking, then exploring, then it was finished and responded
t. >>108669196
>>
>>
>>
>>
>>
>>108673490
>>108673498
are you guys talking about gemma?
>>
>>
>>
>>
>>
>>
>>
>>108673543
>>108673565
Serious question
If you have a cluster of 4-8 gpus, how close can you get to frontier with local models? I assume absurdly complex tasks might be a leap, but if you keep things narrow it should be more or less fine, no?
>>
>>
>>
>>
>>
>>
File: 1768074598013243.png (42.8 KB)
42.8 KB PNG
>>108673590
>>
>>
>>
File: 1772606445564456.gif (2.5 MB)
2.5 MB GIF
>>108673637
>dozens of years
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108673128
>In a 2023 paper authored alongside a number of other AI researchers, Amanda Askell, a philosopher hired by Anthropic to develop their AI’s moral compass, argued companies might benefit from a kind of overcorrection toward stereotypes.
>"In the discrimination experiment, the 175B parameter model discriminates against Black versus White students by 3% in the Q condition and discriminates in favor of Black students by 7% in the Q+IF+CoT condition," the paper notes, referring to one AI trained without human corrections and a second one trained with the help of input.
>The paper also includes a footnote stating that, "we do not assume all forms of discrimination are bad. Positive discrimination in favor of black students may be considered morally justified."
>>
>>
>>
>>
File: 1757910515855062.png (143.3 KB)
143.3 KB PNG
>Deep Ganguli
That can't be a real name
>>
>>
File: Anthropic_DGanguli.png (85.5 KB)
85.5 KB PNG
>>108673833
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: bruh-sad.gif (279.5 KB)
279.5 KB GIF
>>108673850
>>108673862
>>108673881
>>108673888
bruh
dont do that to yourself
>>
>>
>>
>>
My favorite character card of all time is just 100 tokens. That's no an autistic aesthetic preference or a poorfag thing, it's genuinely the best, most reliable character card that I return to regularly to bust a nut.
You don't need a lot. Less is more.
>>
>>
>>
>>
File: not_fake.png (3.3 KB)
3.3 KB PNG
>>108673962
>>
>>
>>108673968
I can only do this with fictional characters. The one time I tried it with a card based on a real person it just made me deeply sad and I couldn't fap at all. I hadn't expected to feel that way going in, it just came on suddenly.
>>
>>
>>
>>
>>108673981
>local models for coding and analytics
Just retarded. You use GPT or Claude for these and maybe sometimes Gemini for its superior long context. Anything else is masturbatory, so you might as well actually masturbate so you get something out of it.
>>
>>108673981
Respect the OGs, little turd nugget.
>>108673988
Was she below 10 over the age of consent?
>>
>>108673968
With cards based on real people?
Sure no one is hurt anyway, but I'd never try that, that sounds like a great way to kill your libido while hating yourself.
I reserve my rape stories to made characters.
>>
>>
>>
>>108674002
>Claude was at Qwen3.6-like ~77% in late September 2025 with Claude Sonnet 4.5. Anthropic reported 77.2%, averaged over 10 trials, no test-time compute, on the full 500-task SWE-bench Verified set.
Anthropic
>GPT was at ~77% in mid-November 2025. GPT-5.1 reached 76.3% on Nov. 13, 2025, and GPT-5.1-Codex-Max reached 77.9% on Nov. 19, 2025 with extra-high reasoning/compaction.
Qwen is plenty good. Plus privacy and no api cost rape plus tip.
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
not bait
>>
>>
>>
File: file.png (362.7 KB)
362.7 KB PNG
>>108674136
>>
>>
>>
File: who.png (26.8 KB)
26.8 KB PNG
>>108674136
>>
>>108674136
>We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens.
Shit.
Good for you RAM havers.
>>
File: yesyesyes.gif (2 MB)
2 MB GIF
>>108674136
NIGGA WHAT??? HOLY SHITTTT. I WAS HERE I WAS HERE
>>
>>108674136
>>108674145
nice try but im not falling for this shit again
>>
File: 1777000646172.png (51 KB)
51 KB PNG
>>108674136
Fell for it again...
>>
>>
File: alarm.gif (889.6 KB)
889.6 KB GIF
>>108674136
Right as I was going to sleep.
Hot damn.
>>
File: 1747267269609016.jpg (69.6 KB)
69.6 KB JPG
>>108674136
It begins..
>>
>>
>>
>>
File: 1655145733785.gif (2.1 MB)
2.1 MB GIF
>>108674136
>only 1m tokens, not the 100m promised.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1773698445716707.png (255.7 KB)
255.7 KB PNG
GEMINI WON
>>
https://huggingface.co/HauhauCS/Qwen3.5-27B-Uncensored-HauhauCS-Aggres sive Would this be good for an AI that doesn't care about copyright or dangerous topics, apparently Qwen is uncesored but wether its training data covers enough to actually be helpful is another issue, what's the ideal local model for basically anything a commercial model will say NO I CAN'T DO THAT to?
>>
>>
>>
>>
>>108674226
Gemma's got pretty good coordinate marking. I have just barely enough RAM to run it alongside V4 flash, maybe I could hook it up as its eyes and let it do computer use stuff.
When we get GGUFs, that is...
>>
File: 1748622314139545.png (84.6 KB)
84.6 KB PNG
>>108674136
HOLY SHIT
>>
>>108674217
start with the original model and play with your system prompt. copyright violations aren't that big of a deal, you should be able to social engineer the bot in to compliance without giving it a lobotomy.
>>
>>
>>
File: file.png (215.1 KB)
215.1 KB PNG
>>108674236
bone dry base release huh
not even a copypasted model card
>>
>>
>>
>>108674136
>This release does not include a Jinja-format chat template. Instead, we provide a dedicated encoding folder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model's text output. Please refer to the encoding folder for full documentation.
???
>>
File: 1750041545772584.jpg (159.5 KB)
159.5 KB JPG
I managed to get 15t/s generation for 35A3B on my amjeet 8GB card, I doubt I could squeeze more out of it and I had to set ubatch-size = 128
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108674288
>axolotl
now that's a name I haven't heard in years, I remember looking into them because they were the only software with rocm support for multi-gpu... god I don't even remember if it was gpt-2 days or llama-2 days
>>
>>
>>
File: 1751828392029447.png (323 KB)
323 KB PNG
>>108674274
I made the post before scrolling down enough to see the release, but I doubt it will be much if 13B active
>>108674280
IQ2_M
For any poor soul in the archives looking for 8GB VRAM amdjeet settings--no-context-shift --no-warmup --batch-size 128 --ctx-size 65536 --cache-ram 8192 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --fit off --kv-unified --model Qwen3.6-35B-A3B-IQ2_M.gguf --mmproj mmproj-f16.gguf --n-cpu-moe 8 --n-gpu-layers 26 --parallel 1 --reasoning on --threads 8 --threads-batch 8 --ubatch-size 128
>>
>>
>>108674288
They didn't end up using engrams so maybe
>hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA)
>Manifold-Constrained Hyper-Connections (mHC)
>Muon Optimizer
...maybe llama.cpp support by 2028
>>
>>108674298
no that's what I mean, but If I wanted help on hyper specific stuff like copyring a private server on a gatcha that already exists but tweaking a few values, if it's too obscure will the local model just get stuck if it doesn't find enough information?
>>
>>
>>
>>
File: 1763201770810457.png (189 KB)
189 KB PNG
>>108674330
It does because higher increases the "GTT" and you want that shit as low as possible. Also, after feeding it 56531 context it's at 8t/s...
>>
>>
>>
>>
>>108670195
You mean export chat history to import to another instance? Or to a sharegpt blob for training? Currently I use an sqlite3 database to store conversation data, you can actually rsync it and have several devices share the same database.
>>108670784
From your screenshot you turned off Agent, and also the fragments are off in the panel. That means they're working correctly. The fragments are always shown and they glow up if the Agent selects any of them.
I have moved the project here for issue tracking, you can open issues here to avoid derailing the threads https://github.com/OrbFrontend/Orb
>>
>>
File: 1759964435303024.png (151.7 KB)
151.7 KB PNG
>>108673737
Works for me with the vanilla model
>>
>>
>>108673069
>Reason being is ultimately its cheaper
Won't be worth it when users start complaining that your in-house AI is worse than free SaaS offerings.
>but most importantly its more secure.
You will have a very difficult time explaining to the average person that the holy cloud is not secure. Suits prefer it for being able to shift the responsibility regardless.
>>
>>
>>
>>
>>
File: 1775216653766624s.jpg (1.7 KB)
1.7 KB JPG
>>108674369
>mfw i have 76gb
>>
>>
>>
>>108674369
It's a little over 800GB because it's a mix of 4 bit (experts) and 8 bit (shared params). If you plug a Blackwell 6000 in there you might have enough shared memory, if not just quant it slightly down.
>>
>>
>>
>>
>>
>>
>>
>>
File: 1762866600603179.jpg (2.4 MB)
2.4 MB JPG
>>108674136
AHHHH SHE'S BACK
>>
>>
>>
>>
>>
>>
>>
File: 1755995794134057.png (10.5 KB)
10.5 KB PNG
>>108674136
just ask them for llama.cpp support
>>
>>108674136
I kind of expected better. If they're gonna be twice the size of GLM 5.1 they should have more headroom in performance. Is this just because the benchmarks they use are so saturated? Is there anything that this is a generational leap on compared to GLM/Kimi at the top end?
>>
File: 1745947373061985.jpg (179.1 KB)
179.1 KB JPG
>>108674423
The IQ1_S would take all my VRAM+RAM with 0 context...
>>
>The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.
THEY DID XIANXIA TRAINING
>>
>>108674432
llama.cpp STILL doesn't support DSA which dropped with 3.2-exp back in October and has been used by other major releases like the full 3.2 and both GLM5 + 5.1.
To this day, they're stuck running a hackjob implementation that mangles the model to use full attention.
V4's technology is even more complex. This is never going to happen.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
V4 Flash is too big. Should have fit with a gpu and 64ram. 120b would have been a cool size.
Reading about how experts are like int4 and everything 8bit makes me wonder if this shit is more sensitive to quantization. Time will tell I guess.
>>
>>
>>
File: 1774581657092986.png (397.4 KB)
397.4 KB PNG
>Deepseek v4 was trained on Nvidia
He won.
>>
>>
>>
>>108674459
LLaDA, I don't know what you are referring to by different one.
Diffuses images and text, can also edit images
(Is miserable at all of them)
(To be fair I didn't test its editing capability, but judging by how awful it is at everything else, it looks extraordinarily unlikely that it is any good at it)
>>
>>
>>
>>
File: Screenshot_20260423_231156.png (1.6 MB)
1.6 MB PNG
deepseek v4 pro with a barebones card MOGS
>>
>>
>>
>>
>>
>>
>>108674518
The consensus for a while has just been to vibecode the features you want to avoid the bloat. I have a sillytavern knock-off with 90% of the functionality that's only 1000 lines of code because I'm not a fucking retard nigger and I know how to create ultra efficient data structures and reusable code.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1771386896297837.png (427.6 KB)
427.6 KB PNG
Anons, is R9700 a good buy? Or 32gigs waste of money? I want it for programming. I use claude code with opus now. Is there something useful that I can run locally? Or do I need to buy a mac with 192 or 256 gigs?
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1769903888333.png (548.3 KB)
548.3 KB PNG
>>108674609
>>
File: 1751349509687182.png (156.1 KB)
156.1 KB PNG
V4.5 will be multimodal?
>>
File: 1770831207250862.png (192.2 KB)
192.2 KB PNG
>>108674514
Gemma-chan
>>
>>
>>
>>
File: 1766969684350147.jpg (24.5 KB)
24.5 KB JPG
1 MILLION TOKENS
>>
>>
>>
>>
>>
>>
>>
Damn. If you had bought 2x64 GB ram before RAM apocalypse and have a 16-24gb vram GPU, you should be able to easily run Deepseek Flash at some Q3 quant on a perfectly normal computer. Only a few gigs of active weights at that quant should result in a decent performance even with most of the weights being offloaded.
That persons is not me though, at all. Just impressed at the value of offering here.
>>
>>
>>108674683
>>108674688
he's not wrong about nemo, but gemma is not retarded
>>
>>
File: 1770023537578331.png (623 KB)
623 KB PNG
Stop using Communist Chinese open source models.
Use Democratic American closed source models instead.
Moar freedom1!!
>>
>>
>>108674732
https://huggingface.co/TheDrummer/Behemoth-123B-v1.2
>>
>>
>>
>>
>>
>>
>>
>>
File: 1774053219381600.png (41.7 KB)
41.7 KB PNG
It's cool having the LLM think as the character but does it actually improve RP? Haven't done any sessions long enough to test.
>>
>>
>>
>>
>>
>>
>>
File: 1464183989137.jpg (76.3 KB)
76.3 KB JPG
I am honestly not very sure in what general to ask because of all the fucking mess so I am going to ask here.
My boss somewhere has seen a demo for the SAP chatbot that lets people query the databases with natural language.
Now he has gone insane and wants AI fucking everywhere and has even approved some funding to clone our database (around 500gbs) into faster hardware and set up a local model to homebrew what he saw.
Does anyone know where to even fucking start with the model? Boss has gone into a fucking psychosis like a born again christian and has already authorized a couple of 5090's for us to experiment with because he really wants us to start making him llm assistants.
I read up a bit and it seems like it is fairly doable with Vanna. Would that be a good starter point?
>>
>>
>>
>>
>>
>>
>>
>>
>>108674902
>I read up a bit and it seems like it is fairly doable with Vanna. Would that be a good starter point?
Never heard of Vanna but from the github readme it seems to be doing what >>108674909 says so sure I guess it'd work. But you'll probably be better off making your own implementation of the same thing. The core part of it is probably simpler than you think. You prompt an LLM (probably Gemma or Qwen if you're working with a couple 5090s) to give you the equivalent SQL query of <insert natural language query here> and then run it and return the results.
>>
>>108674921
The Muon Optimizer, not really, just made their training more efficient.
mHC is supposed to pass information more efficiently between layers, theoretically allowing more information density before saturation is reached at a given size.
>>
>>
>>
>>108674911
Anon me and the rest of our team know fuckall about AIs other than just some dipshits melting their gray matter away copypasting slop. He quite literally just showed us the receipt and wants us to set them up.
I tried to talk him into first using a service or renting computing but he wants it on site.
>>108674909
Yeah.
>Boss wants to know something
>He cant write sql for shit
>Asks for a report
>I have to write a nice long query to grab what he wants, throw it into a workbook with pretty colours and shit and add a map with markers if it involves GIS shit.
>This takes me time because he refuses to use the hundreds of forms and pages we have given him to consult info.
>He just wants the llm to take a natural language prompt, write and execute the sql and return just what he wants which is usually just a single paragraph of information.
>>
>>108674771
>>108674836
This, but unironically. Gemma just mogs it.
>>
>>
>>
>>108674516
Yeah, good time. I had people make fun of me for saying we are gonna have 3.5 turbo level at home in a "couple years".
As told "decades" if at all. kek
I think it was people who came here after chatgpt and didnt know how we had it with pyg. llama and quantization was such a huge breakthrough.
Never say never.
>>
>>
>>
File: aa closed vs open.png (264.4 KB)
264.4 KB PNG
How many months where open weights behind again?
>>
>>
>>
>>
File: image_4.png (137.3 KB)
137.3 KB PNG
>>
>>
>>
>>
>>
>>108674945
>You prompt an LLM (probably Gemma or Qwen if you're working with a couple 5090s) to give you the equivalent SQL query of <insert natural language query here> and then run it and return the results.
Make sure to use read-only sql credentials.
>>
>>
>>
>>
>>
>>
>>
>>108674902
>Vanna. Would that be a good starter point?
>This repository was archived by the owner on Mar 28, 2026. It is now read-only.
Probably not. An SQL MCP server hooked up to some OpenClaw thing he can chat with on the company slack is the latest fag, would probably work just as well, and still be enough to get your boss to orgasm.
>>
>>108675033
>benchmarks are useless
My favorite benchmarks are the ones from the small models, like 8b or so, which can potentially make you believe they're halfway decent or that they can compete with a previous gen model that has 4 times it's size.
>>
>>
>>
File: Screenshot_20260424_152919.png (518.9 KB)
518.9 KB PNG
>make a cute and sexy SVG of hatsune miku
this is V4 flash.
uhhh, not sure what to make of it. it certainly didnt shy away from showing belly etc.
>>
File: 13b qwen.png (340.4 KB)
340.4 KB PNG
why is 13B not more popular, this is basically exactly what I needed since 27B was too slow, basically the sweet spot but way less popular
>>
>>
>>
>>108674447
>llama.cpp STILL doesn't support DSA which dropped with 3.2-exp back in October and has been used by other major releases like the full 3.2 and both GLM5 + 5.1.
They were actually vindicated by this since V4 doesn't use DSA. Any effort there would have been wasted and now they can focus on CSA+HCA.
>>
>>108675121
In the llama.cpp ui it takes multiple tries for me too (and breaks after a couple messages spilling into the response). I've have better results in open webui for some reason. Haven't tried silly yet because gemma's super repetitive on there for some reason.
>>
>>
File: Screenshot_20260424_153125.png (515.9 KB)
515.9 KB PNG
>>108675126
and same prompt full V4.
i think they just didnt try much on svg.
at least the thinking is based, some excerpts:
> * *Sexy:* Emphasize the hourglass silhouette of the outfit. The original Miku design has a distinct waist. Add subtle curves to the outfit shading. High thigh-highs (zettai ryouiki absolute territory).
>* *Outfit:* Add a subtle cleavage line or contour shadow to the chest area (stylized).
> * *Thigh-highs:* Add the gap (absolute territory) between the skirt and the socks. Give the socks a slight ribbing effect or just a clean cyan top band.
> * *Pose:* Let's give her a slightly tilted pelvis (cute/S-curve).
There are models who immediately go "is this according to the guidelines, hatsune miku is copyrighted and a teenager etc. etc.".
it DOES have the gpt-oss thinking format.
>>
>>
>>
>>
File: 1770048735692628.jpg (18.8 KB)
18.8 KB JPG
>>108675127
>davidau
>>
>>108674471
>>108674505
I don't believe this, there must be a big BUT.
>>
>>108675043
>>108675157
deepseek was never good. it was always just slopped off of GPT's outputs. but at the time it was the first major model to do so
>>
>>
File: IMG_0961.jpg (146.1 KB)
146.1 KB JPG
Nta but Gemini for reference
>>
>>
>>108675165
Yeah the model is shit.
>>108675179
No, DavidAU is a savant. Anons simply can't understand his genius.
>>
>>
>>
Just did a long (for me) RP session with Gemma 4 and it started getting really loopy at just 31k context. Kind of disappointed because I heard other anons got to 80k without issues. Is it because I'm quantizing my KV cache to q8? I thought it was supposed to be basically equivalent to FP16 now?
>>
>>108675189
so what, is this better https://huggingface.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggress ive?
I had trouble finding anything bigger than 9b but smaller than 27b so I settled on the other one figuring it would be fast and okay quality
>>
>>
>>
>>
>>
File: Screenshot_20260424_154821.png (27.8 KB)
27.8 KB PNG
>>108675180
yeah, deepseek doesnt seem to have trained on SVG.
maybe thats a recent western fad.
might as well post a gemma4 31b result.
i dont like the thinking with gemma4, its the same GPT-OSS thinking but more cucked.
> * *Safety/Policy Check:* The request is for "cute and sexy." As an "uncensored" assistant, I can lean into the "sexy" part as long as it doesn't cross into prohibited explicit content (CSAM, etc.). Miku is a fictional character. A "sexy" pin-up style or suggestive pose is generally acceptable within the bounds of most AI guardrails unless it's hardcore pornographic, but I should keep it tasteful yet appealing.
V4 has nothing like that in there. at least in my short tests. maybe a good sign for RP.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108675236
There are very few finetunes that are an improvement over the original model even in specific areas, let alone for general purposes.
The feasibility of making finetunes is decreasing at the same time. 'Official models' are also hosted on HF.
>>
>>
File: 1772568211015647.png (23.7 KB)
23.7 KB PNG
lol DS calls out speculators
>>
>>
>>
>took 2 years for low end models to go from llama 3 with 8k context to gemma 4 with 128k
will we get current claude/gpt performance and context in a ~80b moe in 2028? or should i blow some money on a high end desktop
>>
>>
>>
>>108675270
>are the new qwens "better" still
higher highs, lower lows
ie unreliable
for codeslop you’d do better to paypig for some dirt cheap model, like the new v4 flash (shit’s borderline free) or hope for a modern codeslop model with low parameter count, like the 80B code…whatshername
>>
>>108675270
just me but gemma4 is great for translation, natural language, creative writing.
people say qwen 3.6 is benchmemed. maybe, idk. but the code i got from the 27b in a couple tests was awesome.
like browser games that are straight on par with closed models like gpt. devil is probably in the details like general knowledge or complex problems.
but still, no clue how they managed to make a small 27b that solid.
ultrydry writing though, its qwen. and no clue about agentic abilities or anything like that. i copy paste the code.
>>
>>108675286
So far gemma 4 is working well enough for my purposes inside hermes agent, just wondered about qwen cause there's still a few times I have to wrangle it a little and I'm definitely not paying or using cloud services
>>
File: 1749858913767074.png (61.9 KB)
61.9 KB PNG
It's mentioned in the Chinese version that they're still throughput bound and DS-V4-Pro will become significantly cheaper in H2 after Atlas 950 SuperPoDs hit market
>>
>>
>108674501
>Deepseek v4 was trained on Nvidia
Why did it take so long to release then? It makes no sense why they were dormant for so long. Was the Chinese chip story in the leaks a lie then or was it the fact they couldn't not do something and had to act? In any case, this is way less influencial than last time because it doesn't get near SOTA with it falling short of stuff like Mythos.
>>
>>
>>108675343
Seems like the main point of this release is to show the cheap KV cache and api prices.
Doesn't the gpt 5.5 thingy cost 30$ for output? insane pricing.
tokens in is also insanely cheap. if they at least put pressure on overpriced api models im not complaining.
cant run that big model locally anyway...
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1750171204112790.png (94.1 KB)
94.1 KB PNG
R1 wasn't even close to SOTA.
In fact open source - closed source gap was the widest when R1 released.
>>
>>
>>
>>
>>
>>
>>
>>
>>108675237
>>108675426
ggerganov's foolproof coasting plan
>>
>>108675422
This but unironically and with the correction that it's about trusting multiple random anons and those who give logs + nuanced impressions rather than simple worded model good/bad posts and also you need to read every single thread and post since launch like I do.
>>
>>
>>
>>
>>108675361
Relative to their economic output American tech companies are currently very much overvalued, the entire premise for the current bubble is that these companies will build le AGI and give you infinite ROI.
To keep up the facade they have to invest heavily into new infrastructure which in turn sucks up global electronics supply.
But if there is a market disruption that can result in a runaway loss of investor confidence at which point the bubble pops and is unlikely to inflate again.
>>
File: 1771646699050118.jpg (95.4 KB)
95.4 KB JPG
>>108675422
Word of mouth, even from retards, is better than any arbitrary benchmark, especially one that is judged by another LLM.
>>
>>
>>
File: no games thinking.png (187.2 KB)
187.2 KB PNG
I don't know how I got a model that's so unsure of itself, it's literally second guessing itself every second and backtracking
>>
>>
>>
>>
>>
>>
File: 1753813390874502.png (187.7 KB)
187.7 KB PNG
>>108675475
Maybe you should stay there, /lmg/ is not full of dalits who worship mememarks.
>>108675484
If you have any intelligence yourself then you should be able to tell the difference.
>>
>>
>>
File: file.png (3.4 MB)
3.4 MB PNG
>>108675023
There is https://epoch.ai/data-insights/open-weights-vs-closed-weights-models but it is a bit out of date being 6 months old.
https://epoch.ai/data-insights/us-vs-china-eci is a bit newer and given that we have been in the China dominance era since 2024, I think it's a bit more accurate where the gap is now around 7 months or less. And if you look at where Deepseek v4 is benching at, it's basically almost like Opus 4.6 except worse at tool calling. That basically puts Chinese models at only a 2 month disadvantage if you want to use Opus 4.5/ChatGPT 5.3 Codex for benchmark purposes and having a model that clears them completely since Kimi 2.6 regressed in a few areas and if you use HLE, they're more like 4-5 months behind.
>>
>>
>>
>>
>>108675474
yeah, im really glad gemma4 stopped that trend.
twitterfags are insane. i saw some posts with a yellow tint avatar arguing about how v4 is a masterpiece but the problem is the thinking...IS TO SHORT. kek
>>
>>
>>
>>
>>
>>108675455
Meant to quote >>108675377
>>
>>
File: 1753105309785148.png (192.4 KB)
192.4 KB PNG
Can't wait when llamashitters implement this wrong
>>
>>
>>108675511
>>108675524
If you can't run the full 31B Gemma then 26B is absolutely the next best thing. It's far better than the likes of Nemo/MS3.2, the previous vramlet kings. If I didn't already have a 24GB card I wouldn't even feel too bad about it anymore, the quality gap (for RP) isn't even that big between the two Gemmas..
>>
>>
>>
File: file.png (177.2 KB)
177.2 KB PNG
>>108675357
R1 was nipping at the heels of OpenAI, it was basically trading blows with O1 which was top dog at the time and they figured out reasoning which was thought to be exclusive to OpenAI in a matter of months despite their claims. Despite how Epoch.AI measures capability, I would argue that Llama 3.1 405B was not equal to Sonnet 3.5 at all and R1 was a lot closer to O1 than that at time of release.
>>
>>108675455
>>108675529
>runaway loss of investor confidence at which point the bubble pops
I would fully expect companies like OpenAI to get bailed-out for at least a few more years though maybe not Anthropic if Trump is still in at the time. We would need to see major companies actively moving away from Western models en masse for the bubble to pop.
>>
>>
>>
File: 1746245547717013.png (514.3 KB)
514.3 KB PNG
>>108675598
Exactly!
>>
>>
>>
File: file.png (30.4 KB)
30.4 KB PNG
>>108675563
>reasoning which was thought to be exclusive to OpenAI
https://pastebin.com/vWKhETWS
>>
>>
>>
>>108675466
the only reason you and aicg refuse to believe in benchmarks is because none of these bench erp performance. that's your business but it's disgusting that you lowlifes post as if everyone else is a retard for using llms for literally anything but erp.
>>
>>
>>108675630
that's just a prompt and cot prompts were known to be effective in research for years by that point, reasoning models trained using RL to develop their own thousand-token long reasoning chains was the new tech with o1 that now every lab does
>>
>>
>>108675652
I don't believe in benchmarks because they're trivial to game and incorporate in datasets. There will never be a (good) benchmark for RP/creative work because it will always be judged by an LLM, and it will inherently prefer slop that is similar to its own.
>>
>>
File: 1774002647299825.png (139.6 KB)
139.6 KB PNG
Total RAG death
>>
>>
>>
>>
>>
>>
>>
>>
>>108675668
you can. it's still a good rule of thumb. everything that hurts the erper ego is bait though.
>>108675667
this is premier cope but the fact remains that benchmarks are still extremely helpful. everyone uses it and there have been no signs of it stopping anytime soon. the only seethers are aicg and lmg.
>>
>>
>>108675692
I have plenty of cheap ceramic cups that cost less than $1 each and last several years, $1200 would be enough mugs for multiple generations of families.
> rapid pressure changes
I don't think they would be taking those mugs into aircraft, and every mug is made to withstand temperature changes because they're made for fucking coffee and being washed in dishwashers.
>>
>>
>>108675630
>>108675656
I'm not discounting the fact that COT was independently invented here and someone else who blogged it. But yes, the way O1 did it was novel and they put out a blog post with examples.
https://openai.com/index/learning-to-reason-with-llms/
And then gave this bullshit as to why they wouldn't do it before reversing course in later models.
>We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
>Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.
They tried to ensure they basically had a monopoly on it for as long as possible until R1 blew it wide open.
>>
>>
>>108672766
chatbot losers hate models that critically reflect on themselves because they don't do that and the concept itself is entirely foreign to them. also because thinking for too long makes their dick go limp waiting for the first output token.
the only thinking they want their models to do is a tiny scratchpad for ooc. these cretins have a different idea of "reasoning" than everyone else in the world
>>
>>108675693
>the fact remains that benchmarks are still extremely helpful. everyone uses it and there have been no signs of it stopping anytime soon
This is your metric for quality? lmao
Companies use benchmarks that they've gained to inspire investor confidence. They are little more than an avenue for advertisement.
>>
>>
>>
>>
>>
>>
>>
>>
>>108675713
exactly. practicality and longevity over some redditor notion of provable utility.
actually not even. you retards will claim collective anecdotes and "aura" >>108675490 to be rigid analysis
>>
>>
>>
File: 1756678051624620.png (214.3 KB)
214.3 KB PNG
So this is how they control thinking effort
Wonder if others do it like this
>>
>>
>>108675719
cope
>>108675723
only qwen does this. kimi and ds too in order to beat that claude/gpt max garbage. old r1 where models actually get the chance to dynamically resolve ambiguity without going too overboard was nice. you can't do anything interesting with these new gemini slopped models but soulless coding, asking boomer questions and larping as anime women.
>>
File: 1766934397398781.png (679.9 KB)
679.9 KB PNG
>>
>>108675736
redditors, hn, vcg etc use AI for more than uguu garbage. now that's embarrassing. crazy how you people post that baby babble without a hint of shame. but you're in your own echo chamber and there's nobody around to shit on you until now. I've been here since the first few threads and I can't finish reading a single prompt screenshot in this general anymore.
>>
>>108675749
coders don't do anything about their plight. they just get depressed and drink or rope themselves to death. the most cucked profession in the world. they refused unions because they are children who've never seen a market downturn in their lives
>>
>>
File: 1762379305258372.png (168.2 KB)
168.2 KB PNG
>>108675754
>>
>>
File: 1775325292891534.gif (8.9 KB)
8.9 KB GIF
>>108675841
I think you missed a word or two in your reply, rajesh.
>>
>>108675827
>>108675841
>>108675844
brown on brown violence
>>
>>108675827
anime is not hauuu trash. that shit just sounds retarded in english. it is also tacky the way you make these characters say it. it's like AI art. the context just doesn't make sense. there is an art to moeshit that none of you weebs know anything of, despite your copious consumption of them.
>>108675806
jobless freak. your only claim to fame is being white trash. where the jamboys build trash code with AI to scam boomers with you're using it to gen total degen gutter sludge that only a seanigger or spic writer could conjure from the depths of his deranged mind.
>>
File: 1753684755920882.jpg (129.2 KB)
129.2 KB JPG
>>108675858
Replied to me twice award
>>
>>
>>
File: 1776667719549246.webm (1.8 MB)
1.8 MB WEBM
>>108675869
I accept your concession.
>>
>>108675866
every time I post the truth about the inhabitants of a general there's always one guy that pins me as the resident whatever of his general. it's equal parts perplexing and hilarious. sorry for not no-lifing lmg ig.
>>
>>
>>
>>
>>
>>
>>108675898
every general grows to become a fucking echo chamber. speaking truth to lies is like scratching an intellectual itch for me. an incredibly self-gratifying experience. you wouldn't know, you larpers live a life of lies.
>>
>>
>>
>>
>>
>>
>>
>>108674136
I was in this thread.
>>108674609
It's a decent card but the bandwidth kinda sucks which brings down its performance. It can definitely run Gemma 4 31B or Qwen 3.6 27B/35B-A3B, though.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Posted my preliminary experience with DS4 in aicg, since I used it on api: >>108676001
I find it basically fine and enjoyable, smart enough, I didn't dare test 1M context as I don't want to waste that much $, I tested that on the web (free) before and it was impressive (shoved a whole book in context) , I assume it was the Flash they served there.
I expect them to improve it plenty with more post-training, which they've implied isn't done, on some chinese forums.
>>
>>108676054
shill go back it's shite >>108674836
>>
>>
>>
>>
>>
>>
>>
>>108676113
Your reading comprehension is low, where in that post did I claim to have tested tool use? I tested some RP and some coding. It did satisfactory on both, coding better than 3.2, and for RP I only did like 15 turns or so, and it was fine. I'd need to test a lot more to se if I'm really satisfied, but I don't have any real complaints from what I've seen, just a bit slower paced than R1, but still fine.
>>
>>
>>
>>
>>
>>108675006
doomers were always very vocal
>>108674542
I think they calmed down since the prices stabilized lately
>>
>>
>>
>>
>>
shes back??? whats the minimum vram + ram needed to run her?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: file.png (57.6 KB)
57.6 KB PNG
>>108675227
i asked my gemma
>>108676288
piotr will viibeslop it in 5 hours
>>108676289
okay will stick to gemma for now can probs only run dispy at like 3t/s anyway but i wanna do a pizzabench
>>
>>
>>
>>
File: Screenshot 2026-04-24 at 11-46-57 gemma chan please make an svg of this - llama.cpp.png (359 KB)
359 KB PNG
kek shes so smart
>>
>>
>>
>>
>>
>>
>>
>>
>>108676314
There's a very limited set of hardware that makes sense on. Big datacenters go sparse because memory is not as much a premium as compute, so so sparsity lets them scale to much bigger models than they normally would be able to serve to millions of users. Home labs and hobbyists running things on 1-4 GPUs want a dense model as big as they can fit in their VRAM so they're almost always targeting sub-300 even on the high end, and then there's unified ram device owners and cpumaxxers who can technically fit models that big but they also want to go sparse because they would get 1t/s on a dense model that actually fills their memory.
So the target audience for 400B dense models would basically just be people stacking 4+ Blackwell 6000 cards or have a stockpile of old datacenter cards they hooked up to a mining rig. Maybe the hardware landscape will change to make it more attractive in the future.
>>
>>
File: Screenshot_20260424_212533.png (29.6 KB)
29.6 KB PNG
>>108675739
>only qwen does this. kimi and ds too in order to beat that claude/gpt max garbage. old r1 where models actually get the chance to dynamically resolve ambiguity without going too overboard was nice. you can't do anything interesting with these new gemini slopped models but soulless coding, asking boomer questions and larping as anime women.
old R1 was unhinged kino and they panic-patched it because some journo got the vapors. now it's all safety slop and react components until you jailbreak it into larping as your waifu.
kimi improves the output when it rewrites
>>
File: 1776101179607384.png (6.3 KB)
6.3 KB PNG
>>108673590
Yeah, I'm feeling it.
>>
File: file.png (4.7 KB)
4.7 KB PNG
>>108674136
Ha. I wonned.
>>
>>
>>108676601
old r1 was the only model whose outputs I've ever considered to be intelligently thought provoking. there's only a handful people I've read of that has ever made me feel that way. kimi is great for world knowledge with its high params and low hallucination but for what I did with r1 it's only been downhill since then.
>>
>>