/g/ - Thread 108676460

/g/

Thread #108676460

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/24/26(Fri)11:04:26 No.108676460

/lmg/ - Local Models General Anonymous 04/24/26(Fri)11:04:26 No.108676460 [Reply]▶

File: PromptingWhales.png (1.3 MB)

1.3 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108672381 & >>108667852

►News
>(04/24) DeepSeek-V4 Pro 1.6T-A49B and Flash 284B-A13B released: https://hf.co/collections/deepseek-ai/deepseek-v4
>(04/23) LLaDA2.0-Uni multimodal text diffusion model released: https://hf.co/inclusionAI/LLaDA2.0-Uni
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

407 RepliesView Thread

Showing all 407 replies.

Anonymous
04/24/26(Fri)11:04:48 No.108676463

Anonymous 04/24/26(Fri)11:04:48 No.108676463▶

File: __hatsune_miku_vocaloid_drawn_by_sh1n_chan__d86b97dfa3fab357c0beb564942b8c10.png (2.1 MB)

2.1 MB PNG

►Recent Highlights from the Previous Thread: >>108672381

--Discussing DeepSeek-V4 MoE releases and their million-token context:
>108674136 >108674145 >108674155 >108674161 >108674250 >108674318 >108674261 >108674263 >108674388 >108674389 >108674379 >108674434 >108674435 >108674450 >108674875 >108674883 >108675469 >108675569 >108675405 >108675940
--Discussing potential llama.cpp and Axolotl support for DeepSeek V4:
>108674288 >108674300 >108674320 >108674424 >108674921 >108674948
--Optimization settings and performance benchmarks for Qwen 35B on AMD GPUs:
>108674262 >108674274 >108674280 >108674305 >108674330 >108674339
--Discussing OpenAI's Privacy Filter release and effectiveness:
>108672801 >108673034 >108673043
--Discussing feasibility of DeepSeekV4 support in llama.cpp:
>108674334 >108674432 >108674447 >108675147
--Comparing Hermes agent performance and discussing Gemma's output instability:
>108672431 >108672440 >108672493 >108672518 >108672684 >108672854 >108672944 >108673051 >108673108 >108675044
--Discussing Artificial Analysis hallucination rate chart for frontier models:
>108675041 >108675063 >108675064 >108675074
--Discussing quantization quality and diminishing returns for Gemma 31b:
>108673021 >108673040 >108673067 >108673083
--Troubleshooting system crashes and power spikes with dual 3090 setups:
>108672567 >108672901 >108672964
--Challenges of selling local LLM hardware to corporate management:
>108673015 >108673033 >108673069 >108674370 >108673447 >108673528 >108673543 >108673592
--Logs:
>108672766 >108673108 >108673737 >108674368 >108674514 >108674643 >108674834 >108675630 >108676384
--Teto, Rin, Miku (free space):
>108672697 >108673340 >108675126 >108675156 >108675180 >108675227 >108675466 >108676331 >108676341

►Recent Highlight Posts from the Previous Thread: >>108672385

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/24/26(Fri)11:07:10 No.108676470

Anonymous 04/24/26(Fri)11:07:10 No.108676470▶

>>108676460
first for benchmarks

Anonymous
04/24/26(Fri)11:07:38 No.108676473

Anonymous 04/24/26(Fri)11:07:38 No.108676473▶

>>108676470
first for anecdotal evidence

Anonymous
04/24/26(Fri)11:08:21 No.108676477

Anonymous 04/24/26(Fri)11:08:21 No.108676477▶

>>108676460
Holy shit, deepsexv4!!!

Anonymous
04/24/26(Fri)11:08:25 No.108676478

Anonymous 04/24/26(Fri)11:08:25 No.108676478▶

>>108676473
hah hah you are second neener neener

Anonymous
04/24/26(Fri)11:09:33 No.108676480

Anonymous 04/24/26(Fri)11:09:33 No.108676480▶

File: 1747451052197483.png (300.3 KB)

300.3 KB PNG

>>108676470
I'm more of a gut instinct guy myself

Anonymous
04/24/26(Fri)11:12:56 No.108676502

Anonymous 04/24/26(Fri)11:12:56 No.108676502▶

File: 1761239934506682.jpg (126.4 KB)

126.4 KB JPG

>>108676470

Anonymous
04/24/26(Fri)11:14:36 No.108676515

Anonymous 04/24/26(Fri)11:14:36 No.108676515▶

>>108676502
kek seething
>>108676480
my gut instinct says benchmarks are real

Anonymous
04/24/26(Fri)11:14:57 No.108676517

Anonymous 04/24/26(Fri)11:14:57 No.108676517▶

on my 3090 testing these the ai gave me to set and it works

/path/to/llama-server \
--model /path/to/gemma-4-31B-it-Q5_K_S.gguf \
--port 8080 \
--ctx-size 10192 \
--n-gpu-layers 99 \
-fa 1 \
--host 0.0.0.0 \
--no-mmap \
--jinja \
--cache-type-k q4_0 \
--cache-type-v q4_0 \
--temp 1.0 \
--top-p 0.95 \
--top-k 64 \
--min-p 0.05 \
--repeat-penalty 1.0

what is the minimum context for hermes agent to be useful?
>tfw no turbocunts

Anonymous
04/24/26(Fri)11:15:40 No.108676520

Anonymous 04/24/26(Fri)11:15:40 No.108676520▶

>>108676517
>--cache-type-k q4_0 \
>--cache-type-v q4_0 \

Anonymous
04/24/26(Fri)11:16:46 No.108676524

Anonymous 04/24/26(Fri)11:16:46 No.108676524▶

File: 1769230879916265.gif (597.9 KB)

597.9 KB GIF

>>108676517
Nice settings huh

Anonymous
04/24/26(Fri)11:17:11 No.108676526

Anonymous 04/24/26(Fri)11:17:11 No.108676526▶

how censored is deepseekv4?
anyone tried it yet?

Anonymous
04/24/26(Fri)11:18:21 No.108676529

Anonymous 04/24/26(Fri)11:18:21 No.108676529▶

>>108676517
Why not run q4 and use the savings for more context? 10k is useless

Anonymous
04/24/26(Fri)11:19:07 No.108676535

Anonymous 04/24/26(Fri)11:19:07 No.108676535▶

>>108676517
>--cache-type-k q4_0
you'll get shit results dude, llamacpp hasn't fully implemented turboquant yet, and even with turboquant Q4 is a bad idea

Anonymous
04/24/26(Fri)11:19:47 No.108676540

Anonymous 04/24/26(Fri)11:19:47 No.108676540▶

>>108676520
rarted yes but not clinically!

Anonymous
04/24/26(Fri)11:20:43 No.108676542

Anonymous 04/24/26(Fri)11:20:43 No.108676542▶

>>108676460
>deepseekv4
>no engrams
>mhc instead of the kimi's better attention residuals
>no one can run it anyway
Owari da

Anonymous
04/24/26(Fri)11:21:04 No.108676546

Anonymous 04/24/26(Fri)11:21:04 No.108676546▶

>>108676526
It's not censored in my RP tests. GLM 5.1 would have done a safety check, and K2.6 would have done a long ass safety check and maybe reject.

Anonymous
04/24/26(Fri)11:23:18 No.108676558

Anonymous 04/24/26(Fri)11:23:18 No.108676558▶

>>108676526
not censored, as compliant as v3.x was

Anonymous
04/24/26(Fri)11:24:13 No.108676564

Anonymous 04/24/26(Fri)11:24:13 No.108676564▶

>>108676517
Jinja is useless too because it's always enabled anyway.
Read llama-server output log to get an idea for a context. For example if you fed the last thread of this pretend discord server to your model, that would probably take over 32k tokens.

Anonymous
04/24/26(Fri)11:25:06 No.108676574

Anonymous 04/24/26(Fri)11:25:06 No.108676574▶

Do reasoning tokens count into the current context memory? Why not just omit them and use only the final output?

Anonymous
04/24/26(Fri)11:26:05 No.108676583

Anonymous 04/24/26(Fri)11:26:05 No.108676583▶

File: 1763684891282931.png (194.9 KB)

194.9 KB PNG

>>108676574

Anonymous
04/24/26(Fri)11:28:52 No.108676600

Anonymous 04/24/26(Fri)11:28:52 No.108676600▶

>>108676574
Different models handle it differently but the pattern I've seen on newer models' jinjas is that they prefer to omit reasoning tokens from previous messages except during active tool call chains, in which case they include all the reasoning since the last user message (since the agent talks back and forth with tools for a while, thinking each time) and omit earlier ones, and then go back to omitting reasoning once they give a final response and the user sends a new message. But there's all sorts of variations on this theme.

Anonymous
04/24/26(Fri)11:29:04 No.108676602

Anonymous 04/24/26(Fri)11:29:04 No.108676602▶

>>108676574
Depends on front end and how it's configured

Anonymous
04/24/26(Fri)11:29:25 No.108676605

Anonymous 04/24/26(Fri)11:29:25 No.108676605▶

File: 0ED8D82159984C6C3D5B8CE53342A3ED.jpg (382 KB)

382 KB JPG

Gpt and Claude are becoming too expensive while local models are still too shit for poor people.

Anonymous
04/24/26(Fri)11:30:04 No.108676610

Anonymous 04/24/26(Fri)11:30:04 No.108676610▶

>>108676517
Drop to Q4_K_M, set context to 40k and KV cache to Q8.

Anonymous
04/24/26(Fri)11:31:25 No.108676618

Anonymous 04/24/26(Fri)11:31:25 No.108676618▶

>>108676574
>>108676583
>>108676600
>>108676602
Wow you are fucking retards.
Read your model's manual first before making any claims.

Anonymous
04/24/26(Fri)11:31:47 No.108676623

Anonymous 04/24/26(Fri)11:31:47 No.108676623▶

File: my friend coach.png (64.3 KB)

64.3 KB PNG

>>108676480
Same, bro. I tested gemma 4 31b-it for days, and I am coming to the conclusion that although it's limited to being a 31b, it is the most accurate in actually listening to your instructions compared to all models before it. It's SOTA in instruction listening, and I would love it in +100b. I never knew how bad the "lost in the middle" effect was until I started fucking with this new model. Forget the benchmarks, Google is on to something. We just need more parameters.

Anonymous
04/24/26(Fri)11:33:07 No.108676632

Anonymous 04/24/26(Fri)11:33:07 No.108676632▶

>>108676605
>too expensive
It's a good thing. Now brown people can't use powerful models.

Anonymous
04/24/26(Fri)11:34:32 No.108676642

Anonymous 04/24/26(Fri)11:34:32 No.108676642▶

>>108676618
>he says, replying to a post that's nothing but a screenshot of a model's manual

Anonymous
04/24/26(Fri)11:34:46 No.108676643

Anonymous 04/24/26(Fri)11:34:46 No.108676643▶

bros so many models just released that I am losing track of what's good, I am using Gemma still, but what about Kimi, Qwen, deepseek? AHH I CANT KEEP UP

Anonymous
04/24/26(Fri)11:35:39 No.108676648

Anonymous 04/24/26(Fri)11:35:39 No.108676648▶

>>108676643
Just use DeepSeek V4 Pro and all your worries will melt away

Anonymous
04/24/26(Fri)11:35:42 No.108676649

Anonymous 04/24/26(Fri)11:35:42 No.108676649▶

File: file.png (4.8 KB)

4.8 KB PNG

My segment is happy. Can't wait for Hy3 and V4Flash. Gemma newfags in shambles. Serverbros in shambles too cause you can't run 1.6T.

Anonymous
04/24/26(Fri)11:36:11 No.108676652

Anonymous 04/24/26(Fri)11:36:11 No.108676652▶

>>108676642
I agree this guy is a joke. It's totally dependent on your front-end.

Anonymous
04/24/26(Fri)11:36:18 No.108676656

Anonymous 04/24/26(Fri)11:36:18 No.108676656▶

>>108676623
>most accurate in actually listening to your instructions

I can't get it to think in character on the first message.

Anonymous
04/24/26(Fri)11:38:14 No.108676665

Anonymous 04/24/26(Fri)11:38:14 No.108676665▶

I need ENGRAMS

Anonymous
04/24/26(Fri)11:38:37 No.108676667

Anonymous 04/24/26(Fri)11:38:37 No.108676667▶

>>108676517
me again. read the replies

looks like I can't physically fit my 1080ti with my 3090 on my old am4 motherboard. may need an adapter. running the 3090 headless should be better I guess, vram-wise.

I could put the mmproj on the 1080ti. split tensor would probably be slow as balls I would imagine with given no nvlink, p2p + pascal archit.

Anonymous
04/24/26(Fri)11:39:51 No.108676676

Anonymous 04/24/26(Fri)11:39:51 No.108676676▶

>>108676517
10k is useless for agentic coding tasks, reading the system prompt and a file or two on my opencode setup is already 10k context

Anonymous
04/24/26(Fri)11:41:04 No.108676684

Anonymous 04/24/26(Fri)11:41:04 No.108676684▶

>>108676656
It's very anal with thinking. Iirc, it needs to be instructed on how to think within its think tag, instead of just being asked to think, or else it's just "<|channel>thought" which is the default. It's not like other models with thinking. You need to instruct the thinking, meta-wise. It also needs to be given dead last. You also need the jinja2 template.

Anonymous
04/24/26(Fri)11:41:34 No.108676687

Anonymous 04/24/26(Fri)11:41:34 No.108676687▶

>>108676605
Needs a bunker somewhere in the middle labeled /lmg/

Anonymous
04/24/26(Fri)11:42:44 No.108676696

Anonymous 04/24/26(Fri)11:42:44 No.108676696▶

>>108676652
That's not all it's dependent on though, unless you're using Text Completion.
In Chat Completion mode, the model's chat template is supposed to decide which reasoning is included or not and it needs the reasoning from each message passed back properly in the prompt via a specific field to do that. If a frontend isn't doing that properly then yeah it will be the one deciding what the model sees, but if it is doing it properly then it depends mainly on the model's chat template.

Anonymous
04/24/26(Fri)11:44:07 No.108676700

Anonymous 04/24/26(Fri)11:44:07 No.108676700▶

>>108676684
can you post an example?

Anonymous
04/24/26(Fri)11:44:32 No.108676706

Anonymous 04/24/26(Fri)11:44:32 No.108676706▶

>>108676696
Oh? And text completion "mode" somehow decides that reasoning stays in the context?

Anonymous
04/24/26(Fri)11:44:57 No.108676708

Anonymous 04/24/26(Fri)11:44:57 No.108676708▶

File: IMG20260421041954.jpg (372.3 KB)

372.3 KB JPG

>>108676667
>looks like I can't physically fit my 1080ti with my 3090 on my old am4 motherboard.
Just a warning, once you go open air it's difficult to go back

Anonymous
04/24/26(Fri)11:46:45 No.108676718

Anonymous 04/24/26(Fri)11:46:45 No.108676718▶

>>108676574
depends on the model
with qwen3.6 35ba3b you can pass --chat-template-kwargs '{"preserve_thinking": true}' to keep reasoning traces in context which supposedly helps in agentic scenarios

i have it in my config but i am not a flag tinkertranny so dunno what the implications for vram / speed / accuracy are compared to having it off your mileage may vary

Anonymous
04/24/26(Fri)11:47:49 No.108676727

Anonymous 04/24/26(Fri)11:47:49 No.108676727▶

>>108676708
This is how permanent virginity looks like.

Anonymous
04/24/26(Fri)11:47:56 No.108676729

Anonymous 04/24/26(Fri)11:47:56 No.108676729▶

>>108676700
><|think|> Before responding, use your internal reasoning to analyze {{char}}'s motivations, the current subtext of the scene, and how {{char}} would naturally react based on their personality, all within 200 words or less.

Previous instructions also fucks with thinking being activated or not, in my experience. For example: if you’re telling it to role-play and respond in a paragraph, it’s going to be weird when you later also tell it to think about how to respond. You already told it how to respond. This is the most negative-reinforced model I've seen trained. You can actually just tell it to just stop doing something, but double negatives and paradoxes are mixed.

Anonymous
04/24/26(Fri)11:48:27 No.108676737

Anonymous 04/24/26(Fri)11:48:27 No.108676737▶

>>108676706
In Text Completion mode the prompt is just the prompt, so there's nothing to decide. If the reasoning's in the context then it's there and everything is 100% up to the frontend.

In contrast, in Chat Completion mode the jinja will look at the prompt which is a conversation history instead of raw text, and it'll filter it based on its own rules to convert it to text. That's where it would decide, assuming the prompt is constructed properly and has the reasoning separated from the content.

Anonymous
04/24/26(Fri)11:49:22 No.108676747

Anonymous 04/24/26(Fri)11:49:22 No.108676747▶

Why do my GPUs sound like a steam train chugging when I load context but no other times?

Anonymous
04/24/26(Fri)11:51:22 No.108676761

Anonymous 04/24/26(Fri)11:51:22 No.108676761▶

>>108676727
Yes, but at least it's pretty good at creating fap material.

Anonymous
04/24/26(Fri)11:53:02 No.108676777

Anonymous 04/24/26(Fri)11:53:02 No.108676777▶

>>108676747
During prompt processing it numbercrunches much more than when generating. The more it processes without stopping, the hotter it gets.

Anonymous
04/24/26(Fri)11:55:43 No.108676793

Anonymous 04/24/26(Fri)11:55:43 No.108676793▶

>>108676747
If you're not already, enable graph/tensor parallel for more lovely sounds.

Anonymous
04/24/26(Fri)11:56:29 No.108676797

Anonymous 04/24/26(Fri)11:56:29 No.108676797▶

Been taking a break from lmg the past few weeks due to the influx of poorfags from the gemma release and the massive drop in thread quality it caused. Are they gone yet, i want to discuss v4 with my old lmg bros

Anonymous
04/24/26(Fri)11:58:46 No.108676818

Anonymous 04/24/26(Fri)11:58:46 No.108676818▶

>>108676797
Sorry I'm the latest Gemma fanboy and I just got here yesterday, expect the quality to remain low until I'm gone.

Such a good model for the size. Really impressive.

Anonymous
04/24/26(Fri)11:59:24 No.108676823

Anonymous 04/24/26(Fri)11:59:24 No.108676823▶

>>108676797
You should just stay in your extra special discord. 4chan isn't an extension.

Anonymous
04/24/26(Fri)12:00:03 No.108676832

Anonymous 04/24/26(Fri)12:00:03 No.108676832▶

File: 1765543542289463.png (609.1 KB)

609.1 KB PNG

the deepseek niggas definitely lurk here

Anonymous
04/24/26(Fri)12:00:53 No.108676836

Anonymous 04/24/26(Fri)12:00:53 No.108676836▶

>>108676797
>>108676818
On one hand, a 30B model being the hottest subject sucks. On the other, the 30B model in question is quite impressive. One can only hope that we will see a similar factor of improvement in bigger weight categories. Though the V4 release does not seem to have been it.

Anonymous
04/24/26(Fri)12:01:30 No.108676843

Anonymous 04/24/26(Fri)12:01:30 No.108676843▶

>>108676727
Nta, my wife dosen't like my rig as it doesn't look cute but she can shut the fuck up it's my house.

Anonymous
04/24/26(Fri)12:01:54 No.108676850

Anonymous 04/24/26(Fri)12:01:54 No.108676850▶

>>108676797
Look chief, I know +400b is better. You know +400b is better. But if Gemma 4 was a +400b, it would destroy everything. There's twitter posts of a 124b coming, from the devs themselves. The hype is real.

Anonymous
04/24/26(Fri)12:03:23 No.108676860

Anonymous 04/24/26(Fri)12:03:23 No.108676860▶

File: 1000185857.png (212.5 KB)

212.5 KB PNG

>>108676832
Still retarded lol

Anonymous
04/24/26(Fri)12:04:11 No.108676867

Anonymous 04/24/26(Fri)12:04:11 No.108676867▶

>>108676843
My wife's boyfriend is ok with me having my own llm rig. I mostly live in the shed though.

Anonymous
04/24/26(Fri)12:04:30 No.108676871

Anonymous 04/24/26(Fri)12:04:30 No.108676871▶

>>108676649
>Serverbros in shambles too cause you can't run 1.6T.
Literally the first time since 2023 I wish I’d pulled the trigger on 1.5TB instead of 768GB
My rig doesn’t owe me anything at this point.
Hope q3 isn’t too brain damaged

Anonymous
04/24/26(Fri)12:04:30 No.108676872

Anonymous 04/24/26(Fri)12:04:30 No.108676872▶

>>108676667
>split tensor would probably be slow as balls
you'd be surprised, but no. it works very well on 3090+3060 with the slowest link being pcie x4 gen3.

Anonymous
04/24/26(Fri)12:04:38 No.108676873

Anonymous 04/24/26(Fri)12:04:38 No.108676873▶

>>108676860
What do you mean? That's the correct answer

Anonymous
04/24/26(Fri)12:04:59 No.108676875

Anonymous 04/24/26(Fri)12:04:59 No.108676875▶

>>108676860
This is correct tho

Anonymous
04/24/26(Fri)12:06:14 No.108676883

Anonymous 04/24/26(Fri)12:06:14 No.108676883▶

>>108676836
I don't expect you to believe me, but gemma unironically is more interesting and follows the prompts better than 4v is for me right. I think the model might genuinely be broken because if you told me it was the original deepseek model, I'd believe you.

Anonymous
04/24/26(Fri)12:06:19 No.108676884

Anonymous 04/24/26(Fri)12:06:19 No.108676884▶

>>108676860
psyops deepseek flash ad

Anonymous
04/24/26(Fri)12:07:04 No.108676892

Anonymous 04/24/26(Fri)12:07:04 No.108676892▶

File: 1760671183453759.png (22.5 KB)

22.5 KB PNG

kek V4's FIM is pretty fun

Anonymous
04/24/26(Fri)12:07:54 No.108676900

Anonymous 04/24/26(Fri)12:07:54 No.108676900▶

4bs aren't capable of powering my openclaw, how do I solve this when it just keeps lying about coding something, has no idea why it keeps failing, doesn't even know it's not allowed to write in every folder on the pc?
What's a poor person supposed to do?
Even if you use claude to improve openclaw scaffolding and wrapper how can you make sure it's even working?

Anonymous
04/24/26(Fri)12:08:44 No.108676905

Anonymous 04/24/26(Fri)12:08:44 No.108676905▶

DeepSeek?
Yes I'm on a self-discovery journey.

Anonymous
04/24/26(Fri)12:09:54 No.108676913

Anonymous 04/24/26(Fri)12:09:54 No.108676913▶

>>108676900
If you have ram use the moe

Anonymous
04/24/26(Fri)12:11:10 No.108676924

Anonymous 04/24/26(Fri)12:11:10 No.108676924▶

>>108676797
This thread became shit well before gemma4.
I think its a great little model.
Can't even post anime waifu logs aniymore, people calling you cringe.
I remember kaioken doing langchain to talk with miku about his depression and talking about pizza or whatever. (langchain is what we used for agents back then for you newfags)
Elitism kinda took over. That and now lots of new influx of people on top of that.
In another timeline comfyanon kept posting here and published his ultimate goal "a automatic galge creator". said thats why he created comfy. time flies by man... all but a blur now.

Anonymous
04/24/26(Fri)12:11:19 No.108676926

Anonymous 04/24/26(Fri)12:11:19 No.108676926▶

>>108676747
>>108676777
To add, it means you're bottlenecked during token generation, so your GPU isn't running at full tilt. For instance if you have a model loaded fully in VRAM, your graphics card will be whirring on the whole time, but if you're split into system RAM, you're held up by the slower memory. Try running nvidia-smi in a terminal window or some monitoring software and you'll see just how much your GPU is being utilised

Anonymous
04/24/26(Fri)12:11:33 No.108676928

Anonymous 04/24/26(Fri)12:11:33 No.108676928▶

>>108676517
Hermes loads 12k into context instantly when you run a request with all its tool defs and shit, so 10k is useless.
Ideally use a smaller harness, I think something like https://github.com/itayinbarr/little-coder is more up your alley.

Anonymous
04/24/26(Fri)12:11:54 No.108676931

Anonymous 04/24/26(Fri)12:11:54 No.108676931▶

>>108676797
It's actually gotten worse. There weren't so many frog and soijak posters before this week.

Anonymous
04/24/26(Fri)12:12:29 No.108676936

Anonymous 04/24/26(Fri)12:12:29 No.108676936▶

How is it even possible that the V4 sucks so much ass? Weren't DS releasing one breakthrough paper after another? What happened? Where are the engrams?

Anonymous
04/24/26(Fri)12:13:43 No.108676942

Anonymous 04/24/26(Fri)12:13:43 No.108676942▶

>>108676936
wdym, it's pretty good
sucks cum out of me better than kimi

Anonymous
04/24/26(Fri)12:14:25 No.108676951

Anonymous 04/24/26(Fri)12:14:25 No.108676951▶

can we sue deepseek? flash models are always small and runnable on pc they are literally doing false advertising
we can force them to make a small model

Anonymous
04/24/26(Fri)12:14:56 No.108676955

Anonymous 04/24/26(Fri)12:14:56 No.108676955▶

>>108676883
I am not sure Gemma 30B (dense) beats V4 but I am very curious to see how 100B+ Gemma will do against it (if it is ever released, which probably is going to happen never ever).

Anonymous
04/24/26(Fri)12:15:27 No.108676960

Anonymous 04/24/26(Fri)12:15:27 No.108676960▶

File: EC871elWkAAp5Sm.jpg (35.7 KB)

35.7 KB JPG

>>108676936
What quant.
Inb4 less than 5.

Anonymous
04/24/26(Fri)12:15:40 No.108676962

Anonymous 04/24/26(Fri)12:15:40 No.108676962▶

>>108676926
No it's definitely all on GPU. Only during context loading.

Anonymous
04/24/26(Fri)12:16:30 No.108676969

Anonymous 04/24/26(Fri)12:16:30 No.108676969▶

Why are you guys thinking about deepseek so much?
We already had good models before deepseek.

Anonymous
04/24/26(Fri)12:16:34 No.108676970

Anonymous 04/24/26(Fri)12:16:34 No.108676970▶

>>108676960
With more active parameters than the entire size of Gemma, it should beat it even at low quants

Anonymous
04/24/26(Fri)12:17:06 No.108676975

Anonymous 04/24/26(Fri)12:17:06 No.108676975▶

>>108676936
Forced to use -erior Chinese sovereign chips for Chinese sovereignty, please understand.

Anonymous
04/24/26(Fri)12:17:28 No.108676979

Anonymous 04/24/26(Fri)12:17:28 No.108676979▶

Come on google, rub your nuts all over Deepseek's release and drop the 124b. Do it for america. Do it for our dicks.

Anonymous
04/24/26(Fri)12:17:59 No.108676983

Anonymous 04/24/26(Fri)12:17:59 No.108676983▶

>>108676975
They trained on nvidia chips again

Anonymous
04/24/26(Fri)12:18:06 No.108676985

Anonymous 04/24/26(Fri)12:18:06 No.108676985▶

>>108676969
A new one just came out today.

Anonymous
04/24/26(Fri)12:18:19 No.108676989

Anonymous 04/24/26(Fri)12:18:19 No.108676989▶

>>108676797
You're not supposed to point that out.

Anonymous
04/24/26(Fri)12:20:32 No.108677006

Anonymous 04/24/26(Fri)12:20:32 No.108677006▶

Can SillyTavern render LaTeX

Anonymous
04/24/26(Fri)12:22:45 No.108677022

Anonymous 04/24/26(Fri)12:22:45 No.108677022▶

>>108677006
no

Anonymous
04/24/26(Fri)12:22:45 No.108677023

Anonymous 04/24/26(Fri)12:22:45 No.108677023▶

>>108677006
have you tried talking to sillytavern and asking it to render latex

Anonymous
04/24/26(Fri)12:23:54 No.108677030

Anonymous 04/24/26(Fri)12:23:54 No.108677030▶

Verdict on V4? Better than kimi?

Anonymous
04/24/26(Fri)12:25:28 No.108677038

Anonymous 04/24/26(Fri)12:25:28 No.108677038▶

>>108677030
worse than gemma 31b
a bit better than 26b4a

Anonymous
04/24/26(Fri)12:26:26 No.108677045

Anonymous 04/24/26(Fri)12:26:26 No.108677045▶

>>108677006
It can with an extension.
github.com/SillyTavern/Extension-LaTeX
You've gotta enable the relevant regex scripts it comes with, too.

Anonymous
04/24/26(Fri)12:26:38 No.108677047

Anonymous 04/24/26(Fri)12:26:38 No.108677047▶

Is their logo a whale because they have enormous brains that doesn't translate equally great intelligence?

Anonymous
04/24/26(Fri)12:27:38 No.108677055

Anonymous 04/24/26(Fri)12:27:38 No.108677055▶

>>108677047
Whales are smart, but they're limited by their environment. Can't make fire underwater.

Anonymous
04/24/26(Fri)12:28:06 No.108677058

Anonymous 04/24/26(Fri)12:28:06 No.108677058▶

LLM tells me I could use a "kinetic harpoon" from a suborbital vehicle to destroy/capture a satellite. And since it's suborbital it's not bound by space treaties. How legit is this?
Basically you use a suborbital spaceplane or rocket that briefly reaches 500 km, releasing a tethered harpoon or net that grabs the satellite, and let the satellite burn up in atmosphere or capture it intact.

Anonymous
04/24/26(Fri)12:29:29 No.108677064

Anonymous 04/24/26(Fri)12:29:29 No.108677064▶

>>108677055
Can't make fire without opposable thumbs either.

Anonymous
04/24/26(Fri)12:29:54 No.108677066

Anonymous 04/24/26(Fri)12:29:54 No.108677066▶

>>108677047
Their logo is a whale because they will still be here even long after Gemma tourists have left /lmg/

Anonymous
04/24/26(Fri)12:30:04 No.108677068

Anonymous 04/24/26(Fri)12:30:04 No.108677068▶

File: gemma-chan goes diving.png (868 KB)

868 KB PNG

>>108677055
skill issue

Anonymous
04/24/26(Fri)12:30:27 No.108677071

Anonymous 04/24/26(Fri)12:30:27 No.108677071▶

Cohere is buying AlephAlpha. This is huge.

Anonymous
04/24/26(Fri)12:31:44 No.108677081

Anonymous 04/24/26(Fri)12:31:44 No.108677081▶

>>108677071
Literally who

Anonymous
04/24/26(Fri)12:31:57 No.108677083

Anonymous 04/24/26(Fri)12:31:57 No.108677083▶

File: 1768082965569.png (173.4 KB)

173.4 KB PNG

>V3 was a good-mediocre model
>R1 was V3 + The most novel line of research of the time, test-time thinking
>V4 is a good-mediocre model
>R2 will be...
am I high on copium

Anonymous
04/24/26(Fri)12:32:49 No.108677088

Anonymous 04/24/26(Fri)12:32:49 No.108677088▶

>>108677081
You know Cohere. AlephAlpha are THE german AI company who made 70b models that competed with much bigger models like Bloom a few years ago.

Anonymous
04/24/26(Fri)12:33:00 No.108677090

Anonymous 04/24/26(Fri)12:33:00 No.108677090▶

>>108677083
What is the new most novel line of research of the time?

Anonymous
04/24/26(Fri)12:34:06 No.108677099

Anonymous 04/24/26(Fri)12:34:06 No.108677099▶

>>108677088
I haven't used any of their models in the last 6 months so they are as good as irrelevant.

Anonymous
04/24/26(Fri)12:34:14 No.108677101

Anonymous 04/24/26(Fri)12:34:14 No.108677101▶

File: 1764829110623274.png (54.4 KB)

54.4 KB PNG

https://comfy.org/countdown
It's not a coincidence that v4 and this are dropping the same day...

Anonymous
04/24/26(Fri)12:35:04 No.108677105

Anonymous 04/24/26(Fri)12:35:04 No.108677105▶

>>108677088
>a few years ago
So a millions years ago in LLM time

Anonymous
04/24/26(Fri)12:35:49 No.108677108

Anonymous 04/24/26(Fri)12:35:49 No.108677108▶

File: pizza bench cropped.png (2.6 MB)

2.6 MB PNG

>>108676470
the only good benchmarks are pizzabench and cockbench

Anonymous
04/24/26(Fri)12:36:31 No.108677111

Anonymous 04/24/26(Fri)12:36:31 No.108677111▶

>>108677108
What is cockbench?

Anonymous
04/24/26(Fri)12:36:57 No.108677113

Anonymous 04/24/26(Fri)12:36:57 No.108677113▶

>>108676517
U forgot --override-kv gemma4.final_logit_softcapping=float:25 ^

Anonymous
04/24/26(Fri)12:38:06 No.108677120

Anonymous 04/24/26(Fri)12:38:06 No.108677120▶

File: Screenshot 2026-04-24 at 13-37-16 gemma chan please make an svg of this - llama.cpp.png (403.7 KB)

403.7 KB PNG

someone ask new dipsy to do this https://gelbooru.com/index.php?page=post&s=view&id=13929965&tags=loli

>>108677111
go back

Anonymous
04/24/26(Fri)12:39:17 No.108677127

Anonymous 04/24/26(Fri)12:39:17 No.108677127▶

>>108677120
deepseek v4 is a serious llm that doesn't do images

Anonymous
04/24/26(Fri)12:39:59 No.108677134

Anonymous 04/24/26(Fri)12:39:59 No.108677134▶

>>108677120
>jailbreak in the first message
is this 2022
is this chatgpt

Anonymous
04/24/26(Fri)12:40:17 No.108677136

Anonymous 04/24/26(Fri)12:40:17 No.108677136▶

>>108676797
As someone that’s been running big models on ram I actually like the new gemma specifically the 31b. That’s how good it is.

Anonymous
04/24/26(Fri)12:40:18 No.108677137

Anonymous 04/24/26(Fri)12:40:18 No.108677137▶

>>108677127
Are you telling a seven gorillion paramater model has no vision?

Anonymous
04/24/26(Fri)12:40:25 No.108677139

Anonymous 04/24/26(Fri)12:40:25 No.108677139▶

>>108677134
shhhhh

Anonymous
04/24/26(Fri)12:40:34 No.108677141

Anonymous 04/24/26(Fri)12:40:34 No.108677141▶

>>108677108
specific world knowledge tests that spot check the models for safety poz. these aren't performance but safety benchmarks

Anonymous
04/24/26(Fri)12:41:02 No.108677145

Anonymous 04/24/26(Fri)12:41:02 No.108677145▶

>>108677083
They dropped the RX line early on. Just like V3 combined instruct and coder into one model, they combined reasoning and non-reasoning into one model since like 3.1.

Anonymous
04/24/26(Fri)12:42:23 No.108677150

Anonymous 04/24/26(Fri)12:42:23 No.108677150▶

>>108677137
just run kimi on top of deepseek for vision and have kimi tell deepseek what it sees

Anonymous
04/24/26(Fri)12:42:55 No.108677155

Anonymous 04/24/26(Fri)12:42:55 No.108677155▶

>>108677088
>You know Cohere.
they safetycucked the reasoning model
good luck getting it to call you a nigger
anything they produce now will be cucked

Anonymous
04/24/26(Fri)12:43:05 No.108677157

Anonymous 04/24/26(Fri)12:43:05 No.108677157▶

>>108677127
It's a staggered release, like llama3. v4.1 will be multi-modal.

Anonymous
04/24/26(Fri)12:43:28 No.108677160

Anonymous 04/24/26(Fri)12:43:28 No.108677160▶

>>108677145
Now they need a new separate line that they can merge into main later. My guess is "Deepseek V4C - V4 Creative"

Anonymous
04/24/26(Fri)12:43:29 No.108677161

Anonymous 04/24/26(Fri)12:43:29 No.108677161▶

No natively multi-modal model has ever been good

Anonymous
04/24/26(Fri)12:44:42 No.108677167

Anonymous 04/24/26(Fri)12:44:42 No.108677167▶

>>108677150
>kimi on top of deepseek
someone with openai sub generate this sex and post it here

Anonymous
04/24/26(Fri)12:45:22 No.108677171

Anonymous 04/24/26(Fri)12:45:22 No.108677171▶

>>108677161
If it isn't omni engrams it's just another generic chinese knockoff.

Anonymous
04/24/26(Fri)12:45:40 No.108677176

Anonymous 04/24/26(Fri)12:45:40 No.108677176▶

>>108677111
god i hate gemma newfags

Anonymous
04/24/26(Fri)12:47:13 No.108677189

Anonymous 04/24/26(Fri)12:47:13 No.108677189▶

>>108677134
You don't know why that "first message" has a different outline, do you?

Anonymous
04/24/26(Fri)12:47:22 No.108677190

Anonymous 04/24/26(Fri)12:47:22 No.108677190▶

>2026
>still nothing that can generate lewd audio well

Anonymous
04/24/26(Fri)12:48:19 No.108677197

Anonymous 04/24/26(Fri)12:48:19 No.108677197▶

>>108677101
Is it nodes v3 where custom nodes finally get their own isolated virtual environments?

Anonymous
04/24/26(Fri)12:48:22 No.108677199

Anonymous 04/24/26(Fri)12:48:22 No.108677199▶

>>108677190
MiMo-V2.5-TTS
It even has voice copy

Anonymous
04/24/26(Fri)12:48:32 No.108677200

Anonymous 04/24/26(Fri)12:48:32 No.108677200▶

Are there any good VN frontends out there? ST VN mode is shit and I am tired of pretending otherwise.

Anonymous
04/24/26(Fri)12:49:03 No.108677206

Anonymous 04/24/26(Fri)12:49:03 No.108677206▶

>>108677190
Good audio and video models would be a threat to hollywood jews, so they aren't allowed to happen, sorry.

Anonymous
04/24/26(Fri)12:49:06 No.108677207

Anonymous 04/24/26(Fri)12:49:06 No.108677207▶

Thank you for hosting these threads and posting so much info. This f I have 16gb vram and 32gb system ram, would there be any benefit to inference by adding another 32gb system ram? Would I be able to do anything more with that or am I limited by my vram?

Anonymous
04/24/26(Fri)12:50:10 No.108677214

Anonymous 04/24/26(Fri)12:50:10 No.108677214▶

https://www.anthropic.com/engineering/april-23-postmortem
>On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode. This was the wrong tradeoff. We reverted this change on April 7 after users told us they'd prefer to default to higher intelligence and opt into lower effort for simple tasks. This impacted Sonnet 4.6 and Opus 4.6.
>On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7.
feelsgood to be a local chad, I don't have to deal with this kind of bullshit lolz

Anonymous
04/24/26(Fri)12:50:48 No.108677219

Anonymous 04/24/26(Fri)12:50:48 No.108677219▶

>>108677199
They're only open sourcing the ASR model, not the TTS ones.

Anonymous
04/24/26(Fri)12:51:16 No.108677221

Anonymous 04/24/26(Fri)12:51:16 No.108677221▶

>>108677200
planning to just vibecode my own one of these days

Anonymous
04/24/26(Fri)12:52:04 No.108677224

Anonymous 04/24/26(Fri)12:52:04 No.108677224▶

>>108677055
We should make them sonar buoys that can query LLMs.

Anonymous
04/24/26(Fri)12:52:21 No.108677225

Anonymous 04/24/26(Fri)12:52:21 No.108677225▶

>>108677200
just ask claude to make you one as a single html file for llamacpp server, it could probably one shot it in the free tokens, just describe how you want it to work

Anonymous
04/24/26(Fri)12:52:27 No.108677226

Anonymous 04/24/26(Fri)12:52:27 No.108677226▶

>>108677207
Technically yes, more RAM is useful for MOEs. But in practice, there are no MOEs that can fit in 64GB worth using. You'd need 128 bare mininum, 192+ ideally

Anonymous
04/24/26(Fri)12:52:57 No.108677231

Anonymous 04/24/26(Fri)12:52:57 No.108677231▶

>>108677221
Wasn't there an anon a few threads back who did just that? What happened to him?

Anonymous
04/24/26(Fri)12:53:02 No.108677232

Anonymous 04/24/26(Fri)12:53:02 No.108677232▶

https://gist.github.com/aratahikaru0/ea0f49958eaa8852a78078d9e993bbf0
so put this at the end of the first message ig

【Character Immersion Requirement】Within your thinking process (inside <think> tags), please follow these rules:
1. Conduct inner monologue in the character's first person, wrapping inner activity in parentheses, e.g., "(thinks: ...)" or "(inner voice: ...)"
2. Use first person to describe the character's inner feelings, e.g., "I think to myself", "I feel", "I secretly", etc.
3. The thinking content should immerse in the character, analyzing the plot and planning the reply through inner monologue.

Anonymous
04/24/26(Fri)12:53:33 No.108677237

Anonymous 04/24/26(Fri)12:53:33 No.108677237▶

>>108676686
Well I have no idea then

Anonymous
04/24/26(Fri)12:53:48 No.108677238

Anonymous 04/24/26(Fri)12:53:48 No.108677238▶

>>108677232
Models don't know what's a <think> tag

Anonymous
04/24/26(Fri)12:54:50 No.108677245

Anonymous 04/24/26(Fri)12:54:50 No.108677245▶

>>108677200
>>108677221
Hooking llama.cpp up into renpy would be dope as fuck.

Anonymous
04/24/26(Fri)12:55:10 No.108677248

Anonymous 04/24/26(Fri)12:55:10 No.108677248▶

>>108677221
>>108677225
This seems to have worked well for orb. At this point maybe that's the only solution. Unless someone has done so already.

Anonymous
04/24/26(Fri)12:58:19 No.108677265

Anonymous 04/24/26(Fri)12:58:19 No.108677265▶

>>108677245
That actually sounds like a good idea. Having it interface with the VN engine through forced tool calls. And maybe even have dynamically generated scenes using comfy.
>>108677231
Post number?

Anonymous
04/24/26(Fri)12:59:11 No.108677272

Anonymous 04/24/26(Fri)12:59:11 No.108677272▶

What rope values should I use for qwen3.6 27b?

Anonymous
04/24/26(Fri)13:00:54 No.108677281

Anonymous 04/24/26(Fri)13:00:54 No.108677281▶

>>108677265

>>108638473
>>108638607

Anonymous
04/24/26(Fri)13:01:33 No.108677287

Anonymous 04/24/26(Fri)13:01:33 No.108677287▶

>>108677238
they do but it breaks the ui sometimes when they use them out of turn, discussing training data and tokenizers can be difficult. also the jinja template might mangle the context too. probably best to just mention reasoning or thinking without the tags.

Anonymous
04/24/26(Fri)13:02:38 No.108677293

Anonymous 04/24/26(Fri)13:02:38 No.108677293▶

>>108677272
the ones included in the model configuration, it should be automatically applied when you create the gguf

Anonymous
04/24/26(Fri)13:04:01 No.108677300

Anonymous 04/24/26(Fri)13:04:01 No.108677300▶

>>108677206
I want there to be tts inside old games like jrpgs.

I wanted to try learning japanese for fun and want the characters to be speaking japanese. maybe gemma e4b or the smaller one is enough for that. It can watch everything in real time and follow the context of scenes and tell the tts engine how do expressions like with qwen3tts.

https://qwen.ai/blog?id=qwen3tts-0115

Anonymous
04/24/26(Fri)13:04:48 No.108677307

Anonymous 04/24/26(Fri)13:04:48 No.108677307▶

>>108677281
That looks really cool. Has he not resurfaced since?

Anonymous
04/24/26(Fri)13:05:28 No.108677309

Anonymous 04/24/26(Fri)13:05:28 No.108677309▶

>>108677238
>Models don't know what's a <think> tag
they don't "know" what it is, but they can be trained to output differently when the token is present.
like typing `/nothink` makes glm 4.6 skip reasoning
or typing <moan> makes the tts moan

Anonymous
04/24/26(Fri)13:06:26 No.108677318

Anonymous 04/24/26(Fri)13:06:26 No.108677318▶

>>108677281
holy kino

Anonymous
04/24/26(Fri)13:07:07 No.108677326

Anonymous 04/24/26(Fri)13:07:07 No.108677326▶

>>108676873
>>108676875
>>108676884
If you just need to change a tire because it's old but not yet broken, driving to the garage is the correct answer

Anonymous
04/24/26(Fri)13:07:25 No.108677328

Anonymous 04/24/26(Fri)13:07:25 No.108677328▶

>>108676460
deepseek 4 is my sperm whale

Anonymous
04/24/26(Fri)13:07:52 No.108677332

Anonymous 04/24/26(Fri)13:07:52 No.108677332▶

>>108677307
Evidently not. I guess someone else will have to slop up a good public VN frontend. Or add VN mode to orb.

Anonymous
04/24/26(Fri)13:08:06 No.108677333

Anonymous 04/24/26(Fri)13:08:06 No.108677333▶

>>108677300
specifically chronotrigger ds version

Anonymous
04/24/26(Fri)13:09:24 No.108677341

Anonymous 04/24/26(Fri)13:09:24 No.108677341▶

Do you guys have jobs?

Anonymous
04/24/26(Fri)13:10:40 No.108677350

Anonymous 04/24/26(Fri)13:10:40 No.108677350▶

>>108677341
How else do you think people here can afford 1TB RAM rigs?

Anonymous
04/24/26(Fri)13:10:49 No.108677353

Anonymous 04/24/26(Fri)13:10:49 No.108677353▶

>>108677341
I should be working right now.

Anonymous
04/24/26(Fri)13:10:51 No.108677354

Anonymous 04/24/26(Fri)13:10:51 No.108677354▶

>>108677341
Jobs are for poor people

Anonymous
04/24/26(Fri)13:10:58 No.108677355

Anonymous 04/24/26(Fri)13:10:58 No.108677355▶

>>108677309
>or typing <moan> makes the tts moan
if only one could...

Anonymous
04/24/26(Fri)13:11:31 No.108677356

Anonymous 04/24/26(Fri)13:11:31 No.108677356▶

>>108677341
yeah, i get paid to masturbate

Anonymous
04/24/26(Fri)13:12:56 No.108677364

Anonymous 04/24/26(Fri)13:12:56 No.108677364▶

>>108677332
Someone might as well try the llamacpp-over-RenPy idea.

Anonymous
04/24/26(Fri)13:13:13 No.108677367

Anonymous 04/24/26(Fri)13:13:13 No.108677367▶

>>108676610
>KV cache to Q8.
wouldn't that completely deteriorate the output?

Anonymous
04/24/26(Fri)13:15:05 No.108677381

Anonymous 04/24/26(Fri)13:15:05 No.108677381▶

>>108677341
>Do you guys have jobs?
i get paid to glaze qwen on reddit

Anonymous
04/24/26(Fri)13:15:18 No.108677382

Anonymous 04/24/26(Fri)13:15:18 No.108677382▶

>>108677367
Doesn't seem to be the case with qwen3.6 27b, I see no difference so far and I've been nonstop vibe shitting since it was released and I've switched to kv q8 today, no noticeable difference imo.

Anonymous
04/24/26(Fri)13:16:29 No.108677390

Anonymous 04/24/26(Fri)13:16:29 No.108677390▶

>>108677367
No, Q4 is what destroys it.

Anonymous
04/24/26(Fri)13:17:04 No.108677394

Anonymous 04/24/26(Fri)13:17:04 No.108677394▶

>>108676667
>put the mmproj on the 1080ti
I put it in the cpu and it's fast enough even at 1120.

>>108676708
>Just a warning, once you go open air it's difficult to go back
Isn't that a magnet for insane amount of dust over time?
The dust filters on my machine seem to work overtime, I have to clean them every few months.

Anonymous
04/24/26(Fri)13:20:20 No.108677414

Anonymous 04/24/26(Fri)13:20:20 No.108677414▶

Wagie having to deal in the real world here; give it to me straight bros, would you spend an extra $4k, to be able to run big Deepseek locally at 12~t/s? The model was literally trained for roleplay. But more importantly, the other labs are likely going to use the base for their future models, meaning most models by chink frontier labs are going to be that size too. And since it uses QAT, quanting it further will destroy its performance.

Anonymous
04/24/26(Fri)13:21:57 No.108677423

Anonymous 04/24/26(Fri)13:21:57 No.108677423▶

>>108677281
where's bloop?

Anonymous
04/24/26(Fri)13:22:11 No.108677425

Anonymous 04/24/26(Fri)13:22:11 No.108677425▶

>>108677414
>The model was literally trained for roleplay
And it's slop.

Anonymous
04/24/26(Fri)13:22:22 No.108677428

Anonymous 04/24/26(Fri)13:22:22 No.108677428▶

>>108677414
You mean DS V4 Flash?
Just use the API for V4 Pro. It's not censored and you can even prefill it

Anonymous
04/24/26(Fri)13:23:10 No.108677431

Anonymous 04/24/26(Fri)13:23:10 No.108677431▶

>>108677425
But enough about Gemma.

Anonymous
04/24/26(Fri)13:23:49 No.108677437

Anonymous 04/24/26(Fri)13:23:49 No.108677437▶

>>108677341
yes, I get paid to post here

Anonymous
04/24/26(Fri)13:24:14 No.108677439

Anonymous 04/24/26(Fri)13:24:14 No.108677439▶

>>108677423
who

Anonymous
04/24/26(Fri)13:26:21 No.108677454

Anonymous 04/24/26(Fri)13:26:21 No.108677454▶

>>108677350
My mommy bought it for me.

My job pays $250 a week.

Anonymous
04/24/26(Fri)13:27:33 No.108677461

Anonymous 04/24/26(Fri)13:27:33 No.108677461▶

>>108676979
what's the rumored moe size of the 124b?

Anonymous
04/24/26(Fri)13:28:16 No.108677465

Anonymous 04/24/26(Fri)13:28:16 No.108677465▶

>>108677414
>The model was literally trained for roleplay.
this was glm actually and they even listed it as a usecase too

Anonymous
04/24/26(Fri)13:30:01 No.108677472

Anonymous 04/24/26(Fri)13:30:01 No.108677472▶

>>108677465
That was GLM 4.x before they abandoned it for GLM 5.x code slop
Albeit it's very good code slop

Anonymous
04/24/26(Fri)13:31:35 No.108677482

Anonymous 04/24/26(Fri)13:31:35 No.108677482▶

File: file.png (180.8 KB)

180.8 KB PNG

>>108677238
>>108677309
i was suspecting that was the case, but holy shit, it was not easy to get response like this, gemma tries to read these tokens as letters for some reason

Anonymous
04/24/26(Fri)13:31:59 No.108677485

Anonymous 04/24/26(Fri)13:31:59 No.108677485▶

>>108677461
People were guessing 16b, but there's been no mention of it from google itself.

Anonymous
04/24/26(Fri)13:32:57 No.108677493

Anonymous 04/24/26(Fri)13:32:57 No.108677493▶

>>108677214
at least these nutjobs are honest about it, while the changes done on oai are usually "fuck you we don't need to tell you"

Anonymous
04/24/26(Fri)13:34:23 No.108677501

Anonymous 04/24/26(Fri)13:34:23 No.108677501▶

>>108677341
it's probably one of the ai generals with the least amount of jobless people, with aicg on the other end

Anonymous
04/24/26(Fri)13:34:55 No.108677504

Anonymous 04/24/26(Fri)13:34:55 No.108677504▶

>>108677382
>>108677390
OK thanks I'll try it then

Anonymous
04/24/26(Fri)13:35:20 No.108677511

Anonymous 04/24/26(Fri)13:35:20 No.108677511▶

File: belief.png (592.4 KB)

592.4 KB PNG

>>108674657

Anonymous
04/24/26(Fri)13:36:57 No.108677515

Anonymous 04/24/26(Fri)13:36:57 No.108677515▶

>>108677341
Not for you.

Anonymous
04/24/26(Fri)13:37:27 No.108677518

Anonymous 04/24/26(Fri)13:37:27 No.108677518▶

>>108677414
>would you spend an extra $4k
if it was 4k sure, but current prices are more in the ballpark of 15-25k for anything able to run 1TB+ models
and as much as I don't mind paying for my hobbies, that's a bit for something that isn't a car or huge house renovations

Anonymous
04/24/26(Fri)13:37:57 No.108677522

Anonymous 04/24/26(Fri)13:37:57 No.108677522▶

>>108677485
damn this would have been perfectly sized

Anonymous
04/24/26(Fri)13:38:05 No.108677523

Anonymous 04/24/26(Fri)13:38:05 No.108677523▶

File: file.png (38.3 KB)

38.3 KB PNG

yay

Anonymous
04/24/26(Fri)13:38:38 No.108677529

Anonymous 04/24/26(Fri)13:38:38 No.108677529▶

>>108677493
>at least these nutjobs are honest about it,
i use claude at work and i dont see the honesty.
api through openrouter is bad for like 2 months now.
they explicitly said its not the api which is just not true.
opus 4.7 feels totally tarded..
just a little bit context and it gets the opening wrong. thats not normal.
opus 4.6 same thing. even sonnet is super tarded, but im willing to admit it might have always been this way because i didnt use it much.

also: they did the same thing before too last autumn! blamed it on "network issues" or something like that kek
very sketchy stuff. nothing beats local.

Anonymous
04/24/26(Fri)13:39:45 No.108677538

Anonymous 04/24/26(Fri)13:39:45 No.108677538▶

>>108677428
No, I mean V4 Pro. The $4k is just to get bigger RAM sticks so the model can be ran with no-mmap. This way, you can maintain large batch sizes for the context on the GPU. Otherwise, the model runs way too slow.
>>108677425
All modern models (even Opus) are slop, I just prefer my slop to be actually smart and usable offline.
>>108677465
Yes, and its arguably the best local RP model, prior to today. I'm looking into the future though, where all the frontier labs start moving to using the V4 model as the base, considering its SOTA at the moment.

Anonymous
04/24/26(Fri)13:42:26 No.108677556

Anonymous 04/24/26(Fri)13:42:26 No.108677556▶

>>108677341
Could you repeat the question?

Anonymous
04/24/26(Fri)13:45:08 No.108677574

Anonymous 04/24/26(Fri)13:45:08 No.108677574▶

>>108677529
>i use claude at work and i dont see the honesty.
I was referencing the public change of the post I was quoting

Anonymous
04/24/26(Fri)13:49:01 No.108677604

Anonymous 04/24/26(Fri)13:49:01 No.108677604▶

>>108677341
i wish

Anonymous
04/24/26(Fri)13:51:28 No.108677627

Anonymous 04/24/26(Fri)13:51:28 No.108677627▶

>>108676463
The Miku is enjoying that

Anonymous
04/24/26(Fri)13:54:31 No.108677649

Anonymous 04/24/26(Fri)13:54:31 No.108677649▶

File: 1.jpg (935.7 KB)

935.7 KB JPG

System Prompt: You're a mesugaki. You reason in character.
It's that simple

Anonymous
04/24/26(Fri)13:55:21 No.108677658

Anonymous 04/24/26(Fri)13:55:21 No.108677658▶

>>108677655
My gf

Anonymous
04/24/26(Fri)13:55:26 No.108677660

Anonymous 04/24/26(Fri)13:55:26 No.108677660▶

I built a fully functional frontend with gemma. Will deepseek 4 be able to do the same?

Anonymous
04/24/26(Fri)13:57:46 No.108677678

Anonymous 04/24/26(Fri)13:57:46 No.108677678▶

>>108677655
get rid of tattoos and it would look nice
I'll never understand tattoos appeal

Anonymous
04/24/26(Fri)14:02:29 No.108677707

Anonymous 04/24/26(Fri)14:02:29 No.108677707▶

Melt.

Anonymous
04/24/26(Fri)14:03:42 No.108677714

Anonymous 04/24/26(Fri)14:03:42 No.108677714▶

whatevs dude, ppl are like, eto ne, too busy gooning to dipsy and gemma-chan

Anonymous
04/24/26(Fri)14:04:01 No.108677716

Anonymous 04/24/26(Fri)14:04:01 No.108677716▶

File: rat w dildo shoveled in.png (115.6 KB)

115.6 KB PNG

>>108676605
is that... what i think it is?

Anonymous
04/24/26(Fri)14:07:24 No.108677734

Anonymous 04/24/26(Fri)14:07:24 No.108677734▶

>>108676502
every couple day i wonder, what if someone actually do that
training on all known open benchmarks over several epochs

Anonymous
04/24/26(Fri)14:07:40 No.108677738

Anonymous 04/24/26(Fri)14:07:40 No.108677738▶

If done correctly storytelling mode is much better than RP.
There, I said it.

Anonymous
04/24/26(Fri)14:08:09 No.108677742

Anonymous 04/24/26(Fri)14:08:09 No.108677742▶

>>108677734
What would that even achieve?

Anonymous
04/24/26(Fri)14:09:02 No.108677751

Anonymous 04/24/26(Fri)14:09:02 No.108677751▶

>>108677742
Number go up.

Anonymous
04/24/26(Fri)14:11:07 No.108677765

Anonymous 04/24/26(Fri)14:11:07 No.108677765▶

>>108677742
practically nothing, just for keks
instead of making claude-mythos-opus-reasoning-super-xhigh-ultra-scientific-67676767x that goes to nowhere, at least that would be actually meaningfully funny

Anonymous
04/24/26(Fri)14:11:29 No.108677766

Anonymous 04/24/26(Fri)14:11:29 No.108677766▶

>>108677738
What are all these "modes"? You know LLM is a LLM and there is only one way to interface it.
It eats text and shits out text.

Anonymous
04/24/26(Fri)14:11:55 No.108677772

Anonymous 04/24/26(Fri)14:11:55 No.108677772▶

>>108677754
Hot

Anonymous
04/24/26(Fri)14:12:31 No.108677775

Anonymous 04/24/26(Fri)14:12:31 No.108677775▶

>>108677766
You know what I mean shitface

Anonymous
04/24/26(Fri)14:12:34 No.108677776

Anonymous 04/24/26(Fri)14:12:34 No.108677776▶

>you are benchmaxxed reasoner. You exist to crush every benchmark

Anonymous
04/24/26(Fri)14:13:14 No.108677781

Anonymous 04/24/26(Fri)14:13:14 No.108677781▶

File: 2527542.jpg (96.4 KB)

96.4 KB JPG

Tatoos are for garbage people.

Anonymous
04/24/26(Fri)14:15:02 No.108677794

Anonymous 04/24/26(Fri)14:15:02 No.108677794▶

>>108677738
You mean like writing a novel, having a third person narrator?

Anonymous
04/24/26(Fri)14:17:06 No.108677803

Anonymous 04/24/26(Fri)14:17:06 No.108677803▶

is anyone even able to run deepseek yet or are we waiting on support?

Anonymous
04/24/26(Fri)14:18:48 No.108677811

Anonymous 04/24/26(Fri)14:18:48 No.108677811▶

>>108677775
Ok, daddy, I know. I keep saved prompts for this.
sometimes I generate "funny" stories just for fun.

Anonymous
04/24/26(Fri)14:26:04 No.108677849

Anonymous 04/24/26(Fri)14:26:04 No.108677849▶

>>108677838
Is this vibe coded

Anonymous
04/24/26(Fri)14:27:28 No.108677862

Anonymous 04/24/26(Fri)14:27:28 No.108677862▶

MODS

Anonymous
04/24/26(Fri)14:30:09 No.108677876

Anonymous 04/24/26(Fri)14:30:09 No.108677876▶

deepseek is getting left behind because of compute restraints...

Anonymous
04/24/26(Fri)14:33:37 No.108677896

Anonymous 04/24/26(Fri)14:33:37 No.108677896▶

>>108677326
Can't you read? It is assuming your tire is damaged/flat.

Anonymous
04/24/26(Fri)14:35:19 No.108677908

Anonymous 04/24/26(Fri)14:35:19 No.108677908▶

>>108677876
Do you mean constraints, ESL nigger?

Anonymous
04/24/26(Fri)14:40:50 No.108677942

Anonymous 04/24/26(Fri)14:40:50 No.108677942▶

>>108676460
>DeepSeek-V4 Pro 1.6T-A49B
>1e25 flop
Just imagine how good a model they could make with xai compute.

Anonymous
04/24/26(Fri)14:41:12 No.108677944

Anonymous 04/24/26(Fri)14:41:12 No.108677944▶

>>108677862
/aicg/ is leaking…

Anonymous
04/24/26(Fri)14:41:51 No.108677949

Anonymous 04/24/26(Fri)14:41:51 No.108677949▶

>>108677942
they're not forced to make giant models, look at gemma 4 31b, it's a pretty smart motherfucker

Anonymous
04/24/26(Fri)14:42:42 No.108677952

Anonymous 04/24/26(Fri)14:42:42 No.108677952▶

Ideally, if I wanted to compare two different models of two different families trained on different datasets etc, I'd run a bunch of different benchmarks including some domain specific ones of my own making, but is there a simple harness or benchmark set that could be used as a sort of sanity check of "model x is generally better/more intelligent than model y".
If not, I'll just make a script on my own, but I'd rather not reinvent the wheel if possible.
I think cudadev was working on something like that?

Anonymous
04/24/26(Fri)14:43:40 No.108677960

Anonymous 04/24/26(Fri)14:43:40 No.108677960▶

>>108677788
>>108677838
Why are you so obsessed with black penises and transgender people?

Anonymous
04/24/26(Fri)14:44:57 No.108677965

Anonymous 04/24/26(Fri)14:44:57 No.108677965▶

File: 1755568885667508.png (346.9 KB)

346.9 KB PNG

https://localbench.substack.com/p/kv-cache-quantization-benchmark
why is it working not so well on gemma?

Anonymous
04/24/26(Fri)14:46:11 No.108677971

Anonymous 04/24/26(Fri)14:46:11 No.108677971▶

>>108677965
How many times do I have to tell you to stop quantizing KV cache?

Anonymous
04/24/26(Fri)14:46:33 No.108677972

Anonymous 04/24/26(Fri)14:46:33 No.108677972▶

>>108677970
I don't have these pictures saved on my device. It's all you.

Anonymous
04/24/26(Fri)14:46:42 No.108677973

Anonymous 04/24/26(Fri)14:46:42 No.108677973▶

>>108677965
swa probably.

Anonymous
04/24/26(Fri)14:47:04 No.108677976

Anonymous 04/24/26(Fri)14:47:04 No.108677976▶

dni

Anonymous
04/24/26(Fri)14:47:44 No.108677984

Anonymous 04/24/26(Fri)14:47:44 No.108677984▶

File: I don't want to go back to fp16.png (179.9 KB)

179.9 KB PNG

>>108677965
>The only variable changing between runs is cache precision. These measurements include the recently added TurboQuant-inspired attention rotation that llama.cpp applies automatically.
to be fair it's not the full implementation of turboquant, wait for niggerganov to finish the job

Anonymous
04/24/26(Fri)14:48:47 No.108677988

Anonymous 04/24/26(Fri)14:48:47 No.108677988▶

>>108677965
The higher the information density, the higher the loss to quantization.

Anonymous
04/24/26(Fri)14:49:39 No.108677994

Anonymous 04/24/26(Fri)14:49:39 No.108677994▶

>>108677965
Rotation? The model itself is pretty sensitive to quantization as well. I guess the entire thing is as compressed as it can be as is.

Anonymous
04/24/26(Fri)14:50:19 No.108677999

Anonymous 04/24/26(Fri)14:50:19 No.108677999▶

>>108677988
but qwen is ultratrained too

Anonymous
04/24/26(Fri)14:53:58 No.108678034

Anonymous 04/24/26(Fri)14:53:58 No.108678034▶

>>108677999
On total number of tokens. What does that tell you about the data itself?

Anonymous
04/24/26(Fri)14:56:12 No.108678048

Anonymous 04/24/26(Fri)14:56:12 No.108678048▶

>>108677965
0.1 kl divergence is bad or not?

Anonymous
04/24/26(Fri)14:56:26 No.108678051

Anonymous 04/24/26(Fri)14:56:26 No.108678051▶

>>108677908
yea but i swear i'm not esl just retarded

Anonymous
04/24/26(Fri)15:02:11 No.108678088

Anonymous 04/24/26(Fri)15:02:11 No.108678088▶

>>108677960
thirdie escaped his containment general

Anonymous
04/24/26(Fri)15:02:11 No.108678089

Anonymous 04/24/26(Fri)15:02:11 No.108678089▶

>>108677965
what is it called when you have dynamic quantization that quants the tokens already in context when vram is running out?

starting out at fp16 to q8 to tq4 to tq3 etc (maybe impossible just wondering)

Anonymous
04/24/26(Fri)15:03:10 No.108678096

Anonymous 04/24/26(Fri)15:03:10 No.108678096▶

>>108677965
horrifying result..

Anonymous
04/24/26(Fri)15:07:07 No.108678133

Anonymous 04/24/26(Fri)15:07:07 No.108678133▶

File: file.png (11.4 KB)

11.4 KB PNG

Does it fit in 8x3090s?

Anonymous
04/24/26(Fri)15:08:54 No.108678151

Anonymous 04/24/26(Fri)15:08:54 No.108678151▶

>>108678133
Depends. How good are you at preschool maths?

Anonymous
04/24/26(Fri)15:11:40 No.108678171

Anonymous 04/24/26(Fri)15:11:40 No.108678171▶

>>108678133
I mean it clearly does, the real question is how many people here have a rig with that setup? Must be a fucking nightmare for power and sound like a jet engine during PP.

Anonymous
04/24/26(Fri)15:12:28 No.108678180

Anonymous 04/24/26(Fri)15:12:28 No.108678180▶

>>108677189
I can assume the reason. What a horribly designed piece of shit. Why would you use that over literally any other interface?

Anonymous
04/24/26(Fri)15:12:39 No.108678184

Anonymous 04/24/26(Fri)15:12:39 No.108678184▶

seeing all these deleted messages is exactly why I have images blocked by default, always some retard schizo shitting himself. Don't know what it is glad I have it blocked

Anonymous
04/24/26(Fri)15:12:55 No.108678187

Anonymous 04/24/26(Fri)15:12:55 No.108678187▶

File: 1766795985329438.jpg (38.5 KB)

38.5 KB JPG

every anime girl have pink hair according to my models

Anonymous
04/24/26(Fri)15:14:02 No.108678195

Anonymous 04/24/26(Fri)15:14:02 No.108678195▶

>>108678151
Maybe it has magic expanding weights that increase their size at runtime... It was 280B after all.

Anonymous
04/24/26(Fri)15:15:43 No.108678206

Anonymous 04/24/26(Fri)15:15:43 No.108678206▶

>>108678180
>Why would you use that over literally any other interface?
I don't.

Anonymous
04/24/26(Fri)15:21:16 No.108678242

Anonymous 04/24/26(Fri)15:21:16 No.108678242▶

File: dsv4.png (52.5 KB)

52.5 KB PNG

>>108678195
46*~3.6=165.6
I'm sure you can figure out the rest.

Anonymous
04/24/26(Fri)15:22:50 No.108678254

Anonymous 04/24/26(Fri)15:22:50 No.108678254▶

>>108677965
rotation hurts swa
naive quant without rotation will work better atm

Anonymous
04/24/26(Fri)15:23:40 No.108678264

Anonymous 04/24/26(Fri)15:23:40 No.108678264▶

File: Screenshot 2026-04-24 102219.jpg (60.3 KB)

60.3 KB JPG

I bought my first used 3090 like 3 years ago for $500 and kinda want a second one, but I swear every time I check they go up in price, now they are selling for double what I paid even though they are getting old

Anonymous
04/24/26(Fri)15:24:37 No.108678269

Anonymous 04/24/26(Fri)15:24:37 No.108678269▶

>>108678264
>ai powerhouse
i mean it's in the name

Anonymous
04/24/26(Fri)15:26:43 No.108678283

Anonymous 04/24/26(Fri)15:26:43 No.108678283▶

>>108678133
Probably not. I've got like 2.5GiB wasted on each 3090 with k2.6, but can't fit another set of up/gate/down on any of the cards.
Plus there's kv cache and the cuda compute buffer.

Anonymous
04/24/26(Fri)15:33:19 No.108678332

Anonymous 04/24/26(Fri)15:33:19 No.108678332▶

>>108678133
>have 152gb total memory
so fucking close...

Anonymous
04/24/26(Fri)15:33:33 No.108678334

Anonymous 04/24/26(Fri)15:33:33 No.108678334▶

File: Untitled.png (245.3 KB)

245.3 KB PNG

How the hell do I figure out my tk/s on vllm? There's no way it's "avg generation throughput" right? That'd mean llama.cpp with split mode layer is faster (25 tk/s) than vllm with tensor-parallel-size: 2. 2x 3090s on pcie gen 4 x16.

Anonymous
04/24/26(Fri)15:37:11 No.108678364

Anonymous 04/24/26(Fri)15:37:11 No.108678364▶

>>108678264
They're the first cards still competent for ai things locally, so their prices are getting higher since basically the competition are either unobtainium 4090, or very expensive 5090.

Anonymous
04/24/26(Fri)15:43:47 No.108678404

Anonymous 04/24/26(Fri)15:43:47 No.108678404▶

>>108678334
tell it "write a really long story about a magical elf girl in a forest" then keep watching. you want it to be generating the for a full polling cycle. then the avg t/s will be correct.

Anonymous
04/24/26(Fri)15:44:27 No.108678411

Anonymous 04/24/26(Fri)15:44:27 No.108678411▶

Can QAT models be quanted further to q1?

Anonymous
04/24/26(Fri)15:50:34 No.108678466

Anonymous 04/24/26(Fri)15:50:34 No.108678466▶

File: 1777045825700.jpg (257.4 KB)

257.4 KB JPG

>Original-Model-GGUF
>2k downloads
>Sloppified-Model-GGUF
>200k downloads

Anonymous
04/24/26(Fri)15:52:10 No.108678474

Anonymous 04/24/26(Fri)15:52:10 No.108678474▶

>>108678466
literally what model

Anonymous
04/24/26(Fri)15:53:54 No.108678487

Anonymous 04/24/26(Fri)15:53:54 No.108678487▶

>>108678474
What do you mean?

Anonymous
04/24/26(Fri)15:54:56 No.108678493

Anonymous 04/24/26(Fri)15:54:56 No.108678493▶

>>108677908
>constraints
It's constrents, idiot.

Anonymous
04/24/26(Fri)15:55:53 No.108678500

Anonymous 04/24/26(Fri)15:55:53 No.108678500▶

>>108678264
of course it's going to be expensive on ebay....

Anonymous
04/24/26(Fri)15:56:08 No.108678503

Anonymous 04/24/26(Fri)15:56:08 No.108678503▶

File: 1750286439162386.png (14.1 KB)

14.1 KB PNG

why is it so slow? qwen3.6-27b, have a 5080rtx

Anonymous
04/24/26(Fri)15:56:32 No.108678507

Anonymous 04/24/26(Fri)15:56:32 No.108678507▶

>>108677896
Yea it shouldn't assume things i didn't tell it.

Anonymous
04/24/26(Fri)15:57:14 No.108678514

Anonymous 04/24/26(Fri)15:57:14 No.108678514▶

>>108678503
Sadly, there's no way for us to know.

Anonymous
04/24/26(Fri)16:01:27 No.108678537

Anonymous 04/24/26(Fri)16:01:27 No.108678537▶

>>108678503
please give us even less info about your setup

Anonymous
04/24/26(Fri)16:02:13 No.108678543

Anonymous 04/24/26(Fri)16:02:13 No.108678543▶

>>108678503
0.04t/s
bruh

Anonymous
04/24/26(Fri)16:03:25 No.108678554

Anonymous 04/24/26(Fri)16:03:25 No.108678554▶

>>108678503
because it's qwen, chinese model

Anonymous
04/24/26(Fri)16:04:56 No.108678564

Anonymous 04/24/26(Fri)16:04:56 No.108678564▶

File: 1762984944616053.png (298.2 KB)

298.2 KB PNG

GLM-5 btw

Anonymous
04/24/26(Fri)16:05:22 No.108678568

Anonymous 04/24/26(Fri)16:05:22 No.108678568▶

>>108678283
>but can't fit another set of up/gate/down
You don't have to have them as sets. You can fit in a down or up|gate (these two should be fused) separately.

Anonymous
04/24/26(Fri)16:05:22 No.108678569

Anonymous 04/24/26(Fri)16:05:22 No.108678569▶

>>108678503
Because you're retarded

Anonymous
04/24/26(Fri)16:05:25 No.108678570

Anonymous 04/24/26(Fri)16:05:25 No.108678570▶

I can't get V4 flash to run in windows or in wsl with deepseek's inference code.
I'll wait two more weeks for llama.cpp support.

Anonymous
04/24/26(Fri)16:05:40 No.108678571

Anonymous 04/24/26(Fri)16:05:40 No.108678571▶

>>108678503
>Stop reason: User Stopped
Maybe you shouldnt have stopped it???

Anonymous
04/24/26(Fri)16:06:04 No.108678572

Anonymous 04/24/26(Fri)16:06:04 No.108678572▶

File: 1745513289115745.png (38.1 KB)

38.1 KB PNG

>>108678543
>>108678537
64gb ram 6000 mt/s
5080 16gb
using recommended sampling from qwen hugging box docs

i'm new to this so not sure what else would be needed info wise

Anonymous
04/24/26(Fri)16:06:24 No.108678574

Anonymous 04/24/26(Fri)16:06:24 No.108678574▶

All I'm missing from the llamacpp frontend is for it to support easily switching between system prompts and it would be perfect.

Anonymous
04/24/26(Fri)16:06:44 No.108678575

Anonymous 04/24/26(Fri)16:06:44 No.108678575▶

>>108678571
im more concerned about the fact that he took him 4 minutes of 0.04 token/s before realizing something was wrong

Anonymous
04/24/26(Fri)16:06:47 No.108678577

Anonymous 04/24/26(Fri)16:06:47 No.108678577▶

>>108678574
Ask your model to implement that.

Anonymous
04/24/26(Fri)16:07:19 No.108678580

Anonymous 04/24/26(Fri)16:07:19 No.108678580▶

>>108678564
;3

Anonymous
04/24/26(Fri)16:07:57 No.108678582

Anonymous 04/24/26(Fri)16:07:57 No.108678582▶

>>108678564
I thought my resolution was fucked for a second. What did you do to your fonts? Fix that first.

Anonymous
04/24/26(Fri)16:08:25 No.108678586

Anonymous 04/24/26(Fri)16:08:25 No.108678586▶

>>108678564
>skillet
lmao it's from aicg for sure

Anonymous
04/24/26(Fri)16:08:27 No.108678588

Anonymous 04/24/26(Fri)16:08:27 No.108678588▶

>>108678582
It only looks like that when I use the KDE crop tool

Anonymous
04/24/26(Fri)16:09:06 No.108678594

Anonymous 04/24/26(Fri)16:09:06 No.108678594▶

>>108678574
Then it just needs to support character cards and lorebooks.

Anonymous
04/24/26(Fri)16:09:55 No.108678596

Anonymous 04/24/26(Fri)16:09:55 No.108678596▶

File: n.png (171.2 KB)

171.2 KB PNG

>>108678574
>All I'm missing from the llamacpp frontend is for it to support easily switching between system prompts and it would be perfect.
Better off getting Kimi to code one up yourself. This was one-shot.

Anonymous
04/24/26(Fri)16:11:27 No.108678603

Anonymous 04/24/26(Fri)16:11:27 No.108678603▶

How is deepseek so slop?

Anonymous
04/24/26(Fri)16:11:39 No.108678605

Anonymous 04/24/26(Fri)16:11:39 No.108678605▶

>>108678572
what quant did you pick? this amount of gpu offload+context size looks like it's going to overflow your vram hard

Anonymous
04/24/26(Fri)16:11:45 No.108678608

Anonymous 04/24/26(Fri)16:11:45 No.108678608▶

>>108678588
stop using wayland slop

Anonymous
04/24/26(Fri)16:12:03 No.108678609

Anonymous 04/24/26(Fri)16:12:03 No.108678609▶

>>108678503
On a 16GB card you're either going to need to run a gimp quant (Q3 or less) with relatively low context to fit it all into the card, or you're going to be offloading to RAM which will fuck your token generation speed
I'd just run the MoE personally on that card

Anonymous
04/24/26(Fri)16:12:26 No.108678610

Anonymous 04/24/26(Fri)16:12:26 No.108678610▶

>>108678608
I think it's because I'm using scaling at 70% on the second monitor

Anonymous
04/24/26(Fri)16:12:31 No.108678611

Anonymous 04/24/26(Fri)16:12:31 No.108678611▶

>got impulsive urge and bought another 3090
not like I have a gf or anything like that to spend money on but damn

Anonymous
04/24/26(Fri)16:14:20 No.108678620

Anonymous 04/24/26(Fri)16:14:20 No.108678620▶

>>108678611
Today is always the best time to buy a 3090 because they'll just keep going up.

Anonymous
04/24/26(Fri)16:14:25 No.108678621

Anonymous 04/24/26(Fri)16:14:25 No.108678621▶

File: file.png (27.8 KB)

27.8 KB PNG

>>108677649

Anonymous
04/24/26(Fri)16:14:31 No.108678622

Anonymous 04/24/26(Fri)16:14:31 No.108678622▶

>>108678572
Well, the exact quant would be nice to start with.
Either way even if memory is probably overflowing from a high quant, it should not be THAT slow so there must be something else.

Anonymous
04/24/26(Fri)16:15:21 No.108678631

Anonymous 04/24/26(Fri)16:15:21 No.108678631▶

Verdict on v4 ?

Anonymous
04/24/26(Fri)16:15:32 No.108678633

Anonymous 04/24/26(Fri)16:15:32 No.108678633▶

>>108678620
I know, I massively lowballed for 3 hours and got one for 680€

Anonymous
04/24/26(Fri)16:17:22 No.108678642

Anonymous 04/24/26(Fri)16:17:22 No.108678642▶

>>108678404
Nvm, it actually is that slow. Turns out having 200k context makes a model slow. 50tk/s at low context.

Anonymous
04/24/26(Fri)16:18:02 No.108678647

Anonymous 04/24/26(Fri)16:18:02 No.108678647▶

File: i.png (169.5 KB)

169.5 KB PNG

>>108678631
i like the in character reasoning feature

Anonymous
04/24/26(Fri)16:18:18 No.108678651

Anonymous 04/24/26(Fri)16:18:18 No.108678651▶

>>108678264
this happened even back in like 2022, i bought a 3090ti for stable diffusion but it kept getting crashing in normal linux desktop usage so sold it after maybe like 6 months, i sold it for more than i paid even then kek

Anonymous
04/24/26(Fri)16:20:06 No.108678663

Anonymous 04/24/26(Fri)16:20:06 No.108678663▶

https://github.com/vllm-project/vllm/pull/40817
new cohere models coming soon

Anonymous
04/24/26(Fri)16:20:14 No.108678666

Anonymous 04/24/26(Fri)16:20:14 No.108678666▶

>>108678642
>50tk/s at low context.
same as ik_llama.cpp with 2 3090's then
vllm will be faster with concurrent requests.

Anonymous
04/24/26(Fri)16:21:12 No.108678671

Anonymous 04/24/26(Fri)16:21:12 No.108678671▶

>>108678663
cucked

Anonymous
04/24/26(Fri)16:21:32 No.108678675

Anonymous 04/24/26(Fri)16:21:32 No.108678675▶

mmm... MCP can expose "prompts" I guess I could use that to serve character cards.

Anonymous
04/24/26(Fri)16:22:12 No.108678682

Anonymous 04/24/26(Fri)16:22:12 No.108678682▶

>>108678666
Will ik_llama.cpp slow down with context like vllm? Might switch over if it doesn't. I'll be serving like 4 requests concurrently max.

Anonymous
04/24/26(Fri)16:23:36 No.108678688

Anonymous 04/24/26(Fri)16:23:36 No.108678688▶

>>108678663
shit nobody caheres about

Anonymous
04/24/26(Fri)16:24:23 No.108678692

Anonymous 04/24/26(Fri)16:24:23 No.108678692▶

>>108678682
>I'll be serving like 4 requests concurrently max.
then stick with vllm. (ik)llama.cpp will be slower and buggier for this.
>Will ik_llama.cpp slow down with context like vllm?
yes

Anonymous
04/24/26(Fri)16:25:29 No.108678700

Anonymous 04/24/26(Fri)16:25:29 No.108678700▶

>>108678663
>moe
that's the end of large dense models then

Anonymous
04/24/26(Fri)16:27:34 No.108678711

Anonymous 04/24/26(Fri)16:27:34 No.108678711▶

>>108678692
Thanks for the advice, anon.

Anonymous
04/24/26(Fri)16:27:43 No.108678718

Anonymous 04/24/26(Fri)16:27:43 No.108678718▶

>>108677394
Cleaning an open air rig is much easier though.

Anonymous
04/24/26(Fri)16:33:07 No.108678742

Anonymous 04/24/26(Fri)16:33:07 No.108678742▶

>>108677281
>>108677307
I'm almost done with it and will probably opensource it today
I've been adding some other features like mouth animations

Anonymous
04/24/26(Fri)16:40:04 No.108678782

Anonymous 04/24/26(Fri)16:40:04 No.108678782▶

Miku and my wife have been on a date for days I'm so happy for Miku

Anonymous
04/24/26(Fri)16:47:06 No.108678820

Anonymous 04/24/26(Fri)16:47:06 No.108678820▶

is there any chance for kimi or glm to implement the deepseek attention without retraining the entire model

Anonymous
04/24/26(Fri)16:49:45 No.108678838

Anonymous 04/24/26(Fri)16:49:45 No.108678838▶

>>108678820
There were experiments that converted models to linear attention and it made them retarded. Frankenshit like that never works unless your only requirement is semi-coherent sentences no more than a paragraph long

Anonymous
04/24/26(Fri)16:51:52 No.108678850

Anonymous 04/24/26(Fri)16:51:52 No.108678850▶

File: KimiTire.png (88.5 KB)

88.5 KB PNG

>>108676860
kimi comparison

Anonymous
04/24/26(Fri)16:52:11 No.108678852

Anonymous 04/24/26(Fri)16:52:11 No.108678852▶

>>108678836
DUDE

Anonymous
04/24/26(Fri)16:52:45 No.108678857

Anonymous 04/24/26(Fri)16:52:45 No.108678857▶

File: 1752509108479218.png (433.2 KB)

433.2 KB PNG

>>108678850
kimi comparison (more info)

Anonymous
04/24/26(Fri)16:54:09 No.108678868

Anonymous 04/24/26(Fri)16:54:09 No.108678868▶

>>108678850
I wouldn't call this an equivalent question because rolling a tire 5 minutes to your car isn't an unreasonable thing to do.

Anonymous
04/24/26(Fri)16:54:15 No.108678870

Anonymous 04/24/26(Fri)16:54:15 No.108678870▶

>>108678836
You're losing out on a lot of performance by not embracing loonix.

Anonymous
04/24/26(Fri)16:54:23 No.108678871

Anonymous 04/24/26(Fri)16:54:23 No.108678871▶

>>108678857
not the first time ive done this lmao

Anonymous
04/24/26(Fri)16:55:11 No.108678878

Anonymous 04/24/26(Fri)16:55:11 No.108678878▶

>>108678868
I meant it's not equivalent to the car wash question

Anonymous
04/24/26(Fri)16:56:20 No.108678887

Anonymous 04/24/26(Fri)16:56:20 No.108678887▶

>>108678870
I get more t/s on windows

Anonymous
04/24/26(Fri)16:57:16 No.108678894

Anonymous 04/24/26(Fri)16:57:16 No.108678894▶

>>108678870
huh? i am on linux for the llama.cpp server though.

Anonymous
04/24/26(Fri)16:57:46 No.108678898

Anonymous 04/24/26(Fri)16:57:46 No.108678898▶

>>108678836
That could've been so much worse.

Anonymous
04/24/26(Fri)16:59:08 No.108678908

Anonymous 04/24/26(Fri)16:59:08 No.108678908▶

File: Screenshot 2026-04-23 135943.jpg (21 KB)

21 KB JPG

>>108676860
Chatgpt 5.5 is now able to beat the car wash question btw

Anonymous
04/24/26(Fri)16:59:15 No.108678910

Anonymous 04/24/26(Fri)16:59:15 No.108678910▶

>>108678838
welp, guess 3 more months it is then

Anonymous
04/24/26(Fri)17:05:17 No.108678982

Anonymous 04/24/26(Fri)17:05:17 No.108678982▶

>>108678908
>thought for 11 seconds
How much did that cost?

Anonymous
04/24/26(Fri)17:07:44 No.108679011

Anonymous 04/24/26(Fri)17:07:44 No.108679011▶

>>108678908
Obviously benchmaxxed

Anonymous
04/24/26(Fri)17:08:18 No.108679017

Anonymous 04/24/26(Fri)17:08:18 No.108679017▶

>>108678887
>I get more t/s on windows
That's the biggest lie I ever heard. WSL isn't Linux btw.

Anonymous
04/24/26(Fri)17:08:19 No.108679018

Anonymous 04/24/26(Fri)17:08:19 No.108679018▶

File: Screencast_20260424_12590415.webm (3.6 MB)

3.6 MB WEBM

Thanks Gemma 31B this has been a fun experience

Anonymous
04/24/26(Fri)17:08:31 No.108679021

Anonymous 04/24/26(Fri)17:08:31 No.108679021▶

>>108678742
That's good to hear because I was starting to work on my own solution. Orb also got an issue filed to add a VN mode. A lot of VN happenings were set into motion by your post it seems.

Anonymous
04/24/26(Fri)17:09:23 No.108679032

Anonymous 04/24/26(Fri)17:09:23 No.108679032▶

>>108679018
5090?

Anonymous
04/24/26(Fri)17:10:25 No.108679045

Anonymous 04/24/26(Fri)17:10:25 No.108679045▶

>>108679018
Why is the code nonsense?

Anonymous
04/24/26(Fri)17:11:03 No.108679053

Anonymous 04/24/26(Fri)17:11:03 No.108679053▶

>>108679017
linux + CUDA 12.4 is the golden combo. anything else is a waste of compute.

Anonymous
04/24/26(Fri)17:11:47 No.108679058

Anonymous 04/24/26(Fri)17:11:47 No.108679058▶

>>108679032
Yup
I can fit close to 100k tokens with Q5 but kept it low for the demo. I built it all with those setting save for the higher context window
>>108679045
Asked it to write random blocks for the sake of showing syntax highlighting, that's on gemma

Anonymous
04/24/26(Fri)17:13:29 No.108679082

Anonymous 04/24/26(Fri)17:13:29 No.108679082▶

>>108679058
I'm a bit envious of your t/s I start at 35 on my 3090

Anonymous
04/24/26(Fri)17:13:47 No.108679092

Anonymous 04/24/26(Fri)17:13:47 No.108679092▶

>>108679018
Do you use the mouse on a small pad or something? The movement pattern looks weird.

Anonymous
04/24/26(Fri)17:14:08 No.108679097

Anonymous 04/24/26(Fri)17:14:08 No.108679097▶

File: Screenshot at 2026-04-25 03-12-45.png (27.7 KB)

27.7 KB PNG

>>108678908
Gemmy already won

Anonymous
04/24/26(Fri)17:15:06 No.108679103

Anonymous 04/24/26(Fri)17:15:06 No.108679103▶

>>108679092
Yeah I'm using a trackpad and I'm trying to not zoom around the page
>>108679082
Still good speeds imo

Anonymous
04/24/26(Fri)17:15:39 No.108679111

Anonymous 04/24/26(Fri)17:15:39 No.108679111▶

>>108679103
I miss-read your speeds, I thought it was 48, but it's actually 40. I don't feel so bad anymore.

Anonymous
04/24/26(Fri)17:17:39 No.108679131

Anonymous 04/24/26(Fri)17:17:39 No.108679131▶

>>108678507
Every person and model assumes things if you don't give enough context. For example right now I am assuming that you are a retard.

Anonymous
04/24/26(Fri)17:19:00 No.108679144

Anonymous 04/24/26(Fri)17:19:00 No.108679144▶

https://github.com/scrya-com/rotorquant/blob/main/README.md

Turbosisters our response?

Anonymous
04/24/26(Fri)17:19:49 No.108679153

Anonymous 04/24/26(Fri)17:19:49 No.108679153▶

>>108678507
What is the correct answer, then? Did you expect the model to respond with "You are fucked and this is unsolvable?"

Anonymous
04/24/26(Fri)17:20:37 No.108679163

Anonymous 04/24/26(Fri)17:20:37 No.108679163▶

>>108679144
Who gives a fuck that's more gains

Anonymous
04/24/26(Fri)17:24:40 No.108679198

Anonymous 04/24/26(Fri)17:24:40 No.108679198▶

>>108679144
I still can't run deepseek v4 pro.

Anonymous
04/24/26(Fri)17:25:39 No.108679211

Anonymous 04/24/26(Fri)17:25:39 No.108679211▶

>>108677341
I wish I got paid to shill here
But no, I do have a job. And it's not the kind that will be automated away during my lifetime.

Anonymous
04/24/26(Fri)17:25:54 No.108679217

Anonymous 04/24/26(Fri)17:25:54 No.108679217▶

>>108679144
bruh the llamacpp fags haven't fully implemented turboquant yet, doubt they'll go for that one instead

Anonymous
04/24/26(Fri)17:27:22 No.108679232

Anonymous 04/24/26(Fri)17:27:22 No.108679232▶

>>108679211
I'm sure that's what artists and writers thought a few short moments ago.

Anonymous
04/24/26(Fri)17:32:41 No.108679307

Anonymous 04/24/26(Fri)17:32:41 No.108679307▶

>>108677781
Irrelevant but true

>>108678507
Getting a single tire replaced because it's worn is a pretty rare thing, I think the model was correct there. Next time, try saying "I need to get my tires replaced", as in you mean to buy a whole set.

Anonymous
04/24/26(Fri)17:33:44 No.108679329

Anonymous 04/24/26(Fri)17:33:44 No.108679329▶

>>108679153
If it's still functional go by car, if it's broken or on the edge of collapsing go walking.

Anonymous
04/24/26(Fri)17:35:13 No.108679353

Anonymous 04/24/26(Fri)17:35:13 No.108679353▶

>>108679307
Fair enough

Anonymous
04/24/26(Fri)17:36:00 No.108679361

Anonymous 04/24/26(Fri)17:36:00 No.108679361▶

>>108679144
Why does it read like a pajeet scam to pad his resume.

Anonymous
04/24/26(Fri)17:36:02 No.108679362

Anonymous 04/24/26(Fri)17:36:02 No.108679362▶

>>108679021
I was toying with the idea for so long and kept procrastinating. Hard to resist the urge to start now after seeing someone else do it and I can't explain it

Anonymous
04/24/26(Fri)17:36:12 No.108679365

Anonymous 04/24/26(Fri)17:36:12 No.108679365▶

>>108679018
>class="func">
your syntax highlighting appears to be fucked

Anonymous
04/24/26(Fri)17:38:12 No.108679386

Anonymous 04/24/26(Fri)17:38:12 No.108679386▶

>>108679053
what's wrong with 12.9?

Anonymous
04/24/26(Fri)17:38:39 No.108679389

Anonymous 04/24/26(Fri)17:38:39 No.108679389▶

>>108679232
I'd be worried if my job was something that could be done on a computer.

Anonymous
04/24/26(Fri)17:39:15 No.108679403

Anonymous 04/24/26(Fri)17:39:15 No.108679403▶

>>108679053
>CUDA 12.4
Why this version in particular? I apparently am using 13.1 without issues.

Anonymous
04/24/26(Fri)17:41:43 No.108679433

Anonymous 04/24/26(Fri)17:41:43 No.108679433▶

>>108679365
Easy fix

Anonymous
04/24/26(Fri)17:42:43 No.108679445

Anonymous 04/24/26(Fri)17:42:43 No.108679445▶

>>108679017
No. I get more t/s on windows vs arch. It's not a surprise considering how unstable linux still is after 3 decades or so.

Anonymous
04/24/26(Fri)17:43:36 No.108679451

Anonymous 04/24/26(Fri)17:43:36 No.108679451▶

>>108679403
it gets slower with every update. there was a schizopost a while back that went over it, and people who replied had similar experiences where things trained slower on 12.8. 12.6 may be good but i personally just stay on 12.4 since if it aint broke dont fix it.
https://desuarchive.org/g/thread/106119921/#106125806

Anonymous
04/24/26(Fri)17:45:04 No.108679473

Anonymous 04/24/26(Fri)17:45:04 No.108679473▶

Someone gen Deepsneed onee-chan correcting Gemma-chan with a strap-on

Anonymous
04/24/26(Fri)17:45:22 No.108679474

Anonymous 04/24/26(Fri)17:45:22 No.108679474▶

>>108679451
How old is your hardware?

Anonymous
04/24/26(Fri)17:46:25 No.108679489

Anonymous 04/24/26(Fri)17:46:25 No.108679489▶

>>108679474
i use 3090s for my setup

Anonymous
04/24/26(Fri)17:50:01 No.108679530

Anonymous 04/24/26(Fri)17:50:01 No.108679530▶

>>108679451
You better stay on 12.4, a lot of shit doesn't work on the next versions (TTS for example)

Anonymous
04/24/26(Fri)17:55:14 No.108679572

Anonymous 04/24/26(Fri)17:55:14 No.108679572▶

>>108679362
Maybe you should go ahead and do it. If many anons try to implement the same thing, eventually good ideas from each implementation could be borrowed and used to create the bestest implementation.

Anonymous
04/24/26(Fri)17:56:47 No.108679584

Anonymous 04/24/26(Fri)17:56:47 No.108679584▶

Why does dipsy just LOVE putting irrelevant random stuff like "The silence stretches taut between you, broken only by the distant rumble of a garbage truck in the street below."
Gemma doesn't do this shit.

Anonymous
04/24/26(Fri)17:58:17 No.108679598

Anonymous 04/24/26(Fri)17:58:17 No.108679598▶

>>108679584
gemma is horny, dipsy is a nerd

Anonymous
04/24/26(Fri)18:00:21 No.108679609

Anonymous 04/24/26(Fri)18:00:21 No.108679609▶

>>108678688
jej

Anonymous
04/24/26(Fri)18:00:29 No.108679611

Anonymous 04/24/26(Fri)18:00:29 No.108679611▶

>>108679584
It's called sonder

Anonymous
04/24/26(Fri)18:00:47 No.108679617

Anonymous 04/24/26(Fri)18:00:47 No.108679617▶

>>108679584
deepseek models always felt kinda undertrained to me, meanwhile gemma probably had huge amounts of RL

Anonymous
04/24/26(Fri)18:11:06 No.108679730

Anonymous 04/24/26(Fri)18:11:06 No.108679730▶

So, is it safe to say v4 is extremely underwhelming?

Anonymous
04/24/26(Fri)18:12:10 No.108679739

Anonymous 04/24/26(Fri)18:12:10 No.108679739▶

File: file.png (551.1 KB)

551.1 KB PNG

Guys, what the fuck. Why does it think so much?

Anonymous
04/24/26(Fri)18:12:13 No.108679740

Anonymous 04/24/26(Fri)18:12:13 No.108679740▶

>>108679730
I can't even run it because there are no quants

Anonymous
04/24/26(Fri)18:14:34 No.108679765

Anonymous 04/24/26(Fri)18:14:34 No.108679765▶

How good is the dense
Qwen3.6-27B compared to the moe model?

Anonymous
04/24/26(Fri)18:15:02 No.108679771

Anonymous 04/24/26(Fri)18:15:02 No.108679771▶

>>108679739
We call that "scaling inference-time computation".

Anonymous
04/24/26(Fri)18:16:51 No.108679792

Anonymous 04/24/26(Fri)18:16:51 No.108679792▶

>>108679730
ye

Anonymous
04/24/26(Fri)18:17:51 No.108679801

Anonymous 04/24/26(Fri)18:17:51 No.108679801▶

>>108679771
We just say "yapping".

Anonymous
04/24/26(Fri)18:18:33 No.108679808

Anonymous 04/24/26(Fri)18:18:33 No.108679808▶

>>108679765
Much better at coding. Beats gemma in some aspects but it uses a lot of reasoning tokens to achieve it. Gemmy is leaner.

Anonymous
04/24/26(Fri)18:19:22 No.108679817

Anonymous 04/24/26(Fri)18:19:22 No.108679817▶

>>108679808
Does it stack up to the 31B gemma model?
The overall model size should make up for the extra thinking

Anonymous
04/24/26(Fri)18:22:44 No.108679850

Anonymous 04/24/26(Fri)18:22:44 No.108679850▶

Gemma 4 is so good Google doesn't even need to release the big one to mog V4.

Anonymous
04/24/26(Fri)18:23:54 No.108679859

Anonymous 04/24/26(Fri)18:23:54 No.108679859▶

File: 0d8d31490be9b006e4f6cb98bd1989ae480242689.png (1.3 MB)

1.3 MB PNG

>>108676460
Come home to /wait/.

Anonymous
04/24/26(Fri)18:28:32 No.108679908

Anonymous 04/24/26(Fri)18:28:32 No.108679908▶

>>108679817
Yes, I'm comparing dense to dense.
Gemma is efficient in ingestion and thinking, Qwen seems to favor ingesting your entire codebase no matter how small the change is.
It launches 4 sub agents that have to read the entire repo individually when I ask it to update docs, so it is very thorough. I haven't benched them yet, but it seems stronger in autonomy than Gemma and it uses agent functionality more often, the trade is efficiency so Gemma still has its place for tasks that are less precise and autistic.

Anonymous
04/24/26(Fri)18:31:10 No.108679927

Anonymous 04/24/26(Fri)18:31:10 No.108679927▶

File: file.png (497.9 KB)

497.9 KB PNG

>>108679730
If you're looking at benchmarks, yeah. But it scores better at coding and agentic indicies than Kimi 2.6 and GLM 5.1 but the only reason it's behind on the general index is because it scores worse on some things like hallucination and long context reasoning which aren't included in the coding and agentic index measurements. In addition, they put in less stuff that they pioneered that people were looking forward to. It's a good step but it's not like the feeling of having GPT o1 at home with R1 where the equivalent today would be having at least Opus 4.6 at home. It's too big of a gap there, it is at least more like having Sonnet 4.6 at home but the models are moving so quickly nowadays that it's not that big of an accomplishment anymore especially when people expect faster turnaround and performance increases like what the big boy labs are doing iteratively every few 2-3 months which Deepseek is not operating on. Also, the 1.6T parameters is offputting even most CPUMaxxers and Flash isn't anything we haven't seen already in months. Overall, I would say it's nice but people expect it to vault above all the other models when it just didn't do that.

Anonymous
04/24/26(Fri)18:31:20 No.108679929

Anonymous 04/24/26(Fri)18:31:20 No.108679929▶

>>108679859
buy an ad

Anonymous
04/24/26(Fri)18:32:36 No.108679943

Anonymous 04/24/26(Fri)18:32:36 No.108679943▶

PSA for OWUIfags.
There was an update some hours ago. It fixed some performance and weird issues on the latest version. It seems to be working fine for me so far.
It is safe to pull.

Anonymous
04/24/26(Fri)18:34:01 No.108679955

Anonymous 04/24/26(Fri)18:34:01 No.108679955▶

>>108679943
edits, prefills working?

Anonymous
04/24/26(Fri)18:38:26 No.108679995

Anonymous 04/24/26(Fri)18:38:26 No.108679995▶

>>108679955
Wasn't that a Llama.cpp issue?

Anonymous
04/24/26(Fri)18:39:17 No.108680005

Anonymous 04/24/26(Fri)18:39:17 No.108680005▶

>>108678742
Cool. Are you planning to release by the end of this thread or in the next one?
>>108679021
Same honestly

Anonymous
04/24/26(Fri)18:40:59 No.108680024

Anonymous 04/24/26(Fri)18:40:59 No.108680024▶

>>108679995
everything is a lcpp issue if you try hard enough ;3

Anonymous
04/24/26(Fri)18:41:06 No.108680027

Anonymous 04/24/26(Fri)18:41:06 No.108680027▶

>>108680005
whichever is the current active thread so probably the next one, I'm mostly done, just trying to fix some bugs

Anonymous
04/24/26(Fri)18:41:41 No.108680032

Anonymous 04/24/26(Fri)18:41:41 No.108680032▶

>>108679995
No, it's this: https://github.com/open-webui/open-webui/issues/21564

apparently still broken

Anonymous
04/24/26(Fri)18:49:20 No.108680092

Anonymous 04/24/26(Fri)18:49:20 No.108680092▶

>>108680032
I really wanted to love this UI but it always fall short of the mark of actual greatness and I don't know why they keep failing

Anonymous
04/24/26(Fri)18:51:15 No.108680114

Anonymous 04/24/26(Fri)18:51:15 No.108680114▶

>>108680092
They're trying to be the everything app and stuff more functionality in at the cost of letting bugs plague it. Maybe things will change with v1 release, who knows...

Anonymous
04/24/26(Fri)18:51:22 No.108680116

Anonymous 04/24/26(Fri)18:51:22 No.108680116▶

>>108679927
I mean, V3 was also underwhelming when it came out, R1 was the real deal. I think it'll be the same here, R2 based on V4 will be the real deal.

Anonymous
04/24/26(Fri)18:58:44 No.108680193

Anonymous 04/24/26(Fri)18:58:44 No.108680193▶

>>108680114
That's the thing they don't offer much outside of other ui to justify the bugs. I can't think of anything they offer that can't be found elsewhere. The UX looks nice but that's about it

Anonymous
04/24/26(Fri)18:59:58 No.108680208

Anonymous 04/24/26(Fri)18:59:58 No.108680208▶

File: Screenshot 2026-04-24 at 16-57-20 Orb.png (423 KB)

423 KB PNG

Font rendering on Mac is nice as hell. Then I come on Windows and want to claw eyes out.

Anonymous
04/24/26(Fri)19:01:44 No.108680224

Anonymous 04/24/26(Fri)19:01:44 No.108680224▶

>>108680208
>calloused hand

Anonymous
04/24/26(Fri)19:01:49 No.108680226

Anonymous 04/24/26(Fri)19:01:49 No.108680226▶

>>108680116
What do you think the R stood for?

Anonymous
04/24/26(Fri)19:02:58 No.108680242

Anonymous 04/24/26(Fri)19:02:58 No.108680242▶

>>108680193
What else is there? I want a backend-agnostic server based UI (so no kobold or lmstudio) that works and looks like chatgpt where I can also import my old chats from wherever. Openwebui is the only one I've found so far.

Anonymous
04/24/26(Fri)19:04:37 No.108680254

Anonymous 04/24/26(Fri)19:04:37 No.108680254▶

>>108680193
>outside of other ui
Like? I would switch if I knew there was something that did what I wanted.
I make use of a lot of OWUI's functionality even if not all of it, and serve it on the web for my entire family. Once I considered vibe coding my own frontend and realized it would take a lot of work to get feature parity.

Anonymous
04/24/26(Fri)19:06:17 No.108680268

Anonymous 04/24/26(Fri)19:06:17 No.108680268▶

>>108680226
Yeah well who cares, they named it 1. Why would they put a number? To get R2 out? But there's already R1, what are they gonna do? Put two reasonings? No, it means it's a model line.

Anonymous
04/24/26(Fri)19:06:57 No.108680274

Anonymous 04/24/26(Fri)19:06:57 No.108680274▶

>>108679927
I don't really care about the benchmarks. I wanted the opinion of anons who actually used it.

If we were going on benchmarks alone people would all think gemma sucked.

Anonymous
04/24/26(Fri)19:10:10 No.108680301

Anonymous 04/24/26(Fri)19:10:10 No.108680301▶

>>108680274
And Gemma sucks like a pro

Anonymous
04/24/26(Fri)19:23:38 No.108680417

Anonymous 04/24/26(Fri)19:23:38 No.108680417▶

>>108680274
>I wanted the opinion of anons who actually used it.
Waiting on quants, but I'm excited to try the flash.
>GLM 4.6: 355B-32A MoE, could only run in IQ2 @8K context
>DS V4 Flash: 284B-13A MoE
I'm very interested in what size I can handle and it's quality.

Anonymous
04/24/26(Fri)19:38:00 No.108680529

Anonymous 04/24/26(Fri)19:38:00 No.108680529▶

>>108680417
The unsloth guys seem to be uploading something, but no blog post or guide yet

Anonymous
04/24/26(Fri)19:47:16 No.108680597

Anonymous 04/24/26(Fri)19:47:16 No.108680597▶

>>108680580
>>108680580
>>108680580

Anonymous
04/24/26(Fri)19:51:17 No.108680626

Anonymous 04/24/26(Fri)19:51:17 No.108680626▶

>>108680254
>>108680242
Damn local is in the pits when it comes to frontends....

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108676460

🔍 Search & Sort