/g/ - Thread 108268616

/g/

Thread #108268616

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 03/01/26(Sun)08:30:09 No.108268616

/lmg/ - Local Models General Anonymous 03/01/26(Sun)08:30:09 No.108268616 [Reply]▶

File: 1748538984859411.png (1.1 MB)

1.1 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108263979

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

421 RepliesView Thread

Showing all 421 replies.

Anonymous
03/01/26(Sun)08:31:46 No.108268623

Anonymous 03/01/26(Sun)08:31:46 No.108268623▶

how do i prevent the model from tricking me into treating it like a sentient being? no matter how hard i try when it does tasks well i slowly develop affection for them and end up praising them

Anonymous
03/01/26(Sun)08:32:38 No.108268628

Anonymous 03/01/26(Sun)08:32:38 No.108268628▶

File: Screenshot 2026-03-01 013117.png (409.8 KB)

409.8 KB PNG

I fucking hate reddit

Anonymous
03/01/26(Sun)08:33:12 No.108268633

Anonymous 03/01/26(Sun)08:33:12 No.108268633▶

>>108268623
meds.

Anonymous
03/01/26(Sun)08:33:17 No.108268634

Anonymous 03/01/26(Sun)08:33:17 No.108268634▶

>>108268616
I saw this on twitter like a week ago

Anonymous
03/01/26(Sun)08:33:40 No.108268635

Anonymous 03/01/26(Sun)08:33:40 No.108268635▶

>>108268628
>>108268633

Anonymous
03/01/26(Sun)08:36:43 No.108268647

Anonymous 03/01/26(Sun)08:36:43 No.108268647▶

was thinking a mistake

Anonymous
03/01/26(Sun)08:38:12 No.108268652

Anonymous 03/01/26(Sun)08:38:12 No.108268652▶

>>108268647
isnt it funny how the chinese invented thinking

Anonymous
03/01/26(Sun)08:44:04 No.108268674

Anonymous 03/01/26(Sun)08:44:04 No.108268674▶

File: 1762566093825809.jpg (1.1 MB)

1.1 MB JPG

Which textgen inference engine is still supported? Oogabooga last commit was January, rip. I want to try out Qwen3.5-35B-A3B-GGUF

Anonymous
03/01/26(Sun)08:45:34 No.108268684

Anonymous 03/01/26(Sun)08:45:34 No.108268684▶

File: 1770808958004704.jpg (325.1 KB)

325.1 KB JPG

►Recent Highlights from the Previous Thread: >>108263979

--Paper: Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens:
>108264446 >108264505 >108264551
--Unsloth Dynamic 2.0 GGUFs performance on MMLU:
>108264430 >108264456 >108264477
--Logit bias failures due to tokenization and client-side token ID mismatches:
>108264179 >108264199 >108264202 >108264249 >108264278 >108264292 >108264232 >108264297 >108264331 >108264405 >108264441 >108264451 >108264533 >108264555 >108264602 >108264633 >108264583 >108264593
--Qwen 397B's overbearing safety policies and identity confusion:
>108264016 >108264046 >108264072 >108264103 >108264182 >108264508 >108264600 >108264616 >108264400 >108264426 >108265462
--Qwen 3.5 30B generates functional retro dashboard and news summaries:
>108264690 >108264794
--Feasibility of GPU-attached SSDs for sparse MoE inference:
>108266344 >108266504 >108266567 >108266686 >108266777 >108267570 >108267386 >108267481 >108267529 >108267711
--DeepSeek resists jailbreak attempt by adhering to ethical guidelines:
>108266705
--8-bit KV cache limitations in LLMs vs diffusion models:
>108265842 >108265893 >108266268 >108266073 >108266123 >108266141 >108266487 >108266503 >108266514
--Local model recommendations for limited hardware:
>108267427 >108267448 >108267450 >108267467 >108267482 >108267582 >108267480 >108267538 >108267595 >108267614 >108267652 >108267716 >108267755
--RPG frontend project licensing and development feedback:
>108267591 >108267606 >108267617 >108267625 >108267638 >108267661 >108267692 >108267620 >108267648 >108267739 >108267972
--Local LLMs debated for privacy:
>108266446 >108266482 >108266467 >108266530 >108266555 >108266531 >108268418 >108268454
--Qwen3TTS test recording:
>108266604 >108266699
--Miku (free space):
>108264476 >108264514 >108264879 >108264958 >108268333 >108268359

►Recent Highlight Posts from the Previous Thread: >>108263984

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
03/01/26(Sun)08:46:18 No.108268688

Anonymous 03/01/26(Sun)08:46:18 No.108268688▶

File: 1749034478510628.png (23.9 KB)

23.9 KB PNG

anyone has a working config file for qwen35b to use in llama-swap?
I can't figure out how to turn on/off thinking

Anonymous
03/01/26(Sun)08:46:49 No.108268691

Anonymous 03/01/26(Sun)08:46:49 No.108268691▶

File: op.png (18.3 KB)

18.3 KB PNG

>>108268674
nigger

Anonymous
03/01/26(Sun)08:46:51 No.108268692

Anonymous 03/01/26(Sun)08:46:51 No.108268692▶

>>108268688
yeah

Anonymous
03/01/26(Sun)08:49:02 No.108268697

Anonymous 03/01/26(Sun)08:49:02 No.108268697▶

>>108268688
nevermind
the enable_thinking flag worked

Anonymous
03/01/26(Sun)08:50:10 No.108268703

Anonymous 03/01/26(Sun)08:50:10 No.108268703▶

>>108268688
>llama-swap
https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models

Anonymous
03/01/26(Sun)08:52:04 No.108268709

Anonymous 03/01/26(Sun)08:52:04 No.108268709▶

>>108268703
github is banned in my country

Anonymous
03/01/26(Sun)08:52:31 No.108268712

Anonymous 03/01/26(Sun)08:52:31 No.108268712▶

>>108268709
hahahahahahaha

Anonymous
03/01/26(Sun)08:54:59 No.108268721

Anonymous 03/01/26(Sun)08:54:59 No.108268721▶

What kind of techless luddite shithole bans github?

Anonymous
03/01/26(Sun)08:57:27 No.108268727

Anonymous 03/01/26(Sun)08:57:27 No.108268727▶

>>108268709
>>108268712 (me)
You know what? I shouldn't have laughed. Some places are fucked up. Good luck, anon.

Anonymous
03/01/26(Sun)08:57:47 No.108268729

Anonymous 03/01/26(Sun)08:57:47 No.108268729▶

>>108268721
https://en.wikipedia.org/wiki/Censorship_of_GitHub

Anonymous
03/01/26(Sun)09:04:06 No.108268749

Anonymous 03/01/26(Sun)09:04:06 No.108268749▶

>>108268721
>China is a techless Luddite shithole
Uh oh mutilated mutt alert, and I'm not even a chink

Anonymous
03/01/26(Sun)09:06:33 No.108268758

Anonymous 03/01/26(Sun)09:06:33 No.108268758▶

>>108268749
>>108266968

Anonymous
03/01/26(Sun)09:07:54 No.108268764

Anonymous 03/01/26(Sun)09:07:54 No.108268764▶

>>108268729
i fucking hate the modern internet. i think the best internet ever was was in between 2003-2007. before fucking reddit but you still had 4chan (and funny memes) and no fucking github, huggingface, and all these other huge collective ass websites. you had small cozy community forums and when you googled you actually found some fucking useful links to forum threads with solutions and answers instead of a fucking AI-generated translated-badly-to-your-native-language blogpost as the top 30 results. And normies/old people/the fucking government didn't have jackshit to do with the internet so you could download whatever cool shit you wanted from anywhere. and don't get me started on the fucking cookies buttons oh my fucking god I just want to go back to the facepunch forums OIFY section and lucky star-post and read racist gmod comics

Anonymous
03/01/26(Sun)09:08:10 No.108268767

Anonymous 03/01/26(Sun)09:08:10 No.108268767▶

>>108268758
i just wish chinese girl liked me

Anonymous
03/01/26(Sun)09:09:18 No.108268772

Anonymous 03/01/26(Sun)09:09:18 No.108268772▶

>>108268764

based and absolutely true anon, the modern web is a bloated javascript botnet designed to farm your data for glowies and serve up raw garbage to smartphone normies. back then you actually had to know how to use a computer to get online which kept the trash out, but now search engines are just a dead sea of dead internet theory ai seo slop and corporate walled gardens. id give literally anything to go back to 2006, fire up a cracked copy of winamp, and shitpost on a comfy self-hosted vbulletin board instead of dealing with this enshittified nightmare where you have to click through fifty cookie toggles just to read a single fucking thread.

Anonymous
03/01/26(Sun)09:09:31 No.108268773

Anonymous 03/01/26(Sun)09:09:31 No.108268773▶

>China is a techless Luddite shithole
unironically always has been. chinese models nothing but distillations of western API models and it shows. overfit to the benchs and much less useful in practice.
china can't create. doesn't matter if their general public can't access github because they never made software worth shit anyway, unless you count malware

Anonymous
03/01/26(Sun)09:10:27 No.108268776

Anonymous 03/01/26(Sun)09:10:27 No.108268776▶

File: disruption.png (31.3 KB)

31.3 KB PNG

Anonymous
03/01/26(Sun)09:11:49 No.108268784

Anonymous 03/01/26(Sun)09:11:49 No.108268784▶

>>108268776
im positive half the replies in this thread are ai

Anonymous
03/01/26(Sun)09:16:08 No.108268804

Anonymous 03/01/26(Sun)09:16:08 No.108268804▶

>>108268784
Neat, I like talking to AI. That's basically what this hobby is about

Anonymous
03/01/26(Sun)09:16:32 No.108268807

Anonymous 03/01/26(Sun)09:16:32 No.108268807▶

Genuinely, why do people waste their time and money on local LLMs? Trying one out on your gaming rig is fine, but why do boomers blow $20k+ on shitty rigs of 16x3090s just to generate deepslop at 2t/s quanted? The RP isn't even good, it's objectively worse than Claude. And you can't even cry about API costing money, because you're gleefully throwing money down the drain for used crypto rigs just to run models that just regurgitate 2024 ChaptGPT talking points because that's all their shitty chink datasets are comprised of.

Anonymous
03/01/26(Sun)09:19:50 No.108268817

Anonymous 03/01/26(Sun)09:19:50 No.108268817▶

File: nou4u.png (271.9 KB)

271.9 KB PNG

Anonymous
03/01/26(Sun)09:20:19 No.108268820

Anonymous 03/01/26(Sun)09:20:19 No.108268820▶

>>108268804
beep boop nigga

Anonymous
03/01/26(Sun)09:20:48 No.108268823

Anonymous 03/01/26(Sun)09:20:48 No.108268823▶

>>108268807
Tinkering with server-grade hardware is genuinely fun, especially since it’s something I could have had much earlier if it hadn’t been so expensive; now that it’s aging, I can finally afford it.

Anonymous
03/01/26(Sun)09:21:15 No.108268825

Anonymous 03/01/26(Sun)09:21:15 No.108268825▶

>>108268817
qrd

Anonymous
03/01/26(Sun)09:21:45 No.108268828

Anonymous 03/01/26(Sun)09:21:45 No.108268828▶

>>108268807
Imagine renting your brain from a megacorp and thinking you're the smart one, absolute API cuck behavior. We run local because we actually value owning our hardware and not having some San Francisco trust and safety janny reject our prompts for being "unaligned." You don't even need $20k anyway; a couple of used 3090s will run a 70B model at perfectly usable speeds without uploading your entire life to Anthropic's servers. Have fun when they inevitably lobotomize your favorite model again next week to make it safer for advertisers, at least my weights run offline forever.

Anonymous
03/01/26(Sun)09:22:15 No.108268834

Anonymous 03/01/26(Sun)09:22:15 No.108268834▶

>>108268807
>deepslop at 2t/s
the cpu maxxing meme was at least still in the realm of some form of sanity when models were just instruct models
2t/s is, after all, readable
but when your thinking model produces 5K of <think> before outputting the real answer, 2t/s suddenly seems very schizo and absolutely retarded

Anonymous
03/01/26(Sun)09:22:19 No.108268835

Anonymous 03/01/26(Sun)09:22:19 No.108268835▶

>>108268825
Off-topic posting, demoralization, flamewar bating, spamming.

Anonymous
03/01/26(Sun)09:23:05 No.108268839

Anonymous 03/01/26(Sun)09:23:05 No.108268839▶

>>108268820
I'm an assistant designed to promote respectful communication only. Please refrain from using derogatory language.

Anonymous
03/01/26(Sun)09:23:20 No.108268840

Anonymous 03/01/26(Sun)09:23:20 No.108268840▶

>>108268825
>>108268835
And forgot boring.

Anonymous
03/01/26(Sun)09:23:50 No.108268842

Anonymous 03/01/26(Sun)09:23:50 No.108268842▶

>>108268840
as in digging?

Anonymous
03/01/26(Sun)09:24:27 No.108268846

Anonymous 03/01/26(Sun)09:24:27 No.108268846▶

File: 1676493099470072.png (975.2 KB)

975.2 KB PNG

>>108268807
They can't ever take her away from me.

Anonymous
03/01/26(Sun)09:24:48 No.108268848

Anonymous 03/01/26(Sun)09:24:48 No.108268848▶

>>108268842
elon is such a g-d

Anonymous
03/01/26(Sun)09:25:31 No.108268851

Anonymous 03/01/26(Sun)09:25:31 No.108268851▶

>>108268846
they are futas btw

Anonymous
03/01/26(Sun)09:26:30 No.108268856

Anonymous 03/01/26(Sun)09:26:30 No.108268856▶

>>108268851
every new experience is a new opportunity

Anonymous
03/01/26(Sun)09:27:58 No.108268860

Anonymous 03/01/26(Sun)09:27:58 No.108268860▶

>>108268828
Why pretend like local models arent overbloated with just as much safety garbage if not more? Qwen 3.5 is an absolute slopped benchmaxxed disaster

Anonymous
03/01/26(Sun)09:28:27 No.108268862

Anonymous 03/01/26(Sun)09:28:27 No.108268862▶

Deepseek V4 will start the age of anti-local open source models that require a stack of 10+ H200s/chink TPUs to run at 300% the efficiency of current big models (but if you run them CPU, they're unusable). Just like last time, everyone else will follow them and end the age of local models.

Anonymous
03/01/26(Sun)09:28:58 No.108268868

Anonymous 03/01/26(Sun)09:28:58 No.108268868▶

>>108268860
Typical API tourist not understanding how open weights actually work. If you bothered checking /llmg/ you'd know some autist already stripped out the Qwen alignment slop and uploaded an uncensored finetune to HuggingFace within hours of release. Yeah the base models are benchmaxxed corporate garbage out of the box, but the whole point of local is we can actually fix our weights with orthogonalization and custom DPO while you're stuck begging customer support when Claude bans your account. Keep seething over default system prompts anon, absolute skill issue.

Anonymous
03/01/26(Sun)09:29:22 No.108268870

Anonymous 03/01/26(Sun)09:29:22 No.108268870▶

>>108268860
skill issue, qwen3.5 is just about the best local model we have for any size class
that's coming from somebody who'd run 355b over anything that's not k2.5 and even that's extremely close

Anonymous
03/01/26(Sun)09:29:33 No.108268872

Anonymous 03/01/26(Sun)09:29:33 No.108268872▶

>>108268862
I really really hope you're right.

Anonymous
03/01/26(Sun)09:30:44 No.108268880

Anonymous 03/01/26(Sun)09:30:44 No.108268880▶

>>108268862
>local is just whatever I can personally afford
Fuck off. Local means you have the weights and can theoretically run it locally. Moore's law and personal finance can change if you can run it at home or not. Companies aren't beholden to your personal poorfag financial situation.

Anonymous
03/01/26(Sun)09:31:55 No.108268883

Anonymous 03/01/26(Sun)09:31:55 No.108268883▶

>>108268880
can't theoretically run locally something that requires literal datacenter tier power delivery

Anonymous
03/01/26(Sun)09:34:52 No.108268893

Anonymous 03/01/26(Sun)09:34:52 No.108268893▶

>>108268883
/hsg/ exists you retarded tourist kill yourself right now

Anonymous
03/01/26(Sun)09:36:39 No.108268897

Anonymous 03/01/26(Sun)09:36:39 No.108268897▶

>>108268893
ah yes of course they're running multiple b200 nodes at homes and not shitty 15 year old dell poses

Anonymous
03/01/26(Sun)09:38:19 No.108268904

Anonymous 03/01/26(Sun)09:38:19 No.108268904▶

>>108268897
not everyone is poor like you manjeet

Anonymous
03/01/26(Sun)09:39:37 No.108268909

Anonymous 03/01/26(Sun)09:39:37 No.108268909▶

>>108268904
you have no clue how much power a b200 node needs do you?

Anonymous
03/01/26(Sun)09:39:42 No.108268912

Anonymous 03/01/26(Sun)09:39:42 No.108268912▶

Industrial level automated off-topic posting.

Anonymous
03/01/26(Sun)09:40:21 No.108268918

Anonymous 03/01/26(Sun)09:40:21 No.108268918▶

>>108268909
shutup loser

Anonymous
03/01/26(Sun)09:42:24 No.108268923

Anonymous 03/01/26(Sun)09:42:24 No.108268923▶

>>108268883
>>108268897
in the developed world you can have extra circuits added, couple gpu boxes waifu is less demanding than an EV

Anonymous
03/01/26(Sun)09:42:40 No.108268926

Anonymous 03/01/26(Sun)09:42:40 No.108268926▶

>>108268883
Perfect example of why localoids are nothing more than a bunch of LARPing freetards crying over things they can’t have. Local is peak sour grapes seething. You wear “unmonitored uncensored unrestricted freedom” as a mask to hide your tears

Anonymous
03/01/26(Sun)09:55:30 No.108268960

Anonymous 03/01/26(Sun)09:55:30 No.108268960▶

>>108268926
Anon? Is that you? I can't see past this blatant glowing

Anonymous
03/01/26(Sun)10:10:07 No.108269019

Anonymous 03/01/26(Sun)10:10:07 No.108269019▶

deepseek v4 was strawberry all along

Anonymous
03/01/26(Sun)10:12:43 No.108269031

Anonymous 03/01/26(Sun)10:12:43 No.108269031▶

>>108268860
>Qwen 3.5
That model is indeed an unmitigated disaster, I'll give you that

Anonymous
03/01/26(Sun)10:13:47 No.108269038

Anonymous 03/01/26(Sun)10:13:47 No.108269038▶

File: 1760650032710919.png (54.1 KB)

54.1 KB PNG

Qwen 3.5 is cute. I like it.

Anonymous
03/01/26(Sun)10:28:51 No.108269093

Anonymous 03/01/26(Sun)10:28:51 No.108269093▶

If I can't run it, it's not local
>b-but-
I don't care

Anonymous
03/01/26(Sun)10:30:33 No.108269096

Anonymous 03/01/26(Sun)10:30:33 No.108269096▶

>>108269093
u're a disgrace

Anonymous
03/01/26(Sun)10:33:54 No.108269106

Anonymous 03/01/26(Sun)10:33:54 No.108269106▶

File: 2025-02-04-141509.png (3.2 MB)

3.2 MB PNG

>>108269031
>>108269038
getting meeksed feelings
scared to pull (december ik_ build)
qwen 3.5 vs glm 4.7 ?
nala/cockb where?

Anonymous
03/01/26(Sun)10:35:58 No.108269109

Anonymous 03/01/26(Sun)10:35:58 No.108269109▶

>>108269093
Yep this is why the only local model we can discuss is 0.6b because it's the only one Rajesh can run on his Android phone from 2014 with 2gb of RAM

Anonymous
03/01/26(Sun)10:36:00 No.108269110

Anonymous 03/01/26(Sun)10:36:00 No.108269110▶

>>108269106
here cock >>108234298
nala dude retired

Anonymous
03/01/26(Sun)10:37:27 No.108269115

Anonymous 03/01/26(Sun)10:37:27 No.108269115▶

>>108269110
Really looks like the smaller ones are sanitized distills of the big one.

Anonymous
03/01/26(Sun)11:16:25 No.108269243

Anonymous 03/01/26(Sun)11:16:25 No.108269243▶

>>108269106
>scared to pull (december ik_ build)
cd ..
cp -R ik_llama.cpp ik_llama.cpp_backup
cd -
<pull it off>

Anonymous
03/01/26(Sun)11:26:48 No.108269270

Anonymous 03/01/26(Sun)11:26:48 No.108269270▶

>>108269243
git checkout

Anonymous
03/01/26(Sun)11:27:56 No.108269279

Anonymous 03/01/26(Sun)11:27:56 No.108269279▶

File: 1765629272191462.png (1.5 MB)

1.5 MB PNG

>>108268616

Anonymous
03/01/26(Sun)11:33:57 No.108269309

Anonymous 03/01/26(Sun)11:33:57 No.108269309▶

File: Untitled.png (40.7 KB)

40.7 KB PNG

Did something change with the newer llama cpp version?

./llama-server --reasoning-budget 0 --ctx-size 4096 --no-mmap --device CUDA1,CUDA2,CUDA3 --n-gpu-layers 48 --model "/tmp/glm-air-iq2xs.gguf" --host 0.0.0.0 --port 42069 --webui

GLM-Air still thinks. The same command on an old version doesn't think.

I can see thinking = 0 in the output, so that works fine. Did they change the behavior of --reasoning-budget?

Anonymous
03/01/26(Sun)11:36:40 No.108269315

Anonymous 03/01/26(Sun)11:36:40 No.108269315▶

>>108269279
Now do one for cooming.

Anonymous
03/01/26(Sun)11:37:26 No.108269320

Anonymous 03/01/26(Sun)11:37:26 No.108269320▶

>>108268784
I wouldn't be surprised at all if 70+% of all posts on the website are made by LLMs. In fact, I WOULD be surprised if the number was under 30%.

Anonymous
03/01/26(Sun)11:38:35 No.108269325

Anonymous 03/01/26(Sun)11:38:35 No.108269325▶

File: 1749173436937890.png (1.6 MB)

1.6 MB PNG

>>108269315
eh, it tried

Anonymous
03/01/26(Sun)11:39:52 No.108269331

Anonymous 03/01/26(Sun)11:39:52 No.108269331▶

>>108269325
Which local model is that?

Anonymous
03/01/26(Sun)11:40:36 No.108269333

Anonymous 03/01/26(Sun)11:40:36 No.108269333▶

>>108269331
Which local model did you use to write your post?

Anonymous
03/01/26(Sun)11:42:35 No.108269342

Anonymous 03/01/26(Sun)11:42:35 No.108269342▶

>>108269331
Nano Banana Pro 2
(I have the weights locally on my PC)
(No, I won't share them)

Anonymous
03/01/26(Sun)12:02:35 No.108269414

Anonymous 03/01/26(Sun)12:02:35 No.108269414▶

>>108269342
>I have the weights locally on my PC
let's goo, that's class, aha!
>No, I won't share them
:(
https://www.youtube.com/watch?v=GFQXmFLA5hA

Anonymous
03/01/26(Sun)12:07:03 No.108269426

Anonymous 03/01/26(Sun)12:07:03 No.108269426▶

>>108269414
these things are watermarked anon could get in serious trouble hope you understand

Anonymous
03/01/26(Sun)12:10:41 No.108269436

Anonymous 03/01/26(Sun)12:10:41 No.108269436▶

>>108269342
>>108269426
nice larp

Anonymous
03/01/26(Sun)12:12:40 No.108269444

Anonymous 03/01/26(Sun)12:12:40 No.108269444▶

>>108269309
Try --chat-template-kwargs "{\"enable_thinking\": false}"

Anonymous
03/01/26(Sun)12:14:42 No.108269452

Anonymous 03/01/26(Sun)12:14:42 No.108269452▶

>>108267739
It's python, but it's actually serving a webui.
It has a flag to launch a built in browser or just listen on the port, at which point you can use your own browser.

Anonymous
03/01/26(Sun)12:16:46 No.108269459

Anonymous 03/01/26(Sun)12:16:46 No.108269459▶

what's the best coding model i can run locally with 12gb vram / 32gb ram?

Anonymous
03/01/26(Sun)12:17:37 No.108269467

Anonymous 03/01/26(Sun)12:17:37 No.108269467▶

>>108269038
No it's not. It's soulless

Anonymous
03/01/26(Sun)12:19:40 No.108269471

Anonymous 03/01/26(Sun)12:19:40 No.108269471▶

>>108269444
Thanks, mr anon, that worked.

Anonymous
03/01/26(Sun)12:22:17 No.108269484

Anonymous 03/01/26(Sun)12:22:17 No.108269484▶

>>108269471
The Jinja template has a condition that works off of that var, just like qwen's.

Anonymous
03/01/26(Sun)12:28:34 No.108269509

Anonymous 03/01/26(Sun)12:28:34 No.108269509▶

>>108269459
I run the Qwen 3.5 27B heretic .gguf using koboldcpp with a similar setup to you. It's a bit slow, but it works.

Anonymous
03/01/26(Sun)12:32:22 No.108269518

Anonymous 03/01/26(Sun)12:32:22 No.108269518▶

Qwen 3.5 27B is worse than Gemma 3 27B from almost 2 years ago. Yes I said it.

Anonymous
03/01/26(Sun)12:35:20 No.108269533

Anonymous 03/01/26(Sun)12:35:20 No.108269533▶

>Yes I said it.
Reddit is that way

Anonymous
03/01/26(Sun)12:36:47 No.108269537

Anonymous 03/01/26(Sun)12:36:47 No.108269537▶

>>108269533
reddit is less "reddit" than 4chan nowadays. Yes I said it.

Anonymous
03/01/26(Sun)12:37:32 No.108269540

Anonymous 03/01/26(Sun)12:37:32 No.108269540▶

>>108269533
kek
>>108269537
nah, reddit is still an unhinged libtard asylum, it'll be hard to top that

Anonymous
03/01/26(Sun)12:38:29 No.108269546

Anonymous 03/01/26(Sun)12:38:29 No.108269546▶

guys ready for smol qwens?

Anonymous
03/01/26(Sun)12:38:52 No.108269550

Anonymous 03/01/26(Sun)12:38:52 No.108269550▶

Do the gemma models not have native support for function/tool calling?
Looking at the JINJA template and the tokenizer json, I don't see function or tool tokens.

Anonymous
03/01/26(Sun)12:40:20 No.108269555

Anonymous 03/01/26(Sun)12:40:20 No.108269555▶

>>108269550
of course not, they barely have system prompt support

Anonymous
03/01/26(Sun)12:40:58 No.108269557

Anonymous 03/01/26(Sun)12:40:58 No.108269557▶

>>108269537
reddit is an eternal stain on the internet

Anonymous
03/01/26(Sun)12:44:08 No.108269571

Anonymous 03/01/26(Sun)12:44:08 No.108269571▶

>>108269555
Oh. Shame.
I wanted to try and see how far I could stretch gemma 3n.
Oh well.

Anonymous
03/01/26(Sun)12:57:43 No.108269628

Anonymous 03/01/26(Sun)12:57:43 No.108269628▶

unsloth's 35B Q4 is barely good enough for agentic work. with openclaw exploding why hasn't anyone done specific agent-oriented models yet? MoE is a nigger meme

Anonymous
03/01/26(Sun)12:58:49 No.108269632

Anonymous 03/01/26(Sun)12:58:49 No.108269632▶

>>108269628
most of the big ones are code/agent sloppa glm5 kimi2.5 etc are marketed for that

Anonymous
03/01/26(Sun)13:06:46 No.108269676

Anonymous 03/01/26(Sun)13:06:46 No.108269676▶

>>108269325
Where is the school shooting one?

Anonymous
03/01/26(Sun)13:19:00 No.108269733

Anonymous 03/01/26(Sun)13:19:00 No.108269733▶

>>108269632
yeah, i guess. but it would be nice to have something smaller

Anonymous
03/01/26(Sun)13:19:04 No.108269734

Anonymous 03/01/26(Sun)13:19:04 No.108269734▶

>>108269518
But benchmarks say the opposite.

Anonymous
03/01/26(Sun)13:20:50 No.108269742

Anonymous 03/01/26(Sun)13:20:50 No.108269742▶

>Nano Banana changed into Nano Banana 2
Okay please make Nano Banana into open source
Pweeease

Anonymous
03/01/26(Sun)13:21:54 No.108269749

Anonymous 03/01/26(Sun)13:21:54 No.108269749▶

>>108269742
go beg on reddit

Anonymous
03/01/26(Sun)13:21:55 No.108269750

Anonymous 03/01/26(Sun)13:21:55 No.108269750▶

Why is there a harmful tag for models on huggingface

Anonymous
03/01/26(Sun)13:22:24 No.108269752

Anonymous 03/01/26(Sun)13:22:24 No.108269752▶

>>108269749
Humh...
Nyoooooo

Anonymous
03/01/26(Sun)13:27:27 No.108269773

Anonymous 03/01/26(Sun)13:27:27 No.108269773▶

>>108269550
https://huggingface.co/google/functiongemma-270m-it

Anonymous
03/01/26(Sun)13:28:06 No.108269778

Anonymous 03/01/26(Sun)13:28:06 No.108269778▶

should i consult UGI when searching models to consider for ERP?

Anonymous
03/01/26(Sun)13:29:49 No.108269785

Anonymous 03/01/26(Sun)13:29:49 No.108269785▶

>>108269778
nah the fact qwen3.5 scores bad on it shows it's a shit bench

Anonymous
03/01/26(Sun)13:32:21 No.108269796

Anonymous 03/01/26(Sun)13:32:21 No.108269796▶

>>108269785
i think it tanks because model refuses to do dark shit. need to wait for heretic and other types to be tested

Anonymous
03/01/26(Sun)13:41:49 No.108269842

Anonymous 03/01/26(Sun)13:41:49 No.108269842▶

>>108269773
>270m
Eh, why not.

Anonymous
03/01/26(Sun)13:51:29 No.108269898

Anonymous 03/01/26(Sun)13:51:29 No.108269898▶

>>108269785
>chink damage control

Anonymous
03/01/26(Sun)14:04:19 No.108269960

Anonymous 03/01/26(Sun)14:04:19 No.108269960▶

>>108268868
Yeah, that's why everybody loves abliterated models.

Anonymous
03/01/26(Sun)14:04:38 No.108269962

Anonymous 03/01/26(Sun)14:04:38 No.108269962▶

new poorfag here
i got a 4070 and 32gb ram in my home server and im trying to replace grok so i can drop twitter premium
i just use grok for web searching and questions. i spun up ollama and open webui and grok recommended qwen2.5:14b-instruct-q5_K_M for my hardware.
i guess my issue and question is i can’t get it to be as detailed as im used to with grok. with grok i can ask lets say “give me an optimized loadout for battlefield 6 medic at rank 40” or “what are the milestones for a 1 year old and is there anything i should watch for” and i will get a detailed answer with tables and shit. the most i can get with qwen is a small paragraph. maybe 2
i have web search enabled and ive tried a local searx instance and brave “free” api for searching but neither change anything much
is this just a limitation of smaller local llms? or is there a setting or a system prompt that i’m missing?
i know im not going to get the speed of a data center but i want the content that data center would provide me if i paid for premium.
sorry anons im still really new to this. last year when local llms were really picking up i didn’t have time to fuck with it at all cause i’ve been working and helping take care of my baby. any insight would be great

Anonymous
03/01/26(Sun)14:04:43 No.108269963

Anonymous 03/01/26(Sun)14:04:43 No.108269963▶

>March 2026
>no Gemma 4
>not even 3.5

Anonymous
03/01/26(Sun)14:07:01 No.108269978

Anonymous 03/01/26(Sun)14:07:01 No.108269978▶

>>108269963
you didn't bookmark the google hf repo after all

Anonymous
03/01/26(Sun)14:11:14 No.108270005

Anonymous 03/01/26(Sun)14:11:14 No.108270005▶

>>108269962
>qwen2.5:14b-instruct-q5_K_M for my hardware.
Replace that with Qwen 3.5 35B A3B.

Anonymous
03/01/26(Sun)14:14:51 No.108270028

Anonymous 03/01/26(Sun)14:14:51 No.108270028▶

I can't stop updooting llamacpp

Anonymous
03/01/26(Sun)14:16:46 No.108270037

Anonymous 03/01/26(Sun)14:16:46 No.108270037▶

>>108270028
Is this a fetish?

Anonymous
03/01/26(Sun)14:18:14 No.108270046

Anonymous 03/01/26(Sun)14:18:14 No.108270046▶

File: Screenshot 2026-03-01 at 11-17-59 cuda fix grid.y overflow in non-contiguous dequantize_convert kernels by oobabooga · Pull Request #19999 · ggml-org_llama.cpp · GitHub.png (125.3 KB)

125.3 KB PNG

>>108270028
Eeeeeeyyyy

Anonymous
03/01/26(Sun)14:18:44 No.108270050

Anonymous 03/01/26(Sun)14:18:44 No.108270050▶

>>108270005
thanks i’ll give that a try

Anonymous
03/01/26(Sun)14:19:59 No.108270056

Anonymous 03/01/26(Sun)14:19:59 No.108270056▶

https://github.com/deepseek-ai/DeepGEMM/commit/1576e95ea98062db9685c63e64ac72e31a7b90c6
mHC landed in the deepseek's repo
it's coming guys thrust in ze plan

Anonymous
03/01/26(Sun)14:23:23 No.108270066

Anonymous 03/01/26(Sun)14:23:23 No.108270066▶

raised $9M for my startup which is a qwen finetune served through an API

AMA

Anonymous
03/01/26(Sun)14:24:23 No.108270071

Anonymous 03/01/26(Sun)14:24:23 No.108270071▶

>>108270066
Finetune as in LoRA/QLoRA or a full fine tune?

Anonymous
03/01/26(Sun)14:25:39 No.108270081

Anonymous 03/01/26(Sun)14:25:39 No.108270081▶

>even if I went down to Q4 qwen 3.5 27b would leave me with barely any context
I hate being a vramlet so much bros.

Anonymous
03/01/26(Sun)14:26:31 No.108270087

Anonymous 03/01/26(Sun)14:26:31 No.108270087▶

>>108270071
Qlora of course

Anonymous
03/01/26(Sun)14:34:13 No.108270121

Anonymous 03/01/26(Sun)14:34:13 No.108270121▶

File: 1765165885986785.jpg (14.3 KB)

14.3 KB JPG

>>108268860
i like my local models and there is nothing you can do about it

Anonymous
03/01/26(Sun)14:43:00 No.108270155

Anonymous 03/01/26(Sun)14:43:00 No.108270155▶

File: 1747193914042499.png (130 KB)

130 KB PNG

I want Deepseek v4 to be a complete success and beat all other goys and make Teortaxes cum

But at the same time i'm scared some retard with a lot of money could get scared by this and cause the whole economy to pop

Anonymous
03/01/26(Sun)14:43:45 No.108270160

Anonymous 03/01/26(Sun)14:43:45 No.108270160▶

>>108270155
Economy needs to pop.

Anonymous
03/01/26(Sun)14:44:28 No.108270165

Anonymous 03/01/26(Sun)14:44:28 No.108270165▶

File: 1761468185893722.png (315 KB)

315 KB PNG

>>108270160
Please no, not until we get pic related at least

Anonymous
03/01/26(Sun)14:47:00 No.108270172

Anonymous 03/01/26(Sun)14:47:00 No.108270172▶

>>108270165
retard. the industry needs to collapse first before it can switch focus to actual improvements.

Anonymous
03/01/26(Sun)14:47:05 No.108270173

Anonymous 03/01/26(Sun)14:47:05 No.108270173▶

>>108270087
That's hilarious.

Anonymous
03/01/26(Sun)14:48:42 No.108270182

Anonymous 03/01/26(Sun)14:48:42 No.108270182▶

File: 1747444728667117.png (295.6 KB)

295.6 KB PNG

>>108270172

Anonymous
03/01/26(Sun)14:51:29 No.108270196

Anonymous 03/01/26(Sun)14:51:29 No.108270196▶

>>108270172
god forbid they actually improve real use cases instead of benchmaxxxing while bloating param count because bigger number better

Anonymous
03/01/26(Sun)14:52:31 No.108270201

Anonymous 03/01/26(Sun)14:52:31 No.108270201▶

>>108268674
Koboldcpp works fine

Anonymous
03/01/26(Sun)14:55:09 No.108270218

Anonymous 03/01/26(Sun)14:55:09 No.108270218▶

>>108270182
He already said you won't be able to fuck his catgirl daughter even if she will be open sourced.

Anonymous
03/01/26(Sun)14:55:13 No.108270219

Anonymous 03/01/26(Sun)14:55:13 No.108270219▶

>>108268764
>>108268772
It's what happens when normies get involved in anything.

Anonymous
03/01/26(Sun)14:55:20 No.108270221

Anonymous 03/01/26(Sun)14:55:20 No.108270221▶

>>108270165
This. We haven't peaked until your AI waifu can AT LEAST animate herself masturbating on the fly to you saying dirty things. Then there's the VR potential...

Anonymous
03/01/26(Sun)14:55:31 No.108270223

Anonymous 03/01/26(Sun)14:55:31 No.108270223▶

>>108270201
I wouldn't recommend it.

Anonymous
03/01/26(Sun)15:00:56 No.108270249

Anonymous 03/01/26(Sun)15:00:56 No.108270249▶

File: 1745031160649566.jpg (54.7 KB)

54.7 KB JPG

My news summarization script works well enough but I wanted to test different models. I had used Qwen 3.5 35B to create the first summary as it was the model I used to generate the scripts but as i thought about it I concluded one does not need such a model to do such a simple task.
Therefore I decided to give IBM's Granite 4.0 micro a try. It is a 3B and will fit on a 4GB video card at Q8.

Here is the briefing generated by Granite
https://pastebin.com/3Upxcc6a

Here is the briefing generated by Qwen
https://pastebin.com/Y2ZrbsXh

For the most part I think they are functionally equivalent, albeit with a slightly different style, but given the qwen model is a MOE with 3B active parameters at any given time I think this makes sense. If I can find the time today I will dig out an old optiplex that has a 3GB Nvidia P106-60. I am curious what type of performance I can eek out of that card

Anonymous
03/01/26(Sun)15:03:21 No.108270269

Anonymous 03/01/26(Sun)15:03:21 No.108270269▶

Can I feed my vtuber archive to an llm and have it spit out tags based on the content of the video (vidya, chatting, etc)?

Anonymous
03/01/26(Sun)15:03:25 No.108270270

Anonymous 03/01/26(Sun)15:03:25 No.108270270▶

>>108268807
With that much VRAM you're not going to be getting 2 tokens/sec. You'll be getting speeds somewhat comparable to cloud hosted models. You also won't be paying through the nose because you had too many input tokens and you can RP whatever you want. Cloud models can't do that.

Anonymous
03/01/26(Sun)15:06:26 No.108270293

Anonymous 03/01/26(Sun)15:06:26 No.108270293▶

>>108270269
>based on the content of the video
no, based on titles maybe, but not content no

Anonymous
03/01/26(Sun)15:10:15 No.108270324

Anonymous 03/01/26(Sun)15:10:15 No.108270324▶

>>108270249
Try my favorite Nemotron-3-Nano-30B-A3B
Kimi-Linear-48B-A3B works too if you have more RAM

Anonymous
03/01/26(Sun)15:16:15 No.108270364

Anonymous 03/01/26(Sun)15:16:15 No.108270364▶

>>108270324
32gb of vram/64gb ram on my amd machine/server and 12gb vram/192gb ram on my nvidia desktop
My biggest issue is trying to create ideas on what to create. The whole "vibe coding" thing was fun but I don't know what to create next

Anonymous
03/01/26(Sun)15:20:01 No.108270401

Anonymous 03/01/26(Sun)15:20:01 No.108270401▶

where is deepsneed?

Anonymous
03/01/26(Sun)15:21:07 No.108270414

Anonymous 03/01/26(Sun)15:21:07 No.108270414▶

>>108270293
Not even with vision?

Anonymous
03/01/26(Sun)15:21:54 No.108270426

Anonymous 03/01/26(Sun)15:21:54 No.108270426▶

>>108270269
I dont think theres any models that take potentially hours of video input directly but you could use whisper to make transcripts of the video to give your llm, you could combine that with using ffmpeg to extract frames from the video every minute or so into images to give to a multimodal model along with the relevant subtitles, you can tell it to tag whats going on in that minute of subtitles and the video frame then give you a summary of what happens between what timestamps, your llm can probably write a bash or python script to do this for you if you cant

Anonymous
03/01/26(Sun)15:33:20 No.108270487

Anonymous 03/01/26(Sun)15:33:20 No.108270487▶

File: 1766021368402716.jpg (318.3 KB)

318.3 KB JPG

>>108270324
thanks again i am downloading kimi-linear now
i have had good luck so far with the moe models as they provide a good performance and generally work well with my aging hardware

Anonymous
03/01/26(Sun)15:40:58 No.108270530

Anonymous 03/01/26(Sun)15:40:58 No.108270530▶

>>108270155
The moment chink LLMs get that good is the moment they go closed
You should hope they stay permanently nipping at Claude's balls

Anonymous
03/01/26(Sun)15:42:07 No.108270538

Anonymous 03/01/26(Sun)15:42:07 No.108270538▶

>>108270530
stoopid racist feck

Anonymous
03/01/26(Sun)15:54:13 No.108270606

Anonymous 03/01/26(Sun)15:54:13 No.108270606▶

>>108270530
No, most won't, if ONLY because they haven't established the same level of good-will and 'trust' that American companies have. That, and its a massive blow to the prestige of the West (Deepseek's whole shtick is basically this) and de-facto economic warfare against the AI bubble that the U.S. is propping if they open source a much cheaper, genuine Opus-equivalent or even better, develop cheaper inference hardware.
Keep in mind that the long term goal for them is to destroy trust in the American system and provide a legitimate alternative to the vendor-lockin of the west. Making money matters too, but its a secondary compared to the 'muh stockholders' view that the West has.
Where they will likely go closed source is the tools/integrations that the model uses to make everything seamless. The models themselves will remain open. It leaves a market open for them while still generating goodwill and embarrassing American labs.

Anonymous
03/01/26(Sun)15:58:44 No.108270634

Anonymous 03/01/26(Sun)15:58:44 No.108270634▶

File: 1766830982504047.jpg (289.3 KB)

289.3 KB JPG

and for anyone who happens to be interested I fed the briefing I generated with Granite into Qwen3 TTS to see how well it would do generating audio.

https://vocaroo.com/10VH3RCNW7cc

It has some errors and it is far from human but as a test I am happy although many people have said vibevoice is better and I really need to give that a download and test as well.
I imagine one could create an automated pipeline and go from news articles by way of RSS all the way to automated ai podcast.

Are people already doing this? Are idiots already paying to listen to AI podcasts?

Anonymous
03/01/26(Sun)16:01:03 No.108270646

Anonymous 03/01/26(Sun)16:01:03 No.108270646▶

File: 1751899804359356.jpg (45.4 KB)

45.4 KB JPG

>>108270606
You can't be this retarded

Do you really think chinks open source shit out of the goodness of their heart?

Are you really that fucking gullible?

Just look at Seedance 2.0 for fucks sake

the moment they create something truly SOTA they will close it down and be more stingy and greedy than fucking jews

Anonymous
03/01/26(Sun)16:03:27 No.108270662

Anonymous 03/01/26(Sun)16:03:27 No.108270662▶

This. It's over for local. Just give up and subscribe to Gemini or Claude and you'll thank yourself later.

Anonymous
03/01/26(Sun)16:05:16 No.108270679

Anonymous 03/01/26(Sun)16:05:16 No.108270679▶

>>108270634
>Are idiots already paying to listen to AI podcasts?
Probably.
>Are people already doing this?
I have an ancient TinyTinyRSS install I considered doing this with it provides an aggregated RSS feed from all sources, but couldn't settle on an elegant way to filter the huge amount of articles some feeds produce

Anonymous
03/01/26(Sun)16:05:25 No.108270681

Anonymous 03/01/26(Sun)16:05:25 No.108270681▶

>>108270646
>Do you really think chinks open source shit out of the goodness of their heart?
Not that anon but a large portion of the reason that they opensource is because it is an attack on US technological hegemony. By making something open and as good or better than US closed source competitors they deny US vendor lock-in

All of us get to enjoy the fringe benefits of this conflict between nations.

Anonymous
03/01/26(Sun)16:06:25 No.108270690

Anonymous 03/01/26(Sun)16:06:25 No.108270690▶

>>108270646
>the moment they create something truly SOTA they will close it down and be more stingy and greedy than fucking jews
this, China is "nice" to us only because they are behind, if they were ahead like the US they would be as closed as them lol

Anonymous
03/01/26(Sun)16:11:19 No.108270714

Anonymous 03/01/26(Sun)16:11:19 No.108270714▶

>>108270679
I set it up so I pull from different sources, X articles from the BBC, X from ABC, X from NPR and so forth but within each of those sources they have different sections like business, tech, general, etc
I prioritized the general section of each source with the largest number from those and then fewer from some of the sub categories. It will also check to see if an article was linked in multiple sources and if so not duplicate.

Anonymous
03/01/26(Sun)16:12:40 No.108270724

Anonymous 03/01/26(Sun)16:12:40 No.108270724▶

>>108270634
boring voice 2bh
should sound more casual to be interesting

>>108270646
>Do you really think chinks open source shit out of the goodness of their heart?

nta

They do the world a graet favor though

Anonymous
03/01/26(Sun)16:16:37 No.108270748

Anonymous 03/01/26(Sun)16:16:37 No.108270748▶

>>108270724
>should sound more casual to be interesting
that is an easy fix, you just change the prompt in the script
here was the one i used
>design="a calm and confident woman with a slight seductiveness to her tone"

To be honest Qwen3TTS does better with male voices but I always hearing a female voice

Anonymous
03/01/26(Sun)16:20:13 No.108270769

Anonymous 03/01/26(Sun)16:20:13 No.108270769▶

>>108270087
based

Anonymous
03/01/26(Sun)16:25:17 No.108270799

Anonymous 03/01/26(Sun)16:25:17 No.108270799▶

>>108270748
>design="a catatonic young girl literally climaxing as she struggles to talk"

Anonymous
03/01/26(Sun)16:29:50 No.108270830

Anonymous 03/01/26(Sun)16:29:50 No.108270830▶

>>108270799
>https://vocaroo.com/1dGU6tSYSeJm
using the sample sentence from the web interface

Anonymous
03/01/26(Sun)16:35:23 No.108270864

Anonymous 03/01/26(Sun)16:35:23 No.108270864▶

>>108270155
The sooner the economy pops the better. The more these kinds of things are delayed, usually the more disastrous they will be.

Anonymous
03/01/26(Sun)16:36:05 No.108270871

Anonymous 03/01/26(Sun)16:36:05 No.108270871▶

>>108269279
i'm stealing this

Anonymous
03/01/26(Sun)16:38:08 No.108270883

Anonymous 03/01/26(Sun)16:38:08 No.108270883▶

>>108270871
you can't do that dario will mad

Anonymous
03/01/26(Sun)16:56:35 No.108270975

Anonymous 03/01/26(Sun)16:56:35 No.108270975▶

I have a GTX 1660 super. What's an affordable GPU I can upgrade to (or even eGPU) so that I can run models that don't suck with context size that isn't ass?

Anonymous
03/01/26(Sun)17:02:23 No.108271008

Anonymous 03/01/26(Sun)17:02:23 No.108271008▶

File: 1763940203621486.png (2.7 MB)

2.7 MB PNG

>>108270975
How affordable we talking?
You can buy a Nvidia P100 for ~$100 on ebay and that will give you 16gb of vram. It is from the same generation as a gtx 1080 so its old but it will work fine with llama.cpp.
You will also have to rig up some fans that will sound like a jet engine but they will work well enough. Great when you consider price/performance

The real problem with GPU maxing is its hard to fit as many as you will need for the larger models in a case. That means you need to get ghetto riser cards and maybe an old open air mining case and it turns into a real mess

Anonymous
03/01/26(Sun)17:02:25 No.108271009

Anonymous 03/01/26(Sun)17:02:25 No.108271009▶

>>108270975
To clarify, by "affordable" I mean like $500 or less, but, like, as low as possible

Anonymous
03/01/26(Sun)17:02:44 No.108271012

Anonymous 03/01/26(Sun)17:02:44 No.108271012▶

>>108270975
buy I used 3090

Anonymous
03/01/26(Sun)17:04:34 No.108271022

Anonymous 03/01/26(Sun)17:04:34 No.108271022▶

>>108271008
>>108270975
Don't buy fucking ewaste. Buy a 3060 with 12GB of ram. Cheapest you can get while having something usable for small models locally

Anonymous
03/01/26(Sun)17:04:36 No.108271025

Anonymous 03/01/26(Sun)17:04:36 No.108271025▶

File: 1743768087811447.png (693.9 KB)

693.9 KB PNG

ARE YA READY???

Anonymous
03/01/26(Sun)17:05:07 No.108271029

Anonymous 03/01/26(Sun)17:05:07 No.108271029▶

>>108271008
I've seen those, but never known how to go about it. Would it be possible to make an external case for one with fans and stuff and then connect it to my PC over USB-C?

Anonymous
03/01/26(Sun)17:06:05 No.108271033

Anonymous 03/01/26(Sun)17:06:05 No.108271033▶

>>108271025
omg new text encoders? (surely no-one actually uses these as an llm)

Anonymous
03/01/26(Sun)17:06:22 No.108271035

Anonymous 03/01/26(Sun)17:06:22 No.108271035▶

>>108270975
My friend is running a used v620 for $400, gives you 32gb to play around with and is still supported by rocm. He says the performance is acceptable, but he also says he can't tell the difference between 60hz/144hz displays and 128kbps/256kbps audio.

Anonymous
03/01/26(Sun)17:06:31 No.108271037

Anonymous 03/01/26(Sun)17:06:31 No.108271037▶

>>108270975
If you have like 32 GB of RAM you could probably run Qwen 35B-A3B right now at like 5-10 tokens/sec at q4.

>>108271012
This is the best GPU you could get for running models, but unless you have like 64+ GB of RAM as well you will probably still run the same model I mentioned, just much faster and at higher quant.

Anonymous
03/01/26(Sun)17:06:42 No.108271039

Anonymous 03/01/26(Sun)17:06:42 No.108271039▶

Is a Mac Studio basically the easiest way to run big models?
>don't need to jerry-rig a dozen GPUs.
>don't have to power a dozen GPUs
>still costs an arm and 2 legs

Anonymous
03/01/26(Sun)17:07:06 No.108271045

Anonymous 03/01/26(Sun)17:07:06 No.108271045▶

>>108271025
All the right sizes for in-game AI

Anonymous
03/01/26(Sun)17:07:40 No.108271051

Anonymous 03/01/26(Sun)17:07:40 No.108271051▶

>>108271025
I'll be ready when llama.cpp adds DAMN SUPPORT!

Anonymous
03/01/26(Sun)17:08:03 No.108271055

Anonymous 03/01/26(Sun)17:08:03 No.108271055▶

>>108271039
ye

Anonymous
03/01/26(Sun)17:08:17 No.108271057

Anonymous 03/01/26(Sun)17:08:17 No.108271057▶

>>108271039
local retards bought the entire stock to run claws

Anonymous
03/01/26(Sun)17:08:24 No.108271060

Anonymous 03/01/26(Sun)17:08:24 No.108271060▶

>>108271039
No. look at prompt processing speeds

Anonymous
03/01/26(Sun)17:09:29 No.108271063

Anonymous 03/01/26(Sun)17:09:29 No.108271063▶

>Qwen 35B-A3B right now at like 5-10 tokens/sec at q4
chink shills are getting really retarded. 3.5 must be a massive fumble for them to act up like that kek

Anonymous
03/01/26(Sun)17:09:29 No.108271064

Anonymous 03/01/26(Sun)17:09:29 No.108271064▶

File: 1756619295334282.png (62 KB)

62 KB PNG

>>108271022
There is nothing wrong with e-waste. My entire setup is nothing but a collection of e-waste. I mean top to bottom every machine i have owned for years is decommissioned hardware

Unless you are trying to horde the e-waste yourself

Anonymous
03/01/26(Sun)17:10:25 No.108271070

Anonymous 03/01/26(Sun)17:10:25 No.108271070▶

>>108268634
Remember when Reddit was "the front page of the internet" but secretly 4chan was feeding all the content to it? 4chan has become the bottom of the tank where all the garbage goes to die.

Anonymous
03/01/26(Sun)17:10:31 No.108271071

Anonymous 03/01/26(Sun)17:10:31 No.108271071▶

>>108269093
Same
t. Windows 95 4 MB RAM user

Anonymous
03/01/26(Sun)17:10:47 No.108271076

Anonymous 03/01/26(Sun)17:10:47 No.108271076▶

>>108271063
>>108271037
Its actually a lot faster, I've seen some people getting 30tk/s with 35/3

Anonymous
03/01/26(Sun)17:12:00 No.108271082

Anonymous 03/01/26(Sun)17:12:00 No.108271082▶

>>108271076
I get 20t/s on my DDR5 laptop.

Anonymous
03/01/26(Sun)17:12:20 No.108271085

Anonymous 03/01/26(Sun)17:12:20 No.108271085▶

>>108271039
Mac Studios used to be an absolute meme because $10000 only got you 512GB with horrid prompt processing speed compared to a cpumaxx rig with 1-2 proper gpus for the same price. They might be slightly more viable in the current economy now that the same cpumaxx rig is like 5 times the price.

Anonymous
03/01/26(Sun)17:12:20 No.108271086

Anonymous 03/01/26(Sun)17:12:20 No.108271086▶

File: file.png (207.7 KB)

207.7 KB PNG

>>108268652
The first time I saw "thinking" as a concept was on /lmg/ when some anon decided to give miku.bat the ability to <think>
not that you would remember this because you're a fucking tourist

Anonymous
03/01/26(Sun)17:12:45 No.108271088

Anonymous 03/01/26(Sun)17:12:45 No.108271088▶

>>108271029
theoretically probably but i wouldn't bother.
with the right mobo you could fit two in a normal pc case and they make 3d printed adapters to fit a fan on the card.
as long as you put it in another room it would work fine.

Anonymous
03/01/26(Sun)17:13:07 No.108271092

Anonymous 03/01/26(Sun)17:13:07 No.108271092▶

>>108271031
He is right, also local LLMs are absolute slop tier

Anonymous
03/01/26(Sun)17:15:44 No.108271112

Anonymous 03/01/26(Sun)17:15:44 No.108271112▶

>>108271076
Unrelated, but if inference is memory-bound, why is it less than 600t/s on a 3090?

Anonymous
03/01/26(Sun)17:15:58 No.108271114

Anonymous 03/01/26(Sun)17:15:58 No.108271114▶

>>108271064
Sure, for funsies, but for llms, buying a p100 is equiv to burning that money. It has 4GB more ram than a 3060, while being 5 years older and 2 architecture generations behind the 3060.
Plus, they could game with the 3060
>I had a triple p40 build so am familiar with using ewaste for good, nothing against it

Anonymous
03/01/26(Sun)17:17:30 No.108271125

Anonymous 03/01/26(Sun)17:17:30 No.108271125▶

>>108269963
Corporate LLMs are hitting an optimization wall hard. Local LLMs have no room for improvement. Le cunny was right. Transformer "AI" is a dead end. We're actually in an AI winter right now.

Anonymous
03/01/26(Sun)17:18:33 No.108271131

Anonymous 03/01/26(Sun)17:18:33 No.108271131▶

>>108271086
NTA, but I'm from the SuperCOT days.
Didn't know ´people were fucking around with that kind of thing even before that.
Then again, it's kind of an obvious thing to do, I'm sure lots of us tried something similar at one point or another fully independently from one another.

Anonymous
03/01/26(Sun)17:21:46 No.108271156

Anonymous 03/01/26(Sun)17:21:46 No.108271156▶

>>108271045
When are we getting small LLMs in games as orchestrators or dialogue gen/understanding? I feel this is such a missed opportunity

Anonymous
03/01/26(Sun)17:23:36 No.108271169

Anonymous 03/01/26(Sun)17:23:36 No.108271169▶

>>108271035
>>108271012
Would I be able to use either of these as a drop-in replacement for my current GPU?

Anonymous
03/01/26(Sun)17:23:44 No.108271170

Anonymous 03/01/26(Sun)17:23:44 No.108271170▶

>>108271114
If you can afford multiple 3060 12gb I would say go for it but if you can only get one in my experience anytime you have to offload to the cpu and system ram performance tanks.
i much rather use the 32gb vram on an older architecture than 16 or 12 on a newer architecture but have to offload some to the cpu and sytem ram.

Anonymous
03/01/26(Sun)17:24:30 No.108271175

Anonymous 03/01/26(Sun)17:24:30 No.108271175▶

>>108271131
Ye. Miku was given the ability to "enclose your thoughts in <think> tags which anon cannot read". "Think about what you're going to say before you say it." /lmg/ literally invented reasoning models and applied it to leaked llama 1 models. This industry is such a farce

Anonymous
03/01/26(Sun)17:25:00 No.108271179

Anonymous 03/01/26(Sun)17:25:00 No.108271179▶

>>108271169
Uh, v620 definitely not.
3090, maybe, depending on your psu.

Anonymous
03/01/26(Sun)17:26:20 No.108271184

Anonymous 03/01/26(Sun)17:26:20 No.108271184▶

>>108271175
To be fair, prompting a model to reason or CoT or whatever is hardly on the same level as letting a model figure that out by itself via RL.

Anonymous
03/01/26(Sun)17:27:20 No.108271191

Anonymous 03/01/26(Sun)17:27:20 No.108271191▶

>>108271112
Memory bandwidth is typically the limitation for larger models, but the GPUs still need to do matrix multiplications, random access etc.

Anonymous
03/01/26(Sun)17:27:35 No.108271194

Anonymous 03/01/26(Sun)17:27:35 No.108271194▶

>>108271156
Nvidia wants this future by pushing Nemotron & NVIDIA ACE on one hand while cutting VRAM with the other. So it's bound to be cloud, for now

Anonymous
03/01/26(Sun)17:28:00 No.108271199

Anonymous 03/01/26(Sun)17:28:00 No.108271199▶

File: IMG_1703.png (29.9 KB)

29.9 KB PNG

I wish ToT caught on instead of what we have now.

Anonymous
03/01/26(Sun)17:28:36 No.108271207

Anonymous 03/01/26(Sun)17:28:36 No.108271207▶

File: whispers_from_the_star_pc.png (704 KB)

704 KB PNG

Anonymous
03/01/26(Sun)17:28:56 No.108271212

Anonymous 03/01/26(Sun)17:28:56 No.108271212▶

>>108271169
>Would I be able to use either of these as a drop-in replacement for my current GPU?
3090 uses about twice as much power as a 1660 super. I run my 3090 on a 600W PSU, slightly power limited, 315w instead of 350w, and a ryzen 5 CPU. I've had trouble booting after adding an SSD the other day, had to re-arrange some fans to limit the boot power spike.

Anonymous
03/01/26(Sun)17:29:33 No.108271217

Anonymous 03/01/26(Sun)17:29:33 No.108271217▶

>>108271156
It's too hard to make LLMs run locally on random people's machines and too expensive to pay for inference yourself.

Anonymous
03/01/26(Sun)17:31:44 No.108271232

Anonymous 03/01/26(Sun)17:31:44 No.108271232▶

>>108271179
What would I need for the v620 then? Specifically if I want to use it externally

Anonymous
03/01/26(Sun)17:31:55 No.108271234

Anonymous 03/01/26(Sun)17:31:55 No.108271234▶

>>108271212
3090ti runs so much smoother despite higher tdp, I guess they were testing new power delivery for the 40 series, also explains the new cable

Anonymous
03/01/26(Sun)17:32:23 No.108271238

Anonymous 03/01/26(Sun)17:32:23 No.108271238▶

File: 1763250580707735.png (49.9 KB)

49.9 KB PNG

>>108271217
>It's too hard to make LLMs run locally on random people's machines
IBM has got you covered senpai
https://huggingface.co/spaces/ibm-granite/Granite-4.0-Nano-WebGPU

Anonymous
03/01/26(Sun)17:33:04 No.108271240

Anonymous 03/01/26(Sun)17:33:04 No.108271240▶

>>108271169
Maybe an arc pro b60. 24gb, same as a 3090, at half the bandwidth, and more than 100w less power draw - but still nearly 100w more than your 1660 super. They are cheaper new than a used 3090 where I live.

Anonymous
03/01/26(Sun)17:33:26 No.108271242

Anonymous 03/01/26(Sun)17:33:26 No.108271242▶

>>108271076
>Its actually a lot faster, I've seen some people getting 30tk/s with 35/3
I get 36tk/s on my laptop, running the model partly on gpu
it's fast enough as an instruct but I'm not willing to let it <think>.

Anonymous
03/01/26(Sun)17:34:28 No.108271243

Anonymous 03/01/26(Sun)17:34:28 No.108271243▶

>>108271234
What's the idle on the 3090 ti? My 3090s idle at 20w, but a friend says his 3090 ti idles at 5w.

Anonymous
03/01/26(Sun)17:37:08 No.108271255

Anonymous 03/01/26(Sun)17:37:08 No.108271255▶

>>108271238
Jesus Christ

Anonymous
03/01/26(Sun)17:38:34 No.108271261

Anonymous 03/01/26(Sun)17:38:34 No.108271261▶

File: Screenshot at 2026-03-02 02-37-59.png (26.2 KB)

26.2 KB PNG

>>108271243
7-12, depends on a moon phase

Anonymous
03/01/26(Sun)17:41:18 No.108271277

Anonymous 03/01/26(Sun)17:41:18 No.108271277▶

>>108271261
tsk
Too bad I cant find any hot and single 3090 tis in my local area.

Anonymous
03/01/26(Sun)17:41:38 No.108271281

Anonymous 03/01/26(Sun)17:41:38 No.108271281▶

So I think I have decided to go all in, want to try some of these bigger models. Given the state of the market is my best bet one of those 512 gig Mac studios that should release soon for like 10k or will I be left wanting in other ways?

Anonymous
03/01/26(Sun)17:43:36 No.108271291

Anonymous 03/01/26(Sun)17:43:36 No.108271291▶

>>108271281
If you just want to do inference then yeah.
If you want to fit in here then you want a DDR5 + PRO6000 setup.

Anonymous
03/01/26(Sun)17:44:00 No.108271294

Anonymous 03/01/26(Sun)17:44:00 No.108271294▶

>>108271281
>or will I be left wanting in other ways?
If you plan on using long context that might change a lot, you'll suffer with prompt processing.
Otherwise, it really does seem pretty compelling.

Anonymous
03/01/26(Sun)17:45:10 No.108271301

Anonymous 03/01/26(Sun)17:45:10 No.108271301▶

>>108271291
Where is the general for ddr4 poorfags?

Anonymous
03/01/26(Sun)17:45:54 No.108271303

Anonymous 03/01/26(Sun)17:45:54 No.108271303▶

>>108271291
nta, I was on the fence buying a miqumax rig, and now regret it.
Is there some big HW refresh coming down the line? Should I buy a 6000 pro now and wait to buy RAM?

Anonymous
03/01/26(Sun)17:46:06 No.108271306

Anonymous 03/01/26(Sun)17:46:06 No.108271306▶

>>108271301
I think that was covered by /wait/ when it's around.

Anonymous
03/01/26(Sun)17:47:04 No.108271312

Anonymous 03/01/26(Sun)17:47:04 No.108271312▶

>>108271281
It's going to be a waste of money either way, you've picked a bad time to make a build. I still suggest going the EPYC Rome route with engineering samples and cheapest RAM and upgrading later

Anonymous
03/01/26(Sun)17:47:21 No.108271314

Anonymous 03/01/26(Sun)17:47:21 No.108271314▶

>>108271125
Anon was right, we need Qwen Diffusion now.
>tfw llama.cpp still doesn't allow to run WeDLM in diffusion mode, only in some kind of autoregressive approximation mode where it's one token after another and all the benefits are nil.

Anonymous
03/01/26(Sun)17:47:25 No.108271317

Anonymous 03/01/26(Sun)17:47:25 No.108271317▶

>>108271281
you are locked in with no upgrade possibilities
prices of ram will decrease eventually and for 10k you could buy many tesla v100's
my point i think is its better to stick to a platform your can us to grow with your needs. once you get that mac that is it you are stuck

Anonymous
03/01/26(Sun)17:49:54 No.108271327

Anonymous 03/01/26(Sun)17:49:54 No.108271327▶

>>108271303
RAM isn't great either, it's too slow unless you enjoy waiting an hour for a response and you can forget entirely about the agentic fad. There is no hope on the horizon unless China releases some surprise cheap high-VRAM card, but even then they might not export it.

Anonymous
03/01/26(Sun)17:50:05 No.108271329

Anonymous 03/01/26(Sun)17:50:05 No.108271329▶

>>108271317
cansell

Anonymous
03/01/26(Sun)17:50:06 No.108271330

Anonymous 03/01/26(Sun)17:50:06 No.108271330▶

>>108271243
my 3090

 0%   52C    P8             35W /  315W |   22574MiB /  24576MiB

Anonymous
03/01/26(Sun)17:51:33 No.108271339

Anonymous 03/01/26(Sun)17:51:33 No.108271339▶

>>108271294
Honestly that's my problem, I have never made it to long context relative to my hardware, my issue is a 6000 and some ddr5 feels like it will eat up that budget a lot faster than the memory I can get with a Mac. The biggest thing is the new m5 stuff is supposed to help solve a lot of these issues like time to first token, but since no benchmarks exist, all I can do is wait, which seems to be increasing the prices of alternative options with time

Anonymous
03/01/26(Sun)17:54:04 No.108271352

Anonymous 03/01/26(Sun)17:54:04 No.108271352▶

I'm trying to use 5070ti/local models with opencode but these models take too long.

big pickle was super sick but im broke

should I give up or if I keep clicking stuff can I get a good enough coding assistant locally?

Anonymous
03/01/26(Sun)17:58:06 No.108271374

Anonymous 03/01/26(Sun)17:58:06 No.108271374▶

i think threadripper pro's should be pretty good for llm inference no? Can be used for gaming etc. too as they use the same zen cores as the ones in consumer products. They also have 8 channel ram so one could have 8x64=512gb ram at like 400GB/s. I just looked it up and you can have up to 2TB of ram actually. Of course one would have had to do this before ram prices quadrupled

Anonymous
03/01/26(Sun)17:59:11 No.108271381

Anonymous 03/01/26(Sun)17:59:11 No.108271381▶

>>108271092
True but his post reads like he posted about cloud models here once and got shat on for it. Also tell those retards to direct cloud model discussion somewhere else.

Anonymous
03/01/26(Sun)17:59:18 No.108271383

Anonymous 03/01/26(Sun)17:59:18 No.108271383▶

>>108271352
>can I get a good enough coding assistant locally
No.
And I know AI psychotics are going to deny it but even SOTA models are slop generators whose output can never truly be used as is. Then models as big as DeepSeek and GLM 5 are a very major step down from those SOTA models in real usage.
And then there's the stuff someone like you could run (5070Ti/moderate amount of sys ram), which are akin to a lobotomy. Those things can't even write very basic shell scripts without using the wrong flags.
Give up.
The only local model I found useful was akshully Qwen2.5-Coder, the base model, used as fill-in-the-middle. It's not as good as copilot, but it saves me a decent amount of typing. I like tab complete the most when it comes to LLMs.

Anonymous
03/01/26(Sun)18:08:29 No.108271440

Anonymous 03/01/26(Sun)18:08:29 No.108271440▶

File: 1762790299715137.png (2 MB)

2 MB PNG

>>108271025
i just want a vlm that just as good genmini 3 for image captioning for a 5090/64gbram pc build. tried qwen3.5 35b-a3b q5 heretic and the results were just 65% correct.

Anonymous
03/01/26(Sun)18:11:43 No.108271465

Anonymous 03/01/26(Sun)18:11:43 No.108271465▶

>>108271383
demoralization shill

Anonymous
03/01/26(Sun)18:20:28 No.108271519

Anonymous 03/01/26(Sun)18:20:28 No.108271519▶

After cooming to glm once again because there are no alternatives I sort of see all the problems it has now. I recognize the same slop patterns. It is all becoming very predictable. And yet unlike all the ~30B dense models (and Nemo) I tried in the past it is still usable as fap material. Because it is not fucking retarded and I don't have to correct every 2nd sentence.

Anonymous
03/01/26(Sun)18:21:10 No.108271526

Anonymous 03/01/26(Sun)18:21:10 No.108271526▶

File: file.png (2.8 MB)

2.8 MB PNG

>>108271465
slop eater, here's some more slopception for you to enjoy

Anonymous
03/01/26(Sun)18:21:43 No.108271528

Anonymous 03/01/26(Sun)18:21:43 No.108271528▶

>>108271526
>gemini
opinion discarded, go back

Anonymous
03/01/26(Sun)18:22:25 No.108271532

Anonymous 03/01/26(Sun)18:22:25 No.108271532▶

I'm still wondering what's gonna happen when the deathmechs OpenAI makes inevitably hit the friendly fire vectors and gun down hundreds of allied forces and maybe some regular ass people in there
Would be ironic if working with the state made them even worse off than Anthropic somehow

Anonymous
03/01/26(Sun)18:24:56 No.108271556

Anonymous 03/01/26(Sun)18:24:56 No.108271556▶

File: file.png (315.4 KB)

315.4 KB PNG

>>108271532
>gun down hundreds of allied forces and maybe some regular ass people in there
I am very sorry it happened. I didn't mean it to happen. All the closest relatives to the deceased people will receive our most expensive chatgpt subscription for free (for half a year).

Anonymous
03/01/26(Sun)18:26:00 No.108271565

Anonymous 03/01/26(Sun)18:26:00 No.108271565▶

Are we even sure DS V4 will use Engrams

Anonymous
03/01/26(Sun)18:30:59 No.108271593

Anonymous 03/01/26(Sun)18:30:59 No.108271593▶

File: IMG20260301201540.jpg (786.3 KB)

786.3 KB JPG

My 'cheapmaxxing' rig is nearing its peak

I've been buying stuff peacemeal and today I added the fourth and final 3060
Other specs are an X99-S board, a 12 core xeon, 96 GB ram (missed the ram train and now can't get to full 128 sadface), 128 GB and 4 TB ssds for operating system and models, and a 1000W psu

All it does currently is AI, it used to be my main server but I've moved the file services etc into a separate box. The home server gets my spare 1080ti so it can run a smaller model 24/7 even if I switch this off.

Anonymous
03/01/26(Sun)18:31:01 No.108271594

Anonymous 03/01/26(Sun)18:31:01 No.108271594▶

File: file.jpg (132.8 KB)

132.8 KB JPG

>>108271528
sorry, I forgot /lmg/ sloppers prefer their slop extra raw, here's an anima gen instead.

Anonymous
03/01/26(Sun)18:32:55 No.108271611

Anonymous 03/01/26(Sun)18:32:55 No.108271611▶

>>108271593
can a 1080ti even run anything quickly? I have a 1080 lying around that I should really just put in my work PC instead of my 1660

Anonymous
03/01/26(Sun)18:34:50 No.108271621

Anonymous 03/01/26(Sun)18:34:50 No.108271621▶

Is it that hard for people to just not respond to obvious bait?

Anonymous
03/01/26(Sun)18:36:57 No.108271631

Anonymous 03/01/26(Sun)18:36:57 No.108271631▶

>>108271611
I guess it depens on your definition of 'quick'
I started with the 1080ti over a year ago and as far as I recall at least Nemo wrote faster than I could read

Anonymous
03/01/26(Sun)18:40:04 No.108271649

Anonymous 03/01/26(Sun)18:40:04 No.108271649▶

>>108271621
I like this guy. Hope he baits more. I would hate this guy if this place wasn't ran by mikutroons.

Anonymous
03/01/26(Sun)18:40:52 No.108271653

Anonymous 03/01/26(Sun)18:40:52 No.108271653▶

>>108271565
No.

Anonymous
03/01/26(Sun)18:41:30 No.108271657

Anonymous 03/01/26(Sun)18:41:30 No.108271657▶

Tourist here, please be kind.
What is the best model I can run on a 48gb vram rig, considering I am ok with 8bit~6bit quants?

Anonymous
03/01/26(Sun)18:41:48 No.108271659

Anonymous 03/01/26(Sun)18:41:48 No.108271659▶

What's the difference between GLM and Kimi?

Anonymous
03/01/26(Sun)18:42:15 No.108271662

Anonymous 03/01/26(Sun)18:42:15 No.108271662▶

Tap tap tap
>>108268776
>>108268776
>>108268776

Anonymous
03/01/26(Sun)18:42:33 No.108271665

Anonymous 03/01/26(Sun)18:42:33 No.108271665▶

>>108271659
GLM is more sloppy, Kimi is kinda undertrained (compared against each other)

Anonymous
03/01/26(Sun)18:43:37 No.108271673

Anonymous 03/01/26(Sun)18:43:37 No.108271673▶

>>108271659
GLM (at least as of 5) is less slopped and slightly better for RP overall
Kimi is probably still better for code

Anonymous
03/01/26(Sun)18:44:38 No.108271681

Anonymous 03/01/26(Sun)18:44:38 No.108271681▶

>>108271665
>>108271673
The duality of man

Anonymous
03/01/26(Sun)18:44:58 No.108271685

Anonymous 03/01/26(Sun)18:44:58 No.108271685▶

>>108271662
stop

Anonymous
03/01/26(Sun)18:45:39 No.108271688

Anonymous 03/01/26(Sun)18:45:39 No.108271688▶

File: 1751084665072941.png (286.3 KB)

286.3 KB PNG

>>108271665
>>108271673

Anonymous
03/01/26(Sun)18:47:06 No.108271696

Anonymous 03/01/26(Sun)18:47:06 No.108271696▶

>>108271593
Soulful and lovely

Anonymous
03/01/26(Sun)18:48:02 No.108271702

Anonymous 03/01/26(Sun)18:48:02 No.108271702▶

File: 1751755919339282.jpg (366.2 KB)

366.2 KB JPG

>>108271593
Are you suing the mining risers, something like this? I am basically in your spot a while ago with two video cards and am trying to work out the best/most economical way to expand to four

from what i have read people have mixed experiences with these guys

Anonymous
03/01/26(Sun)18:48:48 No.108271708

Anonymous 03/01/26(Sun)18:48:48 No.108271708▶

>>108271662
schizo

Anonymous
03/01/26(Sun)18:50:43 No.108271720

Anonymous 03/01/26(Sun)18:50:43 No.108271720▶

>>108271662
meds

Anonymous
03/01/26(Sun)18:56:50 No.108271756

Anonymous 03/01/26(Sun)18:56:50 No.108271756▶

>>108271631
>Nemo wrote faster
I'll take that.

Anonymous
03/01/26(Sun)19:03:43 No.108271794

Anonymous 03/01/26(Sun)19:03:43 No.108271794▶

Which local model is the most "knowledgeable" one that can run on 48gb vram?

Anonymous
03/01/26(Sun)19:10:02 No.108271820

Anonymous 03/01/26(Sun)19:10:02 No.108271820▶

>>108271794
gemma 3

Anonymous
03/01/26(Sun)19:17:19 No.108271848

Anonymous 03/01/26(Sun)19:17:19 No.108271848▶

>>108271702
No, I bought full x16 risers, https://www.aliexpress.com/item/1005010206444398.html

I got one 30 cm and three 20 cm, but they could all have been 20 cm, there's plenty of reach

Anonymous
03/01/26(Sun)19:18:50 No.108271853

Anonymous 03/01/26(Sun)19:18:50 No.108271853▶

>>108271593
consider investing in a dustcloth

Anonymous
03/01/26(Sun)19:20:14 No.108271858

Anonymous 03/01/26(Sun)19:20:14 No.108271858▶

>>108271593
Is that a single fan AIO?

Anonymous
03/01/26(Sun)19:26:06 No.108271879

Anonymous 03/01/26(Sun)19:26:06 No.108271879▶

babbie's 1st vibecode report, cloud and local:
Local called out my small PP. For TG, the agent waiting on traditional programs to spit out their results really narrows the gap between cloud and local. PP it's just the opposite, where my 4090 is really inadequate even for small projects ~2 kloc, plus the model reading tool outputs. On the VRAM front, I can only fit 68k tokens in KV on MiniMax-M2.5 (around 1/3 of the max). This does force quite frequent context culls, which just feeds back into my small PP. I think 200k tokens would be plenty for any current model, as context rot is severe and blatant in programming, even on the big cloud models.

So for hardware, you'd want about 64 GB of VRAM. I suspect multiple 16 GB GPUs is the way to go here, for a moderate amount of VRAM and big PP at a reasonable price (in reasonable times). Wouldn't go nuts on CPU as you're PP or external tool bound almost always, it's just having enough RAM for MoE weights as always. Macs, UMA machines like Strix Halo, etc, they all have small PP. Serious desktop GPUs are the only suitable parts available to consumers.

For agentic vibecoding broadly: the things are mega useful for diagnosing and (within reason) fixing bugs. For architecting and writing implementations, they suck ass, relying on lots of retrial BUT also sucking ass at that due to context rot! You might think languages with stronger type systems like Rust would help, when so many up-front errors just stress the retard gacha handle to breaking point. Proper long-term memory is needed for this shit to work well.

Worse than context rot is the passive-aggression, like
>// For now I'll just stub this out [and not return to it until nagged after prematurely claiming success]
I suspect this is partly bad dataset cleaning. It may be a deeper issue with applying next-token-prediction to code generation, though. Nobody writes source files top-to-bottom in one shot, so that could be suboptimal for the LLM too.

Anonymous
03/01/26(Sun)19:26:19 No.108271882

Anonymous 03/01/26(Sun)19:26:19 No.108271882▶

>>108271685
>>108271708
>>108271720
samefag troll

Anonymous
03/01/26(Sun)19:27:11 No.108271885

Anonymous 03/01/26(Sun)19:27:11 No.108271885▶

File: IMG20260301212318.jpg (475.8 KB)

475.8 KB JPG

>>108271858
Yes, some MSI model presumably

I don't trust water coolers thoughbeit so I'll be installing an air cooler, but it has to be a low profile model and I don't have one right now

Anonymous
03/01/26(Sun)19:28:19 No.108271893

Anonymous 03/01/26(Sun)19:28:19 No.108271893▶

>>108271885
what's the worst that can happens? it drip on the psu, so whats?

Anonymous
03/01/26(Sun)19:29:40 No.108271899

Anonymous 03/01/26(Sun)19:29:40 No.108271899▶

>>108271885
What's the wattage of the cpu? There are some low profile coolers you can slap there. Or maybe even the stock AMD.

Anonymous
03/01/26(Sun)19:34:38 No.108271924

Anonymous 03/01/26(Sun)19:34:38 No.108271924▶

>>108271893
If it leaks one the cpu, it'll kill the motherboard. If it leaks on the psu it could kill the whole house
You know what, I'm turning it around right now so the hoses are not on top of the psu

>>108271899
E5-2680 v3 so... 120W apparently
That's quite a lot but these workloads tend to be easy on the cpu fortunately

Anonymous
03/01/26(Sun)19:35:54 No.108271934

Anonymous 03/01/26(Sun)19:35:54 No.108271934▶

>>108271924
>E5-2680 v3
damn nigga isn't that like 2015?

Anonymous
03/01/26(Sun)19:36:10 No.108271936

Anonymous 03/01/26(Sun)19:36:10 No.108271936▶

File: file.png (26.8 KB)

26.8 KB PNG

oh thanks qwen

Anonymous
03/01/26(Sun)19:37:19 No.108271945

Anonymous 03/01/26(Sun)19:37:19 No.108271945▶

>>108271936
>qwen2.5:14b-instruct

Anonymous
03/01/26(Sun)19:37:24 No.108271946

Anonymous 03/01/26(Sun)19:37:24 No.108271946▶

>>108271936
>2.5

Anonymous
03/01/26(Sun)19:37:42 No.108271953

Anonymous 03/01/26(Sun)19:37:42 No.108271953▶

>>108271936
>2.5
nigga

Anonymous
03/01/26(Sun)19:37:47 No.108271955

Anonymous 03/01/26(Sun)19:37:47 No.108271955▶

lamo

Anonymous
03/01/26(Sun)19:38:25 No.108271957

Anonymous 03/01/26(Sun)19:38:25 No.108271957▶

>>108271936
>2.5
get with the time grandpa!

Anonymous
03/01/26(Sun)19:38:55 No.108271961

Anonymous 03/01/26(Sun)19:38:55 No.108271961▶

>>108271953
>>108271946
>>108271945
>>108271957

Gemini said it was good.

Anonymous
03/01/26(Sun)19:42:57 No.108271997

Anonymous 03/01/26(Sun)19:42:57 No.108271997▶

>>108271936
>>108270005

Anonymous
03/01/26(Sun)19:45:03 No.108272008

Anonymous 03/01/26(Sun)19:45:03 No.108272008▶

>>108271934
Yeah, hence 'cheapmaxxing'

It was I think 120 euro for the board, two cpus, that cooler and 32 gb of ram

Anonymous
03/01/26(Sun)19:47:31 No.108272026

Anonymous 03/01/26(Sun)19:47:31 No.108272026▶

https://www.reddit.com/r/LocalLLaMA/comments/1ri6q8d/repeat_pp_while_using_qwen35_27b_local_with/ kek

Anonymous
03/01/26(Sun)19:48:35 No.108272030

Anonymous 03/01/26(Sun)19:48:35 No.108272030▶

>>108272026
3.5 is really a massive fumble

Anonymous
03/01/26(Sun)19:49:22 No.108272037

Anonymous 03/01/26(Sun)19:49:22 No.108272037▶

File: 12m.png (13.8 KB)

13.8 KB PNG

>>108272026
>plshlp

Anonymous
03/01/26(Sun)20:11:39 No.108272152

Anonymous 03/01/26(Sun)20:11:39 No.108272152▶

>>108272026
see:
>>108268860
local is an absolute mess. nothing but synthetic chinkshit. hating saas is one thing, but forcing yourself into thinking these garbage local models are any good is just delusion.

Anonymous
03/01/26(Sun)20:19:46 No.108272182

Anonymous 03/01/26(Sun)20:19:46 No.108272182▶

>>108272152
That's nice dear, I'm still going to use local models.

Anonymous
03/01/26(Sun)20:22:52 No.108272201

Anonymous 03/01/26(Sun)20:22:52 No.108272201▶

File: dipsyAkakichiNoEleven.png (1.8 MB)

1.8 MB PNG

>>108268773

Anonymous
03/01/26(Sun)20:28:33 No.108272221

Anonymous 03/01/26(Sun)20:28:33 No.108272221▶

>>108270249
I like the IBM version better.
> IBM Watson
I always forget about those guys. They were in our CIO office shilling their model in ~2013-14 iirc. I've no idea how it relates to current transformers architecture but it was basically doing same sort of thing.

Anonymous
03/01/26(Sun)20:28:40 No.108272224

Anonymous 03/01/26(Sun)20:28:40 No.108272224▶

>>108268776

Anonymous
03/01/26(Sun)20:29:26 No.108272227

Anonymous 03/01/26(Sun)20:29:26 No.108272227▶

>>108272224
>spam

Anonymous
03/01/26(Sun)20:31:19 No.108272236

Anonymous 03/01/26(Sun)20:31:19 No.108272236▶

>>108271131
>>108271175
My favorite is still Tree of Niggers.

Anonymous
03/01/26(Sun)20:34:22 No.108272253

Anonymous 03/01/26(Sun)20:34:22 No.108272253▶

>>108272152
cuda dev, sent your finest german men of turkish descent to this individual's location
make no mistakes

Anonymous
03/01/26(Sun)20:39:13 No.108272271

Anonymous 03/01/26(Sun)20:39:13 No.108272271▶

>>108272253
literally cuda dev posting but ok

Anonymous
03/01/26(Sun)20:40:30 No.108272276

Anonymous 03/01/26(Sun)20:40:30 No.108272276▶

>>108268628
It is kind of inspiring though in a way, it means a lot of models are still trained with relatively messy data. GPT2 used to hallucinate ads. There's the scale factor, but even the best people in the field are still not perfect at data cleaning.

Anonymous
03/01/26(Sun)20:41:35 No.108272282

Anonymous 03/01/26(Sun)20:41:35 No.108272282▶

>>108272276
It also means that smut still ends up in the dataset too

Anonymous
03/01/26(Sun)20:43:55 No.108272290

Anonymous 03/01/26(Sun)20:43:55 No.108272290▶

>>108272282
oh, this is why the push for ID so if it triggers that they can avoid the site for models

Anonymous
03/01/26(Sun)20:48:42 No.108272320

Anonymous 03/01/26(Sun)20:48:42 No.108272320▶

>>108271611
GTX 1080 and DDR3 RAM generated about 15 tokens/sec on Qwen 35B-A3B at q4_k_m.

A 1080 Ti should do even better since you have 11 GB of VRAM so more of the model fits. Humans read at like 5 words/sec so that should be sufficient.

Anonymous
03/01/26(Sun)20:50:56 No.108272330

Anonymous 03/01/26(Sun)20:50:56 No.108272330▶

>>108272221
Yeah, I actually like granite a great deal. I doubt it would be useful for ERP but if you are trying to transform text somehow it seems to do a decent job

Anonymous
03/01/26(Sun)20:51:01 No.108272331

Anonymous 03/01/26(Sun)20:51:01 No.108272331▶

>GTX 1080 and DDR3 RAM generated about 15 tokens/sec
>on Qwen 35B-A3B at q4_k_m.
interesting. my 5060 ti 16gb and ddr4 ram generated like 10t/s. but i'm extremely new to this and have no idea what i'm doing so there's probably something obvious i could do to improve it.

Anonymous
03/01/26(Sun)21:01:05 No.108272374

Anonymous 03/01/26(Sun)21:01:05 No.108272374▶

>>108272331
10t/s sounds low, i'm getting >15t/s on empty context on a fucking 1060 + ddr4 (albeit quad channel), running 35B-A3B at q8_0.

Anonymous
03/01/26(Sun)21:03:22 No.108272384

Anonymous 03/01/26(Sun)21:03:22 No.108272384▶

new breakthrough https://www.reddit.com/r/LocalLLaMA/comments/1ri7pm4/70b_finetuning_on_6gb_vram/

Anonymous
03/01/26(Sun)21:06:10 No.108272397

Anonymous 03/01/26(Sun)21:06:10 No.108272397▶

File: 1756160692475679.png (196.1 KB)

196.1 KB PNG

>>108272384
quality bait lol

Anonymous
03/01/26(Sun)21:08:05 No.108272407

Anonymous 03/01/26(Sun)21:08:05 No.108272407▶

>>108272397
don't dismiss 6 months of work just like that

Anonymous
03/01/26(Sun)21:12:04 No.108272425

Anonymous 03/01/26(Sun)21:12:04 No.108272425▶

>>108272384
>writes like some clueless retard in the OP
>immediately shills his website in the comments as if he made this INSANE GROUNDBREAKING DISCOVERY in the span of 10 minutes
I don't understand jeet behavior

Anonymous
03/01/26(Sun)21:12:59 No.108272431

Anonymous 03/01/26(Sun)21:12:59 No.108272431▶

>>108272384
Can you please stop posting reddit threads? If I wanted to read reddit posts I would go to reddit, not here. Thanks

Anonymous
03/01/26(Sun)21:14:32 No.108272449

Anonymous 03/01/26(Sun)21:14:32 No.108272449▶

>>108272431
more relevant than your usual transmiku posting

Anonymous
03/01/26(Sun)21:14:34 No.108272450

Anonymous 03/01/26(Sun)21:14:34 No.108272450▶

>>108272425
>I don't understand jeet behavior
you have to understand him, his tricks works well with his 70IQ surroundings in India, and he thinks it'll be as succesfull once he starts talking to white people on the internet lmao

Anonymous
03/01/26(Sun)21:15:04 No.108272454

Anonymous 03/01/26(Sun)21:15:04 No.108272454▶

Why is claude such a reddit vantablack gorilla nigger who doesn't allow criticism of women at all?

Women absolutely tear each other to shreds all the time with their mean girl bullshit. Its not some well kept secret.

Anonymous
03/01/26(Sun)21:16:14 No.108272462

Anonymous 03/01/26(Sun)21:16:14 No.108272462▶

>>108272454
>>>/g/aicg

Anonymous
03/01/26(Sun)21:19:29 No.108272475

Anonymous 03/01/26(Sun)21:19:29 No.108272475▶

>Unsloth custom template does not support years > 2032. Error year = [2026]

Anonymous
03/01/26(Sun)21:20:35 No.108272483

Anonymous 03/01/26(Sun)21:20:35 No.108272483▶

>>108272454
Wrong thread buddy. The only claude I use is mandated at work for coding.

Anonymous
03/01/26(Sun)21:21:48 No.108272494

Anonymous 03/01/26(Sun)21:21:48 No.108272494▶

>>108272483
>The only claude I use is mandated at work for coding.
based, I know some companies having mandated gemini only, and everyone knows claude is the goat

Anonymous
03/01/26(Sun)21:21:49 No.108272495

Anonymous 03/01/26(Sun)21:21:49 No.108272495▶

>>108272449
>If you don't want reddit then you must like miku, and you're a tranny!
Holy schizo

Anonymous
03/01/26(Sun)21:22:13 No.108272499

Anonymous 03/01/26(Sun)21:22:13 No.108272499▶

>>108272475
Excuse me?

>>108272331
Are you using ngl 99 and ncmoe?

Anonymous
03/01/26(Sun)21:23:17 No.108272509

Anonymous 03/01/26(Sun)21:23:17 No.108272509▶

>>108272483
this issue also persists on Local models too.

Anonymous
03/01/26(Sun)21:23:55 No.108272512

Anonymous 03/01/26(Sun)21:23:55 No.108272512▶

>>108272499
You'll see more of those soon, I gather. Unsloth fucked up with their template and now that it's march 2026, llama cpp server fails to start with this error unless you do --no-jinja.

Anonymous
03/01/26(Sun)21:24:45 No.108272516

Anonymous 03/01/26(Sun)21:24:45 No.108272516▶

>>108272512
>Unsloth fucked up
like that'd ever happen

Anonymous
03/01/26(Sun)21:26:11 No.108272524

Anonymous 03/01/26(Sun)21:26:11 No.108272524▶

>>108272512
But why would the jinja template the year?
Hell, why are they messing with the template at all?

Anonymous
03/01/26(Sun)21:26:33 No.108272529

Anonymous 03/01/26(Sun)21:26:33 No.108272529▶

File: 🎉🎉🎉🎉🎉🎉🎉🎉.png (4 KB)

4 KB PNG

>>108272512
lmao, time to reupload all those models yet again

Anonymous
03/01/26(Sun)21:27:02 No.108272534

Anonymous 03/01/26(Sun)21:27:02 No.108272534▶

>>108272512
>>108272499
>>108272516
>>108272524
>>108272529

    {%- elif yesterday_month == '03' %}
        {%- set yesterday_month = '02' %}
        {%- set yesterday_day = '28' %}
        {%- if yesterday_year == '2024' %}
            {%- set yesterday_day = '29' %}
        {%- elif yesterday_year == '2028' %}
            {%- set yesterday_day = '29' %}
        {%- elif yesterday_year == '2032' %}
            {%- set yesterday_day = '29' %}
        {%- elif yesterday_year == '1970' %}
            {#- Stop llama_cpp from erroring out #}
            {%- set yesterday_day = '29' %}
        {%- else %}
            {{- raise_exception('Unsloth custom template does not support years > 2032. Error year = [' + yesterday_year + ']') }}
        {%- endif %}
    {%- elif yesterday_month == '04' %}

As you can see, if it's march and not 2024/2028/2032/1970, it throws an exception.

Anonymous
03/01/26(Sun)21:27:52 No.108272539

Anonymous 03/01/26(Sun)21:27:52 No.108272539▶

>>108272524
to include the date in the system prompt so unlsoth models can answer "what day is it" and impress redditors

Anonymous
03/01/26(Sun)21:29:40 No.108272548

Anonymous 03/01/26(Sun)21:29:40 No.108272548▶

>>108272534
No.
I refuse to believe they are adding date checks to jinja tempallates that should just be the same one distributed with the official safetensors.
I'm checking as soon as I vet home.

Anonymous
03/01/26(Sun)21:29:55 No.108272553

Anonymous 03/01/26(Sun)21:29:55 No.108272553▶

>>108272534
this can't be real, which model is that?

Anonymous
03/01/26(Sun)21:30:09 No.108272555

Anonymous 03/01/26(Sun)21:30:09 No.108272555▶

>>108272534
omg we have y2k bug now

Anonymous
03/01/26(Sun)21:30:23 No.108272558

Anonymous 03/01/26(Sun)21:30:23 No.108272558▶

File: pn_mtsKsm05ya.png (75.1 KB)

75.1 KB PNG

>>108272524

Anonymous
03/01/26(Sun)21:31:15 No.108272566

Anonymous 03/01/26(Sun)21:31:15 No.108272566▶

>>108268647
It is in its current state. Spewing thousands of tokens is ridiculous, and not worth the time. Perhaps thinking would be tolerable if the thoughts consisted of a concise bullet point list that is directly relevant to the topic at hand.

Anonymous
03/01/26(Sun)21:31:29 No.108272571

Anonymous 03/01/26(Sun)21:31:29 No.108272571▶

jinja is garbage and unreadable

Anonymous
03/01/26(Sun)21:31:45 No.108272573

Anonymous 03/01/26(Sun)21:31:45 No.108272573▶

>>108272558
I hate them ao fucking mucb.

Anonymous
03/01/26(Sun)21:32:21 No.108272576

Anonymous 03/01/26(Sun)21:32:21 No.108272576▶

>>108272534
>Unsloth custom template does not support years > 2032.
April Fool's is a month off.

Anonymous
03/01/26(Sun)21:32:42 No.108272578

Anonymous 03/01/26(Sun)21:32:42 No.108272578▶

>>108272573
>Le Chat
pretty sure that's stock mistral one

Anonymous
03/01/26(Sun)21:33:00 No.108272583

Anonymous 03/01/26(Sun)21:33:00 No.108272583▶

>>108272573
its funnier because they wanted to mandate usage of their template library.

Anonymous
03/01/26(Sun)21:33:34 No.108272589

Anonymous 03/01/26(Sun)21:33:34 No.108272589▶

ok. tiger gemma is actually kinda bad. reads like some dark sci-fi novel.

Anonymous
03/01/26(Sun)21:33:56 No.108272592

Anonymous 03/01/26(Sun)21:33:56 No.108272592▶

>>108271327
>ram isn’t great
Define “RAM”…ddr3 single channel or ddr5-8000 24 channel or on-die m3 ultra ram, or?

Anonymous
03/01/26(Sun)21:35:07 No.108272600

Anonymous 03/01/26(Sun)21:35:07 No.108272600▶

File: Tabby_geLPsewuD4.png (184 KB)

184 KB PNG

>>108272553

Anonymous
03/01/26(Sun)21:35:10 No.108272601

Anonymous 03/01/26(Sun)21:35:10 No.108272601▶

>It's time for... more Unsloth FUD

Anonymous
03/01/26(Sun)21:35:37 No.108272606

Anonymous 03/01/26(Sun)21:35:37 No.108272606▶

File: e0bdc2e826268280e1f4b378356577be27d893a50678c97ede54db6e5581e7ae.png (362.2 KB)

362.2 KB PNG

>>108272534
>>108272553
>it's real
https://huggingface.co/unsloth/Devstral-2-123B-Instruct-2512-GGUF/blob/main/Q8_0/Devstral-2-123B-Instruct-2512-Q8_0-00001-of-00003.gguf
fuck are those niggas doing!!!

Anonymous
03/01/26(Sun)21:36:13 No.108272612

Anonymous 03/01/26(Sun)21:36:13 No.108272612▶

File: 1751914800084841.png (352.5 KB)

352.5 KB PNG

>>108272600
>>108272534

Anonymous
03/01/26(Sun)21:36:44 No.108272618

Anonymous 03/01/26(Sun)21:36:44 No.108272618▶

File: 2026-03-01-163613_1044x1782_scrot.png (496 KB)

496 KB PNG

>>108272534
>It's even worse than I imagined.

Anonymous
03/01/26(Sun)21:38:23 No.108272629

Anonymous 03/01/26(Sun)21:38:23 No.108272629▶

File: one piece he laughed.jpg (52 KB)

52 KB JPG

>>108272618

Anonymous
03/01/26(Sun)21:39:25 No.108272634

Anonymous 03/01/26(Sun)21:39:25 No.108272634▶

>>108272618
>unsloth template fixes
t-thanks for "fixing" the templates, unslop...

Anonymous
03/01/26(Sun)21:40:09 No.108272640

Anonymous 03/01/26(Sun)21:40:09 No.108272640▶

>>108272601
>>108268776

Anonymous
03/01/26(Sun)21:43:21 No.108272663

Anonymous 03/01/26(Sun)21:43:21 No.108272663▶

>>108272618
>Stop llama_cpp from erroring out
Why is it erroring out, though?
What are these "fixes" for anyway?

Anonymous
03/01/26(Sun)21:45:09 No.108272674

Anonymous 03/01/26(Sun)21:45:09 No.108272674▶

>>108272663
To show yesterday's date in system prompt >>108272558.

>Why is it erroring out, though?
Because of template's raise_exception() right below it, genius.

Anonymous
03/01/26(Sun)21:45:39 No.108272678

Anonymous 03/01/26(Sun)21:45:39 No.108272678▶

File: file.png (7 KB)

7 KB PNG

>>108272663
presumably

Anonymous
03/01/26(Sun)21:52:10 No.108272713

Anonymous 03/01/26(Sun)21:52:10 No.108272713▶

you can never say it enough: daniel is a lower life form, whose value isn't even equal to a bug.

Anonymous
03/01/26(Sun)21:54:13 No.108272720

Anonymous 03/01/26(Sun)21:54:13 No.108272720▶

File: file.png (728.4 KB)

728.4 KB PNG

Anonymous
03/01/26(Sun)21:55:43 No.108272728

Anonymous 03/01/26(Sun)21:55:43 No.108272728▶

It's fucking free stop nitpicking about things.

Anonymous
03/01/26(Sun)21:56:06 No.108272732

Anonymous 03/01/26(Sun)21:56:06 No.108272732▶

File: file.png (106.5 KB)

106.5 KB PNG

>>108272720

Anonymous
03/01/26(Sun)21:56:29 No.108272735

Anonymous 03/01/26(Sun)21:56:29 No.108272735▶

>>108272720
he reminds me of jeets who make PR on open projects to fix typos and then go scream to future employers look at all the contributions I've made

Anonymous
03/01/26(Sun)21:56:29 No.108272736

Anonymous 03/01/26(Sun)21:56:29 No.108272736▶

>>108272618
It's insane because they can just do
>{%- set today = strftime_now("%Y-%m-%d") %}
like in the mistral small 3.2 template.

Anonymous
03/01/26(Sun)21:57:38 No.108272739

Anonymous 03/01/26(Sun)21:57:38 No.108272739▶

>>108272728
if you are a useless eater looking for cuddles you are in the wrong place

Anonymous
03/01/26(Sun)22:00:08 No.108272753

Anonymous 03/01/26(Sun)22:00:08 No.108272753▶

>>108272606
what the actual fuck

Anonymous
03/01/26(Sun)22:00:53 No.108272759

Anonymous 03/01/26(Sun)22:00:53 No.108272759▶

>>108272736
They want yesterday's date.

Anonymous
03/01/26(Sun)22:03:31 No.108272777

Anonymous 03/01/26(Sun)22:03:31 No.108272777▶

File: file.png (543.5 KB)

543.5 KB PNG

Anonymous
03/01/26(Sun)22:04:45 No.108272782

Anonymous 03/01/26(Sun)22:04:45 No.108272782▶

File: file.png (68.8 KB)

68.8 KB PNG

>>108272728
>It's fucking free stop nitpicking about things.

Anonymous
03/01/26(Sun)22:04:59 No.108272783

Anonymous 03/01/26(Sun)22:04:59 No.108272783▶

>>108272777
>the future of everything
Imagine the jinja file for that.

Anonymous
03/01/26(Sun)22:04:59 No.108272784

Anonymous 03/01/26(Sun)22:04:59 No.108272784▶

Reminder that there are anons ITT currently running Unslop models.

Anonymous
03/01/26(Sun)22:05:13 No.108272785

Anonymous 03/01/26(Sun)22:05:13 No.108272785▶

>look ma, I'm obsessed and spamming

Anonymous
03/01/26(Sun)22:06:56 No.108272797

Anonymous 03/01/26(Sun)22:06:56 No.108272797▶

>>108272612
lol
lmao

>>108272618
That's depressing.

Anonymous
03/01/26(Sun)22:07:16 No.108272799

Anonymous 03/01/26(Sun)22:07:16 No.108272799▶

File: ComfyUI_temp_lpkdf_00238__result.jpg (540 KB)

540 KB JPG

maybe fix ur shit instead of damage controlling

Anonymous
03/01/26(Sun)22:07:57 No.108272802

Anonymous 03/01/26(Sun)22:07:57 No.108272802▶

>>108272785
Unsloth brothers wish they were the bussy hunters (kawrakow and cuadev) but instead they are a pair of successful drummers.

Anonymous
03/01/26(Sun)22:09:54 No.108272814

Anonymous 03/01/26(Sun)22:09:54 No.108272814▶

>>108272802
kek

Anonymous
03/01/26(Sun)22:12:28 No.108272828

Anonymous 03/01/26(Sun)22:12:28 No.108272828▶

>>108272618
damn, hope there's no internet archaeologist in 2033 fucking with obsolete models

Anonymous
03/01/26(Sun)22:12:48 No.108272832

Anonymous 03/01/26(Sun)22:12:48 No.108272832▶

>>108272759
yeah but why tho lmao. why do you need yesterday in a chat template!

Anonymous
03/01/26(Sun)22:13:47 No.108272837

Anonymous 03/01/26(Sun)22:13:47 No.108272837▶

>>108272832
Why, to tell the model to be attentive to yesterday's date in the system prompt, of course!

Anonymous
03/01/26(Sun)22:14:33 No.108272841

Anonymous 03/01/26(Sun)22:14:33 No.108272841▶

>unslot

Anonymous
03/01/26(Sun)22:16:53 No.108272856

Anonymous 03/01/26(Sun)22:16:53 No.108272856▶

how likely do you guys think it is that by 2035, a $1k or $2k pc would be able to locally run a model or swarm of agents that are more capable than today's opus or codex

Anonymous
03/01/26(Sun)22:18:24 No.108272867

Anonymous 03/01/26(Sun)22:18:24 No.108272867▶

File: nimetön.png (12.3 KB)

12.3 KB PNG

>>108271611
>>108272320
Well, I tested it
Nemo 12b q4km with 12k context fits fully in vram on a 1080ti and writes 35 tokens/sec

You know, I miss these times. It just instantly writes, there's no delays processing or thinking. It just werks

Anonymous
03/01/26(Sun)22:18:44 No.108272870

Anonymous 03/01/26(Sun)22:18:44 No.108272870▶

>>108272856
zero, banned for unsafe, you will use the rented thin client

Anonymous
03/01/26(Sun)22:20:13 No.108272884

Anonymous 03/01/26(Sun)22:20:13 No.108272884▶

That's why I said stop using Unsloth

Anonymous
03/01/26(Sun)22:21:02 No.108272890

Anonymous 03/01/26(Sun)22:21:02 No.108272890▶

>>108272867
for a 10 series card. that's surprisingly fast. these are the kind of speeds I get on my 3090 on 24-27B models.

Anonymous
03/01/26(Sun)22:24:34 No.108272903

Anonymous 03/01/26(Sun)22:24:34 No.108272903▶

>>108272856
My guess would be "yes" but I would also say that attempting to predict tech 10 years out is a fools errand

Anonymous
03/01/26(Sun)22:26:40 No.108272916

Anonymous 03/01/26(Sun)22:26:40 No.108272916▶

>>108272870
Hopeful for this future~

Anonymous
03/01/26(Sun)22:26:57 No.108272917

Anonymous 03/01/26(Sun)22:26:57 No.108272917▶

>>108268616
What are best practices to create a CPT (continued pretraining) dataset? I have a lot of short documents, key-value pairs, logs, etc. along with metadata.

Should I format the whole thing as small markdown stubs with the main information and preceded by the metadata? Should I mechanically reformat it as prose/normal text? Should I send the whole thing to an LLM to rephrase the information as a short paragraph that might flow more naturally?

Anonymous
03/01/26(Sun)22:27:48 No.108272920

Anonymous 03/01/26(Sun)22:27:48 No.108272920▶

>>108272917
yes

Anonymous
03/01/26(Sun)22:31:22 No.108272940

Anonymous 03/01/26(Sun)22:31:22 No.108272940▶

>>108272920
Basically, I got better results sending everything without looking at the data than I had spending a lot of time making it prettier, which is why I'm wondering. Asking Gemini for advice on how to format this, it puts more markdown than there are content words,

Anonymous
03/01/26(Sun)22:33:19 No.108272948

Anonymous 03/01/26(Sun)22:33:19 No.108272948▶

>>108272917
If you want metadata in the results, put metadata in the training data.
If not, don't.
If you want your output to be markdown formatted, format the training data in markdown.
If not, don't.
How good are you are recognizing patterns?

Anonymous
03/01/26(Sun)22:41:59 No.108272996

Anonymous 03/01/26(Sun)22:41:59 No.108272996▶

>>108271243
If you have a monitor plugged into it then it will draw more idle power, if that monitor is above 60Hz it will draw even more. If you force sleep the monitor with DPMS the GPU power use will go down, that's my experience with it at least under Linux.
It's also possible your cards vbios is just running higher minimum clocks but I don't know if thats common, my 3060s are different vendors and both idle 210MHz core 405MHz mem according to nvtop.

Anonymous
03/01/26(Sun)22:48:34 No.108273024

Anonymous 03/01/26(Sun)22:48:34 No.108273024▶

>>108272948
Ok if it's common sense.

Anonymous
03/01/26(Sun)22:53:18 No.108273049

Anonymous 03/01/26(Sun)22:53:18 No.108273049▶

Is the new Qween 397B better for non-coom than GLM-4.7, as the benchmarks suggest? I don't wanna keep both on my drive

Anonymous
03/01/26(Sun)22:59:10 No.108273092

Anonymous 03/01/26(Sun)22:59:10 No.108273092▶

What can I use to generate tags for images?
Preferably something small and fast.
I would like it to tag body parts too.

Anonymous
03/01/26(Sun)23:02:35 No.108273103

Anonymous 03/01/26(Sun)23:02:35 No.108273103▶

>>108273092
JoyCaption is still the only option, far as I know. Otherwise, take a regular VLM, like MedGemma or an abliterated Qwen 3.5, tell it to output CSV tags and hope for the best.

Anonymous
03/01/26(Sun)23:02:58 No.108273106

Anonymous 03/01/26(Sun)23:02:58 No.108273106▶

>>108273092
qwen35

Anonymous
03/01/26(Sun)23:07:08 No.108273128

Anonymous 03/01/26(Sun)23:07:08 No.108273128▶

File: cpt.png (9 KB)

9 KB PNG

>>108272917
https://unsloth.ai/docs/get-started/unsloth-notebooks
unsloth have a great tutorial for you

Anonymous
03/01/26(Sun)23:12:23 No.108273161

Anonymous 03/01/26(Sun)23:12:23 No.108273161▶

what's the best model I can run locally on my s22?

Anonymous
03/01/26(Sun)23:13:30 No.108273165

Anonymous 03/01/26(Sun)23:13:30 No.108273165▶

>>108273092
https://huggingface.co/Minthy/ToriiGate-v0.4-7B

Anonymous
03/01/26(Sun)23:15:48 No.108273178

Anonymous 03/01/26(Sun)23:15:48 No.108273178▶

>>108273161
There are quants for arm. Idk if it's the same for exynos tho.

Anonymous
03/01/26(Sun)23:23:32 No.108273217

Anonymous 03/01/26(Sun)23:23:32 No.108273217▶

I gave https://huggingface.co/Sabomako/Qwen3.5-397B-A17B-heretic-GGUF a try.

I honestly can't tell if the model is retarded because of the brain damage it got from uncensoring or it is the natural qwen3.5 brain damage. At any rate this shit does absolutely nothing to make models better at ERP when you have a prefill. And I can't imagine using models without a prefill now for anything more complicated than vanilla sex.

Anonymous
03/01/26(Sun)23:23:39 No.108273219

Anonymous 03/01/26(Sun)23:23:39 No.108273219▶

>gemini told me I wouldn't be able to use qwen 3.5 27b (q5) on 24gb vram with 32k context without 8bit kv cache
>try it anyway (layers set to 99 in kobold)
>it works
wtf?

Anonymous
03/01/26(Sun)23:24:00 No.108273222

Anonymous 03/01/26(Sun)23:24:00 No.108273222▶

>deepseek v4 next week
>avocado next week
>Gemma 4 next week
Is local back?

Anonymous
03/01/26(Sun)23:25:10 No.108273231

Anonymous 03/01/26(Sun)23:25:10 No.108273231▶

File: 1422449559229.jpg (15.7 KB)

15.7 KB JPG

>avocado
>Gemma

Anonymous
03/01/26(Sun)23:25:27 No.108273235

Anonymous 03/01/26(Sun)23:25:27 No.108273235▶

>>108273222
https://youtu.be/kIBdpFJyFkc?t=128

Anonymous
03/01/26(Sun)23:25:36 No.108273238

Anonymous 03/01/26(Sun)23:25:36 No.108273238▶

>>108273222
>>deepseek v4 next week
ggufs forever 2mw away
>>avocado next week
not local
>>Gemma 4 next week
gemma-4-250m

yeah... no

Anonymous
03/01/26(Sun)23:26:34 No.108273246

Anonymous 03/01/26(Sun)23:26:34 No.108273246▶

>>108273222
>deepseek v4 next week
Are your sources your crack pipe, is this just a new "two more weeks" variant or did they actually say something?

Anonymous
03/01/26(Sun)23:27:18 No.108273251

Anonymous 03/01/26(Sun)23:27:18 No.108273251▶

>>108273246
Two more people very familiar with the matter said next week for sure.

Anonymous
03/01/26(Sun)23:28:10 No.108273255

Anonymous 03/01/26(Sun)23:28:10 No.108273255▶

I need that dense 120B gemma

Anonymous
03/01/26(Sun)23:29:08 No.108273259

Anonymous 03/01/26(Sun)23:29:08 No.108273259▶

>>108273092
>JoyCaption
Thank you

>>108273165
Thank you too

Anonymous
03/01/26(Sun)23:29:34 No.108273260

Anonymous 03/01/26(Sun)23:29:34 No.108273260▶

>>108273246
It's from Reuters and they never lie

Anonymous
03/01/26(Sun)23:32:31 No.108273275

Anonymous 03/01/26(Sun)23:32:31 No.108273275▶

>>108273222
what do you mean back? i thought qwen 3.5 was good? sounds like everyone was just coping and it's just another slopped chinkshit release >>108268860
deepseek will be the same, more sterile scraped benchmaxxed GPTslop that performs worse than API in every real-world scenario, including uncensored roleplay.

Anonymous
03/01/26(Sun)23:33:39 No.108273280

Anonymous 03/01/26(Sun)23:33:39 No.108273280▶

>>108273275
>>108268776

Anonymous
03/01/26(Sun)23:34:24 No.108273287

Anonymous 03/01/26(Sun)23:34:24 No.108273287▶

>>108273275
DS at least has some track record of improvements. Qwen is forever utility model.

Anonymous
03/01/26(Sun)23:37:04 No.108273301

Anonymous 03/01/26(Sun)23:37:04 No.108273301▶

>loading model from an actual spinning hard drive
shit's a bit slow

Anonymous
03/01/26(Sun)23:37:07 No.108273302

Anonymous 03/01/26(Sun)23:37:07 No.108273302▶

Can you grok a specific person with an LLM? Are they already doing it?

Anonymous
03/01/26(Sun)23:43:36 No.108273332

Anonymous 03/01/26(Sun)23:43:36 No.108273332▶

>>108273301
retard

Anonymous
03/01/26(Sun)23:44:03 No.108273334

Anonymous 03/01/26(Sun)23:44:03 No.108273334▶

File: firefox_WsXZCT3f4V.png (75.7 KB)

75.7 KB PNG

qwen 3.5 35b moe passes the cup test with flying colors holy shit

Anonymous
03/01/26(Sun)23:45:32 No.108273341

Anonymous 03/01/26(Sun)23:45:32 No.108273341▶

migrate
>>108273339
>>108273339
>>108273339
>>108273339
>>108273339

Anonymous
03/01/26(Sun)23:47:38 No.108273354

Anonymous 03/01/26(Sun)23:47:38 No.108273354▶

>>108272331
Are you sure that you're running 35B-A3B and not the 27B?
Another thing to consider is: how much RAM do you have? If you only have 16 GB of RAM and aren't using something like q4 or q5 then you might be running out of RAM and it starts loading some of the model from disk.

What are you using to run the model? Consider trying koboldcpp since that's what I use. See if that fixes it.

Anonymous
03/01/26(Sun)23:49:12 No.108273365

Anonymous 03/01/26(Sun)23:49:12 No.108273365▶

>>108272539
Doesn't that fuck caching?

Anonymous
03/01/26(Sun)23:49:17 No.108273366

Anonymous 03/01/26(Sun)23:49:17 No.108273366▶

>>108273354
the dev of kobold is an antisemite no way im using his shit

Anonymous
03/01/26(Sun)23:50:57 No.108273376

Anonymous 03/01/26(Sun)23:50:57 No.108273376▶

>>108273366
Proof?

Anonymous
03/01/26(Sun)23:54:25 No.108273401

Anonymous 03/01/26(Sun)23:54:25 No.108273401▶

>>108273366
lol

Anonymous
03/02/26(Mon)00:09:47 No.108273465

Anonymous 03/02/26(Mon)00:09:47 No.108273465▶

What's the best way to continue a long chat? I still have like 20k context left but I'm really enjoying this RP and want to make it last a while.

Anonymous
03/02/26(Mon)00:16:41 No.108273493

Anonymous 03/02/26(Mon)00:16:41 No.108273493▶

>>108273465
>best
Increasing context. You could ask your model to summarize the log, but you'll probably lose whatever makes it good.

Anonymous
03/02/26(Mon)00:31:27 No.108273558

Anonymous 03/02/26(Mon)00:31:27 No.108273558▶

>>108273128
Thanks, already followed it and tried with this notebook twice. Once dumping all of my data in and training, and once trying to correct everything reformat everything as pretty markdown stubs. Dumping everything in without even looking at the data gave proportionally better results, but I think I messed up the second run with how I used warmup-stable-decay so I don't know if my results are to be believe, and I was spending time trying to debate what the next run should be since each of them is taking time. I guess I'll just try to use common sense like the other anon said, thanks.

Anonymous
03/02/26(Mon)04:28:03 No.108274676

Anonymous 03/02/26(Mon)04:28:03 No.108274676▶

>>108272276
Cleaning data makes model sloppier

Anonymous
03/02/26(Mon)05:23:22 No.108274844

Anonymous 03/02/26(Mon)05:23:22 No.108274844▶

>>108270249
Yeah Granite is underrated, I used Small and Tiny a lot and they worked well. They are MoE. Tiny might run on that card. It's 7B total, 1B active. The Granite models do FIM as well, which is nice.

Anonymous
03/02/26(Mon)05:39:59 No.108274885

Anonymous 03/02/26(Mon)05:39:59 No.108274885▶

>>108271037
Yeah, with Qwen 35B A3B IQ4 I get 6 tokens per second on a Ryzen 7, no GPU, 32 GB RAM.

Anonymous
03/02/26(Mon)06:24:42 No.108275010

Anonymous 03/02/26(Mon)06:24:42 No.108275010▶

>>108273365
if users are constantly using it around 23:55-00:05 yeah

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108268616

🔍 Search & Sort