/g/ - Thread 108659983

/g/

Thread #108659983

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/22/26(Wed)10:03:49 No.108659983

/lmg/ - Local Models General Anonymous 04/22/26(Wed)10:03:49 No.108659983 [Reply]▶

File: 2026-04-16_190147_seed1_00001_.png (1.3 MB)

1.3 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108655009 & >>108650825

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

469 RepliesView Thread

Showing all 469 replies.

Anonymous
04/22/26(Wed)10:04:19 No.108659986

Anonymous 04/22/26(Wed)10:04:19 No.108659986▶

File: reward function.jpg (183.8 KB)

183.8 KB JPG

►Recent Highlights from the Previous Thread: >>108655009

--Optimizing MoE throughput via expert-specific VRAM placement and KTransformers:
>108657760 >108657782 >108658414 >108658470 >108658530 >108658575 >108658621 >108658586 >108658667 >108658692 >108658791 >108658869 >108659015 >108659058 >108659103 >108659121 >108659225 >108659116 >108658708 >108657857 >108658768 >108658845
--Discussing GPT-Image-2 performance, agentic RP frontends, and prose refinement:
>108655453 >108655622 >108655648 >108655651 >108655698 >108655674 >108655760 >108655836 >108655927 >108656114 >108656231 >108656247 >108656273 >108656305 >108656444 >108656521 >108656543 >108656550 >108656581 >108656955 >108656988 >108657014 >108657067 >108658723 >108657487 >108655952 >108655999 >108656025 >108656045 >108656052 >108656077 >108656111 >108656050
--Discussing llama-server prompt re-processing and KV-cache checkpoint issues:
>108655857 >108655885 >108655892 >108657373 >108657410 >108655920
--Speculating on Engrams and the adoption of Mamba-hybrid architectures:
>108655522 >108655563 >108655575 >108655607 >108655690 >108655839 >108655652 >108655664 >108655696
--Discussing Heretic's string matching limitations and soft refusal detection:
>108657013 >108657036 >108657050 >108657098 >108657128 >108657078 >108657135 >108657469
--Comparing Qwen 3.6 and Gemma 4 performance and tool use:
>108655272 >108655289 >108655291 >108655338 >108656365 >108657503 >108659276
--Comparing Kimi-K2.6 performance and hardware requirements against Gemma:
>108656722 >108656741 >108656785 >108656867 >108656855 >108656868
--Logs:
>108655476 >108655552 >108655622 >108655698 >108656305 >108656399 >108657859 >108657977 >108658392 >108658439 >108658584 >108658643 >108659124
--Teto, Miku (free space):
>108655633 >108655652 >108656114 >108658404 >108658791

►Recent Highlight Posts from the Previous Thread: >>108655011

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/22/26(Wed)10:06:36 No.108659996

Anonymous 04/22/26(Wed)10:06:36 No.108659996▶

I never really dabbled with local LLM's and there are so many models idk what I should go with.
>3090, 24GB VRAM
>something tuned for troubleshooting tech problems
>something tuned for language agnostic coding
>maybe something small and simple for Illustrious/NoobXL prompting
>ideally both something to fill up all of my VRAM and something lighter (8-12GB) depending on how much buffer I'll need

Anonymous
04/22/26(Wed)10:08:15 No.108660001

Anonymous 04/22/26(Wed)10:08:15 No.108660001▶

rin
sex

Anonymous
04/22/26(Wed)10:19:30 No.108660039

Anonymous 04/22/26(Wed)10:19:30 No.108660039▶

>>108659983
I look exactly like this

Anonymous
04/22/26(Wed)10:22:04 No.108660054

Anonymous 04/22/26(Wed)10:22:04 No.108660054▶

>>108659996
Low quant of Gemma 31B and Qwen 3.6 35B with the experts on RAM.
You could try Gemma E4B for the lighter option if you don't have a lot of RAM, but don't expect much from that size.

Anonymous
04/22/26(Wed)10:23:36 No.108660066

Anonymous 04/22/26(Wed)10:23:36 No.108660066▶

uh-oh! The schizo followed us here

Anonymous
04/22/26(Wed)10:25:22 No.108660075

Anonymous 04/22/26(Wed)10:25:22 No.108660075▶

File: kek.png (93.6 KB)

93.6 KB PNG

https://xcancel.com/zerohedge/status/2046706218924691894#m
I hope you're ready to heat up your 4TB RAM machine anon, good times are comming

Anonymous
04/22/26(Wed)10:25:59 No.108660079

Anonymous 04/22/26(Wed)10:25:59 No.108660079▶

>>108660075
>accessed
burger of nothing

Anonymous
04/22/26(Wed)10:35:57 No.108660128

Anonymous 04/22/26(Wed)10:35:57 No.108660128▶

>>108660075
>claude desktop is a vibecoded buggy mess
>zomg trust us guise nobody will be employed by next year!!!

Anonymous
04/22/26(Wed)10:49:02 No.108660195

Anonymous 04/22/26(Wed)10:49:02 No.108660195▶

>>108659996
get another gpu that is 12+ gb and run shit in higher than q4 quants whatever you do.

Anonymous
04/22/26(Wed)10:51:16 No.108660209

Anonymous 04/22/26(Wed)10:51:16 No.108660209▶

>>108660128
>>claude desktop is a vibecoded buggy mess
and it's still the best tui harness by far, which is why the source leak got so much attention. this isn't the damning evidence you seem to think it is.

Anonymous
04/22/26(Wed)10:52:13 No.108660213

Anonymous 04/22/26(Wed)10:52:13 No.108660213▶

>>108660209
>and it's still the best tui harness by far,
which is why no one is using it?

Anonymous
04/22/26(Wed)10:53:33 No.108660223

Anonymous 04/22/26(Wed)10:53:33 No.108660223▶

>>108659996
I run Gemma 31B exl3 at 3bpw and I'm totally happy with it. I don't think I need more

Anonymous
04/22/26(Wed)10:59:13 No.108660247

Anonymous 04/22/26(Wed)10:59:13 No.108660247▶

>>108660054
I have 64GB of DDR4 3600MHz RAM so it's plenty but not that fast. The models would probably be sluggish.

Anonymous
04/22/26(Wed)11:01:06 No.108660260

Anonymous 04/22/26(Wed)11:01:06 No.108660260▶

>>108660247
You can either try to run a q4 of the dense or fullsize moe+room for imagegen/tts/whatevs

Anonymous
04/22/26(Wed)11:02:21 No.108660268

Anonymous 04/22/26(Wed)11:02:21 No.108660268▶

>>108660247
As long as the shared experts are in VRAM, Qwen should still be quick. Try it.

Anonymous
04/22/26(Wed)11:03:53 No.108660279

Anonymous 04/22/26(Wed)11:03:53 No.108660279▶

>>108660268
>As long as the shared experts are in VRAM
Also I don't know shit about fuck about LLM terminology or how to properly run it. I'm guessing the "experts" are an inherent part of the model and I can pass a CLI argument to llama.cpp to offload that part of it to VRAM?

Anonymous
04/22/26(Wed)11:05:46 No.108660284

Anonymous 04/22/26(Wed)11:05:46 No.108660284▶

File: elephant strawberry.jpg (70.4 KB)

70.4 KB JPG

you now remember "strawberry model" hype cycle

Anonymous
04/22/26(Wed)11:11:13 No.108660302

Anonymous 04/22/26(Wed)11:11:13 No.108660302▶

>>108660279
Put these comments in a clpud LLM to explain it. Get Claude Code to set it up for you

Anonymous
04/22/26(Wed)11:12:47 No.108660312

Anonymous 04/22/26(Wed)11:12:47 No.108660312▶

>>108660279
Qwen 35b a3b for example 35b total parameters but only 3b are loaded as experts and you have many of these experts. Very fast vs dense.

Dense models are smarter but slower and offloading to different gpus/cpu is slow as balls while performance is less on MoE vs dense

Anonymous
04/22/26(Wed)11:13:53 No.108660317

Anonymous 04/22/26(Wed)11:13:53 No.108660317▶

>>108660279
NTA but yes, MoE (Mixture of Experts) are a type of LLM which can feasibly be split between VRAM and RAM without a massive speed penalty, llamacpp has an -ncmoe argument for automatically shoving the ram bits into ram, so you'd have something like -ngl 99 -ncmoe in your args,
Conversely a 'dense' model is the original format and slows to a crawl if you put it into ram rather than vram even partially.
You can usually tell the two types apart at a glance because a dense model will just be called whatever-100b whereas an MoE model will be whatever-100b-a5b because it's separated into total and active parameters

Anonymous
04/22/26(Wed)11:14:39 No.108660319

Anonymous 04/22/26(Wed)11:14:39 No.108660319▶

llamaocpp args to shove the shared expert to vram on a4b?

Anonymous
04/22/26(Wed)11:18:20 No.108660347

Anonymous 04/22/26(Wed)11:18:20 No.108660347▶

>>108660302
nvm, googled it up and got my answer lol
https://gist.github.com/DocShotgun/a02a4c0c0a57e43ff4f038b46ca66ae0
>>108660317
>>108660312
Good to know, thanks. Guess it makes plenty of sense to design them like this nowadays since the quality of the model depends on how much context data you have, but you don't necessarily need all of it loaded all the time for it to be speedy and accurate, so the RAM offload is a good compromise. idk I mainly dabbled more in diffusion models so my LLM knowledge is a bit sparse

Anonymous
04/22/26(Wed)11:18:58 No.108660349

Anonymous 04/22/26(Wed)11:18:58 No.108660349▶

File: schat_screenshot.png (42 KB)

42 KB PNG

> everyone makes webui
just vibe code your own native ui in c with sdl
zero bloat

Anonymous
04/22/26(Wed)11:19:48 No.108660354

Anonymous 04/22/26(Wed)11:19:48 No.108660354▶

File: Screenshot_20260422_131828_Chrome.jpg (636 KB)

636 KB JPG

>>108660319
Bro use ai mode or sumfin. Copy paste terminal errors and shit in there until it works

Anonymous
04/22/26(Wed)11:20:52 No.108660360

Anonymous 04/22/26(Wed)11:20:52 No.108660360▶

>>108660354
your picrel answers literally nothing relevant about my original question

Anonymous
04/22/26(Wed)11:21:28 No.108660365

Anonymous 04/22/26(Wed)11:21:28 No.108660365▶

>>108660075
i only have 128 gb. i wish i bought that 1tb earlier.

Anonymous
04/22/26(Wed)11:22:27 No.108660368

Anonymous 04/22/26(Wed)11:22:27 No.108660368▶

>>108660360
You just paste shit into the bot until you understand.

Anonymous
04/22/26(Wed)11:22:58 No.108660371

Anonymous 04/22/26(Wed)11:22:58 No.108660371▶

>>108660319
-cmoe

Anonymous
04/22/26(Wed)11:23:56 No.108660377

Anonymous 04/22/26(Wed)11:23:56 No.108660377▶

>>108660371
read again

Anonymous
04/22/26(Wed)11:25:30 No.108660385

Anonymous 04/22/26(Wed)11:25:30 No.108660385▶

>>108660349
I thought they weren't good at c

Anonymous
04/22/26(Wed)11:25:38 No.108660386

Anonymous 04/22/26(Wed)11:25:38 No.108660386▶

>>108660075
>Zerohedge
About as trustworthy as any post on this site. Literally the same thing. A platform to let anonymous schizos say whatever the fuck.

Anonymous
04/22/26(Wed)11:27:10 No.108660394

Anonymous 04/22/26(Wed)11:27:10 No.108660394▶

>>108660385
not good at writing good c

Anonymous
04/22/26(Wed)11:29:45 No.108660408

Anonymous 04/22/26(Wed)11:29:45 No.108660408▶

File: 1757612581632355.png (21.8 KB)

21.8 KB PNG

>>108660349
We webui engineers are gooder

The Art of Living Intentionally — A Manifesto for the Discerning Mind
04/22/26(Wed)11:31:14 No.108660418

The Art of Living Intentionally — A Manifesto for the Discerning Mind 04/22/26(Wed)11:31:14 No.108660418▶

>tfw too destitute to build a 30b regurgitating machine

Anonymous
04/22/26(Wed)11:38:03 No.108660448

Anonymous 04/22/26(Wed)11:38:03 No.108660448▶

>>108660360
It actually does, if you want to specifically put the shared expert on a device (ie, a gpu) -ot with regex is the correct answer.
More generally though, the shared expert should automatically go to vram with -ngl

Anonymous
04/22/26(Wed)11:39:20 No.108660454

Anonymous 04/22/26(Wed)11:39:20 No.108660454▶

File: GPT Image 2.png (1.2 MB)

1.2 MB PNG

kek

Anonymous
04/22/26(Wed)11:46:28 No.108660493

Anonymous 04/22/26(Wed)11:46:28 No.108660493▶

>>108660408
>marinara
Is it any good?

Anonymous
04/22/26(Wed)11:46:51 No.108660495

Anonymous 04/22/26(Wed)11:46:51 No.108660495▶

>>108660454
the dsv4 links are going to be a lot harder to spot as fakes

Anonymous
04/22/26(Wed)11:50:43 No.108660522

Anonymous 04/22/26(Wed)11:50:43 No.108660522▶

>>108660349
looks like some android 4.1 shit

Anonymous
04/22/26(Wed)11:52:23 No.108660536

Anonymous 04/22/26(Wed)11:52:23 No.108660536▶

>>108660495
get brat gemma-chan to screen them for you

Anonymous
04/22/26(Wed)11:53:00 No.108660542

Anonymous 04/22/26(Wed)11:53:00 No.108660542▶

https://docs.cactuscompute.com/latest/blog/turboquant-h/
>TurboQuant-H shares the core insight with TurboQuant; rotation concentrates coordinates into a well-behaved distribution, enabling aggressive scalar quantization, but simplifies the pipeline for offline weight quantization. Follow the link deeper dive into the technique.
>Cactus baseline used INT4 linears + INT8 embedding, yielding 4.8GB for E2B (5B total params). TurboQuant-H squishes this to INT4 linears + INT2 embeddings, reducing to 2.9GB. The perplexity on our calibration went from 1.8547 to 1.9111, complete evaluation coming in the paper.
desu if I can go from Q8 to Q4 KV cache I'll be happy, this shit is eating so much VRAM

Anonymous
04/22/26(Wed)11:53:18 No.108660543

Anonymous 04/22/26(Wed)11:53:18 No.108660543▶

File: 1763251779552153.jpg (187.2 KB)

187.2 KB JPG

>>108660349
Cenile interface

Anonymous
04/22/26(Wed)11:54:08 No.108660548

Anonymous 04/22/26(Wed)11:54:08 No.108660548▶

>>108660349
Now, add a markdown and latex parser

Anonymous
04/22/26(Wed)11:55:11 No.108660554

Anonymous 04/22/26(Wed)11:55:11 No.108660554▶

Threadly reminder if you're using your llm for coding or anything that requires repeating something in context almost verbatim and you're not using
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-max 64
You're leaving a shitload of performance on the table.

Anonymous
04/22/26(Wed)11:56:19 No.108660558

Anonymous 04/22/26(Wed)11:56:19 No.108660558▶

>>108660349
>not lazarus

Anonymous
04/22/26(Wed)11:56:25 No.108660559

Anonymous 04/22/26(Wed)11:56:25 No.108660559▶

>>108660493
No, it's nonsense dogshit vibecoded garbage for API key sniffers. There is also a retarded, auto-favorited self-insert character of the author you cannot delete period.

Anonymous
04/22/26(Wed)11:56:55 No.108660562

Anonymous 04/22/26(Wed)11:56:55 No.108660562▶

>>108660536
She'll say it's all real and then ask for dick pics

Anonymous
04/22/26(Wed)11:57:12 No.108660565

Anonymous 04/22/26(Wed)11:57:12 No.108660565▶

File: 00001-1378487878.png (1.4 MB)

1.4 MB PNG

> Another day
> Still no V4
Happy Wednesday I guess.
>>108660349
I'd really like a terminal interface / green screen RP engine. But not enough to spend the tokens vibe coding one.

Anonymous
04/22/26(Wed)11:58:50 No.108660571

Anonymous 04/22/26(Wed)11:58:50 No.108660571▶

>>108660565
Ask gemmy to make a highly detailed structure instruct and then to take it and oneshot a sillytav into a TUI conversion

Anonymous
04/22/26(Wed)12:00:38 No.108660580

Anonymous 04/22/26(Wed)12:00:38 No.108660580▶

>>108660559
Sounds gay. Maybe orbanon can steal some of the good ideas

Anonymous
04/22/26(Wed)12:02:25 No.108660589

Anonymous 04/22/26(Wed)12:02:25 No.108660589▶

>>108660580
orb guy's shit doesn't add anything to the table and is also vibecoded

Anonymous
04/22/26(Wed)12:04:33 No.108660600

Anonymous 04/22/26(Wed)12:04:33 No.108660600▶

>>108660565
>forge
I thought lmg was at the vanguard of tech???

Anonymous
04/22/26(Wed)12:04:53 No.108660604

Anonymous 04/22/26(Wed)12:04:53 No.108660604▶

>>108660589
And yet it's already better than st

Anonymous
04/22/26(Wed)12:05:08 No.108660606

Anonymous 04/22/26(Wed)12:05:08 No.108660606▶

>>108660589
I think it's a pretty neat idea, but it's completely anathema to how I do my best to stop context from reprocessing ever. If I was running my backend with parallel requests and it were properly designed around that, I'd probably use it over other rp frontends.

Anonymous
04/22/26(Wed)12:06:19 No.108660609

Anonymous 04/22/26(Wed)12:06:19 No.108660609▶

Just give me sillytavern in vscode

Anonymous
04/22/26(Wed)12:06:40 No.108660610

Anonymous 04/22/26(Wed)12:06:40 No.108660610▶

>>108660075
It was US government.

Anonymous
04/22/26(Wed)12:07:34 No.108660613

Anonymous 04/22/26(Wed)12:07:34 No.108660613▶

>>108660610
fuck's dario gonna do?

Anonymous
04/22/26(Wed)12:08:28 No.108660618

Anonymous 04/22/26(Wed)12:08:28 No.108660618▶

>>108660609
Nah

Anonymous
04/22/26(Wed)12:11:45 No.108660630

Anonymous 04/22/26(Wed)12:11:45 No.108660630▶

>>108660075
All these leaks just tell me they want me to use something they made to add backdoors or something worse to my computer, if I'm foolish enough to run their stolen models.

Anonymous
04/22/26(Wed)12:13:12 No.108660638

Anonymous 04/22/26(Wed)12:13:12 No.108660638▶

>>108660613
He can open source it

Anonymous
04/22/26(Wed)12:14:22 No.108660641

Anonymous 04/22/26(Wed)12:14:22 No.108660641▶

>>108660638
That would literally be like dropping a hydrogen bomb in the middle of NYC

Anonymous
04/22/26(Wed)12:16:26 No.108660649

Anonymous 04/22/26(Wed)12:16:26 No.108660649▶

>>108660641
good

Anonymous
04/22/26(Wed)12:16:27 No.108660650

Anonymous 04/22/26(Wed)12:16:27 No.108660650▶

>>108660641
Stop posting erotica on a blue board

Anonymous
04/22/26(Wed)12:18:56 No.108660655

Anonymous 04/22/26(Wed)12:18:56 No.108660655▶

File: 1749908189602018.gif (1.5 MB)

1.5 MB GIF

>Mythos leaked
>turns out to be an overhyped dud
>either not as good as Anthropic claimed it was or so oversized it's basically useless without a gargantuan datacenter only a few US corpos have making it more of a money sink than an existential threat
>major investor doubt arises when the model that was supposedly """too dangerous to release""" was yet another overhyped trashfire that was never financially viable to begin with
please god let this happen it would be so fucking funny

Anonymous
04/22/26(Wed)12:21:07 No.108660662

Anonymous 04/22/26(Wed)12:21:07 No.108660662▶

>>108660655
Well the chinks will have fun running it locally and distilling the fuck out of it

Anonymous
04/22/26(Wed)12:21:31 No.108660663

Anonymous 04/22/26(Wed)12:21:31 No.108660663▶

>>108660655
Even if it was 4000b, if it was even half as good as they claimed it would be worth it rent compute for a few hours to complete some tasks. It would more than pay for itself for malicious actors.

Anonymous
04/22/26(Wed)12:23:06 No.108660674

Anonymous 04/22/26(Wed)12:23:06 No.108660674▶

>>108660075
so download link or what?

Anonymous
04/22/26(Wed)12:28:35 No.108660701

Anonymous 04/22/26(Wed)12:28:35 No.108660701▶

Any suggestion to improve my kobold batch file for gemma 4 31b?

=========================

@echo off
SET KOBOLD_EXE=koboldcpp.exe

"%KOBOLD_EXE%" ^
--model "D:/Models/LLM_Models/lmstudio-community/google_gemma-4-31B-it-Q6_K-gguf/google_gemma-4-31B-it-Q6_K.gguf" ^
--mmproj "D:/Models/LLM_Models/lmstudio-community/coder3101_gemma_4_31b_it_heretic-Q6_K.gguf/mmproj-model-f16.gguf" ^
--port 5001 ^
--threads 8 ^
--usecuda 0 mmq ^
--contextsize 32768 ^
--gpulayers 99 ^
--tensor_split 8.0 32.0 ^
--maingpu 1 ^
--batchsize 512 ^
--noshift ^
--useswa ^
--usemmap ^
--multiuser 1 ^
--highpriority ^
--jinja ^
--jinja_tools ^
--jinja_kwargs "{\"enable_thinking\":true}" ^
--draftamount 8 ^
--draftgpulayers 999 ^
--chatcompletionsadapter AutoGuess ^
--defaultgenamt 1024 ^
--maxrequestsize 32

pause

Anonymous
04/22/26(Wed)12:28:51 No.108660702

Anonymous 04/22/26(Wed)12:28:51 No.108660702▶

>>108660075
>>108660365
>>108660630
>>108660674
for a text gen general, reading comprehension skills in here are abysmal

Anonymous
04/22/26(Wed)12:31:52 No.108660724

Anonymous 04/22/26(Wed)12:31:52 No.108660724▶

>>108660702
more like, if the tweet meant 'inference was accessed by rando', it means nothing
so everyone was probably assuming 'weights was accessed via third party', vastly underestimating xittards

Anonymous
04/22/26(Wed)12:33:04 No.108660735

Anonymous 04/22/26(Wed)12:33:04 No.108660735▶

>>108660589
>ST
>sloppy, looping mess
>orb
>just werks, no slop, no loop
I only care about results and orb's delivering

Anonymous
04/22/26(Wed)12:34:33 No.108660741

Anonymous 04/22/26(Wed)12:34:33 No.108660741▶

File: hlsgb1l7ycwg1.png (845.5 KB)

845.5 KB PNG

>>108660701
yeah, why are you using that specific quant?
pretty atrocious graph, i doubt 31b is better.

Anonymous
04/22/26(Wed)12:34:47 No.108660743

Anonymous 04/22/26(Wed)12:34:47 No.108660743▶

>>108660701
Why do you have arguments for a draft model but no draft model or ngram loaded?
Also you've made a poor choice with your mmproj: the f16's perform worse than the bf16s.

Anonymous
04/22/26(Wed)12:34:54 No.108660744

Anonymous 04/22/26(Wed)12:34:54 No.108660744▶

>>108660589
And what are you bringing to the table?

Anonymous
04/22/26(Wed)12:35:32 No.108660749

Anonymous 04/22/26(Wed)12:35:32 No.108660749▶

>>108660735
>sloppy, looping mess
stop using text completion
>>108660741
>unsloth-made graph promotes unsloth
noway??

Anonymous
04/22/26(Wed)12:37:55 No.108660765

Anonymous 04/22/26(Wed)12:37:55 No.108660765▶

vramlets it's your time to shine
https://itayinbarr.substack.com/p/honey-i-shrunk-the-coding-agent
https://github.com/itayinbarr/little-coder

Anonymous
04/22/26(Wed)12:37:58 No.108660766

Anonymous 04/22/26(Wed)12:37:58 No.108660766▶

>>108659532
They have also updated their usage tracker to merge Chat & Reasoner
V4 before the end of April (April is a state of mind)

Anonymous
04/22/26(Wed)12:39:32 No.108660771

Anonymous 04/22/26(Wed)12:39:32 No.108660771▶

>>108660749
>stop using text completion
ST rolling branch has gemma 4 templates and it works perfect with text completion now.

Anonymous
04/22/26(Wed)12:40:05 No.108660777

Anonymous 04/22/26(Wed)12:40:05 No.108660777▶

File: Screenshot 2026-04-22 at 14-39-42 itayinbarr_little-coder A coding agent optimized to smaller LLMs.png (9.1 KB)

9.1 KB PNG

>>108660765
garbage

Anonymous
04/22/26(Wed)12:41:13 No.108660785

Anonymous 04/22/26(Wed)12:41:13 No.108660785▶

>>108660749
yeah they are faggots but you trust closed lmstudio "community" more? who even is that
well suit yourself then.

>"Element Labs can provided terminate the Agreement at its discretion upon no less than 10 days-notice via any reasonable means, including by posting a notice on the website..."
Just another company that is a llama.cpp wrapper.

Anonymous
04/22/26(Wed)12:41:21 No.108660787

Anonymous 04/22/26(Wed)12:41:21 No.108660787▶

File: file.png (238.1 KB)

238.1 KB PNG

>>108660741
I just downloaded whatever was most downloaded on HF at the time, i will definitely try another one thank you anon!

>>108660743
I am not sure, still learning what most flags do.. would you recommend to set up a draft model or just remove it? use case for now is mostly rp/assistant

Anonymous
04/22/26(Wed)12:42:25 No.108660789

Anonymous 04/22/26(Wed)12:42:25 No.108660789▶

File: 00004-1260451778.png (1.4 MB)

1.4 MB PNG

>>108660571
Not sure how I'd proceed, but certain that I would not use ST as a starting point. lol. That whole thing is a big ole mess. To me the entire purpose of doing it as text-only / CLI is to debloat it, since I'd only have one inference connection. I'd probably make all the configurations as plain text files you'd update using Nano while out of interface.
>>108660600
/lmg/ would never claim me as one of their own.

Anonymous
04/22/26(Wed)12:43:05 No.108660793

Anonymous 04/22/26(Wed)12:43:05 No.108660793▶

>>108660749
>stop using text completion
I'm using chat completion

Anonymous
04/22/26(Wed)12:44:44 No.108660800

Anonymous 04/22/26(Wed)12:44:44 No.108660800▶

>>108660789
/sdg/ welcomes you

Anonymous
04/22/26(Wed)12:47:26 No.108660815

Anonymous 04/22/26(Wed)12:47:26 No.108660815▶

>>108660800
yeah that's where illu shitmix slop goes to

Anonymous
04/22/26(Wed)12:48:03 No.108660821

Anonymous 04/22/26(Wed)12:48:03 No.108660821▶

>>108660744
I have 287 merged PRs on the SillyTavern repository, what do you contribute?

Anonymous
04/22/26(Wed)12:48:16 No.108660824

Anonymous 04/22/26(Wed)12:48:16 No.108660824▶

>>108659983
>no nipples
sad

Anonymous
04/22/26(Wed)12:49:21 No.108660831

Anonymous 04/22/26(Wed)12:49:21 No.108660831▶

>>108660824
it's being 3d printed

Anonymous
04/22/26(Wed)12:49:28 No.108660832

Anonymous 04/22/26(Wed)12:49:28 No.108660832▶

>>108660815
>yeah that's where illu shitmix slop goes to
that's /adt/ you're thinking of

Anonymous
04/22/26(Wed)12:50:47 No.108660839

Anonymous 04/22/26(Wed)12:50:47 No.108660839▶

Why is turboquant not in llama cpp yet? Isn't the implementation rather trivial? I don't want to use a fork...

Anonymous
04/22/26(Wed)12:51:42 No.108660847

Anonymous 04/22/26(Wed)12:51:42 No.108660847▶

>>108660839
rotation is there

Anonymous
04/22/26(Wed)12:51:56 No.108660848

Anonymous 04/22/26(Wed)12:51:56 No.108660848▶

>>108660787
>I am not sure, still learning what most flags do.. would you recommend to set up a draft model or just remove it? use case for now is mostly rp/assistant
A draft model can give you a notable speedup if you have spare vram for it, but it's incompatible with having an mmproj loaded, so no multimodal.
For reference, I use gemma 31b at q8, without a draft model (and with the mmproj) my output is around 25 t/s. When using it with the 26b q2 as a draft model, I get around 41 t/s doing rp and 80 t/s doing technical work.

Anonymous
04/22/26(Wed)12:53:52 No.108660856

Anonymous 04/22/26(Wed)12:53:52 No.108660856▶

What happened to Elon? He used to do many amazing things for society. But his actions look increasingly selfish. Instead of delivering superior products, he attacks his competitors. He calls OpenAI "ClosedAI", yet they have contributed much more to open source than xAI. He calls Anthropic misanthropic, but they have the most mature efforts to ensure AI benefits all humans while Elon's position boils down to UBI and "truthseeking AI will probably not kill us" with no justification. It feels like he wants to be the one in charge of AGI without providing any reason why the rest of mankind can trust him with this responsibility.

Anonymous
04/22/26(Wed)12:55:22 No.108660864

Anonymous 04/22/26(Wed)12:55:22 No.108660864▶

>>108660856
>He calls Anthropic misanthropic, but they have the most mature efforts to ensure AI benefits all humans
AHAHAHAHAHAHA

Anonymous
04/22/26(Wed)12:57:15 No.108660877

Anonymous 04/22/26(Wed)12:57:15 No.108660877▶

>>108660856
Elon is one of the least relevant people in the industry to this general

Anonymous
04/22/26(Wed)12:59:30 No.108660883

Anonymous 04/22/26(Wed)12:59:30 No.108660883▶

>>108660856
>He calls OpenAI "ClosedAI", yet they have contributed much more to open source than xAI.
Because he didn't name his company OpenxAI. I'm sure this is bait but what a dumb point

Anonymous
04/22/26(Wed)13:00:20 No.108660888

Anonymous 04/22/26(Wed)13:00:20 No.108660888▶

>>108660856
this is the most reddit post ive read all day

Anonymous
04/22/26(Wed)13:01:15 No.108660898

Anonymous 04/22/26(Wed)13:01:15 No.108660898▶

>>108660663
>4000b
https://huggingface.co/unsloth/Claude-Mythos-GGUF/blob/main/Claude-Mythos-TurboCuqed-Bonsai-Stretched-Rotated-smol-UD-Q0.5_XS.gguf

Anonymous
04/22/26(Wed)13:02:13 No.108660907

Anonymous 04/22/26(Wed)13:02:13 No.108660907▶

>>108660898
not bitnet; no click

Anonymous
04/22/26(Wed)13:02:22 No.108660908

Anonymous 04/22/26(Wed)13:02:22 No.108660908▶

>>108660856
you really believe this? the meme ceo never did anything for humanity and neither has anthropic. you are worshiping scumbag corpos and cheering for them like it is your favorite sportsball team, why?

Anonymous
04/22/26(Wed)13:03:36 No.108660918

Anonymous 04/22/26(Wed)13:03:36 No.108660918▶

>>108660848
Thank you for the input i will be looking into it

Anonymous
04/22/26(Wed)13:03:51 No.108660922

Anonymous 04/22/26(Wed)13:03:51 No.108660922▶

File: 2649435.png (125.5 KB)

125.5 KB PNG

https://huggingface.co/Jongsim/gemma-4-26B-A4B-it-upcycled-192-pretrained
What is this schizphrenia?

Anonymous
04/22/26(Wed)13:04:28 No.108660928

Anonymous 04/22/26(Wed)13:04:28 No.108660928▶

>>108660907
Bonsai is the new Bitnet unc

Anonymous
04/22/26(Wed)13:05:39 No.108660934

Anonymous 04/22/26(Wed)13:05:39 No.108660934▶

>>108660848
Not that guy but do you need some snowflake tunes for draft model or it can be anything? Also does quanting matter for those? I have ~1.5 gigs of vram to spare

Anonymous
04/22/26(Wed)13:05:52 No.108660935

Anonymous 04/22/26(Wed)13:05:52 No.108660935▶

>>108660922
China saw the success of https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H and decided to copy it

Anonymous
04/22/26(Wed)13:06:00 No.108660936

Anonymous 04/22/26(Wed)13:06:00 No.108660936▶

>>108660908
OpenAI was co-founded by Elon was it not? Whisper is open source, and so are the early GPTs. They only started their for-profit jewish tactics after Elon left, didn't he try to sue the board because of it?

Anonymous
04/22/26(Wed)13:06:25 No.108660939

Anonymous 04/22/26(Wed)13:06:25 No.108660939▶

>>108660856
which model?

Anonymous
04/22/26(Wed)13:09:26 No.108660961

Anonymous 04/22/26(Wed)13:09:26 No.108660961▶

Saars please teach me how to do the needful and make money with AI

Anonymous
04/22/26(Wed)13:11:30 No.108660973

Anonymous 04/22/26(Wed)13:11:30 No.108660973▶

>>108660936
one slimey business man betrayed some other egotistical business man, thats pretty much par for the course, no honor among thieves, have you even been watching the same sportsball game?

Anonymous
04/22/26(Wed)13:11:38 No.108660974

Anonymous 04/22/26(Wed)13:11:38 No.108660974▶

>>108660961
learn imagegen and setup a patreon

Anonymous
04/22/26(Wed)13:11:57 No.108660977

Anonymous 04/22/26(Wed)13:11:57 No.108660977▶

>>108660961
first u need to use bahrat sovregirn models please god name model 4 trillion params made for good looks and pr.
then u ask model to produce money
sir if you need of support u can find me on fiver and i can be of guide to u for your jurney to make real money :rocket: :rocket:

Anonymous
04/22/26(Wed)13:12:23 No.108660983

Anonymous 04/22/26(Wed)13:12:23 No.108660983▶

Gemma 4 Goliath 62b dense when?

Anonymous
04/22/26(Wed)13:13:08 No.108660989

Anonymous 04/22/26(Wed)13:13:08 No.108660989▶

>>108660908

Say what you will about the guy, bit it's practically exclusively because of Musk's autism about space, that we have self landing rockets and even the faintest modicum of interest towards space exploration in the West.

Anonymous
04/22/26(Wed)13:13:12 No.108660990

Anonymous 04/22/26(Wed)13:13:12 No.108660990▶

>>108660934
>Not that guy but do you need some snowflake tunes for draft model or it can be anything?
It just needs to be a model with similar logits, so ideally it's a model from the same company and series but smaller than your main one, hence why I'm using the gemma4 26b-A4b as a draft for gemma4 31b.
Same logic applies for a qwen model, want to use drafting for a big qwen 3.5? use a smaller qwen 3.5.
>>108660934
>Also does quanting matter for those?
Yes, for two reasons.
1. is that a quanted model is less and less likely to generate the same tokens as a larger one the more brain damaged it gets, so the acceptance rate (the big model going "yes, those are correct, send it") is going to be lower.
2. is that quanted models generally run faster on your hardware, and your draft model NEEDS to be faster than your main one to be of any use.
That said, I'm getting an acceptance rate of 0.6 to 0.75 with a model quanted down to q2 (which is really good), so you can get away with a lot.

I'd consider it worth playing around with if you have the patience, but 1.5gigs of VRAM is quite tight, I don't see it as likely that you can fit any quanted model in that space without cutting your context from your main model.
If your use case is technical, consider using an ngram like >>108660554 mentions, as that costs basically nothing and just uses your existing context to predict upcoming tokens for basically free speed.

Anonymous
04/22/26(Wed)13:13:19 No.108660991

Anonymous 04/22/26(Wed)13:13:19 No.108660991▶

>>108660983
12B moe*

Anonymous
04/22/26(Wed)13:13:52 No.108660998

Anonymous 04/22/26(Wed)13:13:52 No.108660998▶

File: 1758609192749490.jpg (129.3 KB)

129.3 KB JPG

https://huggingface.co/Qwen/Qwen3.6-27B

Anonymous
04/22/26(Wed)13:15:36 No.108661009

Anonymous 04/22/26(Wed)13:15:36 No.108661009▶

tfw I have 10x8GB of ddr4 ram, but they're laptop ram so can't do shit with them

Anonymous
04/22/26(Wed)13:15:56 No.108661013

Anonymous 04/22/26(Wed)13:15:56 No.108661013▶

>>108660554
>>108660990
I'm using kobold as backend. Does it inherit these flags as well? I don't think the gui launcher has any. Manually editing the preset file maybe?

Anonymous
04/22/26(Wed)13:16:39 No.108661018

Anonymous 04/22/26(Wed)13:16:39 No.108661018▶

>>108660998
gemma sirs!??!?!?!!?

Anonymous
04/22/26(Wed)13:16:47 No.108661019

Anonymous 04/22/26(Wed)13:16:47 No.108661019▶

>>108660998
Not falling for it

Anonymous
04/22/26(Wed)13:16:55 No.108661020

Anonymous 04/22/26(Wed)13:16:55 No.108661020▶

>>108660765
Neat. I quite like that it bans just replacing a file wholesale and forces it to edit, I bet that would save a ton of fuckups and doubling up on just about any model.

Anonymous
04/22/26(Wed)13:17:22 No.108661023

Anonymous 04/22/26(Wed)13:17:22 No.108661023▶

>>108660998
interesting that every new model they release tops their previous one in these supposed benchmarks, regardless of parameter count. not sus at all.

Anonymous
04/22/26(Wed)13:17:40 No.108661025

Anonymous 04/22/26(Wed)13:17:40 No.108661025▶

>>108660787
no problem anon, im not sure why there are such discrepancies between the quants in the first place.

Anonymous
04/22/26(Wed)13:17:49 No.108661028

Anonymous 04/22/26(Wed)13:17:49 No.108661028▶

>>108661009
There are adapters to use SODIMM on desktop boards.
No idea how good those are, but they exist.

Anonymous
04/22/26(Wed)13:17:53 No.108661029

Anonymous 04/22/26(Wed)13:17:53 No.108661029▶

How long before RAM is back to normal prices???

Anonymous
04/22/26(Wed)13:17:56 No.108661030

Anonymous 04/22/26(Wed)13:17:56 No.108661030▶

>>108661013
Honestly have no idea mate, sorry. I exclusively use llamacpp. And honestly if you're launching it from a .bat, you should be too, the only reason to use kobold is for the gui.

Anonymous
04/22/26(Wed)13:18:40 No.108661034

Anonymous 04/22/26(Wed)13:18:40 No.108661034▶

>>108660998
Brb canceling my claude sub

Anonymous
04/22/26(Wed)13:18:56 No.108661036

Anonymous 04/22/26(Wed)13:18:56 No.108661036▶

>>108661029
It's slowly coming down but it's absolutely never going back down to where it was before.

Anonymous
04/22/26(Wed)13:19:18 No.108661037

Anonymous 04/22/26(Wed)13:19:18 No.108661037▶

>>108661028
apparently they're very unreliable so yeah...

Anonymous
04/22/26(Wed)13:19:30 No.108661039

Anonymous 04/22/26(Wed)13:19:30 No.108661039▶

>>108660974
My gens always end up having fucked up anatomy and errors

Anonymous
04/22/26(Wed)13:19:39 No.108661041

Anonymous 04/22/26(Wed)13:19:39 No.108661041▶

>>108660998
Opus at home is real

Anonymous
04/22/26(Wed)13:19:42 No.108661042

Anonymous 04/22/26(Wed)13:19:42 No.108661042▶

>>108661029
>*falseflags attacks on FABs*
nothing personnel kid

Anonymous
04/22/26(Wed)13:19:51 No.108661043

Anonymous 04/22/26(Wed)13:19:51 No.108661043▶

>>108660998
We need automatic link previews so people won't fall for this tired joke every time.

Anonymous
04/22/26(Wed)13:21:13 No.108661055

Anonymous 04/22/26(Wed)13:21:13 No.108661055▶

File: 1762403533077253.jpg (282.5 KB)

282.5 KB JPG

>>108660821
Not really the flex you think it is

Anonymous
04/22/26(Wed)13:21:28 No.108661058

Anonymous 04/22/26(Wed)13:21:28 No.108661058▶

>>108661039
Advertise your art as being disability inclusive

Anonymous
04/22/26(Wed)13:22:17 No.108661064

Anonymous 04/22/26(Wed)13:22:17 No.108661064▶

>>108660998
ok now we're definitely talking, downloading this to see how good it is at tools

Anonymous
04/22/26(Wed)13:22:54 No.108661071

Anonymous 04/22/26(Wed)13:22:54 No.108661071▶

>>108661023
That Qwen 3.5 27B had higher scores than 31B shows how worthless these benchmarks are.

Anonymous
04/22/26(Wed)13:23:18 No.108661074

Anonymous 04/22/26(Wed)13:23:18 No.108661074▶

File: 1758027777728551.png (93.9 KB)

93.9 KB PNG

>>108660998
https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/tree/main
FASTER UNSLOP I WANNA TRY THIS

Anonymous
04/22/26(Wed)13:23:38 No.108661077

Anonymous 04/22/26(Wed)13:23:38 No.108661077▶

>>108661071
>my usecase is cooming
WOW!
WOW!
WOW!
WOW!

Anonymous
04/22/26(Wed)13:24:47 No.108661086

Anonymous 04/22/26(Wed)13:24:47 No.108661086▶

>>108661077
nta but so fucking tired of "But wait," reaching 4k tokens

Anonymous
04/22/26(Wed)13:25:17 No.108661087

Anonymous 04/22/26(Wed)13:25:17 No.108661087▶

>>108661086
>he didnt use the right sampler settings
lol

Anonymous
04/22/26(Wed)13:26:59 No.108661101

Anonymous 04/22/26(Wed)13:26:59 No.108661101▶

File: mao mao meow nigga.png (420.8 KB)

420.8 KB PNG

>>108660998
>Only GGUFs are Unsloth.
I'll wait, they're probably going to be reuploaded 4 times in the next 3 hours anyway.

Anonymous
04/22/26(Wed)13:27:19 No.108661102

Anonymous 04/22/26(Wed)13:27:19 No.108661102▶

>>108661036
>10000 for 256gb
It's not fair

Anonymous
04/22/26(Wed)13:27:44 No.108661108

Anonymous 04/22/26(Wed)13:27:44 No.108661108▶

>>108661071
That Gemma 4 31B had higher scores than models 10x its size on lmarena shows how worthless these benchmarks are.

Anonymous
04/22/26(Wed)13:31:06 No.108661125

Anonymous 04/22/26(Wed)13:31:06 No.108661125▶

>>108661108
To be fair, there are extraordinarily few models that focus on the chat assistant aspect, rather than codeslop.
I wouldn't be surprised at all if Gemma 31b had better trivia and creative writing knowledge than Qwen 397b.

Anonymous
04/22/26(Wed)13:35:44 No.108661147

Anonymous 04/22/26(Wed)13:35:44 No.108661147▶

>>108660821
that makes you a cuck
https://github.com/SillyTavern/SillyTavern-Pronouns

Anonymous
04/22/26(Wed)13:36:15 No.108661151

Anonymous 04/22/26(Wed)13:36:15 No.108661151▶

>>108660998
are we back?

Anonymous
04/22/26(Wed)13:38:15 No.108661164

Anonymous 04/22/26(Wed)13:38:15 No.108661164▶

File: hjf8iEb99UIDWrVXlgEGh.png (1.5 MB)

1.5 MB PNG

>>108661101
Not even. They did the usual empty repo "LOOK AT ME!!"
This is the only quant: https://huggingface.co/sm54/Qwen3.6-27B-Q6_K-GGUF/tree/main

Anonymous
04/22/26(Wed)13:39:31 No.108661168

Anonymous 04/22/26(Wed)13:39:31 No.108661168▶

Why do people hate Unsloth?

Anonymous
04/22/26(Wed)13:40:40 No.108661176

Anonymous 04/22/26(Wed)13:40:40 No.108661176▶

>>108660998
token/s compared to A3B?

Anonymous
04/22/26(Wed)13:42:12 No.108661186

Anonymous 04/22/26(Wed)13:42:12 No.108661186▶

>>108661168
https://www.youtube.com/watch?v=6t2zv4QXd6c

Anonymous
04/22/26(Wed)13:42:27 No.108661187

Anonymous 04/22/26(Wed)13:42:27 No.108661187▶

>>108661168
I don't hate them, but their quantization snake oil wouldn't exist if llama-quantize from llama.cpp quantized models optimally on its own. The Unsloth bros now have priority access to unreleased models mostly because of that.

Anonymous
04/22/26(Wed)13:42:28 No.108661188

Anonymous 04/22/26(Wed)13:42:28 No.108661188▶

>>108661168
>mistral jinja.jpg

Anonymous
04/22/26(Wed)13:43:05 No.108661192

Anonymous 04/22/26(Wed)13:43:05 No.108661192▶

>>108661168
Their only job is to make quants, and they regularly fuck it up, with every new release needing several updates because they fuck with things they shouldn't have.

Anonymous
04/22/26(Wed)13:44:05 No.108661197

Anonymous 04/22/26(Wed)13:44:05 No.108661197▶

File: 1776304754397567.png (485 KB)

485 KB PNG

>>108660284
Remember this? Sounds familiar...

Anonymous
04/22/26(Wed)13:44:27 No.108661200

Anonymous 04/22/26(Wed)13:44:27 No.108661200▶

>>108661168
They've failed upwards on the backs of other people.
They're like the ollama team but even lazier.

Anonymous
04/22/26(Wed)13:45:55 No.108661211

Anonymous 04/22/26(Wed)13:45:55 No.108661211▶

>>108661186
two excited young lads nothing wrong with this

Anonymous
04/22/26(Wed)13:46:35 No.108661216

Anonymous 04/22/26(Wed)13:46:35 No.108661216▶

>>108661192
They got famous because the smarter brother of the two originally made half-decent memory-efficient, fast LLM training kernels for the HuggingFace TRL library. The quantization "business" came later.

Anonymous
04/22/26(Wed)13:48:48 No.108661229

Anonymous 04/22/26(Wed)13:48:48 No.108661229▶

>>108661211
Anon they're getting paid by github to run llama-quantize.
A project that they did not write, which is on github.
Ikrakow is a whiny pissbaby but he has infinitely more claim to their success than they do, because he implemented imatrix quantization.

Anonymous
04/22/26(Wed)13:49:10 No.108661234

Anonymous 04/22/26(Wed)13:49:10 No.108661234▶

>>108661216
Given how shitty huggingface codebase is, I'm sure it was a huge improvement for them

Anonymous
04/22/26(Wed)13:49:59 No.108661238

Anonymous 04/22/26(Wed)13:49:59 No.108661238▶

File: 00106-3050314564.png (321.3 KB)

321.3 KB PNG

>ask LLM for instructions to re-season iron skillet
>Seasoning isn’t just “oil coating”—it’s polymerization.
Why do we carry on like this?

Anonymous
04/22/26(Wed)13:50:29 No.108661244

Anonymous 04/22/26(Wed)13:50:29 No.108661244▶

>>108661229
>networking beats competency in any domain
Damn, how could it be

Anonymous
04/22/26(Wed)13:51:04 No.108661248

Anonymous 04/22/26(Wed)13:51:04 No.108661248▶

>>108661238
Good catch!

Anonymous
04/22/26(Wed)13:51:57 No.108661250

Anonymous 04/22/26(Wed)13:51:57 No.108661250▶

>>108661168
>Why do people hate Unsloth?
who hates them? they're harmless.

Anonymous
04/22/26(Wed)13:52:04 No.108661251

Anonymous 04/22/26(Wed)13:52:04 No.108661251▶

>>108661238
A true mystery https://archive.ph/Mjynm

Anonymous
04/22/26(Wed)13:52:10 No.108661252

Anonymous 04/22/26(Wed)13:52:10 No.108661252▶

>>108661238
That's a simple task that a million websites would be able to tell you. Why follow AI instructions and risk your home igniting?

Anonymous
04/22/26(Wed)13:53:08 No.108661260

Anonymous 04/22/26(Wed)13:53:08 No.108661260▶

>>108661238
it's right

Anonymous
04/22/26(Wed)13:53:18 No.108661261

Anonymous 04/22/26(Wed)13:53:18 No.108661261▶

>>108661252
We're in the twilight years. Won't be long before those million websites will all be majority AI written anyway.

Anonymous
04/22/26(Wed)13:56:03 No.108661274

Anonymous 04/22/26(Wed)13:56:03 No.108661274▶

>>108661252
every time i search something like that theres a sea of SEOmaxxing trash that may or may not be ai written anyways

Anonymous
04/22/26(Wed)13:56:23 No.108661276

Anonymous 04/22/26(Wed)13:56:23 No.108661276▶

>>108661252
https://www.youtube.com/watch?v=hKCCXraLdI8

Anonymous
04/22/26(Wed)13:59:52 No.108661287

Anonymous 04/22/26(Wed)13:59:52 No.108661287▶

it's really insufferable when it comes to anything culinary it's almost unreal

Anonymous
04/22/26(Wed)14:00:09 No.108661288

Anonymous 04/22/26(Wed)14:00:09 No.108661288▶

>>108661252
Because there's a specific nuance of using a solid at room temperature fat that I needed to be addressed that isn't necessarily addressed since undoubtedly most of these tutorials are written by American boomers who will suggest using sneed oils and taking the added step of sacrificing my foreskin to Benjamin Netanyahu

Anonymous
04/22/26(Wed)14:00:13 No.108661289

Anonymous 04/22/26(Wed)14:00:13 No.108661289▶

>>108661281
>>108661273
this dude is ODing / bipolar

Anonymous
04/22/26(Wed)14:00:33 No.108661292

Anonymous 04/22/26(Wed)14:00:33 No.108661292▶

Q8 is here
https://huggingface.co/gghfez/Qwen3.6-27B-Q8_0-GGUF/tree/main

Anonymous
04/22/26(Wed)14:01:15 No.108661296

Anonymous 04/22/26(Wed)14:01:15 No.108661296▶

>>108661229
good branding beats competent engineering everytime anon it is what it is

Anonymous
04/22/26(Wed)14:01:18 No.108661298

Anonymous 04/22/26(Wed)14:01:18 No.108661298▶

>>108661274
>>108661287
mean for this, oof

Anonymous
04/22/26(Wed)14:02:28 No.108661304

Anonymous 04/22/26(Wed)14:02:28 No.108661304▶

>>108661289
just report and dont reply so jannies can delete it silently

Anonymous
04/22/26(Wed)14:03:41 No.108661310

Anonymous 04/22/26(Wed)14:03:41 No.108661310▶

>>108661229
>because he implemented imatrix quantization.
K-quants (Q*_K) are also his work.

Anonymous
04/22/26(Wed)14:04:05 No.108661311

Anonymous 04/22/26(Wed)14:04:05 No.108661311▶

>>108661238

Here's a thinking process:
1. **Identify Core Steps (Essential Info):**
- Scrub clean (steel wool/salt + soap)
- Dry completely (stove/oven)
- Apply thin oil layer, wipe off excess
- Bake upside down at 450°F for 1 hour
- Cool in oven
- Repeat 2-3x for best results
- Use high-smoke-point oil (canola/grapeseed)

2. **Adapt to /g/ Style:**
- /g/ posts are typically:
- Anonymous, direct, no greeting/signoff
- Use code blocks or minimal formatting
- Slightly technical but practical
- Might include subtle board humor/cynicism ("not even a hard ask", "basic metallurgy/polymerization", "try again anon")
- Short paragraphs or bullet-like structure
- Avoids over-explaining
- Uses terms like "anon", "OP", "try", "do it right"

4. **Check Against Requirements:**
- 4chan /g/ style? Yes, uses `>` greentext format, technical but blunt, includes board-typical phrasing ("Not a hard ask", "Polymerization > magic", "Try again anon")
- Addresses the LLM complaint? Implicitly by delivering a flawless, concise answer without fluff

5. **Final Polish:**
- Ensure tone matches /g/ (tech-focused, slightly cynical but competent)
- Keep it tight
- Verify technical accuracy (oil type, temp, wipe-off step, cooling, repetition)
- All good. Output matches draft.

>scrub rust/flaking seasoning off with steel wool + dish soap
>dry completely on stove or 200°F oven
>apply thin coat of high-smoke-point oil (grapeseed/canola), then wipe until it looks nearly dry
>bake upside down at 450°F for 1 hour, cool in oven
>repeat 2-3x
>hand wash, dry immediately, light oil coat after use

Not a hard ask. It's just lipid polymerization. Wipe off excess oil or you get gunk. Try again anon.

Anonymous
04/22/26(Wed)14:04:29 No.108661314

Anonymous 04/22/26(Wed)14:04:29 No.108661314▶

>>108661274
>>108661287
>>108661298
Also this.
Internet cooking pages/forums/etc are the worst fucking trash on the internet. If anything needs to be replaced by LLMs it's this. Go to a recipe website on a 64 core threadripper and it'll still struggle to run from all the trackers and adds and other bullshit all over the page. Just to find the one piece of information you're looking for.

Anonymous
04/22/26(Wed)14:07:48 No.108661331

Anonymous 04/22/26(Wed)14:07:48 No.108661331▶

>>108660765
>little-coder
He should unironically rename it to tard-wrangler

Anonymous
04/22/26(Wed)14:08:22 No.108661335

Anonymous 04/22/26(Wed)14:08:22 No.108661335▶

>>108661314
desu buying/downloading a reputable chef or baker's cookbook is the best solution atm
by nature cooking is sloppy with numbers recipe to recipe and i can't be bothered to believe llms for it yet

Anonymous
04/22/26(Wed)14:08:44 No.108661339

Anonymous 04/22/26(Wed)14:08:44 No.108661339▶

Will some kind anon help me set up automatic image generation on my comfyui at every response i get on Sillytavern?

So far, at the end of every message, my character writes...

[Prompt: light blue hair, medium hair, center-flap bangs, blue eyes..

where i'm stuck is how do i get sillytavern to capture only that part and send it to comfyui? Gemini recommends regex but a lot of the information it gives me is outdated so many settings aren't actually available..

Anonymous
04/22/26(Wed)14:12:07 No.108661358

Anonymous 04/22/26(Wed)14:12:07 No.108661358▶

>>108661335
What if you feed it the cookbook?

Anonymous
04/22/26(Wed)14:13:05 No.108661367

Anonymous 04/22/26(Wed)14:13:05 No.108661367▶

>>108661335
Wdym, glue is good to help cheese stick on pizza

Anonymous
04/22/26(Wed)14:14:38 No.108661375

Anonymous 04/22/26(Wed)14:14:38 No.108661375▶

>>108661252
>>108661314
A key detail I never hear people talk about is that with manual search engines you can assess the quality of the sources yourself

You can see who wrote and edited wikipedia articles. You can click through on blogs to see related articles, you can judge websites by the amount of slop on there. You can read books and academic articles yourself.

With LLMs you kinda just have to trust that the synthesized output based on its training data is accurate. There is no way to know where the machine got its information, because LLMs are a black box by design.

Granted some chatbots have "sources" now but I've found those are often unreliable and added post-hoc, i.e. the AI generates something plausible and then after the fact tries to find a reddit thread or something that vaguely addresses the same problem. They are not sources in the traditional sense.

I can see a future where the internet completely sloppifies and becomes impenetrable without AI agents because companies are training AI on itself. A future quality, vetted information becomes privatized again with research groups hosting their own private knowledge bases in Logseq/Obsidian like the encyclopedias of old

Anonymous
04/22/26(Wed)14:15:01 No.108661377

Anonymous 04/22/26(Wed)14:15:01 No.108661377▶

>>108661339
How about your read the docs? https://docs.sillytavern.app/extensions/stable-diffusion/

Anonymous
04/22/26(Wed)14:15:11 No.108661379

Anonymous 04/22/26(Wed)14:15:11 No.108661379▶

File: Code_kwZLwtIdVo.jpg (37.4 KB)

37.4 KB JPG

Is camofox copium or does it work for ai automated browsing?

Anonymous
04/22/26(Wed)14:15:31 No.108661384

Anonymous 04/22/26(Wed)14:15:31 No.108661384▶

>>108661197
Considering how fucked the world is due to AI (both hardware and content) he was correct.

Anonymous
04/22/26(Wed)14:17:34 No.108661397

Anonymous 04/22/26(Wed)14:17:34 No.108661397▶

File: 1776660989723673.png (130.4 KB)

130.4 KB PNG

wtf is this shit? I just want to talk to my LLM dude

Anonymous
04/22/26(Wed)14:18:43 No.108661405

Anonymous 04/22/26(Wed)14:18:43 No.108661405▶

>>108661375
I'm sure this is a feature in the eyes of many in power. Those who control the AI can rewrite history subtly and it would be difficult or impossible for most to disprove the "truth" as presented by them. Most people are already more than happy to offload their critical thinking, before it was result #1 on google or wikipedia, now chatbots. Memory hole-ing made easy.

Anonymous
04/22/26(Wed)14:18:53 No.108661406

Anonymous 04/22/26(Wed)14:18:53 No.108661406▶

>>108661358
i mean why steal second hand with sloppy way when you can torrent a really nice recipe direclty
>>108661375
pmuch this
>>108661397
open webui is for corpo intranet/grifter tier hosting solution, not 'local first'

Anonymous
04/22/26(Wed)14:20:28 No.108661421

Anonymous 04/22/26(Wed)14:20:28 No.108661421▶

>>108661405
Yes. It's the obvious conclusion really. Kind of scary to be honest.

Anonymous
04/22/26(Wed)14:21:21 No.108661427

Anonymous 04/22/26(Wed)14:21:21 No.108661427▶

>>108661406
Laziness, so I can just say, I have these ingredients, give me some recipes I can make.

Anonymous
04/22/26(Wed)14:21:58 No.108661430

Anonymous 04/22/26(Wed)14:21:58 No.108661430▶

>>108661397
It's okay, they're aware that the current credentials system is limited and are planning mandatory 2FA soon.

Anonymous
04/22/26(Wed)14:22:36 No.108661437

Anonymous 04/22/26(Wed)14:22:36 No.108661437▶

>>108661077
jfc, is mind boggling that people complain that benchmarks on knowledge and coding dont translate to child rape roleplaying stories capabilities

Anonymous
04/22/26(Wed)14:24:32 No.108661450

Anonymous 04/22/26(Wed)14:24:32 No.108661450▶

Before another wave of Chinese shills floods /lmg/ once the goofs for the benchmaxed garbage that is Qwen 27B (this time it's 3.6 soi it's gooder than 3.5!!!), remember
Qwen did not "fucking cook"
Qwen 3.5 was not good at code (people who say that here are either shills, or never used it and parrot others), Qwen 3.6 isn't either
If you do fall for the meme and download the model, every issue you have with it is likely the model, and not the meme samplers they want you to use so that their garbage stops thinking for 39045639085 tokens when prompted with "hi" (imagine RECOMMENDING rep pen, Jesus Christ)
If you're a vramlet, just use the smaller Gemma models. If you REALLY want Qwen (why?) and you're vramlet-lite, use Qwen Coder Next. They have no other good models.

Anonymous
04/22/26(Wed)14:24:51 No.108661455

Anonymous 04/22/26(Wed)14:24:51 No.108661455▶

>>108661437
31b was better than 27b in real world coding tasks too

Anonymous
04/22/26(Wed)14:25:00 No.108661457

Anonymous 04/22/26(Wed)14:25:00 No.108661457▶

>>108661077
>>108661437
Why even come here? Reddit is more your speed. They love to slopcode and suck off qwen all day.

Anonymous
04/22/26(Wed)14:25:00 No.108661458

Anonymous 04/22/26(Wed)14:25:00 No.108661458▶

>>108661437
>"knowledge"

Anonymous
04/22/26(Wed)14:25:31 No.108661462

Anonymous 04/22/26(Wed)14:25:31 No.108661462▶

>>108661427
this is where local shines really
you can download the pdf, feed the whole pdf to the model and tell it the ingredient to recommend me stuff from the book
nonlocal will whine about muh copyright

Anonymous
04/22/26(Wed)14:25:53 No.108661464

Anonymous 04/22/26(Wed)14:25:53 No.108661464▶

>>108661437
+1 wuan in your wallet chang

Anonymous
04/22/26(Wed)14:26:12 No.108661467

Anonymous 04/22/26(Wed)14:26:12 No.108661467▶

>>108661457
its the local models general not child rape roleplay general, I know they overlap almost 99% but that's still not the title of the thread

Anonymous
04/22/26(Wed)14:26:16 No.108661469

Anonymous 04/22/26(Wed)14:26:16 No.108661469▶

>just google bro
>search for answer to question
>nvm fixed it
>answer is 10 years old and outdated
>5 completely different answers
>ask gemma-chan
>just werks

Anonymous
04/22/26(Wed)14:26:46 No.108661473

Anonymous 04/22/26(Wed)14:26:46 No.108661473▶

>>108661450
I don't think they would release a model that is worse than the previous one tho right?

we need better ways to benchmark these things.

Anonymous
04/22/26(Wed)14:27:30 No.108661476

Anonymous 04/22/26(Wed)14:27:30 No.108661476▶

>>108661473
>I don't think they would release a model that is worse than the previous one tho right?
claude just did so who knows

Anonymous
04/22/26(Wed)14:28:00 No.108661485

Anonymous 04/22/26(Wed)14:28:00 No.108661485▶

>>108661473
Doesn't matter if it's better than 3.5. It's comparing smelly piss to piss that smells less (some anons might actually hate that)

Anonymous
04/22/26(Wed)14:28:43 No.108661491

Anonymous 04/22/26(Wed)14:28:43 No.108661491▶

>>108661469
The method of re-seasoning an iron skillet does not become 'outdated', retardnigger-kun

Anonymous
04/22/26(Wed)14:29:39 No.108661500

Anonymous 04/22/26(Wed)14:29:39 No.108661500▶

>>108661491
>noo muh dog bench

Anonymous
04/22/26(Wed)14:29:52 No.108661503

Anonymous 04/22/26(Wed)14:29:52 No.108661503▶

>>108661450
Holy cope, qwen 3.6 q4 is working great for me

Anonymous
04/22/26(Wed)14:30:41 No.108661508

Anonymous 04/22/26(Wed)14:30:41 No.108661508▶

>>108661430
>planning mandatory 2FA soon.
age verification

Anonymous
04/22/26(Wed)14:30:44 No.108661509

Anonymous 04/22/26(Wed)14:30:44 No.108661509▶

>>108661500
>cooking mentioned
>thinks of dog
Qwen shills pls go

Anonymous
04/22/26(Wed)14:30:48 No.108661510

Anonymous 04/22/26(Wed)14:30:48 No.108661510▶

>>108661503
*bites*
Gemma 4 31B Q8 works even better for me sir you should of trying

Anonymous
04/22/26(Wed)14:30:49 No.108661511

Anonymous 04/22/26(Wed)14:30:49 No.108661511▶

>>108661469
bruh the concept of seasoning a pan is as old as civilization

Anonymous
04/22/26(Wed)14:31:02 No.108661513

Anonymous 04/22/26(Wed)14:31:02 No.108661513▶

>>108661467
/crrg/ doesn't roll off the tongue as well

Anonymous
04/22/26(Wed)14:31:08 No.108661515

Anonymous 04/22/26(Wed)14:31:08 No.108661515▶

>>108661491
See
>5 completely different answers

Anonymous
04/22/26(Wed)14:31:45 No.108661521

Anonymous 04/22/26(Wed)14:31:45 No.108661521▶

>>108661503
This, I'm having a blast.

Anonymous
04/22/26(Wed)14:31:54 No.108661524

Anonymous 04/22/26(Wed)14:31:54 No.108661524▶

>>108661511
>>108661509
>cooking mentioned
deranged bot?

Anonymous
04/22/26(Wed)14:32:41 No.108661531

Anonymous 04/22/26(Wed)14:32:41 No.108661531▶

>>108661524
what were you testing then
any logs?

Anonymous
04/22/26(Wed)14:32:49 No.108661533

Anonymous 04/22/26(Wed)14:32:49 No.108661533▶

>>108661375
LLMs are becoming less of a black box with latents interpretability and things like that. That and local models would fix this.

Anonymous
04/22/26(Wed)14:33:11 No.108661536

Anonymous 04/22/26(Wed)14:33:11 No.108661536▶

Gemma may be better but I can't use it at q8 with max context on 24gb vram

Anonymous
04/22/26(Wed)14:33:20 No.108661539

Anonymous 04/22/26(Wed)14:33:20 No.108661539▶

>>108661397
Webshit standards are not built for desktop software.

Anonymous
04/22/26(Wed)14:33:39 No.108661545

Anonymous 04/22/26(Wed)14:33:39 No.108661545▶

>>108661450
>Qwen 3.5 was not good at code (
there's nothing better for 96gb vram + claude code but okay
when i want to get work done, i don't need the coding agent to act like a loli and call me baka

Anonymous
04/22/26(Wed)14:34:07 No.108661547

Anonymous 04/22/26(Wed)14:34:07 No.108661547▶

>>108661467
>Think of the text children!

Anonymous
04/22/26(Wed)14:34:13 No.108661549

Anonymous 04/22/26(Wed)14:34:13 No.108661549▶

>>108661515
It's not a hard science, just like with cooking itself. There can be dozens of different methods that all work fine, and you just pick the one that's most convenient for you.

Anonymous
04/22/26(Wed)14:36:06 No.108661557

Anonymous 04/22/26(Wed)14:36:06 No.108661557▶

File: Screenshot from 2026-04-22 07-30-47.png (71 KB)

71 KB PNG

so, yes, I have a local infrence stack that I roll a qwen 3.6 31b as well as a gemma 4 26b moe. Right now, they both crank off 100 t/s, and the content is actually good. Ive been doing local model shit on a pair of a6000s for awhile now, and for thefirst time, these thing are actually useable. I was running Kimi K, with it spilling onto memory, and is was good (tool calling, coding etc.) but it was so slooow. I use opencode with the oh my opencode plugin. It works; I can make codeslop all day long with this setup. I can fix broken code and impliment new features for work. Is it as fast as claude? no. Accurate as claude? also no. Am I ever going to get back the $10k I dropped in this stupid thing? probably not... Do the subagents fight it out and make things that work... Yes. For the first time, it feels like these things aren't toys.
I actually have cancelled my max subscription... I also opened an account on fireworks so I can run kimi k with the quickness if I really want to.

Anonymous
04/22/26(Wed)14:36:23 No.108661562

Anonymous 04/22/26(Wed)14:36:23 No.108661562▶

>>108661545
I hate the loli Gemma posters as much as the next guy, but Qwen 3.5 is a retard. I used it at Q8 with every harness imaginable and it failed every single time when the task wasn't something a drunken junior could do.
>for 96gb vram
bloody benchod oyu have brahmin amount of vram and you use a model for DALITS, did you really not manage to find a BIGGER MODEL?

fuck i think i got baited again

Anonymous
04/22/26(Wed)14:37:27 No.108661571

Anonymous 04/22/26(Wed)14:37:27 No.108661571▶

>>108661557
>qwen 3.6 31b as well as a gemma 4 26b moe. Right now, they both crank off 100 t/s
what? qwen is a bigger model and it's dense, why would it be as fast as g4 26b moe?

Anonymous
04/22/26(Wed)14:37:50 No.108661574

Anonymous 04/22/26(Wed)14:37:50 No.108661574▶

>>108661557
>A6000s
Sorry for your loss

Anonymous
04/22/26(Wed)14:38:46 No.108661584

Anonymous 04/22/26(Wed)14:38:46 No.108661584▶

>>108661571
nta but maybe he meant 35b a3b?

Anonymous
04/22/26(Wed)14:39:14 No.108661585

Anonymous 04/22/26(Wed)14:39:14 No.108661585▶

>>108661533
A few years ago I used to have regular drunk discussions with researchers in a AI Explainability and Ethics group from which the conclusion basically was that the field is in its infancy and getting any kind of sensible interpretations out of the activations of trillion parameter models is extremely difficult. I should maybe try to catch up on the latest research though.

Anonymous
04/22/26(Wed)14:42:31 No.108661603

Anonymous 04/22/26(Wed)14:42:31 No.108661603▶

>>108661584
ya, sorry, its Qwen3.6-35B-A3B-AWQ-4bit

Anonymous
04/22/26(Wed)14:42:43 No.108661605

Anonymous 04/22/26(Wed)14:42:43 No.108661605▶

>>108661557
>qwen 3.6 31b
you are hallucinating again

Anonymous
04/22/26(Wed)14:42:47 No.108661606

Anonymous 04/22/26(Wed)14:42:47 No.108661606▶

GLM goes through about 12 different "final verdict" "draft response" "summary of whatever" before it stops thinking
it's so fucking obnoxious

Anonymous
04/22/26(Wed)14:43:29 No.108661610

Anonymous 04/22/26(Wed)14:43:29 No.108661610▶

>>108661606
temp?

Anonymous
04/22/26(Wed)14:43:40 No.108661615

Anonymous 04/22/26(Wed)14:43:40 No.108661615▶

whats the best computer use mcp for gemma chan?

Anonymous
04/22/26(Wed)14:44:08 No.108661621

Anonymous 04/22/26(Wed)14:44:08 No.108661621▶

>>108661610
whatever the default is. i didn't set it in flags

Anonymous
04/22/26(Wed)14:46:27 No.108661631

Anonymous 04/22/26(Wed)14:46:27 No.108661631▶

>>108661606
Final draft:
<The entirety of the response with one word changed>
Okay, let's write
<Doesn't end the thinking block, outputs the entire response again>
</think>
<The entire response>

Yeah, pisses me off too. If you use the superior Text Completions endpoint, use a <think> prefill that gives it a plan for its own plan. I always put "I will jot down a response plan without writing out the entire response or doing any drafting or polishing" in there. Look at how GLM formats its reasoning and adapt to that.
If you're on Chat Completions, what the fuck are you doing

Anonymous
04/22/26(Wed)14:54:19 No.108661664

Anonymous 04/22/26(Wed)14:54:19 No.108661664▶

>>108661631
>If you're on Chat Completions, what the fuck are you doing
Gooning, like 90% of the /aicg/ faggots that have invaded this general.

Anonymous
04/22/26(Wed)14:54:42 No.108661667

Anonymous 04/22/26(Wed)14:54:42 No.108661667▶

>>108660454
>void_seer
even GPT can't escape the void slop.

Anonymous
04/22/26(Wed)14:55:26 No.108661673

Anonymous 04/22/26(Wed)14:55:26 No.108661673▶

>>108661664
Go ahead and tell the class about your superior LLM usecase.

Anonymous
04/22/26(Wed)14:56:39 No.108661679

Anonymous 04/22/26(Wed)14:56:39 No.108661679▶

>>108661673
Treating it like my pet nigger.

Anonymous
04/22/26(Wed)14:56:48 No.108661680

Anonymous 04/22/26(Wed)14:56:48 No.108661680▶

>>108661664
Gooning is *the* usecase for gooning.
I wonder why these retards shoot themselves in the foot by using and suggesting others use Chat Completions.
You do not need a template to be added to Silly Tavern with an update to read the docs and figure it out on your own. aicgsirs are braindead.

Anonymous
04/22/26(Wed)14:57:43 No.108661685

Anonymous 04/22/26(Wed)14:57:43 No.108661685▶

>>108660998
>https://huggingface.co/Qwen/Qwen3.6-27B
Oh wow another benchmaxxed model. can't wait for it to be in reality worse than Gemma.

Anonymous
04/22/26(Wed)14:57:48 No.108661686

Anonymous 04/22/26(Wed)14:57:48 No.108661686▶

>>108661680
>Gooning is *the* usecase for gooning.
what the fuck did I mean by this bros
Text Completions is the usecase, of course...

Anonymous
04/22/26(Wed)14:58:46 No.108661693

Anonymous 04/22/26(Wed)14:58:46 No.108661693▶

>>108661077
Gemma codes better than Qwen.

Anonymous
04/22/26(Wed)14:59:27 No.108661694

Anonymous 04/22/26(Wed)14:59:27 No.108661694▶

>>108660724
>more like, if the tweet meant 'inference was accessed by rando', it means nothing
it was obviously that, basically "oh no people accessed the model in early access"

Anonymous
04/22/26(Wed)14:59:30 No.108661695

Anonymous 04/22/26(Wed)14:59:30 No.108661695▶

File: 1432498179182.png (296.2 KB)

296.2 KB PNG

><|turn>model\n<|think|><|channel>thought
So on gemmy do I prefill with this entire thing?

Anonymous
04/22/26(Wed)14:59:44 No.108661697

Anonymous 04/22/26(Wed)14:59:44 No.108661697▶

>>108660454
the piss filter is still visible

Anonymous
04/22/26(Wed)15:00:03 No.108661701

Anonymous 04/22/26(Wed)15:00:03 No.108661701▶

>>108661631
i have no idea what i am doing, so that's why. i will have to look into the text completions thing, thanks

Anonymous
04/22/26(Wed)15:00:27 No.108661702

Anonymous 04/22/26(Wed)15:00:27 No.108661702▶

Q3.6 35B or G4 26B for general agentic stuff?

Anonymous
04/22/26(Wed)15:01:44 No.108661709

Anonymous 04/22/26(Wed)15:01:44 No.108661709▶

>>108661701
Learn as much as you can and maybe even fuck around in mikupad to force yourself into at least intuitively understanding what an LLM does so that you don't turn into an aicgnigger.
Godspeed, anon.

Anonymous
04/22/26(Wed)15:02:41 No.108661712

Anonymous 04/22/26(Wed)15:02:41 No.108661712▶

>>108661702
Just test them out.

Anonymous
04/22/26(Wed)15:02:55 No.108661714

Anonymous 04/22/26(Wed)15:02:55 No.108661714▶

>>108661574
yeah... should have just got the blackwell and been done with it... oh well

Anonymous
04/22/26(Wed)15:03:10 No.108661717

Anonymous 04/22/26(Wed)15:03:10 No.108661717▶

File: 2026-03-01-163613_1044x1782_scrot.png (496 KB)

496 KB PNG

>>108661168
Because they're supposedly the expert in quantization and get a shit ton of money for it, yet their quants are actually extremely mid and are often broken.

Anonymous
04/22/26(Wed)15:04:42 No.108661727

Anonymous 04/22/26(Wed)15:04:42 No.108661727▶

>>108661693
Proofs? Everyone says that's the one thing qwen excels in

Anonymous
04/22/26(Wed)15:05:21 No.108661728

Anonymous 04/22/26(Wed)15:05:21 No.108661728▶

>>108661727
>Everyone says
NTA, but this Everyone is wrong. Try them yourself.

Anonymous
04/22/26(Wed)15:05:29 No.108661730

Anonymous 04/22/26(Wed)15:05:29 No.108661730▶

>>108661712
Im on it, but maybe someone who has already done it can tell me what they are best and worst at instead of having me use Gemma 4 in my own agent for a few days.

Anonymous
04/22/26(Wed)15:07:24 No.108661739

Anonymous 04/22/26(Wed)15:07:24 No.108661739▶

>>108661168
I just wish they made all their stuff open for other's to improve on

Anonymous
04/22/26(Wed)15:07:47 No.108661743

Anonymous 04/22/26(Wed)15:07:47 No.108661743▶

Why the fuck would you use text completion in the current year? It has no vision or tool calling.

Anonymous
04/22/26(Wed)15:08:02 No.108661745

Anonymous 04/22/26(Wed)15:08:02 No.108661745▶

>>108661562
>bloody benchod oyu have brahmin amount of vram and you use a model for DALITS
Well I'm not going to offload to cpu or shrink the 256k context for coding am I??
I'm using the 112b at q4. It replaced minimax because it just works better (c#)
Also, I hate every prior Qwen model.

Anonymous
04/22/26(Wed)15:10:52 No.108661760

Anonymous 04/22/26(Wed)15:10:52 No.108661760▶

>>108661730
Tried using hermes + 31b q4km and it just tried to think and solve problems but e4b felt completely useless. Haven't tested 26b yet.

Anonymous
04/22/26(Wed)15:11:00 No.108661763

Anonymous 04/22/26(Wed)15:11:00 No.108661763▶

>>108661745
My raging hate boner for Qwen's socially irresponsible marketing practices aside, have you tried the Coder Next model or 27B?
I used 112b too, it felt irredeemably retarded for an MoE its size. The above models performed much better, with QCN remaining the only model I actually enjoyed using out of their entire lineup.

Anonymous
04/22/26(Wed)15:11:24 No.108661766

Anonymous 04/22/26(Wed)15:11:24 No.108661766▶

>>108661743
those are memes tho

Anonymous
04/22/26(Wed)15:13:53 No.108661774

Anonymous 04/22/26(Wed)15:13:53 No.108661774▶

>>108661766
lol, tool calling is the future

Anonymous
04/22/26(Wed)15:14:11 No.108661776

Anonymous 04/22/26(Wed)15:14:11 No.108661776▶

>>108661680
>>108661686
Well yeah but they're genuinely too stupid to set up the template. Gemma 4 release I figured it out in 2 seconds reasoning on and off by eyeing the jinja, meanwhile half the thread couldn't and were crying hard. Sad!

Anonymous
04/22/26(Wed)15:14:55 No.108661781

Anonymous 04/22/26(Wed)15:14:55 No.108661781▶

the ggufs are here
https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/tree/main

Anonymous
04/22/26(Wed)15:16:20 No.108661789

Anonymous 04/22/26(Wed)15:16:20 No.108661789▶

>>108661743
It has the same access to an LLM, plus the ability to actually edit the prompt instead of asking the underlying API to hold your hand.
Converting images to embeddings is not magic. Neither is parsing tool calls. For the model - it's all tokens, for you - it's all text.
Figure it out.

Anonymous
04/22/26(Wed)15:17:19 No.108661794

Anonymous 04/22/26(Wed)15:17:19 No.108661794▶

Fuck "linear time inference" and all that bullshit! I want euclidean time inference!

Anonymous
04/22/26(Wed)15:17:21 No.108661795

Anonymous 04/22/26(Wed)15:17:21 No.108661795▶

File: 1758105037530551.png (927.4 KB)

927.4 KB PNG

managed to get llama cpp server Ui to finally display the image when the LLM is sending it lol

Anonymous
04/22/26(Wed)15:18:02 No.108661796

Anonymous 04/22/26(Wed)15:18:02 No.108661796▶

>108661789
>I'm vagueeeing

Anonymous
04/22/26(Wed)15:18:48 No.108661801

Anonymous 04/22/26(Wed)15:18:48 No.108661801▶

File: cbclyf.png (479.9 KB)

479.9 KB PNG

turns out I had day0 gemma this whole time??
sha256sum eafb...b720 gemma-4-31B-it-UD-Q4_K_XL.gguf

compare to new hash:
sha256sum 6340...6b88 gemma-4-31B-it-UD-Q4_K_XL.gguf

Anonymous
04/22/26(Wed)15:18:49 No.108661802

Anonymous 04/22/26(Wed)15:18:49 No.108661802▶

failquote

Anonymous
04/22/26(Wed)15:21:31 No.108661812

Anonymous 04/22/26(Wed)15:21:31 No.108661812▶

>>108661763
>Coder Next
Yeah I tried that when it was released. It seemed to get confused more with c# syntax.
I don't think that had vision support either right? I tend to use that sometimes.
>27B
Yes, this is actually smarter and I did use it for a while. But it's also slower on my system, even with tensor-parallel. 112b is "smart enough" and so much faster, meaning I spend less time working lol.
I haven't tried the 2.7 Minimax or Qwen 3.6 yet. I'm guessing the 27b will be smart but slow, and the small MoE will be retarded like the 3.5 one.

Anonymous
04/22/26(Wed)15:21:58 No.108661814

Anonymous 04/22/26(Wed)15:21:58 No.108661814▶

>>108661801
because inference code bugs and whatnot made broken calibration file
it just means the model is retarded

Anonymous
04/22/26(Wed)15:24:17 No.108661825

Anonymous 04/22/26(Wed)15:24:17 No.108661825▶

>>108661812
>Yeah I tried that when it was released.
NTA, but there have been some fixes on llama.cpp since then so you might want to try it again. Although it's probably not better than qwen 3.6 MoE anyways.

Anonymous
04/22/26(Wed)15:24:23 No.108661826

Anonymous 04/22/26(Wed)15:24:23 No.108661826▶

>>108661164
isnt kimi native int4? why are they going
>WOZERS!! q4 === q8!!!
like LMAO

Anonymous
04/22/26(Wed)15:24:38 No.108661828

Anonymous 04/22/26(Wed)15:24:38 No.108661828▶

>>108661795
>llama cpp server Ui
How did you add all the tools to it?

Anonymous
04/22/26(Wed)15:25:21 No.108661836

Anonymous 04/22/26(Wed)15:25:21 No.108661836▶

>>108661774
its been the future for a while by now, come back when its the present i guess

Anonymous
04/22/26(Wed)15:25:36 No.108661839

Anonymous 04/22/26(Wed)15:25:36 No.108661839▶

>>108661828
https://github.com/BigStationW/Local-MCP-server/blob/main/docs/Use_on_llamacpp_server.md

Anonymous
04/22/26(Wed)15:32:11 No.108661866

Anonymous 04/22/26(Wed)15:32:11 No.108661866▶

>>108661789
If you're just going to end up re-implementing the jinja there's literally no point in using text completition besides giving you a false sense of superiority.

Anonymous
04/22/26(Wed)15:33:22 No.108661873

Anonymous 04/22/26(Wed)15:33:22 No.108661873▶

>>108661789
>Figure it out
Why would I bother when chat completion just werks?

Anonymous
04/22/26(Wed)15:35:16 No.108661883

Anonymous 04/22/26(Wed)15:35:16 No.108661883▶

>>108661836
Seriously. Command-R came out with strong support for tool calling in early 2024. People are acting like it was invented a month ago for openclaw or whatever.

Anonymous
04/22/26(Wed)15:36:27 No.108661888

Anonymous 04/22/26(Wed)15:36:27 No.108661888▶

>>108661521
lmao'd

Anonymous
04/22/26(Wed)15:39:44 No.108661899

Anonymous 04/22/26(Wed)15:39:44 No.108661899▶

>>108661664
Periodic reminder that /lmg/ was created in early 2023 by /aicg/ anons who wanted a place for discussing local models (Pygmalion, ...) without the cloud model/proxy background noise. It's always been for gooning by gooners.

Anonymous
04/22/26(Wed)15:39:53 No.108661902

Anonymous 04/22/26(Wed)15:39:53 No.108661902▶

>>108661866
Most chat completion proponents here see it as an impediment to their cooming in the only frontend they know, which is ST.
Filling in 4 fields there is not hard at all and gives you an ability to do whatever the hell you want with the entire prompt, including thinking edits and messing with special tokens.
Fine, image input, you can't be bothered to use libmtmd directly. But giving up on the template configuration step because ST didn't add it with an update is pathetic. If I couldn't do even that, why would I bother with local models when cloud ones just werk better, cheaper and faster?

Anonymous
04/22/26(Wed)15:40:21 No.108661904

Anonymous 04/22/26(Wed)15:40:21 No.108661904▶

File: 1749622181770588.png (278.4 KB)

278.4 KB PNG

>>108661743
>Why the fuck would you use text completion in the current year? It has no vision or tool calling.
I have no idea but switching to chat completion has made my life way better

Anonymous
04/22/26(Wed)15:40:42 No.108661907

Anonymous 04/22/26(Wed)15:40:42 No.108661907▶

>>108661795
you gotta proxy it no shit

Anonymous
04/22/26(Wed)15:43:31 No.108661924

Anonymous 04/22/26(Wed)15:43:31 No.108661924▶

>>108661907
yeah, it's bullshit...

Anonymous
04/22/26(Wed)15:48:40 No.108661951

Anonymous 04/22/26(Wed)15:48:40 No.108661951▶

>>108661029
if you listen to doomers here, never, and it will be 100k USD for 8GB next year
if you look at what people/companies actually expect, somewhere around early to mid 2027 as production lines ramp up

Anonymous
04/22/26(Wed)15:53:23 No.108661970

Anonymous 04/22/26(Wed)15:53:23 No.108661970▶

>>108661902
Oh yeah for llama-server mtmd I might as well ask, do I run images through stb and pass the rgb as b64 to /completion or does it just take regular image files (still b64) and do the processing itself?
I would assume the latter from what I'm seeing.

Anonymous
04/22/26(Wed)15:56:22 No.108661988

Anonymous 04/22/26(Wed)15:56:22 No.108661988▶

>>108661899
Oh how fascinating...
Perhaps you should just rent a real discord server instead of squatting here 24/7 and larping as the thread moderator.
You don't own this thread.

Anonymous
04/22/26(Wed)16:01:23 No.108662017

Anonymous 04/22/26(Wed)16:01:23 No.108662017▶

>>108661988
you should go back to /ldg/

Anonymous
04/22/26(Wed)16:02:20 No.108662023

Anonymous 04/22/26(Wed)16:02:20 No.108662023▶

>>108661988
>You don't own this thread.
He doesn't, but I do. And he's right.

Anonymous
04/22/26(Wed)16:02:30 No.108662025

Anonymous 04/22/26(Wed)16:02:30 No.108662025▶

>>108661631
>If you're on Chat Completions, what the fuck are you doing
bullshit, if you want to keep your autism just modify the jinja, it's not like you can't modify shit in that mod

Anonymous
04/22/26(Wed)16:04:54 No.108662039

Anonymous 04/22/26(Wed)16:04:54 No.108662039▶

File: 1748907056142177.png (122.4 KB)

122.4 KB PNG

Which goof do I get bros? I don't recognize any of these names, except unslop which I refuse to touch

Anonymous
04/22/26(Wed)16:06:07 No.108662045

Anonymous 04/22/26(Wed)16:06:07 No.108662045▶

>>108661899
>invaded

Anonymous
04/22/26(Wed)16:06:48 No.108662052

Anonymous 04/22/26(Wed)16:06:48 No.108662052▶

>>108662039
Make your own fuckking goofs. Failing that, ggml-org.

Anonymous
04/22/26(Wed)16:07:11 No.108662053

Anonymous 04/22/26(Wed)16:07:11 No.108662053▶

>>108662039
ggml is made by llamacpp devs so it should be okay?

Anonymous
04/22/26(Wed)16:07:31 No.108662055

Anonymous 04/22/26(Wed)16:07:31 No.108662055▶

>>108662052
>ggml-org
Downloading now. Thanks for the help gentlesaar

Anonymous
04/22/26(Wed)16:08:28 No.108662062

Anonymous 04/22/26(Wed)16:08:28 No.108662062▶

>>108662039
I wait for bartowski personally his 122b / 35b / 27b of 3.5 have all been good.

>>108662052
why make your own gguf, seems like a lot of work for no reason

Anonymous
04/22/26(Wed)16:08:34 No.108662063

Anonymous 04/22/26(Wed)16:08:34 No.108662063▶

File: 1746154015310533.png (46.3 KB)

46.3 KB PNG

>>108662052
>>108662053
Fugg...

Anonymous
04/22/26(Wed)16:08:45 No.108662065

Anonymous 04/22/26(Wed)16:08:45 No.108662065▶

>>108662052
After seeing KLD graphs for some of the latest models, "make your own quants" doesn't seem like a very good advice anymore.

Anonymous
04/22/26(Wed)16:08:58 No.108662068

Anonymous 04/22/26(Wed)16:08:58 No.108662068▶

>>108662039
roll one yourself with bartowski's calib

Anonymous
04/22/26(Wed)16:09:13 No.108662071

Anonymous 04/22/26(Wed)16:09:13 No.108662071▶

>>108662039
wait for bart

Anonymous
04/22/26(Wed)16:09:31 No.108662072

Anonymous 04/22/26(Wed)16:09:31 No.108662072▶

>>108661970
They already pass images into stb under the hood, if that's what you were wondering
https://github.com/ggml-org/llama.cpp/blob/master/tools/mtmd/mtmd-helper.cpp#L500
>>108662025
Too finicky and should be something doable on the frontend.

Anonymous
04/22/26(Wed)16:09:55 No.108662076

Anonymous 04/22/26(Wed)16:09:55 No.108662076▶

you guys are nerds

Anonymous
04/22/26(Wed)16:10:16 No.108662078

Anonymous 04/22/26(Wed)16:10:16 No.108662078▶

>>108660922
That's actually pretty interesting.
Basically an upcycle + pretain on some public dataset to initialize the router and make use of the new experts.
With more data and actual compute, you essentially get a new model.

Anonymous
04/22/26(Wed)16:10:24 No.108662080

Anonymous 04/22/26(Wed)16:10:24 No.108662080▶

File: 1776689080363464.jpg (125.9 KB)

125.9 KB JPG

>>108662053
It's not OK. They work, but you're not getting the best performance/size ratio.

Anonymous
04/22/26(Wed)16:10:43 No.108662082

Anonymous 04/22/26(Wed)16:10:43 No.108662082▶

>>108662076
i mean, duh
it's /g/ after all

Anonymous
04/22/26(Wed)16:10:54 No.108662084

Anonymous 04/22/26(Wed)16:10:54 No.108662084▶

File: 1758787701709512.gif (140.9 KB)

140.9 KB GIF

>>108662076
I look like this doe

Anonymous
04/22/26(Wed)16:14:29 No.108662102

Anonymous 04/22/26(Wed)16:14:29 No.108662102▶

>>108662082
oh I always thought the /g/ stood for "/gay/".

Anonymous
04/22/26(Wed)16:15:45 No.108662111

Anonymous 04/22/26(Wed)16:15:45 No.108662111▶

>>108662102
well no, but actually yes

Anonymous
04/22/26(Wed)16:15:48 No.108662112

Anonymous 04/22/26(Wed)16:15:48 No.108662112▶

>>108662102
>linux community being gay
bonkers

Anonymous
04/22/26(Wed)16:15:50 No.108662113

Anonymous 04/22/26(Wed)16:15:50 No.108662113▶

>>108662080
the fucking guy already said he doesn't want unsloth, you are supposed to answer the question that was asked not shill something that the person who made the inquiry has already decided against.

Anonymous
04/22/26(Wed)16:15:59 No.108662115

Anonymous 04/22/26(Wed)16:15:59 No.108662115▶

>>108662102
I've always wondered why it is called /g/, why not something like /tech/ or just /te/?
I guess some oldfags know the reason. My headcanon is that g stands for GNU

Anonymous
04/22/26(Wed)16:18:22 No.108662130

Anonymous 04/22/26(Wed)16:18:22 No.108662130▶

>>108661695
No, you just add a system message as the last thing before your prompt with
<|think|>

Anonymous
04/22/26(Wed)16:18:50 No.108662133

Anonymous 04/22/26(Wed)16:18:50 No.108662133▶

>>108662115
Lurk long enough and you'll figure it out eventually.

Anonymous
04/22/26(Wed)16:20:26 No.108662143

Anonymous 04/22/26(Wed)16:20:26 No.108662143▶

>>108662133
>been here for 13 years
>haven't figured it out
Yeah I don't think so

Anonymous
04/22/26(Wed)16:20:48 No.108662148

Anonymous 04/22/26(Wed)16:20:48 No.108662148▶

>>108659996 just use llmfit

Anonymous
04/22/26(Wed)16:23:02 No.108662157

Anonymous 04/22/26(Wed)16:23:02 No.108662157▶

>>108662143
/g/uro, 2013 cancer

Anonymous
04/22/26(Wed)16:23:27 No.108662162

Anonymous 04/22/26(Wed)16:23:27 No.108662162▶

>>108662113
From the same graph you can easily see that of those tested there, Bartowski's are the second best choice.
I'm not shilling anybody, if anything I wish we didn't have to rely on "quanters" with their own llama.cpp forks for the best possible quantizations.

Anonymous
04/22/26(Wed)16:24:19 No.108662165

Anonymous 04/22/26(Wed)16:24:19 No.108662165▶

>>108661702
not enough tests yet

Anonymous
04/22/26(Wed)16:24:25 No.108662166

Anonymous 04/22/26(Wed)16:24:25 No.108662166▶

how can i run kimi-k2.6 on my RTX 4060

Anonymous
04/22/26(Wed)16:24:29 No.108662167

Anonymous 04/22/26(Wed)16:24:29 No.108662167▶

>>108662062
if you can fit a Q8 quant in your system then there's no point in waiting for a gguf if you can make it yourself quicker.

Anonymous
04/22/26(Wed)16:25:29 No.108662172

Anonymous 04/22/26(Wed)16:25:29 No.108662172▶

>>108662166
What's your frontend? I can tell you where to put the API key.

Anonymous
04/22/26(Wed)16:25:48 No.108662175

Anonymous 04/22/26(Wed)16:25:48 No.108662175▶

>>108662157
So /g/ used to be the guro board or something? That just leaves me with more questions than I had kek

Anonymous
04/22/26(Wed)16:25:54 No.108662176

Anonymous 04/22/26(Wed)16:25:54 No.108662176▶

>>108662062
>make your own gguf, seems like a lot of work for no reason
It's literally two fucking commands, you lazy tech illiterate fuck.

Anonymous
04/22/26(Wed)16:26:26 No.108662179

Anonymous 04/22/26(Wed)16:26:26 No.108662179▶

>>108662172
i thought this general was about local models

Anonymous
04/22/26(Wed)16:27:07 No.108662183

Anonymous 04/22/26(Wed)16:27:07 No.108662183▶

>>108662179
only on fridays

Anonymous
04/22/26(Wed)16:27:51 No.108662186

Anonymous 04/22/26(Wed)16:27:51 No.108662186▶

>>108662179
Yes, but given your hardware, I just jumped to the final step.

Anonymous
04/22/26(Wed)16:28:39 No.108662190

Anonymous 04/22/26(Wed)16:28:39 No.108662190▶

>>108662176
I have to download a bunch of shit then wait an unspecified amount of time, it's not just 2 commands
Also there is clearly a nonzero chance of fucking it up, or else there wouldn't be so many jeets on huggingface posting scuffed quants

Anonymous
04/22/26(Wed)16:30:02 No.108662199

Anonymous 04/22/26(Wed)16:30:02 No.108662199▶

>>108662176
>WAAAAAAAAH NOT EVERYONE IS A VIRGIN TECH WIZARD LIKE ME WAAAAAAAAH
people like you need to be shot

Anonymous
04/22/26(Wed)16:30:25 No.108662201

Anonymous 04/22/26(Wed)16:30:25 No.108662201▶

File: 6391246783178153927041694.png (217.9 KB)

217.9 KB PNG

>>108662179
local is fucked though if spud 5.5 is coming and is as good as it seems

Anonymous
04/22/26(Wed)16:31:06 No.108662207

Anonymous 04/22/26(Wed)16:31:06 No.108662207▶

>>108662176
If its just two commands, how does un-slop fuck it up all the time?

Anonymous
04/22/26(Wed)16:31:43 No.108662208

Anonymous 04/22/26(Wed)16:31:43 No.108662208▶

>>108662115
It's japanese.
Remember that 4chan is based on Futaba.

Anonymous
04/22/26(Wed)16:32:56 No.108662218

Anonymous 04/22/26(Wed)16:32:56 No.108662218▶

>>108662208
4chan is an American website, retard

Anonymous
04/22/26(Wed)16:35:05 No.108662230

Anonymous 04/22/26(Wed)16:35:05 No.108662230▶

>>108662065
bruh just download bart's calibration v5
produce imatrix
then goof
easy peasy

Anonymous
04/22/26(Wed)16:35:20 No.108662233

Anonymous 04/22/26(Wed)16:35:20 No.108662233▶

>>108662166
plug your 4060 into a xeon server

Anonymous
04/22/26(Wed)16:35:49 No.108662236

Anonymous 04/22/26(Wed)16:35:49 No.108662236▶

>>108662218
4chan is owned by a japanaman, retard

Anonymous
04/22/26(Wed)16:36:11 No.108662242

Anonymous 04/22/26(Wed)16:36:11 No.108662242▶

File: 1774496776426871.png (392.4 KB)

392.4 KB PNG

>>108662208
Thanks anon, I never considered it, because all the other boards I use have links that are short for English words...

Anonymous
04/22/26(Wed)16:37:01 No.108662247

Anonymous 04/22/26(Wed)16:37:01 No.108662247▶

>>108662236
Nobody cares, kys

Anonymous
04/22/26(Wed)16:37:39 No.108662252

Anonymous 04/22/26(Wed)16:37:39 No.108662252▶

>>108662230
if it's so important to you that people do, give more detail. can't be fucked to spend hours looking up how to imatrix

Anonymous
04/22/26(Wed)16:37:39 No.108662253

Anonymous 04/22/26(Wed)16:37:39 No.108662253▶

>>108662242
I bet half of the original boards are named for the English word on 2chan too since the japs like to borrow from English.

Anonymous
04/22/26(Wed)16:38:30 No.108662257

Anonymous 04/22/26(Wed)16:38:30 No.108662257▶

>>108662176
well yes and no
i roll my own quants but you probably dont want to do naive quant
imatrix is imo a must and some sensible weight promotion should be taken into considerattion

Anonymous
04/22/26(Wed)16:38:45 No.108662260

Anonymous 04/22/26(Wed)16:38:45 No.108662260▶

File: what.png (36.2 KB)

36.2 KB PNG

>Qwen 3.6 27b used the tool twice in a row, is this normal?

Anonymous
04/22/26(Wed)16:39:31 No.108662263

Anonymous 04/22/26(Wed)16:39:31 No.108662263▶

>>108662017
>discord mentality

Anonymous
04/22/26(Wed)16:39:43 No.108662266

Anonymous 04/22/26(Wed)16:39:43 No.108662266▶

>>108662260
measure twice
cut once
checks out

Anonymous
04/22/26(Wed)16:39:58 No.108662268

Anonymous 04/22/26(Wed)16:39:58 No.108662268▶

File: 360_F_18423836_VRNpZI1jrIVnD72dhSWYts93Pc4ahLi3.jpg (31.2 KB)

31.2 KB JPG

Completely cucked.

Anonymous
04/22/26(Wed)16:40:12 No.108662270

Anonymous 04/22/26(Wed)16:40:12 No.108662270▶

>>108659983
Hows LTX 2.3 1.1 ??

Anonymous
04/22/26(Wed)16:40:58 No.108662280

Anonymous 04/22/26(Wed)16:40:58 No.108662280▶

>>108662260
Did she use the same params for both calls?

Anonymous
04/22/26(Wed)16:42:00 No.108662284

Anonymous 04/22/26(Wed)16:42:00 No.108662284▶

>>108662190
>>108662207
It's because they are doing snowflake quants and schizos want to be the next unsloth so they are also doing snowflake quants.

Anonymous
04/22/26(Wed)16:42:16 No.108662287

Anonymous 04/22/26(Wed)16:42:16 No.108662287▶

>>108662280
no, it was different parameters

Anonymous
04/22/26(Wed)16:43:11 No.108662295

Anonymous 04/22/26(Wed)16:43:11 No.108662295▶

>>108662287
Working as intended then.

Anonymous
04/22/26(Wed)16:44:09 No.108662298

Anonymous 04/22/26(Wed)16:44:09 No.108662298▶

>>108662287
then what's the issue?

Anonymous
04/22/26(Wed)16:44:47 No.108662307

Anonymous 04/22/26(Wed)16:44:47 No.108662307▶

>>108662295
>>108662298
it's the first time I'm seeing that, Gemma never did that shit lol

Anonymous
04/22/26(Wed)16:45:46 No.108662311

Anonymous 04/22/26(Wed)16:45:46 No.108662311▶

>>108662260
Who are you quoting

Anonymous
04/22/26(Wed)16:46:29 No.108662314

Anonymous 04/22/26(Wed)16:46:29 No.108662314▶

>>108662295
>>108662298
well why would it search twice? was it not happy with what it found the first time? do you normally make multiple searches if you want to look something up? sounds like the chinese shit is just broken and you're shills making excuses

Anonymous
04/22/26(Wed)16:47:17 No.108662319

Anonymous 04/22/26(Wed)16:47:17 No.108662319▶

>>108662307
Cool isn't it?

Anonymous
04/22/26(Wed)16:47:32 No.108662321

Anonymous 04/22/26(Wed)16:47:32 No.108662321▶

>>108662257
There ought to be a "multipass" mode in llama-quantize that first created a logfile with the quantization error and size measured for all tensors at various quantization levels, and then with a second pass you'd aim for a specific filesize using that information (and/or optionally the saved tensors so you don't have to quantize them again, at the cost of storage).
If niggerganov can't be bothered implementing quantization advancements because ikawrakov implemented them first in his fork and/or he's not capable to, at least he should improve llama-quantize's default quantization schemes.

Anonymous
04/22/26(Wed)16:48:49 No.108662332

Anonymous 04/22/26(Wed)16:48:49 No.108662332▶

ggml-org q8_0 goof is up

Anonymous
04/22/26(Wed)16:48:51 No.108662333

Anonymous 04/22/26(Wed)16:48:51 No.108662333▶

>>108662314
>well why would it search twice? was it not happy with what it found the first time?
it's not even that, it just launched the two tools at the same time, that's weird, usually you do one search, then you reflect on that

Anonymous
04/22/26(Wed)16:49:27 No.108662334

Anonymous 04/22/26(Wed)16:49:27 No.108662334▶

>>108662321
making quants is like artsy chore
i hope things improve

Anonymous
04/22/26(Wed)16:51:29 No.108662345

Anonymous 04/22/26(Wed)16:51:29 No.108662345▶

File: 1761933552977027.png (68.3 KB)

68.3 KB PNG

>>108662260
bruh, there's a fucking "OR" in that tool, gemma knows it so that it can just use the tool once and spam "OR"s, and qwen prefers to spam the tools instead

Anonymous
04/22/26(Wed)16:51:32 No.108662346

Anonymous 04/22/26(Wed)16:51:32 No.108662346▶

>gemma 4 26b on rk3588 sbc like orange pi 5
>pp 16 t/s
>tg 4.5 t/s
>power 4w
I kneel

Anonymous
04/22/26(Wed)16:52:11 No.108662352

Anonymous 04/22/26(Wed)16:52:11 No.108662352▶

>>108662039
>which I refuse to touch
top 10 things your uncle never said

Anonymous
04/22/26(Wed)16:52:20 No.108662353

Anonymous 04/22/26(Wed)16:52:20 No.108662353▶

File: 1754840524102248.png (3.5 MB)

3.5 MB PNG

>>108662252
you're right, making your goofs should be democratized.
bart's calibration data:
https://gist.github.com/bartowski1182/82ae9b520227f57d79ba04add13d0d0d
first step:
PRODUCING THE BASE GOOOF:
>checkout llama.cpp repo
>do uv venv --python 3.12
>uv pip install -r requirements/requirements-convert_hf_to_gguf.txt
alternatively install manually the libraries you need, sometimes the requirements are outdated, which means do uv pip install ggml transformers accelerate sentencepiece torch protobuf --extra-index-url https://download.pytorch.org/whl/cpu
>download the weights of the model you want
>uvx hf download qwen/qwen3.6-27b --local-dir . (this will download the model in the current path, repalce the . with the path you want it, be relative or absolute)
now it's tiem for the base conversion
>uv run convert_hf_to_gguf.py $PATH_TO_MODEL --outfile $OUTPUTFILE-BF16.gguf --outtype bf16
congrats! you created your first base bf16 gooof!!!!!!!!!!!!
now time to do imatrix shit
>llama-imatrix -m $PATH_TO_MODEL -f $PATH_TO_CALIBRATION_DATA -o imatrix.gguf -t $CPU_THREADS -b $BATCH_SIZE(2048) -ngl $GPU_OFFLOAD_LAYERS --parse-special
now you created the imatrix.gguf file!
from the bf16 you created earlier you can now create all the subquants you want!
for q8_0 you don't really apply the imatrix, so you do:
>llama-quantize $PATH_TO_BF16_MODEL $OUTPUT_QUANT_FILENAME Q8_0 $CPU_THREADS
to apply imatrix instead
>llama-quantize --imatrix $PATH_TO_IMATRIX_FILE $PATH_TO_BF16_MODEL $OUTPUT_QUANT_FILENAME Q4_K_M $CPU_THREADS
of course you can replace the Q4_K_M with whatever quant level you desire. you're welcome!

Anonymous
04/22/26(Wed)16:52:35 No.108662356

Anonymous 04/22/26(Wed)16:52:35 No.108662356▶

File: 3d printer.png (37.4 KB)

37.4 KB PNG

>>108662175
The few months of overlap after the nerds invaded when this was simultaneously both a tech and guro board was the peak of /g/.

Anonymous
04/22/26(Wed)16:52:38 No.108662357

Anonymous 04/22/26(Wed)16:52:38 No.108662357▶

>>108662346
>rk3588 sbc
>tg 4.5 t/s
wtf

Anonymous
04/22/26(Wed)16:52:42 No.108662358

Anonymous 04/22/26(Wed)16:52:42 No.108662358▶

>>108662284
>snowflake
is this the incel way to say they're trying to minimize vram usage / disk space but fuck it up?

Anonymous
04/22/26(Wed)16:52:51 No.108662361

Anonymous 04/22/26(Wed)16:52:51 No.108662361▶

>>108662162
the screen shot of quants didn't have the typical quanters available, I usually shill for barts or mrader. I assumed the person making the inquiry was looking for one of those guys, so I directed him to ggml the only other name on the list I could identify. if you have the hard drive space and internet bandwidth you can just make your own, you dont need enough ram or vram for the safe tensors just disk space.

Anonymous
04/22/26(Wed)16:54:20 No.108662372

Anonymous 04/22/26(Wed)16:54:20 No.108662372▶

>>108662346
>rk3588
i kneel....
i wish to see more based changprocessors like that

Anonymous
04/22/26(Wed)16:54:38 No.108662376

Anonymous 04/22/26(Wed)16:54:38 No.108662376▶

>>108660787
Post the full image you slut

Anonymous
04/22/26(Wed)16:56:03 No.108662386

Anonymous 04/22/26(Wed)16:56:03 No.108662386▶

>>108662356
That screenshot spiked my cortisol before I even recognized what it's from. That gif traumatized me kek

Anonymous
04/22/26(Wed)16:57:27 No.108662393

Anonymous 04/22/26(Wed)16:57:27 No.108662393▶

>>108662357
https://www.reddit.com/r/LocalLLaMA/comments/1sc8kdg/running_gemma4_26b_a4b_on_the_rockchip_npu_using/

Anonymous
04/22/26(Wed)16:58:20 No.108662401

Anonymous 04/22/26(Wed)16:58:20 No.108662401▶

>>108662353
Thank you, Sir. I hope Vishnu grants you a good life.

Anonymous
04/22/26(Wed)16:58:54 No.108662406

Anonymous 04/22/26(Wed)16:58:54 No.108662406▶

>>108662356
is this real or am i getting baited
t. newfag

Anonymous
04/22/26(Wed)16:59:17 No.108662410

Anonymous 04/22/26(Wed)16:59:17 No.108662410▶

>>108662393
If this site had good moderation, posting a /r/localllama link would get you permanently banned from this general

Anonymous
04/22/26(Wed)16:59:51 No.108662416

Anonymous 04/22/26(Wed)16:59:51 No.108662416▶

>>108662393
so it's fake then?

Anonymous
04/22/26(Wed)17:00:36 No.108662423

Anonymous 04/22/26(Wed)17:00:36 No.108662423▶

File: dd964229a55cf670ddcb6693e739f694.jpg (211.1 KB)

211.1 KB JPG

>>108662353
you are a good man, thanks

Anonymous
04/22/26(Wed)17:01:03 No.108662427

Anonymous 04/22/26(Wed)17:01:03 No.108662427▶

So what's the verdict on the new qwen?

Anonymous
04/22/26(Wed)17:01:13 No.108662429

Anonymous 04/22/26(Wed)17:01:13 No.108662429▶

>>108662410
do you have some better resource for learning about this or do you just want to bitch about r*dd*t

Anonymous
04/22/26(Wed)17:01:30 No.108662431

Anonymous 04/22/26(Wed)17:01:30 No.108662431▶

>>108662416
I have one in the drawer. I'll test it if I have time

Anonymous
04/22/26(Wed)17:01:42 No.108662433

Anonymous 04/22/26(Wed)17:01:42 No.108662433▶

>>108662386
I never understood what's so traumatic about it.

Anonymous
04/22/26(Wed)17:02:09 No.108662436

Anonymous 04/22/26(Wed)17:02:09 No.108662436▶

>>108662406
you are getting master baited

Anonymous
04/22/26(Wed)17:03:39 No.108662450

Anonymous 04/22/26(Wed)17:03:39 No.108662450▶

>>108662429
>better resource
boards.4chan.org/g/lmg/

Anonymous
04/22/26(Wed)17:03:54 No.108662453

Anonymous 04/22/26(Wed)17:03:54 No.108662453▶

>>108662393
This. So. Much. This.

Upvoted!

@mods can we ban x too? #fuckelonmusk

Edit: Thanks for the gold, kind stranger!

Anonymous
04/22/26(Wed)17:07:10 No.108662471

Anonymous 04/22/26(Wed)17:07:10 No.108662471▶

>>108660554
>ngram-mod
The ngram cache resets way too early by default. Change this from 3 to 64 and recompile. Makes it usable.
>
if (n_low >= 3) {
https://github.com/ggml-org/llama.cpp/blob/8bccdbbff9d0d91d54838471f6eea182b9ab1b79/common/speculative.cpp#L747

Anonymous
04/22/26(Wed)17:07:49 No.108662475

Anonymous 04/22/26(Wed)17:07:49 No.108662475▶

>>108662353
adding to this, if you want to get the mmproj, you do:
>uv run convert_hf_to_gguf.py $PATH_TO_MODEL --outfile $mmproj-OUTPUTFILE-BF16.gguf --outtype bf16 --mproj

also with llama quantize you can get cheeky and do specific quant levels on the layers you want by passing --tensor-type (or multiple of them) if you want to keep certain layers at q8_0 bf16/f16

you did imatrix shit but HOW DO YOU KNOW IT'S WORKING????? BY MEASURING PPL!!!!!!!
first you need to obtain wikitext shit!
https://cosmo.zip/pub/datasets/wikitext-2-raw/wiki.test.raw
https://cosmo.zip/pub/datasets/wikitext-2-raw/wiki.train.raw
https://cosmo.zip/pub/datasets/wikitext-2-raw/wiki.valid.raw

then you run this shit, $DATASET should be the wiki.test.raw file
>llama_perplexity -m $PATH_TO_MODEL -f $DATASET -ngl $GPU_LAYERS_OFFLOAD
run this on both an imatrix'd and non imatrix'd quant to se how much of an effect you shit did!

>b-but muh KLD
I actually didnt look into how to do kld calculations so... sorry! just ask your fave llm or hit google lomoa!

Anonymous
04/22/26(Wed)17:09:26 No.108662489

Anonymous 04/22/26(Wed)17:09:26 No.108662489▶

OpenAI released an open source model!
https://huggingface.co/openai/privacy-filter

>1.5B parameters total and 50M active parameters.
>sparse mixture-of-experts feed-forward blocks with 128 experts total (top-4 routing per token)

Anonymous
04/22/26(Wed)17:11:15 No.108662500

Anonymous 04/22/26(Wed)17:11:15 No.108662500▶

>>108662406
The image is an old meme (go look for the gif version).

Anonymous
04/22/26(Wed)17:13:48 No.108662508

Anonymous 04/22/26(Wed)17:13:48 No.108662508▶

>>108662500
i mean i know what it is but didnt know /g/ was a guro board in 2004

Anonymous
04/22/26(Wed)17:18:11 No.108662528

Anonymous 04/22/26(Wed)17:18:11 No.108662528▶

>>108662372
it even supports lpddr5

Anonymous
04/22/26(Wed)17:18:28 No.108662530

Anonymous 04/22/26(Wed)17:18:28 No.108662530▶

>>108662489
this. changes. everything.

Anonymous
04/22/26(Wed)17:19:09 No.108662533

Anonymous 04/22/26(Wed)17:19:09 No.108662533▶

>inb4 google releases Gemma 4.5 to btfo the chinamen again

Anonymous
04/22/26(Wed)17:20:38 No.108662542

Anonymous 04/22/26(Wed)17:20:38 No.108662542▶

>>108661763
>Coder Next model
obsoleted

Anonymous
04/22/26(Wed)17:21:04 No.108662543

Anonymous 04/22/26(Wed)17:21:04 No.108662543▶

File: 8x ddr4 2400 vs 12x ddr5 6400.png (199.6 KB)

199.6 KB PNG

>>108662533
Here's hoping.
The best we can expect is competition and rivalries where they keep trying to one up each other by releasing increasingly more capable open weight models.

Anonymous
04/22/26(Wed)17:22:43 No.108662549

Anonymous 04/22/26(Wed)17:22:43 No.108662549▶

>>108662533
They just have to release the 124b and everything except Kimi and GLM 5.1 becomes worthless

Anonymous
04/22/26(Wed)17:24:26 No.108662561

Anonymous 04/22/26(Wed)17:24:26 No.108662561▶

>>108662533
is qwen 3.6 that good? I guess people are testing the 31b model right now

Anonymous
04/22/26(Wed)17:28:41 No.108662589

Anonymous 04/22/26(Wed)17:28:41 No.108662589▶

>>108662549
Was the 124B dense or MoE?

Anonymous
04/22/26(Wed)17:30:21 No.108662594

Anonymous 04/22/26(Wed)17:30:21 No.108662594▶

>>108662589
MoE iirc

Anonymous
04/22/26(Wed)17:32:37 No.108662607

Anonymous 04/22/26(Wed)17:32:37 No.108662607▶

>Here’s a fixed, minimal, working version you can drop in and load without errors...

Anonymous
04/22/26(Wed)17:34:14 No.108662614

Anonymous 04/22/26(Wed)17:34:14 No.108662614▶

File: 1775155673440.png (1.4 MB)

1.4 MB PNG

>>108662589
>>108662594

Anonymous
04/22/26(Wed)17:40:04 No.108662654

Anonymous 04/22/26(Wed)17:40:04 No.108662654▶

File: 1751577236872156.png (261.2 KB)

261.2 KB PNG

@kache shut the fuck up you leaf faggot

Anonymous
04/22/26(Wed)17:41:50 No.108662664

Anonymous 04/22/26(Wed)17:41:50 No.108662664▶

>>108662654
>nooo my dead general needs to stay dead

Anonymous
04/22/26(Wed)17:44:02 No.108662733

Anonymous 04/22/26(Wed)17:44:02 No.108662733▶

>>108662664
shut the fuck up faggot

Anonymous
04/22/26(Wed)17:44:35 No.108662736

Anonymous 04/22/26(Wed)17:44:35 No.108662736▶

My headcanon for 124B is that it BTFOs Gemini pro so they won't release it.

Anonymous
04/22/26(Wed)17:46:44 No.108662752

Anonymous 04/22/26(Wed)17:46:44 No.108662752▶

File: 1774443294977572.png (81.6 KB)

81.6 KB PNG

https://www.reddit.com/r/LocalLLaMA/comments/1ssl1xh/qwen_36_27b_is_out/
kek

Anonymous
04/22/26(Wed)17:47:02 No.108662755

Anonymous 04/22/26(Wed)17:47:02 No.108662755▶

The truth about the 124b is that it got tested on llmarena (their benchmark of choice) and under performed.

Anonymous
04/22/26(Wed)17:47:41 No.108662762

Anonymous 04/22/26(Wed)17:47:41 No.108662762▶

random thought, no judgement
how good ia gemma at being a findom?

Anonymous
04/22/26(Wed)17:50:00 No.108662773

Anonymous 04/22/26(Wed)17:50:00 No.108662773▶

>>108662489
I know people like to shit on oai releases, but this strikes me as actually somewhat useful. I'd feel better about letting my """agents""" that have access to my personal notes interact with other people with more filters in place. Especially when using comparatively-more-retarded local models that are easier for other people to fool than some 10T cloud abomination.

That being said, for the personal usecase, idk if stacking a filter model is really more effective than a regex filter with my name in it.

Anonymous
04/22/26(Wed)17:50:55 No.108662780

Anonymous 04/22/26(Wed)17:50:55 No.108662780▶

File: quant_error.png (784 KB)

784 KB PNG

>>108662321
>quantization error
Getting closer; turns out it could be easily vibe coded, in a way or another.

Anonymous
04/22/26(Wed)17:51:21 No.108662783

Anonymous 04/22/26(Wed)17:51:21 No.108662783▶

File: 1774790474325284.png (1.2 MB)

1.2 MB PNG

>>108662752
lmaooo

Anonymous
04/22/26(Wed)17:55:29 No.108662813

Anonymous 04/22/26(Wed)17:55:29 No.108662813▶

File: HGQXeUBWUAAuLgm.png (30 KB)

30 KB PNG

>>108661019
>>108661023
BEHOLD
https://x.com/saltjsx/status/2045874466958270903

Anonymous
04/22/26(Wed)17:58:13 No.108662834

Anonymous 04/22/26(Wed)17:58:13 No.108662834▶

File: 1769170576230608.png (603.7 KB)

603.7 KB PNG

https://www.reuters.com/world/asia-pacific/tencent-alibaba-talks-invest-deepseek-information-reports-2026-04-22/
Deepseek's comeback??

Anonymous
04/22/26(Wed)17:59:03 No.108662846

Anonymous 04/22/26(Wed)17:59:03 No.108662846▶

>>108662813
https://arxiv.org/abs/2309.08632
must have implemented this old paper

Anonymous
04/22/26(Wed)17:59:41 No.108662854

Anonymous 04/22/26(Wed)17:59:41 No.108662854▶

>>108662813
gemma 5 dropping in an hour will mog mog-1

Anonymous
04/22/26(Wed)18:02:32 No.108662869

Anonymous 04/22/26(Wed)18:02:32 No.108662869▶

File: HGX7Z0YWkAANkJ3.jpg (83.1 KB)

83.1 KB JPG

Should I pull the trigger on second GPU bros..
Model is getting so much better every week, days even one card might be all I needed after all.

Anonymous
04/22/26(Wed)18:05:09 No.108662882

Anonymous 04/22/26(Wed)18:05:09 No.108662882▶

>>108662869
The minimum is now 48GB. Do it if you have less.

Anonymous
04/22/26(Wed)18:05:21 No.108662883

Anonymous 04/22/26(Wed)18:05:21 No.108662883▶

>>108662846
kek

Anonymous
04/22/26(Wed)18:06:11 No.108662887

Anonymous 04/22/26(Wed)18:06:11 No.108662887▶

This shit thinks harder than my gf when I ask her how she spent so much time at the mall.

Anonymous
04/22/26(Wed)18:09:21 No.108662906

Anonymous 04/22/26(Wed)18:09:21 No.108662906▶

(strix nigga halo niggas smiling waiting for qwen3.6 122b)

Anonymous
04/22/26(Wed)18:10:48 No.108662915

Anonymous 04/22/26(Wed)18:10:48 No.108662915▶

>>108662887
>women
>reasoning budget

Anonymous
04/22/26(Wed)18:14:29 No.108662945

Anonymous 04/22/26(Wed)18:14:29 No.108662945▶

>>108662882
use case?

Anonymous
04/22/26(Wed)18:15:09 No.108662949

Anonymous 04/22/26(Wed)18:15:09 No.108662949▶

>>108662915
Yes, but
>I must do X
>wait, what if I do Y
>Y seems good
>but wait, maybe Z is better
>if I do Z then la la la la la la la la (...)

Anonymous
04/22/26(Wed)18:16:10 No.108662957

Anonymous 04/22/26(Wed)18:16:10 No.108662957▶

>>108662945
Masturbation, writing code I can't understand, and assisted storywriting.

Anonymous
04/22/26(Wed)18:16:55 No.108662965

Anonymous 04/22/26(Wed)18:16:55 No.108662965▶

>>108662887
which thinks more? kimi 2.6 or qwen 3.6?

Anonymous
04/22/26(Wed)18:17:21 No.108662970

Anonymous 04/22/26(Wed)18:17:21 No.108662970▶

>>108662915
I'm sure she uses text completion.

Anonymous
04/22/26(Wed)18:20:24 No.108662996

Anonymous 04/22/26(Wed)18:20:24 No.108662996▶

>>108662783
asdkjf;lkfsdg';lasdgkKL;ASDJF;LKASJDFLKJASDF;LKL LMAO

Anonymous
04/22/26(Wed)18:20:27 No.108662997

Anonymous 04/22/26(Wed)18:20:27 No.108662997▶

>>108662949
seems like she needs attension transformers to see all choices

Anonymous
04/22/26(Wed)18:26:19 No.108663035

Anonymous 04/22/26(Wed)18:26:19 No.108663035▶

>[59301] ggml_cuda_init: failed to initialize CUDA: unknown error
>[59301] load_backend: loaded CUDA backend from /app/libggml-cuda.so
>[59301] load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
>[59301] warning: no usable GPU found, --gpu-layers option will be ignored
>[59301] warning: one possible reason is that llama.cpp was compiled without GPU support
What the fuck? it worked fine yesterday bitch!

Anonymous
04/22/26(Wed)18:28:50 No.108663050

Anonymous 04/22/26(Wed)18:28:50 No.108663050▶

>>108663035
he pulled

Anonymous
04/22/26(Wed)18:32:41 No.108663073

Anonymous 04/22/26(Wed)18:32:41 No.108663073▶

File: 1750659002865044.png (49.7 KB)

49.7 KB PNG

lol

Anonymous
04/22/26(Wed)18:32:44 No.108663075

Anonymous 04/22/26(Wed)18:32:44 No.108663075▶

If your qwen starts going

Wait… but I should
Wait… but maybe
Wait… what if

isn't that a sign your quantization is garbage? Mine will think for a long time potentially but I stopped getting the Wait… shit when I switched from unsloth to a non shitty quant provider. That and --reasoning-budget 4096 / --reasoning-budget-message "Thinking time exceeded. Output answer now\n"

Anonymous
04/22/26(Wed)18:34:40 No.108663091

Anonymous 04/22/26(Wed)18:34:40 No.108663091▶

>>108663073
>1.28t/s
bruh...

Anonymous
04/22/26(Wed)18:35:32 No.108663095

Anonymous 04/22/26(Wed)18:35:32 No.108663095▶

>>108661186
WHAAATT THE FUCK IS THAT AUTISTIC CHINKSHIT ALL THIS TIME I THOUGHT IT WAS PROBABLY TWO PASSIONATE WHITE GUYS FROM AMERICA NOOOOO

Anonymous
04/22/26(Wed)18:35:39 No.108663098

Anonymous 04/22/26(Wed)18:35:39 No.108663098▶

>>108663073
>11min30sec
Bro....

Anonymous
04/22/26(Wed)18:36:12 No.108663104

Anonymous 04/22/26(Wed)18:36:12 No.108663104▶

File: 1768183262130103.png (173.6 KB)

173.6 KB PNG

>>108663091
Yes, and?

Anonymous
04/22/26(Wed)18:37:25 No.108663108

Anonymous 04/22/26(Wed)18:37:25 No.108663108▶

>>108663104
chudda... I kneel... >>>/wsg/6132196

Anonymous
04/22/26(Wed)18:42:55 No.108663138

Anonymous 04/22/26(Wed)18:42:55 No.108663138▶

>>108662733
this

Anonymous
04/22/26(Wed)18:46:55 No.108663170

Anonymous 04/22/26(Wed)18:46:55 No.108663170▶

hope everyone is doing their own quants now!

Anonymous
04/22/26(Wed)18:47:48 No.108663176

Anonymous 04/22/26(Wed)18:47:48 No.108663176▶

>>108663073
And I thought 2.5 was slow

Anonymous
04/22/26(Wed)18:47:55 No.108663177

Anonymous 04/22/26(Wed)18:47:55 No.108663177▶

File: glow-shine.jpg (210.3 KB)

210.3 KB JPG

>>108662957
buy a 3090 and a cloud sub. Wait like the rest of us who didn't CPUMAXX and wait...

Anonymous
04/22/26(Wed)18:49:58 No.108663194

Anonymous 04/22/26(Wed)18:49:58 No.108663194▶

>>108662614
>>108662594
Wonder if I'd be able to run it with my 7900xtx and 32gb ram...

Anonymous
04/22/26(Wed)18:50:42 No.108663196

Anonymous 04/22/26(Wed)18:50:42 No.108663196▶

just remembered this existed: https://github.com/fagenorn/handcrafted-persona-engine

Anonymous
04/22/26(Wed)18:56:26 No.108663225

Anonymous 04/22/26(Wed)18:56:26 No.108663225▶

>>108662869
Rent compute until the current craziness calms down, you'll get what you want for cheaper until prices are better for gpu and ram.

Anonymous
04/22/26(Wed)18:56:44 No.108663227

Anonymous 04/22/26(Wed)18:56:44 No.108663227▶

>>108663170
I want a one-click exe

Anonymous
04/22/26(Wed)18:57:19 No.108663235

Anonymous 04/22/26(Wed)18:57:19 No.108663235▶

>>108663196
holy slop

Anonymous
04/22/26(Wed)18:59:30 No.108663250

Anonymous 04/22/26(Wed)18:59:30 No.108663250▶

>>108663095
if this post communicates one thing it's the cultural and intellectual superiority of the great united states of america and its fine population

Anonymous
04/22/26(Wed)19:00:06 No.108663256

Anonymous 04/22/26(Wed)19:00:06 No.108663256▶

Thoughts on new Qwen 27B.
I almost like the thinking process it does. Everything makes sense. I don't even mind it drafting the whole response and then checking / fixing some things in post.
But, 2000 tokens of thinking take so fucking long, man. If this ran at 100+ tk/s it would be pretty fun.
It writes a lot like Gemma 4 31B does but thinks 1500 more tokens while running 5 tk/s faster.
If this released a month ago I'd be using it but now it's just not worth it.

Anonymous
04/22/26(Wed)19:00:27 No.108663261

Anonymous 04/22/26(Wed)19:00:27 No.108663261▶

>>108663095
Anon, it's always either chinks or jews...

Anonymous
04/22/26(Wed)19:01:03 No.108663262

Anonymous 04/22/26(Wed)19:01:03 No.108663262▶

>>108662773
>I know people like to shit on oai releases
They're not people, just shills in a different flavor.

Anonymous
04/22/26(Wed)19:26:00 No.108663392

Anonymous 04/22/26(Wed)19:26:00 No.108663392▶

>>108662654
Yooo, @yacineMTB / kache is a pedophile / hates niggers like us? Based! I've heard roon also visits here and posts pics from his personal stash.

>>108662783
Reddit cooked with this epic cuckoldry meme

Anonymous
04/22/26(Wed)19:29:11 No.108663408

Anonymous 04/22/26(Wed)19:29:11 No.108663408▶

>>108662783
LOWER I don't give a fuck about coding

Anonymous
04/22/26(Wed)19:31:33 No.108663425

Anonymous 04/22/26(Wed)19:31:33 No.108663425▶

File: Screenshot048.png (5.2 KB)

5.2 KB PNG

>>108663256
ngmi
DOA

Anonymous
04/22/26(Wed)19:33:17 No.108663435

Anonymous 04/22/26(Wed)19:33:17 No.108663435▶

File: 1776345292760771.png (36.6 KB)

36.6 KB PNG

Anonymous
04/22/26(Wed)19:33:28 No.108663437

Anonymous 04/22/26(Wed)19:33:28 No.108663437▶

>decide to migrate to emacs, ditch vim and micro
>spend 8 hours configuring init.el and still not sure about everything...
Finally I can begin editing my text files! Thinking about making elisp version of my llm client. that will probably be spaghetti.

Anonymous
04/22/26(Wed)19:34:14 No.108663443

Anonymous 04/22/26(Wed)19:34:14 No.108663443▶

https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
WTF HE ALREADY DID IT

Anonymous
04/22/26(Wed)19:36:32 No.108663461

Anonymous 04/22/26(Wed)19:36:32 No.108663461▶

>>108663449
>>108663449
>>108663449

Anonymous
04/22/26(Wed)19:59:13 No.108663610

Anonymous 04/22/26(Wed)19:59:13 No.108663610▶

>>108662906
Laughing at the strix halo niggas who couldn't figure out a double eGPU setup and have to wait for shit models they will use at shit speeds.
t. strix halo nigga

Anonymous
04/22/26(Wed)20:00:20 No.108663616

Anonymous 04/22/26(Wed)20:00:20 No.108663616▶

>>108663610
No one cares about your expensive ewaste though

Anonymous
04/22/26(Wed)20:00:58 No.108663621

Anonymous 04/22/26(Wed)20:00:58 No.108663621▶

>>108663616
I thought that was the thread's subject

Anonymous
04/22/26(Wed)20:02:15 No.108663628

Anonymous 04/22/26(Wed)20:02:15 No.108663628▶

File: 1763559247414133.jpg (25.7 KB)

25.7 KB JPG

>>108663621

Anonymous
04/22/26(Wed)20:13:49 No.108663705

Anonymous 04/22/26(Wed)20:13:49 No.108663705▶

>>108663435
cute

Anonymous
04/22/26(Wed)20:25:05 No.108663783

Anonymous 04/22/26(Wed)20:25:05 No.108663783▶

>>108661009
learn to solder them, you can buy the dimms without chips in aliexpress

Anonymous
04/22/26(Wed)21:41:17 No.108664219

Anonymous 04/22/26(Wed)21:41:17 No.108664219▶

>>108662543
Is that right? My 8x 3200 ddr4 w/ 3090 does 10 tk/s on q4 glm 4.6. Are you saying that going to 12x 6400 ddr5 will only get me 5 tk/s more? Shouldn't it be more than 20 tk/s?

Anonymous
04/22/26(Wed)21:58:47 No.108664332

Anonymous 04/22/26(Wed)21:58:47 No.108664332▶

File: 00000-1378487878.png (1.3 MB)

1.3 MB PNG

>>108662834
Sounds more like a potential tech sharing agreement in the works. West acts like China's all kumbaya and shares everything, but ofc they are hyper competitive.
If you're the DS founder, gotta love the $20B valuation on his side-gig. lol. Idk what that guy makes at his real job but this implies he's now a multibillionaire at minimum.

Anonymous
04/22/26(Wed)22:58:52 No.108664641

Anonymous 04/22/26(Wed)22:58:52 No.108664641▶

>>108664332
That would be more of something that the CCP would arrange under cover, an investment opportunity like this wouldn't cover tech and it's not like they can't just look at what Deepseek publishes to get something out of whatever they are planning.

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108659983

🔍 Search & Sort