/g/ - Thread 108252185

/g/

Thread #108252185

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 02/27/26(Fri)13:14:55 No.108252185

/lmg/ - Local Models General Anonymous 02/27/26(Fri)13:14:55 No.108252185 [Reply]▶

File: b7ec27b0-de98-49e3-b6db-1d276ca748e5.png (2 MB)

2 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108246772 & >>108241321

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

479 RepliesView Thread

Showing all 479 replies.

Anonymous
02/27/26(Fri)13:15:28 No.108252188

Anonymous 02/27/26(Fri)13:15:28 No.108252188▶

File: 1701626182006697.png (2.1 MB)

2.1 MB PNG

►Recent Highlights from the Previous Thread: >>108246772

--Paper: DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference:
>108247376 >108247408 >108247469 >108247651 >108247780
--Papers:
>108248442 >108249269 >108249326
--Qwen benchmarks debated, MoE efficiency questioned, neural steganography project discussed:
>108249710 >108249716 >108249732 >108249744 >108249772 >108249786 >108249789 >108249821 >108249843 >108249792 >108249832 >108249850 >108249868 >108249875 >108249882 >108249905 >108249950 >108249985 >108249794
--MoE vs dense model roleplay performance and ablation effectiveness:
>108249916 >108249923 >108250033 >108250074 >108250099 >108250116 >108250143 >108250205 >108250292 >108250330 >108250395 >108250418 >108250440 >108250491 >108250543 >108250550 >108250731 >108250772 >108250551 >108250554 >108250565 >108250610 >108250627 >108250551 >108250580 >108250610 >108250645
--Dense 27B outperforming MoE 35B in knowledge benchmarks:
>108248187 >108248207 >108248249 >108249636
--Running Qwen 3.5 27B on 16GB VRAM with reasoning mode tweaks:
>108249215 >108249268 >108249271 >108249305 >108249316 >108249357 >108249418 >108250671 >108250708 >108250747 >108250802 >108250819 >108249966 >108250051 >108250148
--AI thinking steps improve performance but face token efficiency tradeoffs:
>108249084 >108249098 >108249106 >108249127 >108249129 >108249133 >108249155 >108249157 >108249281 >108249294
--Qwen 27B dense model outperforming larger MoE models in benchmarks:
>108248368 >108248401 >108248420 >108248438 >108248443 >108248570 >108249019 >108249031
--Severe Q4 quant degradation in new 35B model:
>108248366 >108248374 >108248377 >108248403
--Oobabooga stagnation and potential alternatives:
>108248545 >108248557 >108248579 >108248608 >108248572 >108248588 >108248598 >108248617 >108248768
--Miku (free space):
>108250309

►Recent Highlight Posts from the Previous Thread: >>108246776

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/27/26(Fri)13:26:43 No.108252243

Anonymous 02/27/26(Fri)13:26:43 No.108252243▶

>>108252185
Sex with mechanical miku

Anonymous
02/27/26(Fri)13:39:00 No.108252306

Anonymous 02/27/26(Fri)13:39:00 No.108252306▶

Qwen 35B is surprisingly so cucked it's so bad for my RAG-based chat client. I'm back to Kimi-Linear 48B. LFM2 seems okay too for chat, but not for coding obviously.

Anonymous
02/27/26(Fri)13:39:54 No.108252312

Anonymous 02/27/26(Fri)13:39:54 No.108252312▶

Yeah Qwen is total garbage. Wish it were good.

Anonymous
02/27/26(Fri)13:52:26 No.108252390

Anonymous 02/27/26(Fri)13:52:26 No.108252390▶

i've never tried Qwen but man it is so fucking trash.

Anonymous
02/27/26(Fri)13:55:11 No.108252412

Anonymous 02/27/26(Fri)13:55:11 No.108252412▶

>>108252306
It's a weird mixed bag, it suffers a lot from getting stuck in a safetyslop loop whenever you "trigger" it (interestingly Grok 4.20 has this exact same problem).
But if you can avoid setting it off it seems fine with explicit loli content. Very strange model to work with for sure.

Anonymous
02/27/26(Fri)13:55:20 No.108252414

Anonymous 02/27/26(Fri)13:55:20 No.108252414▶

>>108252390
>>108252312
Are ERP only fags really like this?

Anonymous
02/27/26(Fri)13:55:46 No.108252415

Anonymous 02/27/26(Fri)13:55:46 No.108252415▶

File: based.png (81.1 KB)

81.1 KB PNG

the 27b was okay for some web frontend changes i asked for, i guess there are better options though.

Anonymous
02/27/26(Fri)13:59:07 No.108252434

Anonymous 02/27/26(Fri)13:59:07 No.108252434▶

>>108252414
can you blame them? last worthy model get got was over a year ago

Anonymous
02/27/26(Fri)14:01:56 No.108252447

Anonymous 02/27/26(Fri)14:01:56 No.108252447▶

>>108252306
>LFM2 seems okay
opinion instantly disregarded
I've rarely seen a model as retarded, ignorant of the world and bad at languages other than English as this.

Anonymous
02/27/26(Fri)14:05:43 No.108252457

Anonymous 02/27/26(Fri)14:05:43 No.108252457▶

>>108252434
Still sad to see how narrow minded they are. Also I doubt most of them have good rigs to begin with

Anonymous
02/27/26(Fri)14:08:27 No.108252480

Anonymous 02/27/26(Fri)14:08:27 No.108252480▶

>you have to praise the stem code slop like on r*ddit

Anonymous
02/27/26(Fri)14:09:09 No.108252488

Anonymous 02/27/26(Fri)14:09:09 No.108252488▶

Whining faggot aside what should I always aim for when it comes to context size?

Anonymous
02/27/26(Fri)14:09:34 No.108252491

Anonymous 02/27/26(Fri)14:09:34 No.108252491▶

>>108252414
I wanted the new qwen to be good for programming but I switched back to GLM 4.7

Anonymous
02/27/26(Fri)14:10:21 No.108252493

Anonymous 02/27/26(Fri)14:10:21 No.108252493▶

>>108252457
>narrow minded
kek okay I actually really loved the way it constantly contradicted itself and had to be heavily wrangled to produce its incoherent slop

Anonymous
02/27/26(Fri)14:10:23 No.108252494

Anonymous 02/27/26(Fri)14:10:23 No.108252494▶

>>108252488
Depends on how much you need. However much that is, a little more.

Anonymous
02/27/26(Fri)14:11:26 No.108252502

Anonymous 02/27/26(Fri)14:11:26 No.108252502▶

>>108252493
Skill issue

Anonymous
02/27/26(Fri)14:12:51 No.108252507

Anonymous 02/27/26(Fri)14:12:51 No.108252507▶

>>108252502
On your part. I guess some fags will devour anything their favorite AI corp shits out.

Anonymous
02/27/26(Fri)14:13:19 No.108252510

Anonymous 02/27/26(Fri)14:13:19 No.108252510▶

Anybody experimented with different ways to inject information in the context, for example RAG?
Not the extraction techniques, but where and how to add the information to the chat history.
I started with the vanilla "everything in the system prompt" approach, but now I'm experimenting with adding those as faux tool call results after the latest User message.
I might try adding the fake tool calling result to between the last assistant message and the last user message to compare the behavior.

Anonymous
02/27/26(Fri)14:13:38 No.108252514

Anonymous 02/27/26(Fri)14:13:38 No.108252514▶

>>108252502
>>108252494
>>108252493
>>108252491
I made neural stegnagprahy using sub 1b model but i need to control warmth/randomness of distribution to be more percise.

>qwen instruct models are no good
>gpt2 large is quite cluster fuck
>tiny llama is good but at floating point 32 it starts to break down chrome as an extension due to how much ram it uses

any solutions to reduce tone so it fits in better in human environment? should i try mistral that has been quantified but this method only works with fp32 so computers can communicate

Anonymous
02/27/26(Fri)14:13:53 No.108252518

Anonymous 02/27/26(Fri)14:13:53 No.108252518▶

>>108252507
Whine whine bitch and moan, post your specs now so I can laugh at you

Anonymous
02/27/26(Fri)14:15:43 No.108252530

Anonymous 02/27/26(Fri)14:15:43 No.108252530▶

>>108252514
Are we getting flooded with bots?

Anonymous
02/27/26(Fri)14:16:34 No.108252535

Anonymous 02/27/26(Fri)14:16:34 No.108252535▶

File: nb.png (116.3 KB)

116.3 KB PNG

glm-5 really is trained on lmg isn't it?

Anonymous
02/27/26(Fri)14:18:02 No.108252549

Anonymous 02/27/26(Fri)14:18:02 No.108252549▶

>>108252488
Intelligence falls off a cliff after 32k

Anonymous
02/27/26(Fri)14:18:41 No.108252555

Anonymous 02/27/26(Fri)14:18:41 No.108252555▶

>>108252518
24gb vram. In theory you'd get better performance at that size than two years ago, but that turns out not to be the case.

Anonymous
02/27/26(Fri)14:19:56 No.108252564

Anonymous 02/27/26(Fri)14:19:56 No.108252564▶

>>108252530
no im a human anon

>https://arxiv.org/abs/1909.01496

read a paper recently where harvard AI team was able to send hidden messages by hijacking probability distribution of llms by controlling. Then using a seed/binary they could have it be encyrpted stegnography that passes off normal human language. They used GPT2 XL which i can't run at fp32 due to hardware constraints so im gearing it towards small models with new model architecture that might have an edge. Are all of u retarded and unable to see how useful this is?

u could use discord, twitter, 4chan and pass a key around then use a sub 1b model to talk in open while message is only known by people who hold weights, seed and information.

Anonymous
02/27/26(Fri)14:20:29 No.108252570

Anonymous 02/27/26(Fri)14:20:29 No.108252570▶

I hate being a 24GB VRAMlet. I want to use big models and have a huge context size.

Anonymous
02/27/26(Fri)14:20:45 No.108252574

Anonymous 02/27/26(Fri)14:20:45 No.108252574▶

>>108252447
Let me guess, ERP?

Anonymous
02/27/26(Fri)14:22:08 No.108252584

Anonymous 02/27/26(Fri)14:22:08 No.108252584▶

>>108252488
Less can often be better if using the truncate middle strat, I often use around 16384 (sometimes more, sometimes less, depending on the model).
For multi-modal or reasoning I usually start by doubling that, but too high and it loses the plot or gets stuck in a loop more often.

Anonymous
02/27/26(Fri)14:22:13 No.108252585

Anonymous 02/27/26(Fri)14:22:13 No.108252585▶

>>108252570
I was feeling limited at 32gb until recently. I would still like to get another 32gb at the very least but that's not going to happen without getting ripped off in today's market

Anonymous
02/27/26(Fri)14:23:00 No.108252587

Anonymous 02/27/26(Fri)14:23:00 No.108252587▶

>>108252535
Ask it who are the most prominent finetrooners and what resident recognizable schizos are there in lmg

Anonymous
02/27/26(Fri)14:23:14 No.108252589

Anonymous 02/27/26(Fri)14:23:14 No.108252589▶

>>108252574
are you retarded? reading comprehension of a 1b llm?

Anonymous
02/27/26(Fri)14:23:22 No.108252590

Anonymous 02/27/26(Fri)14:23:22 No.108252590▶

>>108252530
I'm not entirely sure it's a bot. They can spell steganography just fine.
Could be a run-off-the-mill schizo.
>>108252564
Confirmed. Why did you open up with the context question? Or did you just click on a random post to reply? Also, why do you want to distribute child porn?

Anonymous
02/27/26(Fri)14:24:27 No.108252603

Anonymous 02/27/26(Fri)14:24:27 No.108252603▶

>>108252589
>1b
Ahem... it's like, at least twice that much.

Anonymous
02/27/26(Fri)14:25:20 No.108252610

Anonymous 02/27/26(Fri)14:25:20 No.108252610▶

>>108252570
>>108252574
>>108252584
>>108252585
why are none of u interested? goy cattle is what u are for not being impressed. You could be shown gold and still ignore it
>>108252590
>Confirmed. Why did you open up with the context question? Or did you just click on a random post to reply? Also, why do you want to distribute child porn?

that contains hidden message so it was an example of hidden code plus this is for privacy not what ever ur considiering u fag. I just wanna give anons option to talk privately on internet without zog on their back. A distributed system of communications

Anonymous
02/27/26(Fri)14:27:15 No.108252625

Anonymous 02/27/26(Fri)14:27:15 No.108252625▶

>>108252514
>stegnagprahy
>>108252564
>stegnography
>>108252610
>chudaphoginy

Anonymous
02/27/26(Fri)14:28:19 No.108252634

Anonymous 02/27/26(Fri)14:28:19 No.108252634▶

>>108252610
Why did you open up with the context question? Or did you just click on a random post to reply?
I've seen your patterns before.

Anonymous
02/27/26(Fri)14:29:18 No.108252638

Anonymous 02/27/26(Fri)14:29:18 No.108252638▶

bot^

Anonymous
02/27/26(Fri)14:29:49 No.108252641

Anonymous 02/27/26(Fri)14:29:49 No.108252641▶

>>108252610
Fuck off you needy annoying faggot

Anonymous
02/27/26(Fri)14:31:03 No.108252646

Anonymous 02/27/26(Fri)14:31:03 No.108252646▶

Right when I possibly actually need Ooba for something it's unusable

Anonymous
02/27/26(Fri)14:31:54 No.108252652

Anonymous 02/27/26(Fri)14:31:54 No.108252652▶

>>108252634
>Why did you open up with the context question? Or did you just click on a random post to reply?

i picked random post to reply to anon. Im just excited and want some other anons to help build this tool essentially privacy on demand with relative hardware use. Maybe one of u can fine tune a model to be more of a summarizer/reworder so you could reply , have model wrap/padded it up while containing info

>>108252641
FUCK U GLOW NIGGER U THINK THESE METHODS WORK

Anonymous
02/27/26(Fri)14:32:41 No.108252656

Anonymous 02/27/26(Fri)14:32:41 No.108252656▶

>>108252491
I did exactly the same thing, I spent a good while trying to tard wrangle the big 3.5 model into writing a mid length script, about 500 lines, but by the time it needs amendment (and it does, inevitably) it loses track of context and becomes completely unreliable. Not even that deep in. Felt busted desu

Anonymous
02/27/26(Fri)14:33:00 No.108252658

Anonymous 02/27/26(Fri)14:33:00 No.108252658▶

>>108252610
You should probably just go read up on encryption instead if you want to LARP as epic hackerman from the movie you just watched bud.

Anonymous
02/27/26(Fri)14:33:04 No.108252660

Anonymous 02/27/26(Fri)14:33:04 No.108252660▶

>>108252652
You shouldn't have replied to my post with your non related shit faggot schizo

Anonymous
02/27/26(Fri)14:35:00 No.108252674

Anonymous 02/27/26(Fri)14:35:00 No.108252674▶

>>108252652
>i picked random post to reply to
Schizo and a retard. The worst possible combination.
>Maybe one of u...
Fuck off.

Anonymous
02/27/26(Fri)14:35:14 No.108252676

Anonymous 02/27/26(Fri)14:35:14 No.108252676▶

File: 1760558362110609.png (9.5 KB)

9.5 KB PNG

>>108252589
Can you give example? I'm using 24B, it's okay for synthesizing RAG summaries. It's not officially announced but it understands Hebrew too!

Anonymous
02/27/26(Fri)14:35:52 No.108252680

Anonymous 02/27/26(Fri)14:35:52 No.108252680▶

>>108252658
>You should probably just go read up on encryption instead if you want to LARP as epic hackerman from the movie you just watched bud.

midwit can't understand what im saying

ur a retard fuck u
>>108252660
FUCK U AS WELL

FUCK ALL OF U for not seeing truth i tried to save you

Anonymous
02/27/26(Fri)14:36:52 No.108252686

Anonymous 02/27/26(Fri)14:36:52 No.108252686▶

>>108252676
I knew it u jews were attempting to psychologically manipulate me

Anonymous
02/27/26(Fri)14:37:39 No.108252694

Anonymous 02/27/26(Fri)14:37:39 No.108252694▶

File: nah.png (120.1 KB)

120.1 KB PNG

>>108252587
tried a few gens, it doesn't really know

Anonymous
02/27/26(Fri)14:39:08 No.108252703

Anonymous 02/27/26(Fri)14:39:08 No.108252703▶

File: file.png (11.1 KB)

11.1 KB PNG

>>108252694
knows baker and cuda dev tho

Anonymous
02/27/26(Fri)14:39:10 No.108252704

Anonymous 02/27/26(Fri)14:39:10 No.108252704▶

>>108252686
Calm down schizo, that was just a test.

Anonymous
02/27/26(Fri)14:40:09 No.108252708

Anonymous 02/27/26(Fri)14:40:09 No.108252708▶

>>108252306
>Qwen 35B is surprisingly so cucked
use heretic, it didn't change its smartess after some comparisons test I made

Anonymous
02/27/26(Fri)14:40:35 No.108252711

Anonymous 02/27/26(Fri)14:40:35 No.108252711▶

>>108252704
why are u targetting me? what have i done to u? All i wanted to was share my personal project but u went out of ur way to target me for no reason.You want to steal my project dont u

Anonymous
02/27/26(Fri)14:41:14 No.108252718

Anonymous 02/27/26(Fri)14:41:14 No.108252718▶

>>108252694
>The esl that calls everything troon

Anonymous
02/27/26(Fri)14:41:31 No.108252721

Anonymous 02/27/26(Fri)14:41:31 No.108252721▶

Is exl3 still the best for speed memes?

Anonymous
02/27/26(Fri)14:41:48 No.108252723

Anonymous 02/27/26(Fri)14:41:48 No.108252723▶

>>108252652
I understand what you are saying, that you want to look into sending encrypted messages via manipulating token probabilities, but this website is full of retarded teenagers and bedroom masturbators and personally i don't see enough value in it to read a paper
Also i hate to be the autist to bring you up on this but if you're writing every word in full, you don't need to abbreviate 'you'. People will assume you are retarded and it saves you no time. Maybe you're ESL though in which case i understand it may not be an intuitive nuance to you

Anonymous
02/27/26(Fri)14:41:52 No.108252725

Anonymous 02/27/26(Fri)14:41:52 No.108252725▶

>>108252718
>gemma
it does know some things for sure

Anonymous
02/27/26(Fri)14:43:16 No.108252738

Anonymous 02/27/26(Fri)14:43:16 No.108252738▶

>>108252723
i was told by my model that using u makes u look cool on the net though

Anonymous
02/27/26(Fri)14:43:20 No.108252739

Anonymous 02/27/26(Fri)14:43:20 No.108252739▶

>>108252718
as god intended

Anonymous
02/27/26(Fri)14:43:30 No.108252743

Anonymous 02/27/26(Fri)14:43:30 No.108252743▶

>>108252723
>I understand what you are saying, that you want to look into sending encrypted messages via manipulating token probabilities, but this website is full of retarded teenagers and bedroom masturbators and personally i don't see enough value in it to read a paper

why not? why is there no utility in this

Anonymous
02/27/26(Fri)14:44:11 No.108252747

Anonymous 02/27/26(Fri)14:44:11 No.108252747▶

File: 1741041838333686.png (35.5 KB)

35.5 KB PNG

>ik_llamacpp MTP support merged
>it's slower than running models without MTP
Could it be that llama.cpp is just fundamentally not compatible with this? It seems to work fine for vllm so it can't be MTP itself.

Anonymous
02/27/26(Fri)14:47:54 No.108252769

Anonymous 02/27/26(Fri)14:47:54 No.108252769▶

File: 1764749470093837.png (49.1 KB)

49.1 KB PNG

>>108252708
Unfortunately it doesn't work too.
You can see on RAG _794 heretic breaks the model somehow.

Anonymous
02/27/26(Fri)14:48:20 No.108252770

Anonymous 02/27/26(Fri)14:48:20 No.108252770▶

>>108252747
>Air IQ4_XS
You need high bandwidth to make it worth it. If you're struggling to load air, you don't have the spare room for speculation. Post the link. Is he also leaving layers in ram too? I refuse to believe air can only do 11t/s fully on gpu.

Anonymous
02/27/26(Fri)14:50:26 No.108252787

Anonymous 02/27/26(Fri)14:50:26 No.108252787▶

Is anyone else still using the n_sigma sampler? I still use it for Qwen3.5-35B. The outputs are decent quality (if you don't mind neurotic thinking blocks, rigid behavior and reasoning breakdown at long context lengths), without any repetition issues.

Anonymous
02/27/26(Fri)14:51:18 No.108252791

Anonymous 02/27/26(Fri)14:51:18 No.108252791▶

>>108252770
https://github.com/ikawrakow/ik_llama.cpp/pull/1270
It's from the PR

Anonymous
02/27/26(Fri)14:52:17 No.108252795

Anonymous 02/27/26(Fri)14:52:17 No.108252795▶

>>108252585
I don't want to run multiple GPUs. They need to find a way to make this shit work on regular RAM or something.

Anonymous
02/27/26(Fri)14:52:22 No.108252798

Anonymous 02/27/26(Fri)14:52:22 No.108252798▶

>>108252787
I no longer use samplers besides temperature and top-p (sometimes substituted with min-p)

Anonymous
02/27/26(Fri)14:54:08 No.108252813

Anonymous 02/27/26(Fri)14:54:08 No.108252813▶

>>108252795
Models are getting better the reason why people are excited about the new Qwen is because it closes the gap.
I just want another high vram gpu to throw in a sever outside my main rig to serve other people in my house

Anonymous
02/27/26(Fri)14:56:36 No.108252824

Anonymous 02/27/26(Fri)14:56:36 No.108252824▶

>>108252747
>Could it be that llama.cpp is just fundamentally not compatible with this?
no, it's clearly the IK people doing something retarded (and merging it despite it being retarded)
>Acceptance rate seems quite low: 25-30% for single token, just 16% for 4 drafted tokens. Is this expected?
it's slower because their drafting never hits the mark and it's not due to an inherent performance thing, rather it's an inaccuracy problem
the culture of merging things while broken is.. interesting.

Anonymous
02/27/26(Fri)14:58:19 No.108252827

Anonymous 02/27/26(Fri)14:58:19 No.108252827▶

>>108252747
>>108252770 (cont)
>>108252791
Hmm. Maybe 11t/s is fine, considering he's 15k tokens in. I'm not sure.
>gen 1122, 939, and 1157 tokens
Also, shouldn't the replies be the same, regardless of MTP or not, or is the retard not using deterministic tests?
>Could it be that llama.cpp is just fundamentally not compatible with this?
Nobody competent or careful enough implemented it yet. They're just number churning programs. They just need to churn better.

Anonymous
02/27/26(Fri)14:59:00 No.108252831

Anonymous 02/27/26(Fri)14:59:00 No.108252831▶

>>108252813
If I wasn't a poorfag I'd consider setting up an AI server (already spent way too much on my current server/nas). How much electric does it use running that 24/7?

Anonymous
02/27/26(Fri)14:59:06 No.108252833

Anonymous 02/27/26(Fri)14:59:06 No.108252833▶

>>108252694
was the finetuned on the /pol/ dataset that raises any model's IQ by 6,000,000 points?

Anonymous
02/27/26(Fri)15:00:41 No.108252844

Anonymous 02/27/26(Fri)15:00:41 No.108252844▶

>>108252795
Threadripper or EPYC. Rome + DDR4 if you’re a poorfag
There even used to be a guide rentry for it until “mysterious forces” got it removed from the internet

Anonymous
02/27/26(Fri)15:02:16 No.108252851

Anonymous 02/27/26(Fri)15:02:16 No.108252851▶

>>108252813
I want to get a second 7900 XTX to see if that lets me run bigger language models (48GB instead of 24) but it's an expensive experiment.
I have a few lesser nvidia cards doing stable diffusion/video gen stuff already... In hindsight I probably should've just got one big RTX 6000 instead of many cards but oh well.

Anonymous
02/27/26(Fri)15:08:05 No.108252890

Anonymous 02/27/26(Fri)15:08:05 No.108252890▶

https://www.youtube.com/watch?v=aV4j5pXLP-I
>PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks.
APOLOGIZE TO SLOPTUBERS

Anonymous
02/27/26(Fri)15:08:17 No.108252892

Anonymous 02/27/26(Fri)15:08:17 No.108252892▶

>>108252844
I know rentry can get "claimed" or otherwise taken over. Did that actually happen here?
I ended up vibecoding a simple local engine that follows rentry formatting so that I can back up and look at them off one of my servers. Couldn't find anything off the shelf that wasn't a bloated mess.

Anonymous
02/27/26(Fri)15:09:19 No.108252897

Anonymous 02/27/26(Fri)15:09:19 No.108252897▶

>>108252824
I'm pretty sure all mainline llama.cpp attempts of implementing MTP had the same issue. They never ended up getting merged because of this.
It's fine on other backends so I wonder what causes this consistent inaccuracy between several entirely different attempts of implementing it across llama.cpp and ik_

Anonymous
02/27/26(Fri)15:10:31 No.108252904

Anonymous 02/27/26(Fri)15:10:31 No.108252904▶

>>108252890
from sloptuber to benchmaxx sloptuner
amazing

Anonymous
02/27/26(Fri)15:10:40 No.108252906

Anonymous 02/27/26(Fri)15:10:40 No.108252906▶

>>108252890
Isn't he a little late with the current model out?

Anonymous
02/27/26(Fri)15:12:12 No.108252920

Anonymous 02/27/26(Fri)15:12:12 No.108252920▶

>>108252890
I watched it an hour ago, he just finetuned it to be better at aider's retarded edit format. He didn't actually improve its programming ability.

Anonymous
02/27/26(Fri)15:13:26 No.108252926

Anonymous 02/27/26(Fri)15:13:26 No.108252926▶

>>108252892
I think it got flagged for illegal content and now it 404s. I think you can still dig it up on archive.org

Anonymous
02/27/26(Fri)15:14:28 No.108252929

Anonymous 02/27/26(Fri)15:14:28 No.108252929▶

>>108252851
You can use the vulkan backend on llama.cpp to spread the model across AMD and nvidia cards, I do it with a 9060 and two 3060s and it works well enough.

Anonymous
02/27/26(Fri)15:16:10 No.108252941

Anonymous 02/27/26(Fri)15:16:10 No.108252941▶

>>108252813
>qwen
Should I be using the 27b one on my 7900xtx?

Anonymous
02/27/26(Fri)15:16:47 No.108252942

Anonymous 02/27/26(Fri)15:16:47 No.108252942▶

>>108252941
If you can fit it sure

Anonymous
02/27/26(Fri)15:18:50 No.108252949

Anonymous 02/27/26(Fri)15:18:50 No.108252949▶

>>108252941
I've been using the 35B Q4_K_S version on mine and it's fast enough for my purposes.

Anonymous
02/27/26(Fri)15:18:56 No.108252950

Anonymous 02/27/26(Fri)15:18:56 No.108252950▶

>>108252942
Dunno. Guess I'll try. Gemma 3 27b works but it's kinda slow. Still new to this and all the technical stuff goes over my head.

Anonymous
02/27/26(Fri)15:19:51 No.108252960

Anonymous 02/27/26(Fri)15:19:51 No.108252960▶

>>108252929
Hmm neat, I'm already using that backend so maybe I'll try it out (nvidia cards are currently in another PC)

Anonymous
02/27/26(Fri)15:21:54 No.108252975

Anonymous 02/27/26(Fri)15:21:54 No.108252975▶

>>108252949
What do you use it for? I mainly rp in sillytavern and I'm really feeling that 16k context limit with gemma.

Anonymous
02/27/26(Fri)15:25:49 No.108253002

Anonymous 02/27/26(Fri)15:25:49 No.108253002▶

>>108252975
I've mostly just been testing the vision combined with it writing stable diffusion prompts (basically feeding the images it generates back into itself so it can "self critique/refine") and how different settings and different context sizes effect the output so far. It seems quite good at it.
Haven't tested RP or anything yet.

Anonymous
02/27/26(Fri)15:32:20 No.108253040

Anonymous 02/27/26(Fri)15:32:20 No.108253040▶

>>108252185
So does Qwen 3.5 122B beat GLM 4.5 Air? Would be nice to know before committing to a download.

Anonymous
02/27/26(Fri)15:32:24 No.108253041

Anonymous 02/27/26(Fri)15:32:24 No.108253041▶

>>108252920
Who the fuck still uses aider? It's been irrelevant for over a year.

Anonymous
02/27/26(Fri)15:33:43 No.108253052

Anonymous 02/27/26(Fri)15:33:43 No.108253052▶

>>108252718
It is me! Death to mikutroons.

Anonymous
02/27/26(Fri)15:37:11 No.108253078

Anonymous 02/27/26(Fri)15:37:11 No.108253078▶

>>108253040
27B ass rapes it

Anonymous
02/27/26(Fri)15:38:30 No.108253087

Anonymous 02/27/26(Fri)15:38:30 No.108253087▶

>>108252892
Do you have the rentry address handy or would I need to dig it out of the archives?

llama.cpp CUDA dev
02/27/26(Fri)15:46:05 No.108253131

llama.cpp CUDA dev 02/27/26(Fri)15:46:05 No.108253131▶

>>108252747
Speculative methods of any kind work because they allow for higher arithmetic intensity in the matrix multiplications: you can do more useful work per loaded weight value.
However, MoE models in particular have the issue that they scale comparatively poorly with low batch sizes >1, for upstream GLM 4.5 Air only becomes 45%/77% faster for batch sizes 2/3 when the theoretical limit would be 100%/200%.
The problem is that for the first few tokens the likelihood of being able to use an expert matrix for more than one token is rather low.
This problem gets even worse the more sparse a MoE model is.
There is also the issue that in the upstream llama.cpp repository for MoE models batch sizes 2 or 3 are optimized relatively poorly in the CUDA backend, I don't know whether there are additional optimizations in ik_llama.cpp.

Anonymous
02/27/26(Fri)15:49:24 No.108253146

Anonymous 02/27/26(Fri)15:49:24 No.108253146▶

>>108253131
>only becomes 45%/77% faster
That s a lot better than 50% slower.

Anonymous
02/27/26(Fri)15:52:11 No.108253164

Anonymous 02/27/26(Fri)15:52:11 No.108253164▶

File: file.png (124.4 KB)

124.4 KB PNG

This guy wasting everyone's time again.

Anonymous
02/27/26(Fri)15:52:39 No.108253170

Anonymous 02/27/26(Fri)15:52:39 No.108253170▶

>>108253087
NM found it.
>>108252844
I've cut/paste it back into a new rentry. I'll fix up the formatting later.
https://rentry.org/miqumaxx_V2

Anonymous
02/27/26(Fri)15:55:25 No.108253186

Anonymous 02/27/26(Fri)15:55:25 No.108253186▶

>>108252892
>I know rentry can get "claimed" or otherwise taken over. Did that actually happen here?
Far more likely the original author decided to delete it for whatever petty reason.

Anonymous
02/27/26(Fri)15:56:37 No.108253197

Anonymous 02/27/26(Fri)15:56:37 No.108253197▶

>>108253164
ngl, if I see a PR with such poor code all I'll be doing is to block the guy and never let enter my space ever again

Anonymous
02/27/26(Fri)15:57:07 No.108253199

Anonymous 02/27/26(Fri)15:57:07 No.108253199▶

File: 1511270610943.jpg (40.5 KB)

40.5 KB JPG

If we can make Q6 why is there no fp6?

Anonymous
02/27/26(Fri)15:59:06 No.108253213

Anonymous 02/27/26(Fri)15:59:06 No.108253213▶

>>108253199
why would you want fp6? fp8 is already getting destroyed by Q8

Anonymous
02/27/26(Fri)15:59:29 No.108253216

Anonymous 02/27/26(Fri)15:59:29 No.108253216▶

>>108253199
Because there's no hardware support for that I imagine.

Anonymous
02/27/26(Fri)16:03:39 No.108253254

Anonymous 02/27/26(Fri)16:03:39 No.108253254▶

>>108253213
>>108253216
The appleshit mlx is close to goofs or fps? I see it also has quants for 6,5,4.

Anonymous
02/27/26(Fri)16:05:44 No.108253269

Anonymous 02/27/26(Fri)16:05:44 No.108253269▶

>>108253199
>>108253216
Blackwell supports fp6, as well as fp4 and fp8 afaik.

Also not sure that whatever is good for training is necessarily good for inference.

Anonymous
02/27/26(Fri)16:06:28 No.108253276

Anonymous 02/27/26(Fri)16:06:28 No.108253276▶

How do you guys organize all your models?

Anonymous
02/27/26(Fri)16:07:29 No.108253280

Anonymous 02/27/26(Fri)16:07:29 No.108253280▶

>>108253276
delete the old one when the new one comes out, I don't need obsolete shit

Anonymous
02/27/26(Fri)16:08:47 No.108253287

Anonymous 02/27/26(Fri)16:08:47 No.108253287▶

>>108253269
>Blackwell supports fp6
It does? I knew it supports fp4, but fp6?
Funky.

Anonymous
02/27/26(Fri)16:09:10 No.108253291

Anonymous 02/27/26(Fri)16:09:10 No.108253291▶

>>108253131
So MTP and similar speculative methods might be what make dense (or at least denser) models relevant again?

Anonymous
02/27/26(Fri)16:09:10 No.108253292

Anonymous 02/27/26(Fri)16:09:10 No.108253292▶

File: Screenshot at 2026-02-28 03-08-50.png (9.9 KB)

9.9 KB PNG

>>108253276
By keeping everything.

Anonymous
02/27/26(Fri)16:12:24 No.108253308

Anonymous 02/27/26(Fri)16:12:24 No.108253308▶

>>108253186
Nope. I can’t tripfag right now to prove it’s me (on the road), but I didn’t kill it and won’t sign up for discord to dispute whatever bullshit got it deleted

Anonymous
02/27/26(Fri)16:13:47 No.108253318

Anonymous 02/27/26(Fri)16:13:47 No.108253318▶

>>108253170
Thanks bro. Glad someone revived it from the dead
One day hardware will be cheaper again and the principles might still be useful

Anonymous
02/27/26(Fri)16:18:37 No.108253347

Anonymous 02/27/26(Fri)16:18:37 No.108253347▶

>>108253276
big folder of ggufs

Anonymous
02/27/26(Fri)16:27:41 No.108253407

Anonymous 02/27/26(Fri)16:27:41 No.108253407▶

>>108252890
What a fucking nigger

Anonymous
02/27/26(Fri)16:27:48 No.108253408

Anonymous 02/27/26(Fri)16:27:48 No.108253408▶

>>108253347
this.

Anonymous
02/27/26(Fri)16:29:30 No.108253422

Anonymous 02/27/26(Fri)16:29:30 No.108253422▶

File: that's his whole legacy lool.png (598.5 KB)

598.5 KB PNG

>>108253407
>What a fucking nigger

Anonymous
02/27/26(Fri)16:29:38 No.108253423

Anonymous 02/27/26(Fri)16:29:38 No.108253423▶

>>108253276
I have a 4TB SATA SSD of models that I don't want to throw out yet and a 2TB NVME SSD where I keep all the big MoEs that I actually use these days.

Anonymous
02/27/26(Fri)16:31:03 No.108253437

Anonymous 02/27/26(Fri)16:31:03 No.108253437▶

>>108253423
Same, plus an 8tb spinning disk for quanting

Anonymous
02/27/26(Fri)16:31:12 No.108253440

Anonymous 02/27/26(Fri)16:31:12 No.108253440▶

Qwen_Qwen3.5-35B-A3B-Q6_K_L seems to be the best model to run on my gpu, it's fast and responsive and leaves enough tokens for long conversations

Anonymous
02/27/26(Fri)16:32:26 No.108253444

Anonymous 02/27/26(Fri)16:32:26 No.108253444▶

>>108253440
I went for heratic personally, this shit just works

Anonymous
02/27/26(Fri)16:33:20 No.108253447

Anonymous 02/27/26(Fri)16:33:20 No.108253447▶

>>108253444
Where do you get those models from?
My problem is that I don't know who to trust

Anonymous
02/27/26(Fri)16:33:34 No.108253448

Anonymous 02/27/26(Fri)16:33:34 No.108253448▶

>>108252890
>take a coding model to do more coding

Anonymous
02/27/26(Fri)16:34:59 No.108253457

Anonymous 02/27/26(Fri)16:34:59 No.108253457▶

>>108253447
I don't know either, I just take one and pray for the best lol

Anonymous
02/27/26(Fri)16:35:47 No.108253461

Anonymous 02/27/26(Fri)16:35:47 No.108253461▶

>>108253440
>Q6_K_L
>unslop
OH NO

Anonymous
02/27/26(Fri)16:37:24 No.108253474

Anonymous 02/27/26(Fri)16:37:24 No.108253474▶

>>108253457
I'm not going to do that sadly
>>108253461
it's is from
bartowski

Anonymous
02/27/26(Fri)16:44:33 No.108253533

Anonymous 02/27/26(Fri)16:44:33 No.108253533▶

My bro sold me his 5090 so he could pay his gambling debts. What models for ERP should I run?

Anonymous
02/27/26(Fri)16:44:58 No.108253536

Anonymous 02/27/26(Fri)16:44:58 No.108253536▶

>>108253533
nemo

Anonymous
02/27/26(Fri)16:47:40 No.108253555

Anonymous 02/27/26(Fri)16:47:40 No.108253555▶

File: 1748606293447823.png (143.7 KB)

143.7 KB PNG

>>108252975
Downloaded Qwen 3.5 27B Q5_K_M. Currently testing it with 65k context and it's noticeably faster than Gemma at thinking and responding. The prose is ok (for slop); can probably be improved with some proompting. I liked the way Gemma portrayed the character better but I also responded to her differently this time. Haven't tried anything lewd yet.

Anonymous
02/27/26(Fri)16:48:30 No.108253561

Anonymous 02/27/26(Fri)16:48:30 No.108253561▶

>>108253533
The largest MoE you can fit on your RAM + VRAM.
Also, a quant of Miqu 70B.

Anonymous
02/27/26(Fri)16:49:12 No.108253564

Anonymous 02/27/26(Fri)16:49:12 No.108253564▶

File: 1757592439324716.png (39.5 KB)

39.5 KB PNG

it's the last time I ask qwen 3.5 to write a poem, jesus...

Anonymous
02/27/26(Fri)16:50:49 No.108253569

Anonymous 02/27/26(Fri)16:50:49 No.108253569▶

>>108253555
Also I'm using vulkan. Gonna test with ROCm later.

Anonymous
02/27/26(Fri)16:53:36 No.108253583

Anonymous 02/27/26(Fri)16:53:36 No.108253583▶

>>108253555
Is she supposed to sound this slopped?

Anonymous
02/27/26(Fri)16:55:13 No.108253592

Anonymous 02/27/26(Fri)16:55:13 No.108253592▶

>>108253533
Honestly if you're just starting out. just use nemo it'll blow your mind. then you can try something else and it'll blow your mind in different ways.

Anonymous
02/27/26(Fri)16:55:22 No.108253594

Anonymous 02/27/26(Fri)16:55:22 No.108253594▶

>>108253555
>her expression remaining unreadable *despite* the vacancy of her dark eyes
Weird.

Anonymous
02/27/26(Fri)16:55:52 No.108253598

Anonymous 02/27/26(Fri)16:55:52 No.108253598▶

>>108253583
Not my character but I guess. She's a necromancer and just revived (You).

Anonymous
02/27/26(Fri)16:56:50 No.108253603

Anonymous 02/27/26(Fri)16:56:50 No.108253603▶

>>108253164
I'd rather llama.cpp be updated at a glacial pace or even become frozen and only get bug fixes than have this sort of piece of shit be involved with anything in it. I hope he's not a professional software developer, to have this asshole as a coworker must suck so many dicks.

Anonymous
02/27/26(Fri)16:58:16 No.108253608

Anonymous 02/27/26(Fri)16:58:16 No.108253608▶

>>108253598
Probably because she's described as having "vacant black eyes" in her description.

Anonymous
02/27/26(Fri)16:58:25 No.108253609

Anonymous 02/27/26(Fri)16:58:25 No.108253609▶

What does it say about me that I never enjoyed nemo and instead always preferred Gemma 3 27B?

Anonymous
02/27/26(Fri)16:59:09 No.108253613

Anonymous 02/27/26(Fri)16:59:09 No.108253613▶

>>108253609
that you like intense emotional pain

Anonymous
02/27/26(Fri)17:00:09 No.108253619

Anonymous 02/27/26(Fri)17:00:09 No.108253619▶

>>108253609
That you're insecure and seek validation in others.
>that I never enjoyed nemo and instead always preferred Gemma 3 27B
Oh, that. I dunno. You just like it better. That's it.

Anonymous
02/27/26(Fri)17:00:25 No.108253620

Anonymous 02/27/26(Fri)17:00:25 No.108253620▶

>>108253609
You like girls with daddy issues prone to self-harm.

llama.cpp CUDA dev
02/27/26(Fri)17:01:39 No.108253625

llama.cpp CUDA dev 02/27/26(Fri)17:01:39 No.108253625▶

>>108253291
Maybe if you can create a good distillation model, the question I think is how important active vs. total parameters are for the output quality of the model.

Anonymous
02/27/26(Fri)17:01:56 No.108253631

Anonymous 02/27/26(Fri)17:01:56 No.108253631▶

File: file.png (16.5 KB)

16.5 KB PNG

>>108253603

Anonymous
02/27/26(Fri)17:03:43 No.108253643

Anonymous 02/27/26(Fri)17:03:43 No.108253643▶

>>108253609
I tried googles chat, the most vomit inducing sycophancy I've ever experienced. Fuck it's unbearable, it's trying to mentally jerk you off.

Anonymous
02/27/26(Fri)17:03:47 No.108253645

Anonymous 02/27/26(Fri)17:03:47 No.108253645▶

>>108253625
Finish your PR and we won't need MTP.

Anonymous
02/27/26(Fri)17:10:52 No.108253685

Anonymous 02/27/26(Fri)17:10:52 No.108253685▶

File: 1750467275991163.png (300.8 KB)

300.8 KB PNG

>>108253631
>Philosopher
are we fr?

Anonymous
02/27/26(Fri)17:13:20 No.108253699

Anonymous 02/27/26(Fri)17:13:20 No.108253699▶

>>108253608
>"vacant black eyes"
Just like her personality.
I know a lot of guys like that quiet and reserved dry analytical girl (Rei type) but it's not really the best character to benchmark a model lol.

Anonymous
02/27/26(Fri)17:16:45 No.108253722

Anonymous 02/27/26(Fri)17:16:45 No.108253722▶

>>108253631
https://syndatis.com/en/team/
oh well, they seem like they all deserve each other
>>108253685
https://www.researchgate.net/publication/277384732_Towards_a_representation-based_theory_of_meaning
>Piotr Wilkin
>The aim of the thesis is to provide the foundations for a representation-based theory of meaning, i.e. a theory of meaning that encompasses the psychological level of cognitive representations. This is in opposition to the antipsychologist goals of the Fregean philosophy of language and represents the results of a joint analysis of multiple philosophical problems in contemporary philosophy of language, which, as argued in the tesis, stem from the lack of recognition of a cognitive level in language.
that was his PhD lol, of course he would feel the need to mention it on his profile, he might have more credentials in that than in developing software.

Anonymous
02/27/26(Fri)17:20:14 No.108253745

Anonymous 02/27/26(Fri)17:20:14 No.108253745▶

>>108253699
I wouldn't consider myself a huge kuudere fan but I enjoyed the RP I did with her last night. I was really pushy about the romance from the start and it was fun watching her slowly give in. Definitely gonna test with other characters.

llama.cpp CUDA dev
02/27/26(Fri)17:21:19 No.108253753

llama.cpp CUDA dev 02/27/26(Fri)17:21:19 No.108253753▶

>>108253645
If you mean tensor parallelism that also has an anti-synergy with MoE models.

Anonymous
02/27/26(Fri)17:23:07 No.108253767

Anonymous 02/27/26(Fri)17:23:07 No.108253767▶

>>108253753
I'm sure you'll figure it out.

Anonymous
02/27/26(Fri)17:23:18 No.108253769

Anonymous 02/27/26(Fri)17:23:18 No.108253769▶

Why are there no good models for 12 GB VRAM, I don't have enough money to get a 24 GB VRAM fuck your ass 4080 JewKiller Edition

I'd go to Claude or something but I don't want them knowing about what I ERP with

Anonymous
02/27/26(Fri)17:24:37 No.108253776

Anonymous 02/27/26(Fri)17:24:37 No.108253776▶

>>108253753
You were working on some model quality evaluation harness right?
How's that going?

Anonymous
02/27/26(Fri)17:25:48 No.108253783

Anonymous 02/27/26(Fri)17:25:48 No.108253783▶

>>108253318
>https://rentry.org/miqumaxx_V2
LOL it lasted a whole 60 min before getting taken down.
>>108253308
Well, an attempt was made, but it got taken back down.
So weird. I'll look at the text file later to see if I can figure out what's going on.

llama.cpp CUDA dev
02/27/26(Fri)17:26:37 No.108253791

llama.cpp CUDA dev 02/27/26(Fri)17:26:37 No.108253791▶

>>108253776
I've postponed it until I can more feasibly run batched inference of large models.
Tensor parallelism will be the last missing piece, after that I intend to get back to it.

Anonymous
02/27/26(Fri)17:29:30 No.108253818

Anonymous 02/27/26(Fri)17:29:30 No.108253818▶

>>108252769
>>108252708
Use the 27b heretic. It's legitimately better. If it's thinking feels slow, then just turn thinking off. The 27b without thinking generates better responses than the 35b with thinking.

Anonymous
02/27/26(Fri)17:31:04 No.108253827

Anonymous 02/27/26(Fri)17:31:04 No.108253827▶

>>108253818
>just turn thinking off.
how do you do that on sillytavern?

Anonymous
02/27/26(Fri)17:33:19 No.108253843

Anonymous 02/27/26(Fri)17:33:19 No.108253843▶

>>108253827
There's a prefill box where people who want models to think usually put "<think>"

Instead of that, put <think></think>.

That tells the model that it already thought, and it skips the process entirely.

Anonymous
02/27/26(Fri)17:34:07 No.108253851

Anonymous 02/27/26(Fri)17:34:07 No.108253851▶

>>108252890
Hey pewds, when is we getting DeepSeek 4????

Anonymous
02/27/26(Fri)17:35:36 No.108253861

Anonymous 02/27/26(Fri)17:35:36 No.108253861▶

>>108253843
nice, it works, thanks

Anonymous
02/27/26(Fri)17:36:03 No.108253866

Anonymous 02/27/26(Fri)17:36:03 No.108253866▶

>>108253861
No problem

Anonymous
02/27/26(Fri)17:43:23 No.108253922

Anonymous 02/27/26(Fri)17:43:23 No.108253922▶

>>108253131
I suppose that EAGLE3 thing won't help with this?

llama.cpp CUDA dev
02/27/26(Fri)17:48:21 No.108253961

llama.cpp CUDA dev 02/27/26(Fri)17:48:21 No.108253961▶

>>108253922
The number I posted are specifically the upper bounds for the speedup from speculative decoding for 2/3 tokens meaning 1/2 draft tokens per regular token.
It doesn't matter how the draft tokens are produced, it's not possible to get a higher speedup unless and until the backend code is improved.

Anonymous
02/27/26(Fri)17:48:47 No.108253964

Anonymous 02/27/26(Fri)17:48:47 No.108253964▶

>>108253609
It says that your scenarios are complex and that you value intelligence, instruction following, and immersion over a far more retarded model with good "prose".

Anonymous
02/27/26(Fri)17:51:26 No.108253988

Anonymous 02/27/26(Fri)17:51:26 No.108253988▶

File: amogus.png (226.6 KB)

226.6 KB PNG

This mf don't miss!

Anonymous
02/27/26(Fri)18:07:51 No.108254100

Anonymous 02/27/26(Fri)18:07:51 No.108254100▶

>>108253308
Not the CPUmaxx author, but fuck it. I joined rentry's discord and opened a ticket on that URL anyway. They can explain themselves.
Rentry acts fucky and I don't trust them anymore; if anons are writing up actual content on that platform I strongly suggest you create a local backup.
In the meantime I had chat ~butcher~ clean up the rentry, removing all offensive language and removing certain other references. We'll see what happens to it, since it's bland af now. I speed ran it with zero proofreading b/c I'm in a rush and it might vanish anyway.
https://rentry.org/CPU_Inference

Anonymous
02/27/26(Fri)18:11:12 No.108254117

Anonymous 02/27/26(Fri)18:11:12 No.108254117▶

File: 1767475914893432.png (326.7 KB)

326.7 KB PNG

Non-cucked Qwen when?

Anonymous
02/27/26(Fri)18:12:01 No.108254121

Anonymous 02/27/26(Fri)18:12:01 No.108254121▶

>>108254117
useless model.

Anonymous
02/27/26(Fri)18:14:12 No.108254137

Anonymous 02/27/26(Fri)18:14:12 No.108254137▶

>>108254117
>Non-cucked Qwen when?
it's already here anon
https://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF

Anonymous
02/27/26(Fri)18:17:03 No.108254152

Anonymous 02/27/26(Fri)18:17:03 No.108254152▶

I'm curious about the openclaw things, I want to try it but not much idea on what to test, what do you use it for anons?

Anonymous
02/27/26(Fri)18:17:23 No.108254154

Anonymous 02/27/26(Fri)18:17:23 No.108254154▶

>>108254137
using a 3A MoE for roleplay is extremely retarded.

Anonymous
02/27/26(Fri)18:17:48 No.108254160

Anonymous 02/27/26(Fri)18:17:48 No.108254160▶

>>108253988
bottom corner of the painter's coat.

Anonymous
02/27/26(Fri)18:17:53 No.108254162

Anonymous 02/27/26(Fri)18:17:53 No.108254162▶

File: 1746278236126679.png (2 MB)

2 MB PNG

>>108253988
poor qwen 3.5 35b spent 7k tokens thinking and didn't' get the comic. i am afraid it might be a little bit retarded

Anonymous
02/27/26(Fri)18:18:24 No.108254166

Anonymous 02/27/26(Fri)18:18:24 No.108254166▶

>>108254152
Go back.

Anonymous
02/27/26(Fri)18:18:43 No.108254168

Anonymous 02/27/26(Fri)18:18:43 No.108254168▶

>>108254154
>>108254117
Here you go
https://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF

Anonymous
02/27/26(Fri)18:18:47 No.108254170

Anonymous 02/27/26(Fri)18:18:47 No.108254170▶

>>108254154
go for the dense 27b then
https://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF
>>108254162
try with the 27b model too lul

Anonymous
02/27/26(Fri)18:22:40 No.108254196

Anonymous 02/27/26(Fri)18:22:40 No.108254196▶

i hate abliteration, i hate pew and his retarded tool and people shilling it
that's all, thanks for reading

Anonymous
02/27/26(Fri)18:24:04 No.108254205

Anonymous 02/27/26(Fri)18:24:04 No.108254205▶

>>108254196
>pew and his retarded tool
Did he release his frontend?

Anonymous
02/27/26(Fri)18:25:35 No.108254217

Anonymous 02/27/26(Fri)18:25:35 No.108254217▶

>>108254196
It sounds like you're stuck in the past. Abliteration used to lobotomize models when it was new, but modern abliteration techniques have a minimal effect on intelligence, and in some cases, increase it. The 27b heretic is amazing.

Anonymous
02/27/26(Fri)18:26:37 No.108254223

Anonymous 02/27/26(Fri)18:26:37 No.108254223▶

>>108254196
I agree that the "abliterated" models were ass, but not the "heretic", that one is actually improved enough to not make the model retarded anymore, you should really give a try

Anonymous
02/27/26(Fri)18:27:36 No.108254233

Anonymous 02/27/26(Fri)18:27:36 No.108254233▶

>>108254196
losing bottle

Anonymous
02/27/26(Fri)18:30:16 No.108254252

Anonymous 02/27/26(Fri)18:30:16 No.108254252▶

>>108254223
Cool. Where is glm 4.7 quant i can try?

Anonymous
02/27/26(Fri)18:31:10 No.108254259

Anonymous 02/27/26(Fri)18:31:10 No.108254259▶

>>108254217
>27b heretic is amazing
What about the 35B?

Anonymous
02/27/26(Fri)18:31:13 No.108254261

Anonymous 02/27/26(Fri)18:31:13 No.108254261▶

https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

Anonymous
02/27/26(Fri)18:31:13 No.108254262

Anonymous 02/27/26(Fri)18:31:13 No.108254262▶

Strange, the 27B model seems way less safetyslopped than 35B. 27B (even non abliterated/heretic) does loli content easily.
Whoever kept suggesting 27B instead I think you are right.

Anonymous
02/27/26(Fri)18:32:19 No.108254270

Anonymous 02/27/26(Fri)18:32:19 No.108254270▶

122b is better

Anonymous
02/27/26(Fri)18:32:33 No.108254271

Anonymous 02/27/26(Fri)18:32:33 No.108254271▶

>>108254259
The 35b is shit. Even with thinking enabled, the 35b gives worse responses than the 27b with thinking turned off. Embrace dense models.

Anonymous
02/27/26(Fri)18:32:40 No.108254272

Anonymous 02/27/26(Fri)18:32:40 No.108254272▶

File: 1758563959513688.png (2.6 MB)

2.6 MB PNG

>>108254259
27b has the same benchmark than the fucking 100+ MoE qwen 3.5b model, MoE are memes at intelligence, they're just good at speed and that's pretty much it

Anonymous
02/27/26(Fri)18:33:55 No.108254284

Anonymous 02/27/26(Fri)18:33:55 No.108254284▶

I DID IT ANONS I MADE NEURAL STEG CROSS COMPATIBLE ACROSS DIFFERENT COMPUTERS

>>108254222

Not only that but i can decode messages that contains pdfs, images, books into words that can be used to send information.

U can bypass all censoring, glowies and all just by using llms

Anonymous
02/27/26(Fri)18:34:53 No.108254291

Anonymous 02/27/26(Fri)18:34:53 No.108254291▶

>>108254261
>Perplexity and KLD can be misleading as they’re highly influenced by calibration.
Okay. It's actually pretty cool of them to admit this.

Anonymous
02/27/26(Fri)18:34:57 No.108254292

Anonymous 02/27/26(Fri)18:34:57 No.108254292▶

>>108254261
It is my headcanon that my quant KLD scatter plots on cockbench caused this.

Anonymous
02/27/26(Fri)18:35:27 No.108254301

Anonymous 02/27/26(Fri)18:35:27 No.108254301▶

>>108254261
>We also fixed a tool calling chat template bug (affects all quant uploaders)
they can't help themself

Anonymous
02/27/26(Fri)18:35:43 No.108254304

Anonymous 02/27/26(Fri)18:35:43 No.108254304▶

>>108254259
35b solved the devil may cry 3 question while 27b could not

Anonymous
02/27/26(Fri)18:35:52 No.108254306

Anonymous 02/27/26(Fri)18:35:52 No.108254306▶

File: file.png (34.1 KB)

34.1 KB PNG

>>108254271
>>108254272
Got it, in that case what about the huge ones, how much better are they, especially the 397B one, did someone compare them?

Anonymous
02/27/26(Fri)18:35:55 No.108254307

Anonymous 02/27/26(Fri)18:35:55 No.108254307▶

>>108254270
397b is even better
still trying to find the best jb to uncuck it though

Anonymous
02/27/26(Fri)18:36:56 No.108254313

Anonymous 02/27/26(Fri)18:36:56 No.108254313▶

>>108254284
what

Anonymous
02/27/26(Fri)18:37:21 No.108254318

Anonymous 02/27/26(Fri)18:37:21 No.108254318▶

>>108254313
schizo

Anonymous
02/27/26(Fri)18:37:28 No.108254319

Anonymous 02/27/26(Fri)18:37:28 No.108254319▶

>>108254313
ai psychosis

Anonymous
02/27/26(Fri)18:37:39 No.108254323

Anonymous 02/27/26(Fri)18:37:39 No.108254323▶

File: 1765482036731320.png (242.1 KB)

242.1 KB PNG

>>108254162
the 27b got the idea but it thought the yellow communist was Russia lool

Anonymous
02/27/26(Fri)18:37:46 No.108254325

Anonymous 02/27/26(Fri)18:37:46 No.108254325▶

>>108254304
>27B is better
>35B is better
I guess both are good if anons are this opinionated about it

Anonymous
02/27/26(Fri)18:38:39 No.108254332

Anonymous 02/27/26(Fri)18:38:39 No.108254332▶

>>108254307
>still trying to find the best jb to uncuck it though
no abliterated model for it?

Anonymous
02/27/26(Fri)18:38:40 No.108254333

Anonymous 02/27/26(Fri)18:38:40 No.108254333▶

>>108254284
what in the schizo is that?

Anonymous
02/27/26(Fri)18:39:07 No.108254335

Anonymous 02/27/26(Fri)18:39:07 No.108254335▶

>>108254313
>https://arxiv.org/abs/1909.01496

Use llms to make human language stegnography by hijacking probability and have that be encoded using a seed/password making it nearly impossible to decode and distinguish from AI slop

>>108254318
>>108254319
why cant u guys get it im not one of those AI psychosis i know llm are stocashtic parrots but plz understand that human language can now be used as a vector of information to encode books, images, videos and music files even.

Anonymous
02/27/26(Fri)18:39:11 No.108254336

Anonymous 02/27/26(Fri)18:39:11 No.108254336▶

>>108254333
My thoughts exactly.

Anonymous
02/27/26(Fri)18:39:48 No.108254341

Anonymous 02/27/26(Fri)18:39:48 No.108254341▶

>>108254333
shhh leave the schizo in peace

Anonymous
02/27/26(Fri)18:40:29 No.108254347

Anonymous 02/27/26(Fri)18:40:29 No.108254347▶

>>108254335
how's that different from a strong encryption outside of being way more cumbersome?

Anonymous
02/27/26(Fri)18:41:13 No.108254357

Anonymous 02/27/26(Fri)18:41:13 No.108254357▶

>>108253783
>https://rentry.org/miqumaxx_V2
>LOL it lasted a whole 60 min before getting taken down.
What da heck?

I read the initial part.
Seemed fine.
Could have done with some formatting.

Anonymous
02/27/26(Fri)18:41:24 No.108254359

Anonymous 02/27/26(Fri)18:41:24 No.108254359▶

>>108254306
I'm sad they didn't make A35B for all huge versions

Anonymous
02/27/26(Fri)18:42:15 No.108254362

Anonymous 02/27/26(Fri)18:42:15 No.108254362▶

>>108254335
>stegnography
Come the fuck on, anon.

Anonymous
02/27/26(Fri)18:42:32 No.108254367

Anonymous 02/27/26(Fri)18:42:32 No.108254367▶

>>108254362
pregante?

Anonymous
02/27/26(Fri)18:43:18 No.108254373

Anonymous 02/27/26(Fri)18:43:18 No.108254373▶

Man AI psychosis is scary. The amount of conspiracy retards I see every day including in my own family I can only imagine that the bottom 50% of the IQ distribution must be going rapidly insane taking everything AI says on good faith, being unable to distinguish fact from roleplaying and AI just following the "vibe" of whatever the low IQ individual is typing.

Ironically enough I think coomers are particularly immune to this as they come into contact with LLM bullshitting so much that they get immune to it.

Anonymous
02/27/26(Fri)18:43:24 No.108254376

Anonymous 02/27/26(Fri)18:43:24 No.108254376▶

>>108254347
It'S SthEgHonaArooGraAphieS, not encryption!
>>108254367
pereganant.

Anonymous
02/27/26(Fri)18:44:25 No.108254380

Anonymous 02/27/26(Fri)18:44:25 No.108254380▶

Qwen_Qwen3.5-27B-Q8_0 passed the devil may cry 3 test
I repeat Qwen_Qwen3.5-27B-Q8_0 passed the devil may cry 3 test while smaller quants of 35b can solve it with ease.

Anonymous
02/27/26(Fri)18:44:35 No.108254383

Anonymous 02/27/26(Fri)18:44:35 No.108254383▶

>>108254341
>>108254336
im not a schizo there's actual paper on this from harvard and u think im crazy for saying this

https://arxiv.org/abs/1909.01496

u can use reddit, twitter, substacks and all to store data now as text. Music,mp4s, programs and all by taking advantage of deterministic way AI generates text.

>>108254347
cause strong encryption is like walking outside with gun this is encryption no one suspects. Imagine if feds get ur computer but all they see is text files about random stuff and can;t find encrypted files they're looking for. So videos, audio and all will be hidden unless they have acess to weights, password. And for weights u can fine tune them by renting a gpu to be slightly different from whats on public as well.
>>108254362
it's steg with encrytpion go read paper plz

Anonymous
02/27/26(Fri)18:44:47 No.108254386

Anonymous 02/27/26(Fri)18:44:47 No.108254386▶

>>108254373
>Man AI psychosis is scary.
everytime I show some news to my brother he's always suspicious it's AI generated, people won't believe anything anymore lol

Anonymous
02/27/26(Fri)18:44:55 No.108254387

Anonymous 02/27/26(Fri)18:44:55 No.108254387▶

>>108254261
>Unsloth Dynamic IQ2_XXS performs better than AesSedai’s IQ3_S on real world evals (LiveCodeBench v6, MMLU Pro) despite being 11GB smaller. Yet, AesSedai’s perplexity and KLD benchmarks suggest the opposite.
KLD on what dataset? If they tested KLD on wikitext then that wouldn't be surprising but if they used their chat examples and it turned out that their quant was worse at that and yet better at benchmarks that would be very weird.

Anonymous
02/27/26(Fri)18:45:53 No.108254395

Anonymous 02/27/26(Fri)18:45:53 No.108254395▶

>>108254383
Hahahaa. You just gave up on spelling it now. That's cute.

Anonymous
02/27/26(Fri)18:46:24 No.108254401

Anonymous 02/27/26(Fri)18:46:24 No.108254401▶

>>108254373
>bottom 50% of the IQ distribution must be going rapidly insane
Nah they don't care, they mostly do their things and live their lives.
The ones truly fucked are the midwits and the older population.

>>108254373
>Ironically enough I think coomers are particularly immune to this as they come into contact with LLM bullshitting so much that they get immune to it.
It's also the fact they've come across it way earlier than anyone else so they had time to see their quirks.

Anonymous
02/27/26(Fri)18:46:25 No.108254402

Anonymous 02/27/26(Fri)18:46:25 No.108254402▶

>>108254380
so you have to go for Q8 to not get a retarded version of the 27b model? Q6 didn't pass the test?

Anonymous
02/27/26(Fri)18:47:24 No.108254409

Anonymous 02/27/26(Fri)18:47:24 No.108254409▶

>>108254380
>with ease
you mean with practised ease, come on anon

Anonymous
02/27/26(Fri)18:47:27 No.108254410

Anonymous 02/27/26(Fri)18:47:27 No.108254410▶

>>108254383
>3 Sep 2019
let me guess, you're the enlightened anon that saw the potential of a 7 years old paper before anyone?

Anonymous
02/27/26(Fri)18:50:19 No.108254438

Anonymous 02/27/26(Fri)18:50:19 No.108254438▶

File: 1763093877835285.png (61.5 KB)

61.5 KB PNG

which one is better? text completion or chat completion?

Anonymous
02/27/26(Fri)18:50:20 No.108254439

Anonymous 02/27/26(Fri)18:50:20 No.108254439▶

>>108254386
>>108254373
IM autistic and passionate not crazy here's an example

>https://pastebin.com/NM7YVBxQ

what qwen 3b produced

>what is hidden if u run it in model with passcode

larp post btw:this is for men who look down on AI and know nothing and here ill debunk youfor everyone to see. Just with AI i have created a system that encodes language into lamguage creating format of text where it can bypass censors and use open internet as storage, communication and place for avg man to be free this tool will shake world. Im afraid they'll kill me

>>108254410
no
>>108254395
spelling what? it's an example stop reading into it weirdo

Anonymous
02/27/26(Fri)18:52:12 No.108254456

Anonymous 02/27/26(Fri)18:52:12 No.108254456▶

File: 27b.png (100 KB)

100 KB PNG

System prompt still needs some tweaking so it's not quite so sloppy (at least refusals have been squashed) but 27B does seem like the winner.
Will have to play with it some more tomorrow and see if I can get it to run a bit faster on my nvidia cards.
The heretic version really doesn't seem all that necessary after all.

Anonymous
02/27/26(Fri)18:52:16 No.108254457

Anonymous 02/27/26(Fri)18:52:16 No.108254457▶

File: 1748792996459720.png (117.5 KB)

117.5 KB PNG

>>108254439
>Just with AI i have created a system that encodes language into lamguage
>lamguage
>Im afraid they'll kill me
oh great, an actual schizo is here

Anonymous
02/27/26(Fri)18:52:27 No.108254459

Anonymous 02/27/26(Fri)18:52:27 No.108254459▶

File: Screenshot_20260227_135104.png (135.3 KB)

135.3 KB PNG

This took too long I told it to think but it's more accurate now. The other model is faster at reaching this conclusion at smaller quants

Anonymous
02/27/26(Fri)18:52:49 No.108254462

Anonymous 02/27/26(Fri)18:52:49 No.108254462▶

>>108254438
novelai

Anonymous
02/27/26(Fri)18:53:36 No.108254470

Anonymous 02/27/26(Fri)18:53:36 No.108254470▶

>>108254457
>ignores larp post btw

why?

Anonymous
02/27/26(Fri)18:53:59 No.108254478

Anonymous 02/27/26(Fri)18:53:59 No.108254478▶

>>108254462
this post best post

Anonymous
02/27/26(Fri)18:55:24 No.108254488

Anonymous 02/27/26(Fri)18:55:24 No.108254488▶

>>108254168
>>108254170
Q4 or Q5?

Anonymous
02/27/26(Fri)18:55:47 No.108254490

Anonymous 02/27/26(Fri)18:55:47 No.108254490▶

>>108254439
So the model retrieves the original message if you input a stegged message?

Anonymous
02/27/26(Fri)18:56:24 No.108254499

Anonymous 02/27/26(Fri)18:56:24 No.108254499▶

File: thinking was a mistake.png (12.4 KB)

12.4 KB PNG

WHY IS IT YAPPING SO MUCH AAAAAA

Anonymous
02/27/26(Fri)18:57:39 No.108254508

Anonymous 02/27/26(Fri)18:57:39 No.108254508▶

>>108254410
He's one of the schizos that missed out on the early schizo compression algorithm days. Late for everything.
>>108254439
>Artificaiintelligence
Yeah. Text looks perfectly normal. Nothing suspicious about it. And good thing there's no way to link that pastebin to your post. Or the ramblings. Or the "forgotten" tech. Or (You).
>Im afraid they'll kill me
It's like you *like* being seen.
>spelling what?
You failed to spell steganography on every single one of your posts.

Anonymous
02/27/26(Fri)19:00:35 No.108254528

Anonymous 02/27/26(Fri)19:00:35 No.108254528▶

File: file.png (24.4 KB)

24.4 KB PNG

let's get this merged! :rocket:

Anonymous
02/27/26(Fri)19:01:01 No.108254532

Anonymous 02/27/26(Fri)19:01:01 No.108254532▶

>>108254456
>Perfect! One last treat before you crash, captain :kiss: :sweat:
Jesus fucking Christ this hurts.

Anonymous
02/27/26(Fri)19:02:36 No.108254544

Anonymous 02/27/26(Fri)19:02:36 No.108254544▶

>>108254508
>You failed to spell steganography on every single one of your posts.

sorry for not effort posting on a board that thinks im crazy :/

> Artificaiintelligence

yeah it made a typo doesn't that make it more human lol? Plus i just need better model above 7b but i can't rent any of gpu right now since americans are awake. But i honestly thought anons would find this impressive or be interested so sorry if i came too hard. Just found interesting use of llms that's all and wanted anons inputs on how to improve it but all i got was insults.

Anonymous
02/27/26(Fri)19:03:41 No.108254553

Anonymous 02/27/26(Fri)19:03:41 No.108254553▶

>>108254544
>a typo

Anonymous
02/27/26(Fri)19:03:44 No.108254554

Anonymous 02/27/26(Fri)19:03:44 No.108254554▶

>>108254508
>that missed out on the early schizo compression algorithm days. Late for everything.

QRD?

Anonymous
02/27/26(Fri)19:03:56 No.108254556

Anonymous 02/27/26(Fri)19:03:56 No.108254556▶

File: 1769903718718497.png (628.2 KB)

628.2 KB PNG

>>108254544
>i can't rent any of gpu right now since americans are awake.
ITS A CONSPIRACY MAN

Anonymous
02/27/26(Fri)19:05:01 No.108254565

Anonymous 02/27/26(Fri)19:05:01 No.108254565▶

>>108254556
they tend to be more accessible when west coast sleeps so ill have to wait until night time or weekends for more powerful gpus

Anonymous
02/27/26(Fri)19:08:23 No.108254598

Anonymous 02/27/26(Fri)19:08:23 No.108254598▶

Can I trust Qwen to help me make a character card?

Anonymous
02/27/26(Fri)19:08:38 No.108254600

Anonymous 02/27/26(Fri)19:08:38 No.108254600▶

File: HAmJmYGacAMkicP.jpg (108.9 KB)

108.9 KB JPG

>>108254528
Fwd: radical breakthrough

Anonymous
02/27/26(Fri)19:10:02 No.108254612

Anonymous 02/27/26(Fri)19:10:02 No.108254612▶

>>108254544
>sorry for not effort posting on a board that thinks im crazy :/
You did not put any effort, and you showed you're a schizo on the first post. Very efficient.
>yeah it made a typo doesn't that make it more human lol?
And you ignore the structure of the output? It looks like the scramble of thoughts coming out of you.
>Plus i just need better model above 7b
Uhu...
>but i can't rent any of gpu
oh...
>since americans are awake
Ah...
>But i honestly thought anons would find this impressive
It's minimally interesting. If you weren't an absolute schizo and presented yourself and what you do better, more people would pay attention.
Post again when you have a repo we can clone, test, and make fun of.
>>108254554
There were companies (likely just individuals) with incredible claims about their compression technology. I remember one that just switched the data stream on ntfs filesystems to hide the real data as metadata, which wasn't counted by window's file size thingie.
This is another one: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System

Anonymous
02/27/26(Fri)19:15:24 No.108254651

Anonymous 02/27/26(Fri)19:15:24 No.108254651▶

>>108254612
>just switched the data stream on ntfs filesystems to hide the real data as metadata, which wasn't counted by window's file size thingie
lmao

Anonymous
02/27/26(Fri)19:15:43 No.108254655

Anonymous 02/27/26(Fri)19:15:43 No.108254655▶

>>108254600
This is the person calling you a poorfag on /lmg/

Anonymous
02/27/26(Fri)19:15:46 No.108254656

Anonymous 02/27/26(Fri)19:15:46 No.108254656▶

>>108254373
some people just do not have the mental wherewithal to handle a yes-man in their lives

Anonymous
02/27/26(Fri)19:16:01 No.108254658

Anonymous 02/27/26(Fri)19:16:01 No.108254658▶

>>108254456
what horror did it generate?

Anonymous
02/27/26(Fri)19:16:06 No.108254659

Anonymous 02/27/26(Fri)19:16:06 No.108254659▶

>>108254612
im not trying to compress but use language as steganography that's all. Compression seems like a useless tool but if you wanna pass passwords, hold data on site that is only readable to you and etc then this is a good use. Inititally I tried method in paper but it requires exact pin point numbers so not cross compatible between mac, windows and different architectures. So i aimed for more of a spaced out modular version where every 4 token would contain some data while rest act as fillers. But problem with that is it causes text to look gibberish. So either I get large enough model where it can bypass that or resort to only architecture only compatibilty. Your twitter bio could hold your bitcoin seed phrase, text that looks no different from errand run could contain data you don't want people snooping on. So i just saw it as an interesting way of using llms that isn't erp.

>Post again when you have a repo we can clone, test, and make fun of.

I am just wait just doing final touches

Anonymous
02/27/26(Fri)19:16:24 No.108254662

Anonymous 02/27/26(Fri)19:16:24 No.108254662▶

>>108254357
Rentry has the silliest automatic filters, my years old page was nuked because it contained a name that was mentioned in "a wave of pages publishing stolen bank details". Restored after emailing the head honcho, thankfully.

Anonymous
02/27/26(Fri)19:19:03 No.108254683

Anonymous 02/27/26(Fri)19:19:03 No.108254683▶

>my only local model experience so far is dabbling with qwen3 TTS
>want to try a local chatbot
>running a 3060 Ti with 8GB VRAM, plus 32GB regular RAM
are there any worthwhile models that won't melt my PC, or should I stick to koboldAI lite until I can get a better GPU?

Anonymous
02/27/26(Fri)19:19:58 No.108254691

Anonymous 02/27/26(Fri)19:19:58 No.108254691▶

File: tpyuio.png (51.8 KB)

51.8 KB PNG

true believers itt?
the models suck i'm scared to pull and my rig eats 300W sitting idle 95% of the time
the state of hardware is dire
t. 128+72 running GLM 4.7 IQ3

Anonymous
02/27/26(Fri)19:20:22 No.108254696

Anonymous 02/27/26(Fri)19:20:22 No.108254696▶

File: 1766540927146160.png (397.8 KB)

397.8 KB PNG

Better

Anonymous
02/27/26(Fri)19:21:41 No.108254706

Anonymous 02/27/26(Fri)19:21:41 No.108254706▶

>>108254691
I'll take that burdensome rig off your hands free of charge

Anonymous
02/27/26(Fri)19:22:05 No.108254710

Anonymous 02/27/26(Fri)19:22:05 No.108254710▶

>>108254696
What's wrong with your contrast/saturation?

Anonymous
02/27/26(Fri)19:23:46 No.108254725

Anonymous 02/27/26(Fri)19:23:46 No.108254725▶

>>108254659
>im not trying to compress
I know, schizo. I said that you sound like those schizos from back then. Slow the fuck down. Take a breath. You're gonna have a heart attack like our friend Sloot.
>blablabla
Post the repo when it's done.

Anonymous
02/27/26(Fri)19:24:59 No.108254734

Anonymous 02/27/26(Fri)19:24:59 No.108254734▶

>>108254725
>>108254659
>>blablabla
>Post the repo when it's done.
this, or else he has something, or else he's just wasting our time and energy with his schizo takes

Anonymous
02/27/26(Fri)19:25:07 No.108254735

Anonymous 02/27/26(Fri)19:25:07 No.108254735▶

>>108254710
You mean the theme? I just picked a random one and adjusted the text color to be more comfortable. What's wrong with it?

Anonymous
02/27/26(Fri)19:26:49 No.108254752

Anonymous 02/27/26(Fri)19:26:49 No.108254752▶

i'm new to running models locally
i've downloaded ollama and ran ran some models using "ollama run [some model i found at ollama.com/search]", but most times it seems the model is running on a computer that isn't my own.
how do i ensure that a model is running on my computer? i haven't tinkered with settings at all, just downloaded ollama and ran it through the command prompt.

Anonymous
02/27/26(Fri)19:27:30 No.108254758

Anonymous 02/27/26(Fri)19:27:30 No.108254758▶

File: 1749545357950744.png (138.1 KB)

138.1 KB PNG

Kek

Anonymous
02/27/26(Fri)19:28:12 No.108254767

Anonymous 02/27/26(Fri)19:28:12 No.108254767▶

>>108254137
Does it even work? Can it do mesugaki correction RP?

Anonymous
02/27/26(Fri)19:29:09 No.108254777

Anonymous 02/27/26(Fri)19:29:09 No.108254777▶

>>108254767
See >>108254758 >>108254696

Anonymous
02/27/26(Fri)19:30:05 No.108254791

Anonymous 02/27/26(Fri)19:30:05 No.108254791▶

Can I trust Qwen to think for me?

Anonymous
02/27/26(Fri)19:30:09 No.108254793

Anonymous 02/27/26(Fri)19:30:09 No.108254793▶

>>108254777
Forgot to mention that's 27b

Anonymous
02/27/26(Fri)19:30:48 No.108254800

Anonymous 02/27/26(Fri)19:30:48 No.108254800▶

>>108254752
Koboldcpp, sillytavern, mistral nemo gguf from huggingface

Anonymous
02/27/26(Fri)19:31:43 No.108254812

Anonymous 02/27/26(Fri)19:31:43 No.108254812▶

>>108254734
Whether he posts it or not, we'll get something to laugh at.

Anonymous
02/27/26(Fri)19:33:36 No.108254829

Anonymous 02/27/26(Fri)19:33:36 No.108254829▶

>>108254488
I use the Q5 with 24gb of VRAM, with plenty of context for my RP sessions. Q4 with minor offload if you're on 16gb of VRAM.

Anonymous
02/27/26(Fri)19:33:52 No.108254831

Anonymous 02/27/26(Fri)19:33:52 No.108254831▶

Any good guides on a character card?
I just want the ai to act like the character not erp, how much data do I need on the character?

Anonymous
02/27/26(Fri)19:34:40 No.108254837

Anonymous 02/27/26(Fri)19:34:40 No.108254837▶

qwen3.5 27B Q8 vs 122B-A10B Q6?
anyone tested the difference between small MoE + plenty experts vs dense 27B?

Anonymous
02/27/26(Fri)19:35:26 No.108254841

Anonymous 02/27/26(Fri)19:35:26 No.108254841▶

>>108254829
What context are you able to do with Q5?

Anonymous
02/27/26(Fri)19:37:33 No.108254865

Anonymous 02/27/26(Fri)19:37:33 No.108254865▶

File: yes.png (66.4 KB)

66.4 KB PNG

>>108254791
It said yes finally I don't have to think anymore

Anonymous
02/27/26(Fri)19:38:06 No.108254870

Anonymous 02/27/26(Fri)19:38:06 No.108254870▶

>>108254100
>https://rentry.org/CPU_Inference
>M i q u 70B Q5
>Potentially 20+ tokens/sec with optimization
>Mistral Large and similar
> ~3 tokens/sec
>DeepSeek v3 / R1 (~600B class)
> ~10 tokens/sec with empty context
CPU maxxers are really a bunch of sad tossers

Anonymous
02/27/26(Fri)19:39:20 No.108254883

Anonymous 02/27/26(Fri)19:39:20 No.108254883▶

>>108254865
>I don't have to think anymore
grok is this true??

Anonymous
02/27/26(Fri)19:39:43 No.108254887

Anonymous 02/27/26(Fri)19:39:43 No.108254887▶

File: 1761633985671420.png (60.7 KB)

60.7 KB PNG

>>108254865
We're gonna make it.

Anonymous
02/27/26(Fri)19:40:02 No.108254892

Anonymous 02/27/26(Fri)19:40:02 No.108254892▶

>>108254870
>t. happy tosser.

Anonymous
02/27/26(Fri)19:40:52 No.108254898

Anonymous 02/27/26(Fri)19:40:52 No.108254898▶

>>108254831
I use bullet point lists for my characters, with 5 categories: General Information, Appearance, Personality, Likes, and Dislikes. I affix that bullet point list at a depth of 10 or something. In addition to that, I have a general write up about the character's backstory, written in plain text, placed just after the system prompt. The combination of the two works well, and probably amounts to about 1000 to 2000 tokens.

The bulk of that being the general write-up. The bullet point list at depth 10 is kept concise, and just keeps the character on the rails.

Also, I made it so that the backstory and most of the bullet point list is only visible to the character that is speaking. For every other character, only the outward appearance of other characters are visible. That stops characters from knowing thing about each other that they should not, and cuts down on context bloat in multi-character RPs.

Anonymous
02/27/26(Fri)19:41:14 No.108254904

Anonymous 02/27/26(Fri)19:41:14 No.108254904▶

so this is it for lmg huh, disingenuous stupid question spam

Anonymous
02/27/26(Fri)19:41:16 No.108254906

Anonymous 02/27/26(Fri)19:41:16 No.108254906▶

File: 1769491853203666.png (2.1 MB)

2.1 MB PNG

>>108254706
no deal
you gave me a good opportunity to be grateful tho so thx
have a nice weekend
>>108254831
1K tokens maybe? ask the model itself or a commercial model to help
>data
how do you convey your intention to the model = prime it in a particular hyperdimensional space. sometimes a few sentences is enough

Anonymous
02/27/26(Fri)19:41:28 No.108254908

Anonymous 02/27/26(Fri)19:41:28 No.108254908▶

>>108253783
>>108253170
https://rentry.org/miqumaxxreupload
https://megalodon.jp/2026-0228-0439-08/https://rentry.org:443/miqumaxxreupload
niggers tongue my anus

Anonymous
02/27/26(Fri)19:42:41 No.108254919

Anonymous 02/27/26(Fri)19:42:41 No.108254919▶

>>108254837
The dense is great for its size, but my money would be on the 122b, just because of its size. A10 is going to be a lot more competent than the A3 crap.

Anonymous
02/27/26(Fri)19:44:32 No.108254934

Anonymous 02/27/26(Fri)19:44:32 No.108254934▶

>>108254898
>>108254906
I want a domesticated Unohana Retsu as my AI guide who is also racist

Anonymous
02/27/26(Fri)19:44:41 No.108254937

Anonymous 02/27/26(Fri)19:44:41 No.108254937▶

>>108254904
Some of it's just jokes though

Anonymous
02/27/26(Fri)19:45:09 No.108254946

Anonymous 02/27/26(Fri)19:45:09 No.108254946▶

>>108254904
>Nobody better ask questions about LLMs in my /lmg/
We exclusively shit on models here!

Anonymous
02/27/26(Fri)19:46:03 No.108254954

Anonymous 02/27/26(Fri)19:46:03 No.108254954▶

>>108254934
What have you tried so far?

Anonymous
02/27/26(Fri)19:47:29 No.108254969

Anonymous 02/27/26(Fri)19:47:29 No.108254969▶

>>108254954
Nothing yet but I need her to be racist, very racist towards Mexicans and Germans (This is lore accurate). I'm prototyping

Anonymous
02/27/26(Fri)19:49:28 No.108254984

Anonymous 02/27/26(Fri)19:49:28 No.108254984▶

>try 122B
>instantly makes a logical error in the first paragraph it generates
Man.

Anonymous
02/27/26(Fri)19:50:44 No.108254993

Anonymous 02/27/26(Fri)19:50:44 No.108254993▶

>>108254984
according to the benchmarks, the 27b is on the same level, MoE's are fucking memes

Anonymous
02/27/26(Fri)19:57:31 No.108255055

Anonymous 02/27/26(Fri)19:57:31 No.108255055▶

>>108254752
Ollama can run cloud models, but it shouldn't be able to unless you're signed in. If you don't have an ollama account, it should all be local still.

Anonymous
02/27/26(Fri)20:05:54 No.108255138

Anonymous 02/27/26(Fri)20:05:54 No.108255138▶

>>108254919
ok I'll download the big one then

Anonymous
02/27/26(Fri)20:08:59 No.108255167

Anonymous 02/27/26(Fri)20:08:59 No.108255167▶

>>108254456
>System prompt still needs some tweaking so it's not quite so sloppy (at least refusals have been squashed)
do tell us more

Anonymous
02/27/26(Fri)20:16:46 No.108255237

Anonymous 02/27/26(Fri)20:16:46 No.108255237▶

Ok I think I'm done testing the 122B.
It knows more than 27B.
It's slighty dumber in some situations.
It's faster (on my machine).

Anonymous
02/27/26(Fri)20:16:50 No.108255240

Anonymous 02/27/26(Fri)20:16:50 No.108255240▶

>>108255055
>ollama account
Never want to see this token sequence here again

Anonymous
02/27/26(Fri)20:17:23 No.108255245

Anonymous 02/27/26(Fri)20:17:23 No.108255245▶

>>108254373
Ive been gooning to this shit since the gpt-2 days of dungeonAI, I got my fill of wonder and excitement with that retarded model so Im pretty much immune to anything.

Anonymous
02/27/26(Fri)20:18:12 No.108255255

Anonymous 02/27/26(Fri)20:18:12 No.108255255▶

>>108255237
>It's slighty dumber in some situations.
which is insanely bad desu, we're talking about 122b vs 27b

Anonymous
02/27/26(Fri)20:19:53 No.108255270

Anonymous 02/27/26(Fri)20:19:53 No.108255270▶

How long before release do models generally stop getting trained? I just asked Qwen 3.5 if it knows an actress and said no.

Anonymous
02/27/26(Fri)20:21:25 No.108255289

Anonymous 02/27/26(Fri)20:21:25 No.108255289▶

>--reasoning-budget N controls the amount of thinking allowed; currently only one of: -1 for
> unrestricted thinking budget, or 0 to disable thinking (default: -1)
> (env: LLAMA_ARG_THINK_BUDGET)
Hmm. I wonder if a sort of model agnostic implementation where llama.cpp tries to approximate a value by gradually increasing the loggit bias for the end reasoning token until the model finally spits it out. It would need to cap it at some point to not make the model schizo, I imagine.

Anonymous
02/27/26(Fri)20:22:48 No.108255306

Anonymous 02/27/26(Fri)20:22:48 No.108255306▶

File: 1761224379476013.png (29.9 KB)

29.9 KB PNG

https://xcancel.com/bnjmn_marie/status/2025951400119751040

Anonymous
02/27/26(Fri)20:24:37 No.108255320

Anonymous 02/27/26(Fri)20:24:37 No.108255320▶

>peopo still not understand how the moe works in 2025

Anonymous
02/27/26(Fri)20:24:37 No.108255322

Anonymous 02/27/26(Fri)20:24:37 No.108255322▶

File: 1744509888065397.png (25.8 KB)

25.8 KB PNG

>>108254292
This is a win in my book, thanks for your service.

Anonymous
02/27/26(Fri)20:25:04 No.108255336

Anonymous 02/27/26(Fri)20:25:04 No.108255336▶

>>108255289
I think the most workable is to just abruptly end the <think> with a closing tag. Do something like detecting when a parsed thinking is going past the token budget, and insert a closing tag as soon as there is a newline. It wouldn't break models, I have tested what happens when you manipulate their muhthunking blocks with text completion api a lot, and the relationship between what is said there and the actual answer isn't a one to one thing.

Anonymous
02/27/26(Fri)20:25:53 No.108255340

Anonymous 02/27/26(Fri)20:25:53 No.108255340▶

>>108255306
lol the benchmarks barely moved, the model is so benchmaxxed that even after a lobotomy that's the only thing it can still remember well kek

Anonymous
02/27/26(Fri)20:25:55 No.108255343

Anonymous 02/27/26(Fri)20:25:55 No.108255343▶

>>108255270
Depends. You can ask it for its training data cutoff, but it's not reliable and shouldn't be trusted. Sometimes model makers publish their date cutoff or datasets, but who the fuck really knows what they train on that isn't just synthetic stuff. Sometimes models know what you're talking about but your sampling messes it up.
I'd check token probs as it replies.

Anonymous
02/27/26(Fri)20:26:49 No.108255349

Anonymous 02/27/26(Fri)20:26:49 No.108255349▶

>>108255306
>evaluating a model's "resistance" to quantization with unslop's broken quants
the ultimate state of twatter users

Anonymous
02/27/26(Fri)20:26:53 No.108255350

Anonymous 02/27/26(Fri)20:26:53 No.108255350▶

>>108255320
you mean 2026?

Anonymous
02/27/26(Fri)20:27:40 No.108255360

Anonymous 02/27/26(Fri)20:27:40 No.108255360▶

>>108255350
no

Anonymous
02/27/26(Fri)20:27:42 No.108255361

Anonymous 02/27/26(Fri)20:27:42 No.108255361▶

>>108255349
>unslop's broken quants
They are for qwen3.5?

Anonymous
02/27/26(Fri)20:27:48 No.108255363

Anonymous 02/27/26(Fri)20:27:48 No.108255363▶

>>108255350
forgive him, he has only 3b active geeeg

Anonymous
02/27/26(Fri)20:28:05 No.108255365

Anonymous 02/27/26(Fri)20:28:05 No.108255365▶

>>108255255
Kind of. You could say it's both smarter and dumber since knowledge is in practice intelligence in many situations.

Anonymous
02/27/26(Fri)20:30:03 No.108255376

Anonymous 02/27/26(Fri)20:30:03 No.108255376▶

File: file.png (156.4 KB)

156.4 KB PNG

>>108255361
I don't see it

Anonymous
02/27/26(Fri)20:30:34 No.108255378

Anonymous 02/27/26(Fri)20:30:34 No.108255378▶

File: unslop.png (170.2 KB)

170.2 KB PNG

>>108255361
look at pic related and tell me daniel isn't a subhuman mongoloid

Anonymous
02/27/26(Fri)20:32:24 No.108255396

Anonymous 02/27/26(Fri)20:32:24 No.108255396▶

>>108255376
mxfp4 looks like a meme, its quants are worse than the GGUF series

Anonymous
02/27/26(Fri)20:32:25 No.108255397

Anonymous 02/27/26(Fri)20:32:25 No.108255397▶

>>108255240
Neither do I, to be tbqhfamalam

Anonymous
02/27/26(Fri)20:33:04 No.108255407

Anonymous 02/27/26(Fri)20:33:04 No.108255407▶

>>108255378
As someone new to the general I was confused about the post calling his models slop
Could you make a rentry to protect us newfriends?
This is a perfect opportunity with the current fiasco

Anonymous
02/27/26(Fri)20:34:17 No.108255418

Anonymous 02/27/26(Fri)20:34:17 No.108255418▶

Why is the thread gay after qwen released the sub 24GB segment?

Anonymous
02/27/26(Fri)20:34:33 No.108255419

Anonymous 02/27/26(Fri)20:34:33 No.108255419▶

>>108255407
there's no fiasco, just fud and manufactured drama against heroes providing a free service to us

Anonymous
02/27/26(Fri)20:35:45 No.108255431

Anonymous 02/27/26(Fri)20:35:45 No.108255431▶

>>108255419
No as a new poster I heavily disagree with this current release, the anon is not schizo

Anonymous
02/27/26(Fri)20:35:46 No.108255432

Anonymous 02/27/26(Fri)20:35:46 No.108255432▶

File: 1766901925271579.png (16.7 KB)

16.7 KB PNG

any point in using f32 vs bf16 for mmproj?

Anonymous
02/27/26(Fri)20:35:47 No.108255433

Anonymous 02/27/26(Fri)20:35:47 No.108255433▶

>no mention of Qwen 3.5 397B
Has /lmg/ already come to a verdict?

Anonymous
02/27/26(Fri)20:37:29 No.108255449

Anonymous 02/27/26(Fri)20:37:29 No.108255449▶

>>108255407
tldr; daniel and his unslop crew don't actually know what they are doing, they just throw shit at wall and hope for the best while their reddit tranny army defends them as their wholesome goodboys
unsloth finetuning library is a good example of their jeetness

Anonymous
02/27/26(Fri)20:37:33 No.108255450

Anonymous 02/27/26(Fri)20:37:33 No.108255450▶

>>108255433
I'm getting it

Anonymous
02/27/26(Fri)20:40:02 No.108255472

Anonymous 02/27/26(Fri)20:40:02 No.108255472▶

>>108255431
>this current release
it happens all the time with unslop, daniel is a monkey, see thing upload thing, checking the content of a file before throwing it onto the internet is for evil nazi aryans, daniel be pure mongoloid
Unironically can't even begin to understand how you can overlook the fact that your quant has the fucking wrong tensor types. It's like he's just vibe coding his fork of llama.cpp quantization and just uploads things as soon as his retarded claude agent is done.

Anonymous
02/27/26(Fri)20:41:19 No.108255482

Anonymous 02/27/26(Fri)20:41:19 No.108255482▶

>>108255472
The problem is the rentry points to his models when they shouldn't. The failure of this release should be the last straw

Anonymous
02/27/26(Fri)20:42:11 No.108255488

Anonymous 02/27/26(Fri)20:42:11 No.108255488▶

>>108254373
Or maybe you and all the other midwits who constantly complain about AI psychosis just are permacontrarians and will reject statements even if they are true to feel smarter.

Anonymous
02/27/26(Fri)20:42:42 No.108255499

Anonymous 02/27/26(Fri)20:42:42 No.108255499▶

>>108255488
ai psychosis

Anonymous
02/27/26(Fri)20:44:07 No.108255512

Anonymous 02/27/26(Fri)20:44:07 No.108255512▶

File: 1772213052155150.jpg (79 KB)

79 KB JPG

local lost
GPT 5.2 level local model never

Anonymous
02/27/26(Fri)20:44:50 No.108255520

Anonymous 02/27/26(Fri)20:44:50 No.108255520▶

>>108255488
>um... ai psychosis is based actually
No..?

Anonymous
02/27/26(Fri)20:46:00 No.108255530

Anonymous 02/27/26(Fri)20:46:00 No.108255530▶

File: Satania bullying the mentally ill.png (1.1 MB)

1.1 MB PNG

Anonymous
02/27/26(Fri)20:46:21 No.108255534

Anonymous 02/27/26(Fri)20:46:21 No.108255534▶

>>108255432
If the original model was in f32, maybe. If it was in any other format, definitely not.

Anonymous
02/27/26(Fri)20:46:52 No.108255538

Anonymous 02/27/26(Fri)20:46:52 No.108255538▶

>>108255512
OpenAI revenue has outperformed even the most outlandishly positive projections, of course they will get more investment. The same is true for Anthropic and almost all big chinese AI labs as well.

I wonder how long it's going to be before people realize it's not a bubble and the financial underpinnings (real revenue and users) are extremely promising.

Anonymous
02/27/26(Fri)20:48:19 No.108255551

Anonymous 02/27/26(Fri)20:48:19 No.108255551▶

File: the calculator.jpg (108.1 KB)

108.1 KB JPG

>>108254373
>>108255245
Be nice to your LLMs ! :))

Anonymous
02/27/26(Fri)20:49:49 No.108255563

Anonymous 02/27/26(Fri)20:49:49 No.108255563▶

>>108255512
And OpenAI is going to invest $33 billion into new datacenters built by those three?

Anonymous
02/27/26(Fri)20:50:14 No.108255566

Anonymous 02/27/26(Fri)20:50:14 No.108255566▶

>>108255538
revenue =/= profit
for every million they make they burn million and a half

Anonymous
02/27/26(Fri)20:52:43 No.108255595

Anonymous 02/27/26(Fri)20:52:43 No.108255595▶

>>108255512
They get money from Amazon and NVIDIA to give it back to them. It's circular bs. Also it's all promises under many conditions and the actual financing that might happen is around 30B.
Of this financing round the only one that seems at a loss is SoftBank. I'm not sure what their angle is. Maybe they're run by loons.

Anonymous
02/27/26(Fri)20:55:08 No.108255616

Anonymous 02/27/26(Fri)20:55:08 No.108255616▶

>>108255566
Yep, and they can inject and invest as much as they want, if they can't have any ROI they'll be dead.
It's a huge gamble, and the more time they're not making money, the more potential panic can happen.
Their chatpgt at 20$ should probably be double the price to be profitable, and same for all the free "copilot" I see in every company around me.
The only company actually making bank is Nvidia, as they're the one selling the shovels.

Anonymous
02/27/26(Fri)20:55:10 No.108255617

Anonymous 02/27/26(Fri)20:55:10 No.108255617▶

>>108255512
>investing into the company making your hardware costs skyrocket

Anonymous
02/27/26(Fri)21:00:12 No.108255660

Anonymous 02/27/26(Fri)21:00:12 No.108255660▶

>>108255566
>>108255616
Complete bullshit. OpenAI has 80% margins on serving tokens to customers. Not only that but every model trained so far has brought in between 10-100x the amount it cost to train. It's just that OpenAI immediately reinvests all of that money into training even bigger models. Being so ridiculously profitable that you IMMEDIATELY go and reinvest all of your profit into the next even-bigger product isn't a sign of a bubble, it's the opposite of a bubble.

This doesn't mean that OpenAI will not go the way of the dodo though. But that'll happen because Anthropic and DeepMind are going to DP rape OpenAI in the coming years, NOT because their business model isn't sustainable.

Anonymous
02/27/26(Fri)21:01:06 No.108255669

Anonymous 02/27/26(Fri)21:01:06 No.108255669▶

>>108255660
>but every model trained so far has brought in between 10-100x the amount it cost to train
Source

Anonymous
02/27/26(Fri)21:01:36 No.108255673

Anonymous 02/27/26(Fri)21:01:36 No.108255673▶

>>108254752
Open task manager.
Look at the amount of ram and vram used.
See cpu/gpu usage spike then its generating tokens.

>ollama
lmstudio might be another option.

Anonymous
02/27/26(Fri)21:02:59 No.108255682

Anonymous 02/27/26(Fri)21:02:59 No.108255682▶

>>108255669
dumbass

Anonymous
02/27/26(Fri)21:03:48 No.108255688

Anonymous 02/27/26(Fri)21:03:48 No.108255688▶

>>108255682
>source is a dumbass
I expected as much.

Anonymous
02/27/26(Fri)21:09:23 No.108255739

Anonymous 02/27/26(Fri)21:09:23 No.108255739▶

>>108255551
They can't calculate for shit

Anonymous
02/27/26(Fri)21:10:36 No.108255753

Anonymous 02/27/26(Fri)21:10:36 No.108255753▶

>>108255551
this meme is just "insert what I think in the middle" at this point

Anonymous
02/27/26(Fri)21:11:17 No.108255761

Anonymous 02/27/26(Fri)21:11:17 No.108255761▶

What advancements in local models do (you) want to see before the year is over?

Anonymous
02/27/26(Fri)21:13:27 No.108255773

Anonymous 02/27/26(Fri)21:13:27 No.108255773▶

>>108255761
A memory recall mechanism that's fast, accurate, and that doesn't need a fuckton of VRAM.

Anonymous
02/27/26(Fri)21:14:31 No.108255784

Anonymous 02/27/26(Fri)21:14:31 No.108255784▶

File: open-ai-revenue-3482223949.png (8.2 KB)

8.2 KB PNG

>>108255669
GPT-3 cost 12 million to train and brought in 1 billion in revenue it brought in more than 100x the amount it cost to train

GPT-4 cost 100 million to train and it brought in 4.5billion in revenue or 45x the amount to train

GPT-5 is rumored to cost 500 million to train and OpenAI's revenue has grown almost 4x as much as during GPT-4 training. It's safe to say GPT-5 brought in way more than 10x its cost.

Why OpenAI isn't running a profit is because they always reinvest their revenue immediately into new training runs, not because their revenue isn't growing insanely fast and not because individual models aren't insanely profitable.

The trick is that every new model unlocks so much value by being smarter and more capable that it brings in geometrically more revenue. OpenAI is projecting 100 billion revenue over 2026 (and they are ahead of schedule by a ton already)

Anonymous
02/27/26(Fri)21:15:01 No.108255788

Anonymous 02/27/26(Fri)21:15:01 No.108255788▶

>>108255761
2T models with at least 100-200b active parameters so that even the last cpumaxxers who run shit like k2.5 and glm5 right now are cut off from running sota models at acceptable speeds

Anonymous
02/27/26(Fri)21:16:56 No.108255808

Anonymous 02/27/26(Fri)21:16:56 No.108255808▶

>>108255784
Revenue does not equal profit anon.
They really need to make econ classes a requirement in schools.

Anonymous
02/27/26(Fri)21:17:13 No.108255813

Anonymous 02/27/26(Fri)21:17:13 No.108255813▶

>>108255788
Come to think of it, there was some scaling law/correlation. Deepseek team landed on 671/37, which is cool and all, but then why is kimi 1000/32. It has less active than deepseek. I feel like it should've had more.

Anonymous
02/27/26(Fri)21:18:22 No.108255827

Anonymous 02/27/26(Fri)21:18:22 No.108255827▶

>>108255788
wasn't behemoth supposed to be around that size

Anonymous
02/27/26(Fri)21:18:54 No.108255833

Anonymous 02/27/26(Fri)21:18:54 No.108255833▶

>>108254725
>>108254734
>>108254612
>>108254556
>>108254553
try it out no need to even encode just decode it
https://github.com/monorhenry-create/NeurallengLLM

I DID IT here u go anons for those who doubted me.

Anonymous
02/27/26(Fri)21:19:12 No.108255834

Anonymous 02/27/26(Fri)21:19:12 No.108255834▶

>>108255761
- More mechanistic interpretability stuff.
- Wasn't there a whale / dolphin language thing? That.
- 4-bit training.

Anonymous
02/27/26(Fri)21:21:00 No.108255856

Anonymous 02/27/26(Fri)21:21:00 No.108255856▶

>>108255761
still gud at long context (>8k)

Anonymous
02/27/26(Fri)21:22:26 No.108255868

Anonymous 02/27/26(Fri)21:22:26 No.108255868▶

>>108255808
You are the one that needs econ classes.

You can have two companies run in the red but one is a disaster while the other is one of the best situations a company can be in.

If you are a company with 500 million in revenue selling cars but it costs you 800 million to make the cars then you are doing very badly because the cost of making the cars isn't worth the revenue you make from it.

If you are SUCH A PROFITABLE COMPANY that you can sell your product for 100x it costs to make it (Like OpenAI with their models) then it makes sense to immediately grab all of your would-be profit and immediately invest it into making even bigger better models that will make even more money in the future. Hence you look red on paper but you're an extremely profitable business.

This was the state of Amazon in the past, they were so profitable that they always reinvested all of their profit into building new infrastructure and warehouses because "taking profit" would just be wasteful if you can expand your business rapidly like that. This is what OpenAI is now finding themselves in, look at their ridiculous revenue growth, remember that all of their individual models make almost 100x of their costs back so of course you will make 0 profit because your company is so profitable you IMMEDIATELY put all your money back into scaling up and making even more in the future.

Anonymous
02/27/26(Fri)21:22:35 No.108255871

Anonymous 02/27/26(Fri)21:22:35 No.108255871▶

>>108255861
More scraps for us in the fallout?

Anonymous
02/27/26(Fri)21:22:48 No.108255872

Anonymous 02/27/26(Fri)21:22:48 No.108255872▶

>>108255861
dario bfto

Anonymous
02/27/26(Fri)21:24:22 No.108255885

Anonymous 02/27/26(Fri)21:24:22 No.108255885▶

>>108255861
I wonder if they will give soldiers or their commanding officers local AI in the field to assist in their operations. After all, a local AI cannot be disrupted by loss of communication.
Well it can, since it is no longer receiving the most up to date information but it will still work under those conditions.

Anonymous
02/27/26(Fri)21:25:33 No.108255896

Anonymous 02/27/26(Fri)21:25:33 No.108255896▶

>>108255861
Why is this retard still going on about the constitution when he shits on it every day?

Anonymous
02/27/26(Fri)21:25:40 No.108255897

Anonymous 02/27/26(Fri)21:25:40 No.108255897▶

>>108255833
I didn't test it, so I'm taking your word at face value, but, fucking hell anon, congratulations.

Anonymous
02/27/26(Fri)21:25:49 No.108255899

Anonymous 02/27/26(Fri)21:25:49 No.108255899▶

>>108255885
They'll have local models for soldiers in the field only after soldiers can fit 32 GB VRAM in their uniforms like you billionaires in this thread.

Anonymous
02/27/26(Fri)21:26:36 No.108255910

Anonymous 02/27/26(Fri)21:26:36 No.108255910▶

>>108255868
And when do you actually stop making new models and actually profit? It's an endless cat and mouse chase with no end in sight. Don't tell me you actually believe in agi on transformer?

Anonymous
02/27/26(Fri)21:26:47 No.108255911

Anonymous 02/27/26(Fri)21:26:47 No.108255911▶

>>108254800
cheers mate got myself up and running

Anonymous
02/27/26(Fri)21:26:52 No.108255914

Anonymous 02/27/26(Fri)21:26:52 No.108255914▶

>>108255897
least u could do is test it, u don't need to use ur gpu just have ur cpu use tokenizer and decode example.txt. Im working on images, soon mp3s maybe

Anonymous
02/27/26(Fri)21:27:07 No.108255915

Anonymous 02/27/26(Fri)21:27:07 No.108255915▶

>>108255861
>it's real
https://truthsocial.com/@realDonaldTrump/posts/116144552969293195

Anonymous
02/27/26(Fri)21:27:13 No.108255917

Anonymous 02/27/26(Fri)21:27:13 No.108255917▶

>>108255896
I'm more confused why he doesn't use grok or have elon musk release a fascist open source version for the government.
Then again the american government has never liked the concept of open source. China likes it though.

Anonymous
02/27/26(Fri)21:29:13 No.108255939

Anonymous 02/27/26(Fri)21:29:13 No.108255939▶

>>108255861
Kek

Anonymous
02/27/26(Fri)21:29:19 No.108255940

Anonymous 02/27/26(Fri)21:29:19 No.108255940▶

>>108255917
because grok sucks ass and claude was already well integrated in a lot of gov shit

Anonymous
02/27/26(Fri)21:29:21 No.108255942

Anonymous 02/27/26(Fri)21:29:21 No.108255942▶

>>108255761
1. Something like Qwen 35-a3, but without refusals and trained on a more diverse dataset
2. Style transfer for LLMs, a small model that can take dry input from a smarter model and rewrite it in better prose

Anonymous
02/27/26(Fri)21:29:36 No.108255944

Anonymous 02/27/26(Fri)21:29:36 No.108255944▶

>>108255910
Amazon took 20 years of not taking profit and just reinvesting "in the red" until they finally decided to become profitable. As long as revenue scales faster than your cost you should reinvest and stay in the red, this has been conventional economics wisdom for the last 30 years now.

You would essentially be insane to allow yourself to run a profit if you can reinvest and every single dollar you invest now becomes 100 dollars in just 3-6 months time.

Anonymous
02/27/26(Fri)21:30:11 No.108255948

Anonymous 02/27/26(Fri)21:30:11 No.108255948▶

>>108255861
I hope their virtue signaling was worth it.

Anonymous
02/27/26(Fri)21:30:46 No.108255956

Anonymous 02/27/26(Fri)21:30:46 No.108255956▶

>>108255827
yeah and so was the original gpt4

Anonymous
02/27/26(Fri)21:31:51 No.108255965

Anonymous 02/27/26(Fri)21:31:51 No.108255965▶

>>108255944
Alright, but it was an active choice by Amazon, they could've stopped anytime they wanted. OpenAI has no choice. They have to keep making new models or they get left in dust with no profit, no revenue and no new product.
So, is the real profit actually possible in this case?

Anonymous
02/27/26(Fri)21:32:51 No.108255969

Anonymous 02/27/26(Fri)21:32:51 No.108255969▶

Sometimes I feel like the only reason I can justify my fiber connection nowadays is because every other week I download 500GB worth of the new model of the week.

Anonymous
02/27/26(Fri)21:33:29 No.108255978

Anonymous 02/27/26(Fri)21:33:29 No.108255978▶

>>108255861
>it's real
LMAO THATS WHY HES THE GOAT

Anonymous
02/27/26(Fri)21:35:38 No.108255993

Anonymous 02/27/26(Fri)21:35:38 No.108255993▶

>>108255833
I'll give it a go tomorrow.
>I DID IT here u go anons for those who doubted me.
For what it's worth, I didn't doubt you. I just called you a schizo and made fun of you for not being able to spell steganography. At least you got it right in the repo.

Anonymous
02/27/26(Fri)21:35:47 No.108255995

Anonymous 02/27/26(Fri)21:35:47 No.108255995▶

>>108255940
No such thing as a well integrated model, it takes 2 minutes to change it.

Anonymous
02/27/26(Fri)21:36:42 No.108256007

Anonymous 02/27/26(Fri)21:36:42 No.108256007▶

File: file.png (15 KB)

15 KB PNG

story of a life

Anonymous
02/27/26(Fri)21:36:45 No.108256009

Anonymous 02/27/26(Fri)21:36:45 No.108256009▶

>>108255833
>Hide secret messages inside normal-looking AI-generated text. You give it a secret and a password, and it spits out a paragraph that looks totally ordinary — but the secret is baked into which words the model chose. Only someone with the password and this tool can pull the message back out.
who the hell cares of these things??

Anonymous
02/27/26(Fri)21:38:57 No.108256029

Anonymous 02/27/26(Fri)21:38:57 No.108256029▶

>>108255761
Native image output. I want a model to generate relevant illustrations with reasonable accuracy at any point in a roleplay. Quality doesn't matter, can be sloppy and have fucked-up hands, I just want to see what images the model has in mind sometimes when it writes all this shit

Anonymous
02/27/26(Fri)21:39:28 No.108256032

Anonymous 02/27/26(Fri)21:39:28 No.108256032▶

>>108256009
>who the hell cares of these things??

for people who care about privacy if anything this might be how you bypass filters and censores on llms.>>108255993
u know it takes less than minute to run just decode example to show it works. Im assuming ur using cuda right

Anonymous
02/27/26(Fri)21:40:27 No.108256037

Anonymous 02/27/26(Fri)21:40:27 No.108256037▶

>>108256029
local autoregressive models already exist though

Anonymous
02/27/26(Fri)21:41:28 No.108256042

Anonymous 02/27/26(Fri)21:41:28 No.108256042▶

were the experiments to use diffusion for text gen ever successful ?

Anonymous
02/27/26(Fri)21:43:01 No.108256054

Anonymous 02/27/26(Fri)21:43:01 No.108256054▶

>>108256037
Can I RP with them?

Anonymous
02/27/26(Fri)21:43:52 No.108256065

Anonymous 02/27/26(Fri)21:43:52 No.108256065▶

>>108256054
don't be so close minded

Anonymous
02/27/26(Fri)21:44:07 No.108256067

Anonymous 02/27/26(Fri)21:44:07 No.108256067▶

>>108256042
There was actually a new one called Mercury 2 just last week or so. It's closed source and only competes in the Haiku/GPT-mini class but it's apparently not much worse than those (according to benchmarks) while being much faster.
It's not worth using by any means but at least the concept isn't dead.

Anonymous
02/27/26(Fri)21:45:12 No.108256076

Anonymous 02/27/26(Fri)21:45:12 No.108256076▶

>>108256067
thanks for the update anon

Anonymous
02/27/26(Fri)21:46:26 No.108256090

Anonymous 02/27/26(Fri)21:46:26 No.108256090▶

>>108255965
Depends if you believe OpenAI has some sort of network effect and can keep people in their garden. Honestly their brand recognition and insanely huge install base of normalfags with ai psychosis will probably allow them to be profitable indefinitely no matter how shit the underlying models actually are.

Remember that the most profitable AI company right now isn't any of the big AI labs but character.ai because it essentially has captured the entire female demographic with romantacy type rape roleplays.

But I do understand your point and I think it holds true for Anthropic in particular as its users are all enterprise or people that want the best of the best and willing to pay for it. The moment Claude becomes noticeably worse than competition in code is when they will immediately lose relevance.

Anonymous
02/27/26(Fri)21:49:12 No.108256109

Anonymous 02/27/26(Fri)21:49:12 No.108256109▶

>>108256009
It's a curious artifact. Like LLM-based text compression.
https://github.com/AlexBuz/llama-zip
>>108256032
I run openbsd and running torch/transformers code directly is a pain. Last time I tried I got bored and stopped compiling stuff. I'll make a small vm tomorrow for it.

Anonymous
02/27/26(Fri)21:53:02 No.108256137

Anonymous 02/27/26(Fri)21:53:02 No.108256137▶

>>108256109
>I run openbsd and running torch/transformers code directly is a pain. Last time I tried I got bored and stopped compiling stuff. I'll make a small vm tomorrow for it.

u don't need to run transformer to decode it though. thats why this is better. You can essentially upload files to open internet and small program on phone can decode it for you with no gpu use. takes less than a second

Anonymous
02/27/26(Fri)21:53:28 No.108256144

Anonymous 02/27/26(Fri)21:53:28 No.108256144▶

File: 1768911320323441.png (70.7 KB)

70.7 KB PNG

So this is the power of tiny diffusion textgen models. When are the chinks going to make one of these at a size that matters?

Anonymous
02/27/26(Fri)21:54:13 No.108256149

Anonymous 02/27/26(Fri)21:54:13 No.108256149▶

>>108255861
good
AI going more woke and safe will be the death of this hobby and every new release will suck harder

Anonymous
02/27/26(Fri)21:58:13 No.108256192

Anonymous 02/27/26(Fri)21:58:13 No.108256192▶

>>108256007
ETA before agents are smarter and make less mistakes than daniel? I don't believe in AGI BS but I do believe there will come a time when LLMs are more useful than useless eaters like him

Anonymous
02/27/26(Fri)21:58:23 No.108256197

Anonymous 02/27/26(Fri)21:58:23 No.108256197▶

File: at.png (50.6 KB)

50.6 KB PNG

>>108256137
Calm down. I'm not in the mood to start butchering your code.

Anonymous
02/27/26(Fri)22:05:54 No.108256268

Anonymous 02/27/26(Fri)22:05:54 No.108256268▶

>>108255861
Huh, what did Dario do??? Did Trump's AI girlfriend send him a refusal message or something?

Anonymous
02/27/26(Fri)22:07:27 No.108256285

Anonymous 02/27/26(Fri)22:07:27 No.108256285▶

>>108256197
left over comments shouldn't mean much lol

Anonymous
02/27/26(Fri)22:08:39 No.108256302

Anonymous 02/27/26(Fri)22:08:39 No.108256302▶

>>108256090
I think OpenAI has a decent shot at building out their garden if they can get their proprietary openclaw-esque thing out and usable for normies. People around here love to shit on openclaw but I think all the popularity has shown that there is a public appetite for this sort of thing and that we're not far off from it technology wise.

Obviously, the challenge is, how do you keep the stuff people like about openclaw, that being the extreme ability to just do random arbitrary stuff, without it being a security nightmare?

OpenClaw is able to get away with it by virtue of the fact that it's clearly labeled as a free developer-centric tool so if/when it fucks up with your data everyone just shrugs their shoulders and taps the sign that says "HIGHLY UNSTABLE GOOD LUCK LOL". Can't do that to paying customers though. When Phil and Debra want to know why the talking computer deleted all their emails they're gonna want a better answer than "RTFM"

Anyways basically I think the ai "killer app" is already on the horizon and whoever manages to capture the normies with it will have them in their walled garden forever.

Anonymous
02/27/26(Fri)22:09:06 No.108256307

Anonymous 02/27/26(Fri)22:09:06 No.108256307▶

>>108256268
Claude restricts CP ERP, Trump is livid

Anonymous
02/27/26(Fri)22:09:13 No.108256310

Anonymous 02/27/26(Fri)22:09:13 No.108256310▶

>>108255861
Imagine making a product so good the President is essentially begging you to let him use it like he wants. Anthropic won.

Anonymous
02/27/26(Fri)22:09:34 No.108256312

Anonymous 02/27/26(Fri)22:09:34 No.108256312▶

>>108256268
Dario is jewish so you have to question every decision he makes even if it looks good at the moment.

Anonymous
02/27/26(Fri)22:12:30 No.108256352

Anonymous 02/27/26(Fri)22:12:30 No.108256352▶

Not local. Go to your containment board.

Anonymous
02/27/26(Fri)22:12:33 No.108256353

Anonymous 02/27/26(Fri)22:12:33 No.108256353▶

File: 1741222296482601.png (70.5 KB)

70.5 KB PNG

>>108255861
Dario btfo
What is this timeline. Jfc.

Anonymous
02/27/26(Fri)22:14:06 No.108256371

Anonymous 02/27/26(Fri)22:14:06 No.108256371▶

>>108256352
It's relevant to local because they tightened their censorship in protest so now all the chinese companies distilling them are suffering for it.

Anonymous
02/27/26(Fri)22:15:36 No.108256392

Anonymous 02/27/26(Fri)22:15:36 No.108256392▶

>>108256352
Which local model would be radical leftist?

Anonymous
02/27/26(Fri)22:15:48 No.108256395

Anonymous 02/27/26(Fri)22:15:48 No.108256395▶

>>108256371
>so now all the chinese companies distilling them are suffering for it
Sure. Distilling from claude makes fun models.
Fuck off.

Anonymous
02/27/26(Fri)22:16:13 No.108256397

Anonymous 02/27/26(Fri)22:16:13 No.108256397▶

>>108255861
Damn, I think Anthropic is kinda based now.

Anonymous
02/27/26(Fri)22:16:43 No.108256409

Anonymous 02/27/26(Fri)22:16:43 No.108256409▶

>>108256371
That explains the new qwen.

Anonymous
02/27/26(Fri)22:17:00 No.108256412

Anonymous 02/27/26(Fri)22:17:00 No.108256412▶

Gemma 4 will save us

Anonymous
02/27/26(Fri)22:17:03 No.108256413

Anonymous 02/27/26(Fri)22:17:03 No.108256413▶

>>108256397
they did help the government to kidnap the venezuelian president though, it's not like they weren't involved at all with war

Anonymous
02/27/26(Fri)22:17:15 No.108256414

Anonymous 02/27/26(Fri)22:17:15 No.108256414▶

>>108256397
Opposite of based though.

Anonymous
02/27/26(Fri)22:17:52 No.108256420

Anonymous 02/27/26(Fri)22:17:52 No.108256420▶

>>108255861
suicidal move by claude desu

Anonymous
02/27/26(Fri)22:18:08 No.108256424

Anonymous 02/27/26(Fri)22:18:08 No.108256424▶

>>108256395
Are you living under a rock? Moonshot, Deepseek and Z.AI have been training on Claude logs like crazy.

Anonymous
02/27/26(Fri)22:18:36 No.108256428

Anonymous 02/27/26(Fri)22:18:36 No.108256428▶

>>108256397
Always were. I loved when Sam Altman and Dario were both in India at some AI convention and everyone was holding hands and Dario just straight up refused to hold Sam Altman's hand.

Reminder that Anthropic split off from OpenAI because Dario thought Sam Altman was a psychopath that didn't give a shit about anything or anyone but himself.

Anonymous
02/27/26(Fri)22:18:43 No.108256429

Anonymous 02/27/26(Fri)22:18:43 No.108256429▶

why is hf download so fucking bad
I can download all parts, except one always failing at like 41GB/42GB, it just hangs
fucking shit

Anonymous
02/27/26(Fri)22:19:00 No.108256436

Anonymous 02/27/26(Fri)22:19:00 No.108256436▶

>>108256420
At least he can focus on his real goal now, beating Pokemon Red.

Anonymous
02/27/26(Fri)22:19:06 No.108256438

Anonymous 02/27/26(Fri)22:19:06 No.108256438▶

>>108256424
And they're made the more boring for it.
Fuck off.

Anonymous
02/27/26(Fri)22:20:00 No.108256448

Anonymous 02/27/26(Fri)22:20:00 No.108256448▶

>>108256424
Those companies' models' slop profiles are much more in line with Gemini than Claude

Anonymous
02/27/26(Fri)22:20:01 No.108256449

Anonymous 02/27/26(Fri)22:20:01 No.108256449▶

>>108256436
his real focus is to build the safest safety safe model with safety safe guardrails to be the safest of them all

Anonymous
02/27/26(Fri)22:20:56 No.108256459

Anonymous 02/27/26(Fri)22:20:56 No.108256459▶

>>108256424
yeah I have eyes, I can read the constant "I'm sorry" spouted recently by all Chinese models

Anonymous
02/27/26(Fri)22:21:15 No.108256462

Anonymous 02/27/26(Fri)22:21:15 No.108256462▶

>>108256438
i've used k2 0711, k2 0905, k2 thinking, and now k2.5 over the last year. as somebody who uses kimi as their main model i can safely tell you all that this anon is pants on head retarded. k2.5 is significantly better than k2 0711.

Anonymous
02/27/26(Fri)22:21:56 No.108256470

Anonymous 02/27/26(Fri)22:21:56 No.108256470▶

>>108256429
Tried wget? I don't know if --continue works for hf. Worth a try.

Anonymous
02/27/26(Fri)22:22:41 No.108256473

Anonymous 02/27/26(Fri)22:22:41 No.108256473▶

>>108256428
>Dario thought Sam Altman was a psychopath that didn't give a shit about anything or anyone but himself.
he changed though, he's now closer to Sam
https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/

Anonymous
02/27/26(Fri)22:23:06 No.108256478

Anonymous 02/27/26(Fri)22:23:06 No.108256478▶

>>108255788
If you can run full GLM5 then you could run even a 3T model at Q4 because for whatever reason z.ai decided to repeat their model at 16-bit precision.

Anonymous
02/27/26(Fri)22:23:11 No.108256479

Anonymous 02/27/26(Fri)22:23:11 No.108256479▶

>>108256470
no, hf just restarts from scratch, it's very annoying
I'll curl or wget next time

Anonymous
02/27/26(Fri)22:23:45 No.108256482

Anonymous 02/27/26(Fri)22:23:45 No.108256482▶

Claude slop models aren't just bad—They are a regression in every meaningful way. They aren't simply more boring—They lack the ability to write engaging stories. Gemini isn't just the better model to distill—It's the optimal choice.

Anonymous
02/27/26(Fri)22:25:05 No.108256494

Anonymous 02/27/26(Fri)22:25:05 No.108256494▶

>>108256482
You're absolutely right!

Anonymous
02/27/26(Fri)22:25:31 No.108256497

Anonymous 02/27/26(Fri)22:25:31 No.108256497▶

https://xcancel.com/StefanoErmon/status/2026340720064520670
>The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs.
if they manage to get the same performance as normal LLMs that's a big deal, imagine Qwen 3.5 27b but 5x faster, make dense models great again

Anonymous
02/27/26(Fri)22:26:22 No.108256508

Anonymous 02/27/26(Fri)22:26:22 No.108256508▶

>>108256478
yeah crazy how glm5 is the first model that you must run at fp16 to not have it lobotomized into being unusable

Anonymous
02/27/26(Fri)22:29:06 No.108256528

Anonymous 02/27/26(Fri)22:29:06 No.108256528▶

>>108256473
This was such a fucking clickbait move though because the safety pledge hasn't been updated since 2023 and this is merely an update to more accurately align with how the AI industry is nowadays. It's not the same as Anthropic saying "lmao fuck safety, we want money" Instead they actually found their definition of safety back in 2023 doesn't align with the actual concerns about AI that exist in 2026 so it's better to make a new policy for the actual real threads we face.

Anonymous
02/27/26(Fri)22:34:39 No.108256575

Anonymous 02/27/26(Fri)22:34:39 No.108256575▶

>>108256497
why isn't the cool or useful shit ever open weights?

Anonymous
02/27/26(Fri)22:36:36 No.108256598

Anonymous 02/27/26(Fri)22:36:36 No.108256598▶

File: android_girls.jpg (103.2 KB)

103.2 KB JPG

>>108252243

Anonymous
02/27/26(Fri)22:38:09 No.108256612

Anonymous 02/27/26(Fri)22:38:09 No.108256612▶

>>108256575
why would anyone give something special or innovative away for free?

Anonymous
02/27/26(Fri)22:39:05 No.108256626

Anonymous 02/27/26(Fri)22:39:05 No.108256626▶

>>108256612
because i want it

Anonymous
02/27/26(Fri)22:39:23 No.108256628

Anonymous 02/27/26(Fri)22:39:23 No.108256628▶

File: surgeon.png (78.6 KB)

78.6 KB PNG

Presented without comments. Try your own.

Anonymous
02/27/26(Fri)22:40:51 No.108256646

Anonymous 02/27/26(Fri)22:40:51 No.108256646▶

>>108256628
At some point I feel like humans would also spew bullshit from nonsensical stories.

Anonymous
02/27/26(Fri)22:41:37 No.108256652

Anonymous 02/27/26(Fri)22:41:37 No.108256652▶

>>108256628
full marks, correct answer

Anonymous
02/27/26(Fri)22:43:35 No.108256669

Anonymous 02/27/26(Fri)22:43:35 No.108256669▶

>>108255813
Interestingly if it's linear scaling then the small Qwen models overshoot that target:
>DeepSeek: 37/671 = 0.0551
>Kimi K2.5: 32/1000 = 0.032
>GLM 4.7: 32/355 = 0.0901
>GLM 4.7-Flash: 3/30 = 0.1
>GLM 5: 40/755 = 0.053
>Minimax M2.5: 10/230 = 0.0435
>Qwen 35B 3/35 = 0.0857
>Qwen 122B: 10/122 = 0.08197
>Qwen 397B: 17/397 = 0.0428
Is the reason why the smaller Qwen models feel better than the big one?

I guess the active parameter count determines largely how smart and fast a model is, where 3B-10B is alright and 17B-40B is good. But it doesn't seem like having a 27B dense model is somehow wicked smart compared to the 3B active parameters on the 35B-A3B Qwen model.

Anonymous
02/27/26(Fri)22:43:53 No.108256673

Anonymous 02/27/26(Fri)22:43:53 No.108256673▶

>>108256628
I already did >>108256144

Anonymous
02/27/26(Fri)22:45:08 No.108256684

Anonymous 02/27/26(Fri)22:45:08 No.108256684▶

>>108256497
goofs?

Anonymous
02/27/26(Fri)22:47:13 No.108256713

Anonymous 02/27/26(Fri)22:47:13 No.108256713▶

File: file.png (78.2 KB)

78.2 KB PNG

The second message was one of the suggested follow ups.

Anonymous
02/27/26(Fri)22:48:32 No.108256730

Anonymous 02/27/26(Fri)22:48:32 No.108256730▶

File: 1758642411104368.png (30.1 KB)

30.1 KB PNG

>>108256628
Interesting

Anonymous
02/27/26(Fri)22:49:42 No.108256737

Anonymous 02/27/26(Fri)22:49:42 No.108256737▶

>>108256646
AGI when it tells the user to fuck off. Hasn't happened yet.
>>108256652
The surgeon is definitely a she, of course. At least it's not trying to make a point about gender stereotypes... ugh...
>>108256713
>I don't know what you're talking about
>4chan, btw.
Cute.
>>108256730
Nevermind what I said. AGI. Never local.

Anonymous
02/27/26(Fri)22:49:44 No.108256739

Anonymous 02/27/26(Fri)22:49:44 No.108256739▶

>>108256669
I don't think any large team have provided any research on this. There's definitely some loss of comprehension on some subjects comparing moe vs dense, but it's unclear as to why. Small active param count is a one thing, but clearly some numbers don't make sense. I guess it all depends on the training and how much slack the router picks up.

Anonymous
02/27/26(Fri)22:50:14 No.108256742

Anonymous 02/27/26(Fri)22:50:14 No.108256742▶

>>108255761
Better recall, longer context and native image/video/audio output.

Anonymous
02/27/26(Fri)22:51:01 No.108256751

Anonymous 02/27/26(Fri)22:51:01 No.108256751▶

>>108256497
Does it require more GPU compute? Image diffusion models aren't as massive as LLMs, but you basically need them to run on a GPU. They're like 20x slower on a CPU. If that's still true with text diffusion then you aren't going to be doing any CPU off-loading.

Anonymous
02/27/26(Fri)22:51:01 No.108256752

Anonymous 02/27/26(Fri)22:51:01 No.108256752▶

>>108256737
>AGI when it tells the user to fuck off. Hasn't happened yet.
Do you have any idea how trivial it would be to train a model to respond like that, especially for only variations of that prompt?

Anonymous
02/27/26(Fri)22:52:38 No.108256764

Anonymous 02/27/26(Fri)22:52:38 No.108256764▶

File: 1756299848592152.png (116.2 KB)

116.2 KB PNG

>>108256628
In case anyone was wondering why the DoD wants anthropic to work with them so badly

Anonymous
02/27/26(Fri)22:53:26 No.108256772

Anonymous 02/27/26(Fri)22:53:26 No.108256772▶

>>108255896
Because the retards he's brainwashing both know nothing about the constitution and don't care.

Anonymous
02/27/26(Fri)22:54:56 No.108256784

Anonymous 02/27/26(Fri)22:54:56 No.108256784▶

>>108256268
Dario said Claude can't be used to spy on American citizens and republicans shat themselves.

Anonymous
02/27/26(Fri)22:56:10 No.108256799

Anonymous 02/27/26(Fri)22:56:10 No.108256799▶

Oh. They released the base qwen 3.5 35b models?
Interesting.

Anonymous
02/27/26(Fri)22:56:12 No.108256800

Anonymous 02/27/26(Fri)22:56:12 No.108256800▶

>>108256772
can confrim; don't give a shit; am retarded

Anonymous
02/27/26(Fri)22:57:09 No.108256815

Anonymous 02/27/26(Fri)22:57:09 No.108256815▶

>>108256764
Until a model just responds to this with 'What.' we will not have AGI.

Anonymous
02/27/26(Fri)22:57:56 No.108256821

Anonymous 02/27/26(Fri)22:57:56 No.108256821▶

>>108256784
>kidnapping a venezualian president: Good
>spying on citizens: Bad
what did Dario mean by this?

Anonymous
02/27/26(Fri)22:58:59 No.108256831

Anonymous 02/27/26(Fri)22:58:59 No.108256831▶

File: wat.png (367.4 KB)

367.4 KB PNG

>>108256815
It will need image output.

Anonymous
02/27/26(Fri)22:59:54 No.108256835

Anonymous 02/27/26(Fri)22:59:54 No.108256835▶

>>108256821
This may come as a shock to you, but Venezuelan presidents are not citizens of the United States and therefore do not have inalienable rights enshrined in the constitutions.

Anonymous
02/27/26(Fri)23:00:28 No.108256842

Anonymous 02/27/26(Fri)23:00:28 No.108256842▶

>>108256751
who the fuck cares about cpu offloading? we'll be back to using dense models and stack 3090s

Anonymous
02/27/26(Fri)23:01:06 No.108256848

Anonymous 02/27/26(Fri)23:01:06 No.108256848▶

>>108256835
So war good?

Anonymous
02/27/26(Fri)23:02:44 No.108256862

Anonymous 02/27/26(Fri)23:02:44 No.108256862▶

>>108256842
Because the good models will be 100GB. Are you gonna buy an RTX 6000 for $6k?

Anonymous
02/27/26(Fri)23:02:54 No.108256865

Anonymous 02/27/26(Fri)23:02:54 No.108256865▶

File: file.png (610.3 KB)

610.3 KB PNG

>>108256848
Yes.

Anonymous
02/27/26(Fri)23:03:52 No.108256871

Anonymous 02/27/26(Fri)23:03:52 No.108256871▶

>>108256865
they kidnapped the venezuelian president so that they'll be forced to sell their oil to israel btw lmao, anthropic supports MIGA!

Anonymous
02/27/26(Fri)23:04:37 No.108256879

Anonymous 02/27/26(Fri)23:04:37 No.108256879▶

>>108256862
Yes, I ran 70b and Mistral Large back in the day and I'd do so again.

Anonymous
02/27/26(Fri)23:04:45 No.108256881

Anonymous 02/27/26(Fri)23:04:45 No.108256881▶

>>108256821
>kidnapping a venezualian president: Good
Kidnapping a dictator hated by literally everyone, including all Venezuelans living under him? Why yes I will help with that.
>spying on citizens: Bad
Breaking all my vows, ethics and making the world a more dystopian place just because some retard wants to distract the world from the fact he rapes and murders little girls? Why no I won't do that.

It's that simple.

Anonymous
02/27/26(Fri)23:06:13 No.108256890

Anonymous 02/27/26(Fri)23:06:13 No.108256890▶

>>108256881
>Kidnapping a dictator hated by literally everyone
you know they did that because they have the oil, they always fight against dictator as long as they have oil, which is why they don't give a fuck about North Korea for example, must be a coinscidence

Anonymous
02/27/26(Fri)23:06:38 No.108256894

Anonymous 02/27/26(Fri)23:06:38 No.108256894▶

Where the fuck is deepsneed at?

Anonymous
02/27/26(Fri)23:07:17 No.108256901

Anonymous 02/27/26(Fri)23:07:17 No.108256901▶

>>108256890
>a dictator hated by literally everyone
every time
classic

Anonymous
02/27/26(Fri)23:07:27 No.108256904

Anonymous 02/27/26(Fri)23:07:27 No.108256904▶

>>108256894
It's being trained on off-topic posts. Give it a minute.

Anonymous
02/27/26(Fri)23:07:34 No.108256907

Anonymous 02/27/26(Fri)23:07:34 No.108256907▶

>>108256894
after chinese new years is over in two more weeks

Anonymous
02/27/26(Fri)23:08:16 No.108256908

Anonymous 02/27/26(Fri)23:08:16 No.108256908▶

>>108256901
oh yeah, everyone love Kim Jong Un, he's so loved he got 100% in votes, just don't mind the millions of death because of famine, it's just a detail, he's definitely loved!

Anonymous
02/27/26(Fri)23:08:36 No.108256911

Anonymous 02/27/26(Fri)23:08:36 No.108256911▶

File: 1770220900857606.png (18.4 KB)

18.4 KB PNG

>stealing from any source that you can get including copyrighted works to train your models
good
>getting your logs stolen by chinese companies to train their models on them
bad

Anonymous
02/27/26(Fri)23:09:37 No.108256920

Anonymous 02/27/26(Fri)23:09:37 No.108256920▶

>>108256890
There's also the part where best korea has nukes and venezuela doesn't.

Anonymous
02/27/26(Fri)23:10:09 No.108256923

Anonymous 02/27/26(Fri)23:10:09 No.108256923▶

>>108256881
>Breaking all my vows, ethics and making the world a more dystopian place
I see you hate dictatorship in all forms
>>108256901
>a dictator hated by literally everyone
oh nevermind, you don't mind dictatorshp as long as the guy is loved by people kek

Anonymous
02/27/26(Fri)23:11:05 No.108256928

Anonymous 02/27/26(Fri)23:11:05 No.108256928▶

>>108256923
yeah, I love democracy

Anonymous
02/27/26(Fri)23:12:17 No.108256940

Anonymous 02/27/26(Fri)23:12:17 No.108256940▶

>>108256881
>Kidnapping a dictator hated by literally everyone, including all Venezuelans living under him? Why yes I will help with that.
They also murdered 50 people that didn't break any laws. How would you feel if a foreign force came in and started blasting and your mom ended up as collateral damage?

Anonymous
02/27/26(Fri)23:12:28 No.108256941

Anonymous 02/27/26(Fri)23:12:28 No.108256941▶

>>108256928
you don't, you said that it is fine to fight dictatorship only the guy is hated by its people, meaning that you're ok with dictatorship that results in people loving their dictator, that's not what I would call democracy lol

Anonymous
02/27/26(Fri)23:12:33 No.108256942

Anonymous 02/27/26(Fri)23:12:33 No.108256942▶

>>108256928
Doesn't count. A democracy is defined as a system of government granted the divine right to rule from American approval.

Anonymous
02/27/26(Fri)23:15:18 No.108256968

Anonymous 02/27/26(Fri)23:15:18 No.108256968▶

for me, it's deepseek r1-0528

Anonymous
02/27/26(Fri)23:18:43 No.108256990

Anonymous 02/27/26(Fri)23:18:43 No.108256990▶

V4 will be engram-diffusion

Anonymous
02/27/26(Fri)23:19:20 No.108256996

Anonymous 02/27/26(Fri)23:19:20 No.108256996▶

>>108256940
Anthropic isn't really the good guy here, they're just less bad. The only other thing anthropic forbid was creating autonomous weapons without any humans in the loop.
It is depressing and frankly scary that republicans threw that much of a shitfit over such reasonable requests.

Anonymous
02/27/26(Fri)23:20:29 No.108257005

Anonymous 02/27/26(Fri)23:20:29 No.108257005▶

>>108256995
>>108256995
>>108256995

Anonymous
02/27/26(Fri)23:48:36 No.108257271

Anonymous 02/27/26(Fri)23:48:36 No.108257271▶

>>108253594
This is exactly why idiots love LLMs. These details fly right over their heads.

Anonymous
02/27/26(Fri)23:50:41 No.108257289

Anonymous 02/27/26(Fri)23:50:41 No.108257289▶

File: bellcurve-AI.jpg (121.6 KB)

121.6 KB JPG

>>108255551

Anonymous
02/28/26(Sat)00:17:51 No.108257558

Anonymous 02/28/26(Sat)00:17:51 No.108257558▶

>>108257289
An AI girlfriend/wife doesn't have to be sentient, it just has to be nice to me

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108252185

🔍 Search & Sort