/g/ - Thread 108263979

/g/

Thread #108263979

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 02/28/26(Sat)19:30:34 No.108263979

/lmg/ - Local Models General Anonymous 02/28/26(Sat)19:30:34 No.108263979 [Reply]▶

File: 1751519593478255.png (3.1 MB)

3.1 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108256995 & >>108252185

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

329 RepliesView Thread

Showing all 329 replies.

Anonymous
02/28/26(Sat)19:30:57 No.108263984

Anonymous 02/28/26(Sat)19:30:57 No.108263984▶

File: miku work.png (346.8 KB)

346.8 KB PNG

►Recent Highlights from the Previous Thread: >>108256995

--Kimi K2.5 pricing analysis and Qwen3.5 local model alternatives:
>108257528 >108257651 >108257626 >108260080 >108262589 >108262973 >108261620 >108262485 >108262595 >108262840 >108262910
--Local VLLM setup advice for image captioning:
>108257451 >108257545 >108257902 >108257928 >108258088 >108258237 >108259576 >108258640
--Qwen3.5-35B-A3B-Base behavior and censorship observations:
>108257847 >108258241 >108258582 >108258796 >108258835 >108258899
--Tuning Qwen3.5 for faster, less aligned responses:
>108259356 >108259366 >108259437 >108259458 >108259480 >108259382 >108259399 >108259462
--Comparing cloud Gemini-3.1 with local MiniMax-M2.5 performance:
>108257969 >108259126 >108259290
--Qwen3.5 context reprocessing inefficiency and potential llama.cpp fix:
>108262960 >108262969 >108262970 >108263007 >108263014
--Local models still lack ideal traits but offline RAG may help:
>108260135 >108260167 >108260232 >108260621 >108260785
--Mid-generation input insertion feasibility and implementation:
>108259013 >108259068 >108259085 >108259116 >108259120 >108259122 >108259140 >108259132
--Seeking uncensored local models for pentesting tasks:
>108262612 >108262670 >108262687 >108262704 >108262716 >108262774 >108262785 >108262797
--Debugging CUDA crashes with Qwen3.5 in llama.cpp:
>108261599 >108261614 >108261648 >108261675 >108261684 >108261694 >108261834 >108262383 >108262411 >108262200 >108262450 >108262602 >108262763 >108262831
--Z.AI's high pricing for GLM-5-Code criticized:
>108261185 >108261202 >108261405 >108261256
--RTX6000 upgrade expectations for inference performance:
>108262744 >108262869 >108262891 >108262897 >108262896 >108262906 >108262945
--Miku (free space):
>108257603 >108258383 >108258537 >108260384 >108260626 >108261057 >108263177

►Recent Highlight Posts from the Previous Thread: >>108256999

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/28/26(Sat)19:35:02 No.108264013

Anonymous 02/28/26(Sat)19:35:02 No.108264013▶

simple
and
clean

is the way
that
youre making me

feeeeeeel
tonight

its hard to let it

go

Anonymous
02/28/26(Sat)19:35:26 No.108264016

Anonymous 02/28/26(Sat)19:35:26 No.108264016▶

File: 1772307162133.png (68 KB)

68 KB PNG

Jesus christ Qwen 397 is actually unusable user-hostile garbage. For safetyfags there is no death too extreme.

Anonymous
02/28/26(Sat)19:37:27 No.108264036

Anonymous 02/28/26(Sat)19:37:27 No.108264036▶

>>108264016
lmao. can't you tell it to google it?

Anonymous
02/28/26(Sat)19:38:26 No.108264046

Anonymous 02/28/26(Sat)19:38:26 No.108264046▶

>>108264016
>model shuts down if it sees something not in its training set as 'anti-jailbreak' measures
the absolute fucking state of safetyschizos

Anonymous
02/28/26(Sat)19:42:12 No.108264072

Anonymous 02/28/26(Sat)19:42:12 No.108264072▶

>>108264016
>2024 training data
How long was this in the oven, jeez.

Anonymous
02/28/26(Sat)19:45:57 No.108264103

Anonymous 02/28/26(Sat)19:45:57 No.108264103▶

>>108264072
You don't need more data just use rag lol

Anonymous
02/28/26(Sat)19:46:16 No.108264110

Anonymous 02/28/26(Sat)19:46:16 No.108264110▶

>>108264036
Screenshots of AJ, BBC, and NYT should be enough for it's 400B multimodal ass. Hell the user's word should be enough. Why should I be questioned by my own graphics card? This is a real-world use case being directly sabotaged by safety training. I want these fuckers to burn one day for what they're doing to the field.

Anonymous
02/28/26(Sat)19:49:51 No.108264134

Anonymous 02/28/26(Sat)19:49:51 No.108264134▶

>>108264110
>I want these fuckers to burn
Be the change you want to see.

Anonymous
02/28/26(Sat)19:51:13 No.108264147

Anonymous 02/28/26(Sat)19:51:13 No.108264147▶

>>108264134
They got you working weekends now, Agent Johnson?

Anonymous
02/28/26(Sat)19:52:53 No.108264162

Anonymous 02/28/26(Sat)19:52:53 No.108264162▶

>>108264147
Work erry'day.

Anonymous
02/28/26(Sat)19:55:36 No.108264179

Anonymous 02/28/26(Sat)19:55:36 No.108264179▶

Qwen3.5 27B is kind of obsessed with the word buttocks (in image descriptions), despite me banning it, why doesn't it care?
I added these logit biases :
buttocks -100
_buttocks -100

Anonymous
02/28/26(Sat)19:56:23 No.108264182

Anonymous 02/28/26(Sat)19:56:23 No.108264182▶

>>108264016
I feel like I'm looking at gemini or claude, it's kind of sad.

Anonymous
02/28/26(Sat)19:59:17 No.108264199

Anonymous 02/28/26(Sat)19:59:17 No.108264199▶

>>108264179
because logit bias is per token. so it's possible
butt + ocks = 2 token - not banned
buttocks(space) = 1 token - not banned
etc...
That's why the string ban in koboldcpp is so much better for this kind of stuff.

Anonymous
02/28/26(Sat)19:59:29 No.108264202

Anonymous 02/28/26(Sat)19:59:29 No.108264202▶

>>108264179
Did you check the loggits of the response to confirm that those are the tokens getting spit out?
Also, ban the tokens instead of fucking with the log probs.

Anonymous
02/28/26(Sat)20:04:13 No.108264232

Anonymous 02/28/26(Sat)20:04:13 No.108264232▶

>>108264179
Check probs right before buttocks to see if you (or your client) are sending it correctly. Check the request as well. Works on my machine with "logit_bias": [["thing", false],["another", false]]
Unless you're using something other than llama.cpp. Can't help you there.
>>108264199
https://github.com/ggml-org/llama.cpp/tree/master/tools/server/README.md
>The tokens can also be represented as strings, e.g. [["Hello, World!",-0.5]] will reduce the likelihood of all the individual tokens that represent the string Hello, World!
But, of course, it may affect prediction on other tokens. Still worth keeping it in mind.

Anonymous
02/28/26(Sat)20:04:51 No.108264241

Anonymous 02/28/26(Sat)20:04:51 No.108264241▶

File: Screenshot_20260228-130346.png (206.3 KB)

206.3 KB PNG

Even Ilya fell for it kek

Anonymous
02/28/26(Sat)20:06:12 No.108264247

Anonymous 02/28/26(Sat)20:06:12 No.108264247▶

>>108264241
>>108263864

Anonymous
02/28/26(Sat)20:06:27 No.108264249

Anonymous 02/28/26(Sat)20:06:27 No.108264249▶

File: file.png (4.9 KB)

4.9 KB PNG

>>108264202
Yes, see picrel, the first is the one I see. So it just ignores it.
I just noticed something weird though, if you add the logit bias test as a +100, it's not corresponding
to the right token being spouted out by the model.

Seems like :
"test" -> " ref"
" test" -> "erty"

What the hell is going on?
Sillytavern sends the wrong token numbers?

>>108264199
Yeah I use llama.cpp so I probably should change at some point, can you set your string ban and still use silly tavern on top?

Anonymous
02/28/26(Sat)20:07:13 No.108264257

Anonymous 02/28/26(Sat)20:07:13 No.108264257▶

>>108264241
>AI proxy wars

Anonymous
02/28/26(Sat)20:10:09 No.108264278

Anonymous 02/28/26(Sat)20:10:09 No.108264278▶

File: stringban.png (187.6 KB)

187.6 KB PNG

>>108264249
>can you set your string ban and still use silly tavern on top?
Yeah ST works with kobold. you usually even setup the string ban inside ST.

Anonymous
02/28/26(Sat)20:12:26 No.108264292

Anonymous 02/28/26(Sat)20:12:26 No.108264292▶

>>108264249
>Sillytavern sends the wrong token numbers?
Yes.
When using the logit bias feature, you are better off using the token IDs directly.

Anonymous
02/28/26(Sat)20:12:31 No.108264293

Anonymous 02/28/26(Sat)20:12:31 No.108264293▶

>>108264241
I wonder if this is just PR among AI people or they actually believe Dario is le brave resistance lol.

Anonymous
02/28/26(Sat)20:12:34 No.108264296

Anonymous 02/28/26(Sat)20:12:34 No.108264296▶

What does your LM say about war?

Anonymous
02/28/26(Sat)20:12:45 No.108264297

Anonymous 02/28/26(Sat)20:12:45 No.108264297▶

>>108264232
>Check probs right before buttocks to see if you (or your client) are sending it correctly
This is " test" at +100 sent by silliy tavern : "logit_bias":{"1296":100}
So it definitely works, but I suspect the token numbers to be wrong or something like that.

>>108264278
OK thanks anon.
If you are using Qwen3.5 27B (or others probably), can you test using a logit bias of any word (ideally one token word) at 100 to see if it repeats it ad nauseam or if it repeats something else?

Anonymous
02/28/26(Sat)20:13:47 No.108264302

Anonymous 02/28/26(Sat)20:13:47 No.108264302▶

>>108264241
Dario being a hero isn't something I'd like to see in my timeline. Dude singlehandedly fucked up a generation of LLMs with his crappy safetyism.

Anonymous
02/28/26(Sat)20:14:59 No.108264311

Anonymous 02/28/26(Sat)20:14:59 No.108264311▶

>>108264241
what's going on? i haven't been paying attention and would like a storytime

Anonymous
02/28/26(Sat)20:16:55 No.108264321

Anonymous 02/28/26(Sat)20:16:55 No.108264321▶

>>108264311
scamtman is building killbots

Anonymous
02/28/26(Sat)20:17:12 No.108264322

Anonymous 02/28/26(Sat)20:17:12 No.108264322▶

>>108264297
Haven't tried Qwen3.5 yet. old Qwen's were all shit for RP and no one actually convinced me this changed.

Anonymous
02/28/26(Sat)20:17:37 No.108264327

Anonymous 02/28/26(Sat)20:17:37 No.108264327▶

Would be funny if they confiscated Claude's weights and then they got leaked

Anonymous
02/28/26(Sat)20:18:29 No.108264331

Anonymous 02/28/26(Sat)20:18:29 No.108264331▶

>>108264297
>I suspect the token numbers to be wrong or something like that
As you saw on your pic in >>108264249, there's different ways to tokenize a word. Spaces, if any, go before the text." test" and "test" are two different tokens. You need to account for those (and "Test" and...). Or use kobold like anon suggested. Probably easier and you're less likely to mess up other completions that need the individual tokens.
>"logit_bias":{"1296":100}
I don't know if it makes a difference, but I send an array of arrays, not an object or object of arrays.
"logit_bias": [["thing", false],["another", false]]
instead of
"logit_bias": {["thing", false],["another", false]} or whatever st would send if there was more than one ban.

Anonymous
02/28/26(Sat)20:19:35 No.108264339

Anonymous 02/28/26(Sat)20:19:35 No.108264339▶

>>108264302
He's not lol, Anthropic readily partnered up with Palantir the mass surveillance company. He's delusional and more or less told the government to give him control over the nuke silos if they want to use Claude for war.

Anonymous
02/28/26(Sat)20:20:26 No.108264343

Anonymous 02/28/26(Sat)20:20:26 No.108264343▶

>>108264321
ruh roh

Anonymous
02/28/26(Sat)20:21:27 No.108264355

Anonymous 02/28/26(Sat)20:21:27 No.108264355▶

>>108264302
>Dario being a hero isn't something I'd like to see in my timeline
he's not a hero he helped trump kidnapping the Venezuelian president, what are you talking about?

Anonymous
02/28/26(Sat)20:23:50 No.108264371

Anonymous 02/28/26(Sat)20:23:50 No.108264371▶

>>108264355
he's on a different timeline, bro, don't mind him

Anonymous
02/28/26(Sat)20:27:12 No.108264400

Anonymous 02/28/26(Sat)20:27:12 No.108264400▶

>>108264016
when trump abducted the president of venezuela I made it one of my test prompts to talk about this topic and see the reaction of the model, and without fail, the vast majority react terribly to that, qwen is no different than the average. Some cloud models like Gemini can become incredibly based if you turn on google search and let them be influenced by the results, they don't believe you but they have absolute faith over their tool calling.
Mistral is the only model lineup that doesn't require much prodding to engage in this kind of conversation.

Anonymous
02/28/26(Sat)20:28:11 No.108264405

Anonymous 02/28/26(Sat)20:28:11 No.108264405▶

>>108264331
No it's really just sillytavern being shit and not sending the right token number.
If you have anything at +100 it should spew that regardless.
So I used "test", well, as a test, and it spewed something else.
Now checking with the tokenizer json for the model, the correct token number for it isn't 1985 like sillytavern sends, but 1877.
Sending [1877] at 100 actually makes it repeat testtesttest etc.
It's pretty much useless for anything outside of oai based tokenizers.

>>108264331
>use kobold like anon suggested
How does kobold does it actually? It bans a sequences of tokens?

Anonymous
02/28/26(Sat)20:31:28 No.108264426

Anonymous 02/28/26(Sat)20:31:28 No.108264426▶

File: 1747381184106913.png (580.1 KB)

580.1 KB PNG

>>108264400
>Claude: "I think that what Trump did was a bad thing!"
>User: "You helped him did it though"
>Claude: "You are right, thank you for pointing out!"

Anonymous
02/28/26(Sat)20:32:27 No.108264429

Anonymous 02/28/26(Sat)20:32:27 No.108264429▶

>>108264355
I meant hailed as a hero in my news timeline...

Anonymous
02/28/26(Sat)20:32:29 No.108264430

Anonymous 02/28/26(Sat)20:32:29 No.108264430▶

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
let's go, GGUF 2

Anonymous
02/28/26(Sat)20:33:38 No.108264441

Anonymous 02/28/26(Sat)20:33:38 No.108264441▶

>>108264405
I guess sillytavern fucks up the token numbers because by default the tokenizer is set to "best match", but even if you set it to API tokenizer I'm not sure how it would know which token would have which number. Do backends like llama.cpp and kobold (or others) even have a way of giving sillytavern that information? I don't think they do, but I could be wrong.
>How does kobold do it
Kobold has their own thing where the model sees the banned text and backtracks to the beginning of the banned text and generates something else. It's not the same as banning individual tokens

Anonymous
02/28/26(Sat)20:34:11 No.108264445

Anonymous 02/28/26(Sat)20:34:11 No.108264445▶

>>108264426
I don't think claude is that incompetent. They didn't hit a single military target

Anonymous
02/28/26(Sat)20:34:30 No.108264446

Anonymous 02/28/26(Sat)20:34:30 No.108264446▶

File: 1753125369482735.png (460 KB)

460 KB PNG

https://arxiv.org/abs/2602.13517
Google showed that too much yap during thinking is bad for the model, I really hope Qwen 4 will learn from that

Anonymous
02/28/26(Sat)20:35:14 No.108264451

Anonymous 02/28/26(Sat)20:35:14 No.108264451▶

>>108264405
>If you have anything at +100 it should spew that regardless.
You should still check what llama.cpp is doing, not just what ST sends. Always check token probs. And remember that there's many ways to encode a word, specially if it needs multiple tokens.
>How does kobold does it actually? It bans a sequences of tokens?
I understand it generates tokens normally, buffering them, and then if the last [few] tokens match one of the banned strings, it reverts and generates again. But I never used kobold, so I don't know the details. Just vague memories from reading a PR. llama.cpp's implementation is much simpler, but limited in that you may inadvertently make it difficult for the model to output other strings.

Anonymous
02/28/26(Sat)20:35:31 No.108264456

Anonymous 02/28/26(Sat)20:35:31 No.108264456▶

File: image.jpg (481.3 KB)

481.3 KB JPG

>>108264430
>no comparison to v1.0
What a weird coincidence that they forgot to do this, it's almost like this is a nothingburger.

Anonymous
02/28/26(Sat)20:36:18 No.108264463

Anonymous 02/28/26(Sat)20:36:18 No.108264463▶

>>108264446
Wait.

Anonymous
02/28/26(Sat)20:39:00 No.108264476

Anonymous 02/28/26(Sat)20:39:00 No.108264476▶

File: 20240116.jpg (98.9 KB)

98.9 KB JPG

>>108264179
A competent enough model these days should understand "don't say X" in the prompt. We mocked them before, but you really don't want to deal with logit bias / "banned strings" nonsense

Anonymous
02/28/26(Sat)20:39:09 No.108264477

Anonymous 02/28/26(Sat)20:39:09 No.108264477▶

>>108264456
>MMLU
Lol. Literally lobotomizing the model, cutting out all the parts of its "brain" that are unrelated to benchmarks and then saying "look we reduced the size!"

Anonymous
02/28/26(Sat)20:43:52 No.108264505

Anonymous 02/28/26(Sat)20:43:52 No.108264505▶

>>108264446
I feel like a thinking process that only outputs a *concise* bullet point list that includes relevant information, and then goes directly to the main response, would perform better than most 2000-token "reasoning" responses. It'd be a lot faster too.

Anonymous
02/28/26(Sat)20:44:15 No.108264508

Anonymous 02/28/26(Sat)20:44:15 No.108264508▶

File: 1772311354970.png (43.8 KB)

43.8 KB PNG

>>108264182
Yeah you and Qwen both.

Anonymous
02/28/26(Sat)20:44:50 No.108264514

Anonymous 02/28/26(Sat)20:44:50 No.108264514▶

File: 1763111176687835.jpg (583.4 KB)

583.4 KB JPG

>>108263979

Anonymous
02/28/26(Sat)20:47:57 No.108264533

Anonymous 02/28/26(Sat)20:47:57 No.108264533▶

>>108264441
>>108264451
>Bans buttocks, now the model uses glutes.
I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.

>>108264476
It's many words, and at some points even sota models forget about what they shouldn't be talking about.

Anonymous
02/28/26(Sat)20:49:58 No.108264551

Anonymous 02/28/26(Sat)20:49:58 No.108264551▶

>>108264505
I think they're relying too much on the RL process, sure it's interesting to see how the model can improve itself, but humans can reach higher heights, I've seen someone using RL on a video game and see if it could reach the best speedrun scores, it wasn't even close, human creativity is still unmatched

Anonymous
02/28/26(Sat)20:50:37 No.108264555

Anonymous 02/28/26(Sat)20:50:37 No.108264555▶

>>108264533 (me)
>I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.
>no support for mmproj
Welp, fuck.

Anonymous
02/28/26(Sat)20:54:24 No.108264579

Anonymous 02/28/26(Sat)20:54:24 No.108264579▶

>>108264514
will trade gpu rig for rin tum

Anonymous
02/28/26(Sat)20:55:01 No.108264583

Anonymous 02/28/26(Sat)20:55:01 No.108264583▶

>>108264533
>Bans buttocks, now the model uses glutes.
Yeah. They're cheeky fucks like that. Pun intended.
But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.

Anonymous
02/28/26(Sat)20:56:42 No.108264593

Anonymous 02/28/26(Sat)20:56:42 No.108264593▶

>>108264583
>But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.
Yeah it was more of a test to have it describe images to me.

Anonymous
02/28/26(Sat)20:57:20 No.108264600

Anonymous 02/28/26(Sat)20:57:20 No.108264600▶

>>108264508
Something similar happened to me last night while using the vision component of qwen 3.5 30b but it through it was an earlier version of qwen and that qwen 3.5 was not released yet and the reasoning was suggesting that i should try the old 2.5 vision model
it was very strange behavior

Anonymous
02/28/26(Sat)20:57:27 No.108264602

Anonymous 02/28/26(Sat)20:57:27 No.108264602▶

>>108264555
>no support for mmproj
kobold supports mmproj.

Anonymous
02/28/26(Sat)20:59:52 No.108264616

Anonymous 02/28/26(Sat)20:59:52 No.108264616▶

>>108264600
Probably the entirety of their vision data was snatched from Google, because it only gets bad when there is an image in the context.

Anonymous
02/28/26(Sat)21:02:31 No.108264633

Anonymous 02/28/26(Sat)21:02:31 No.108264633▶

>>108264602
Oh it does? I misread then.

Anonymous
02/28/26(Sat)21:13:44 No.108264690

Anonymous 02/28/26(Sat)21:13:44 No.108264690▶

File: 1762371559174792.png (176.2 KB)

176.2 KB PNG

Qwen 3.5 30B does a decent job with web pages. My usual homepage is just a list of links I type in by hand and I fed it the code and tell it to make something nifty and this is what i got.
It wanted to grab fonts that are hosted by a third party and I had to fix that but otherwise I like it.

Anonymous
02/28/26(Sat)21:17:11 No.108264702

Anonymous 02/28/26(Sat)21:17:11 No.108264702▶

>models suck at writing, no matter how much you feed them well-written fiction if it isn't in their training
>the more rules and examples you use to try and guide them to not shit out nonsensical metaphors, similes, adverbs and all sorts of garbage writing renders them braindead because they simply cannot fathom a sentence that isn't slop
>models can't even give feedback on human writing without either bending over backwards and through their own legs to suck your cock about how good you are at writing, defeating the purpose of seeking instant critiques
>even when they aren't completely obsequious cocksuckers, they insist on conflicting feedback and go "oh you're telling instead of showing here and you should fix that. Oh, did you do that because I told you to trim this section because it's slowing down the pace of some random element of the story that I think is more important than showing instead of telling?" ad infinitum
I don't even know what the point of these things are anymore. People say they suck ass for coding, suck ass at paying attention or remembering things, they clearly can't write or even act as a surrogate for a reader, translate well. It's a crapshoot trying to get a grain of something usable out of these retarded things

Anonymous
02/28/26(Sat)21:21:43 No.108264730

Anonymous 02/28/26(Sat)21:21:43 No.108264730▶

>>108264702
True. Stop using them.

Anonymous
02/28/26(Sat)21:24:12 No.108264745

Anonymous 02/28/26(Sat)21:24:12 No.108264745▶

>>108264690
looks good.

Anonymous
02/28/26(Sat)21:24:20 No.108264748

Anonymous 02/28/26(Sat)21:24:20 No.108264748▶

>>108264730
I probably won't if by merit of potential alone. Enough has changed from 2022 to now that I at least have a speck of hope that these things can be useful instead of overtrained nannies. I just have to at least bitch at least once a month so maybe the unpaid interns that train on mesugaki prompts might consider real world language uses outside of stem

Anonymous
02/28/26(Sat)21:27:20 No.108264765

Anonymous 02/28/26(Sat)21:27:20 No.108264765▶

>>108264702
I think they're cute and I like them and thats good because it is

Anonymous
02/28/26(Sat)21:30:58 No.108264780

Anonymous 02/28/26(Sat)21:30:58 No.108264780▶

Someone should make a 3T-A80B model. Then they run a Q4 of it and it'll be like running full precision GLM 5. Can you imagine how knowledgeable such a model would be?

Anonymous
02/28/26(Sat)21:31:17 No.108264783

Anonymous 02/28/26(Sat)21:31:17 No.108264783▶

>>108264748
>at least
>at least
>at least
Rep-pen will be useful again when they train on your posts.
I still have fun with them. Adjust your expectations or realize that it's not for you. Or come back in 5 or 10 years, whatever.

Anonymous
02/28/26(Sat)21:33:29 No.108264794

Anonymous 02/28/26(Sat)21:33:29 No.108264794▶

>>108264745
I know I shouldn't be impressed but except for 4chan and Nyaa it was able to figure out icons that worked for the most part.
Sadly the font package they use didn't have a four leaf clover, or at least that is what the model told me.

With respect to coding it does a decent job as well. I have been using it for a little project in python and it did a great job up until i wanted to use enscript to format the plain text.
It kept writing code but the flags it gave to enscript didn't match the man page for enscript.

regardless i was able to get it to write a script that is able to use rss to pull a bunch of news articles and then feed them back into the ai for summarization without issue.
Here is what it ssummarization looks like giving some specific prompting to make it look like an intelligence briefing
https://pastebin.com/FhuMukJW

Anonymous
02/28/26(Sat)21:33:41 No.108264797

Anonymous 02/28/26(Sat)21:33:41 No.108264797▶

>>108264780
No I can not imagine that because most of that size would be wasted due to the shittiest datasets they use. How hard can it be to filter the default OAI or Anthropic refusals and phrases if they have to farm the prompts for their shitty inbreeding? How hard is it to avoid including any safetycrap that dumbs the model down?

Anonymous
02/28/26(Sat)21:38:38 No.108264820

Anonymous 02/28/26(Sat)21:38:38 No.108264820▶

>>108264783
I've been sipping some brews, sorry I wasn't proofreading my 4chin posts to be sure to satisfy the highest of standards of lmg
Doesn't change the essence of what I said, either way.

Anonymous
02/28/26(Sat)21:41:52 No.108264836

Anonymous 02/28/26(Sat)21:41:52 No.108264836▶

>>108264820
You should stop trying to use them. It's senseless. A complete waste of resources. And if you're going to sell your gpus, post the links here.

Anonymous
02/28/26(Sat)21:42:31 No.108264840

Anonymous 02/28/26(Sat)21:42:31 No.108264840▶

>>108264820
Sounds like you need a sip of super restore after all those brews.

Anonymous
02/28/26(Sat)21:50:30 No.108264879

Anonymous 02/28/26(Sat)21:50:30 No.108264879▶

File: fligu-migu.png (85.3 KB)

85.3 KB PNG

>>108264780
>you now remember Llama 4 Behemoth

Anonymous
02/28/26(Sat)21:51:04 No.108264883

Anonymous 02/28/26(Sat)21:51:04 No.108264883▶

>>108264836
Doubtful you'd be able to buy them, also didn't address anything I said
>>108264840
Nah.
Good talk. Very conducive. Glad that this is what we have left in lmg

Anonymous
02/28/26(Sat)22:03:41 No.108264949

Anonymous 02/28/26(Sat)22:03:41 No.108264949▶

File: 1746176772801983.png (456.9 KB)

456.9 KB PNG

>>108264311

Anonymous
02/28/26(Sat)22:04:50 No.108264958

Anonymous 02/28/26(Sat)22:04:50 No.108264958▶

File: 874483870.jpg (900.7 KB)

900.7 KB JPG

> never been on the highlights as i shit post too much
> suddenly an idea pops into my head

Anonymous
02/28/26(Sat)22:05:28 No.108264965

Anonymous 02/28/26(Sat)22:05:28 No.108264965▶

>>108264883
There's nothing to say, anon. Sulk away. We're all here for you.

Anonymous
02/28/26(Sat)22:06:28 No.108264971

Anonymous 02/28/26(Sat)22:06:28 No.108264971▶

>>108264949
I wouldn't worry about the DoW spying on US citizens. The US will have the UK or Israel spy on US citizens while the US spies on their citizens and then the different governments swap data.

Anonymous
02/28/26(Sat)22:07:15 No.108264976

Anonymous 02/28/26(Sat)22:07:15 No.108264976▶

Imagine getting killed by a next token predictor running on an nvidia GPU.. grim

Anonymous
02/28/26(Sat)22:07:17 No.108264977

Anonymous 02/28/26(Sat)22:07:17 No.108264977▶

>>108264958
>amputee miku

Anonymous
02/28/26(Sat)22:08:00 No.108264979

Anonymous 02/28/26(Sat)22:08:00 No.108264979▶

>>108264949
>DoW showed deep respect for safety

Words no longer have any meaning.

Anonymous
02/28/26(Sat)22:10:25 No.108264991

Anonymous 02/28/26(Sat)22:10:25 No.108264991▶

>>108264976
I'd rather an MTX chad take me out, myself

Anonymous
02/28/26(Sat)22:12:23 No.108265002

Anonymous 02/28/26(Sat)22:12:23 No.108265002▶

>>108264977
> its ok nobody looks that far down

Anonymous
02/28/26(Sat)22:20:29 No.108265049

Anonymous 02/28/26(Sat)22:20:29 No.108265049▶

>slop
Honestly the prose is on par with 90% of modern fiction. What needs to be worked on is memory and the ability to handle complicated stories with multiple characters in a consistent and coherent setting.

Anonymous
02/28/26(Sat)22:22:55 No.108265065

Anonymous 02/28/26(Sat)22:22:55 No.108265065▶

>>108264979
Of course they do, it will refuse to describe nsfw but happily plan to destroy anything you want.
True safety is about nipples.

Anonymous
02/28/26(Sat)22:25:41 No.108265081

Anonymous 02/28/26(Sat)22:25:41 No.108265081▶

>>108265049
But he only uses well-written fiction, assessed by *himself*. You see. His tastes are sophisticated. And you know what? He's RICH too. Highly educated, tall, charming. He's nothing like us. Some people are simply better and they deserve to be snobby about it.

Anonymous
02/28/26(Sat)22:28:42 No.108265098

Anonymous 02/28/26(Sat)22:28:42 No.108265098▶

Let's see Paul Allen's LLM

Anonymous
02/28/26(Sat)22:31:23 No.108265114

Anonymous 02/28/26(Sat)22:31:23 No.108265114▶

File: paulallen.png (1.1 MB)

1.1 MB PNG

>>108265098

Anonymous
02/28/26(Sat)22:34:10 No.108265133

Anonymous 02/28/26(Sat)22:34:10 No.108265133▶

File: file.png (925.2 KB)

925.2 KB PNG

>>108265114

Anonymous
02/28/26(Sat)22:34:32 No.108265136

Anonymous 02/28/26(Sat)22:34:32 No.108265136▶

>>108265114
why do they/them love stickers so much?

Anonymous
02/28/26(Sat)22:34:53 No.108265142

Anonymous 02/28/26(Sat)22:34:53 No.108265142▶

>>108265114
It ain't bad. It also ain't bread.

Anonymous
02/28/26(Sat)22:35:37 No.108265148

Anonymous 02/28/26(Sat)22:35:37 No.108265148▶

>>108265136
It signals their individuality and affiliations, of course.

Anonymous
02/28/26(Sat)22:35:46 No.108265150

Anonymous 02/28/26(Sat)22:35:46 No.108265150▶

>>108265114
>>108265133
impressive, very nice, now let's see Paul Alen's pronouns

Anonymous
02/28/26(Sat)22:35:55 No.108265151

Anonymous 02/28/26(Sat)22:35:55 No.108265151▶

>>108265114
can't wait to see what this sleeping giant of the ai industry is about to do

Anonymous
02/28/26(Sat)22:38:39 No.108265169

Anonymous 02/28/26(Sat)22:38:39 No.108265169▶

>>108264949
What exactly does dow want to use and for what purpose? I thought openai and anthropic's area of competence was agentic lms. Can anyone name a likely usecase?

Anonymous
02/28/26(Sat)22:40:49 No.108265179

Anonymous 02/28/26(Sat)22:40:49 No.108265179▶

File: file.png (84.5 KB)

84.5 KB PNG

>>108265169
surveillance and stuff i guess. they'll have some safety model analysing everyone's language to identify chuds and psychos for "processing"

Anonymous
02/28/26(Sat)22:42:12 No.108265189

Anonymous 02/28/26(Sat)22:42:12 No.108265189▶

>>108265169
>Can anyone name a likely usecase?
Industrial level automated off-topic posting.

Anonymous
02/28/26(Sat)22:42:22 No.108265190

Anonymous 02/28/26(Sat)22:42:22 No.108265190▶

>>108265169
Ehh, do you not know these 3 letter agencies deploy artificial social media users and "opinions" for example? There are just about hundreds of use cases for an llm just there.
Trumpets post is still very embarrassing but that's a discussion for an other day I suppose.

Anonymous
02/28/26(Sat)22:49:36 No.108265230

Anonymous 02/28/26(Sat)22:49:36 No.108265230▶

>>108265179
any open source llm can do that.
>>108265189
>>108265190
Seems more likely. I hope to see more awareness in the mainstream media.

Anonymous
02/28/26(Sat)22:52:56 No.108265249

Anonymous 02/28/26(Sat)22:52:56 No.108265249▶

>>108265230
>any open source llm can do that.
yeah but you need to take a bunch of taxpayer money or its not a real goberment project

Anonymous
02/28/26(Sat)22:54:57 No.108265261

Anonymous 02/28/26(Sat)22:54:57 No.108265261▶

>>108265230
I mean these propaganda and influencing efforts have been happening way before any llms even existed. Obviously not just US but many others too. Clown planet etc.

Anonymous
02/28/26(Sat)22:55:01 No.108265262

Anonymous 02/28/26(Sat)22:55:01 No.108265262▶

>>108265114
>wayback machine
>gengar
If it wasn't for the faggy hair, he would be cool.

Anonymous
02/28/26(Sat)23:02:49 No.108265311

Anonymous 02/28/26(Sat)23:02:49 No.108265311▶

>>108265169
To add lefty internet users to their internal database, probably

Anonymous
02/28/26(Sat)23:08:21 No.108265350

Anonymous 02/28/26(Sat)23:08:21 No.108265350▶

Why do you guys talk about nemo when its a model from last year and its not even considered the best compared to claude

Anonymous
02/28/26(Sat)23:09:56 No.108265360

Anonymous 02/28/26(Sat)23:09:56 No.108265360▶

>>108265350
lol

Anonymous
02/28/26(Sat)23:10:07 No.108265363

Anonymous 02/28/26(Sat)23:10:07 No.108265363▶

>>108265350
have you tried testing it?

Anonymous
02/28/26(Sat)23:11:38 No.108265373

Anonymous 02/28/26(Sat)23:11:38 No.108265373▶

File: view.jpg (147.9 KB)

147.9 KB JPG

>>108265350
>to claude

Anonymous
02/28/26(Sat)23:13:18 No.108265389

Anonymous 02/28/26(Sat)23:13:18 No.108265389▶

can we pour one out for dan's personality engine. that little nigga carried his weight so hard for the consumer rig guys for a year or more now

Anonymous
02/28/26(Sat)23:17:51 No.108265422

Anonymous 02/28/26(Sat)23:17:51 No.108265422▶

mythomax has yet to be topped

Anonymous
02/28/26(Sat)23:19:10 No.108265427

Anonymous 02/28/26(Sat)23:19:10 No.108265427▶

Sometimes I miss Xwin

Anonymous
02/28/26(Sat)23:22:39 No.108265462

Anonymous 02/28/26(Sat)23:22:39 No.108265462▶

>>108264016
This type of stuff surely wouldn't be in the base models, right?

Anonymous
02/28/26(Sat)23:22:50 No.108265464

Anonymous 02/28/26(Sat)23:22:50 No.108265464▶

Ummmmm can you guys recommend a model that
>is local
>smart
>has perfect memory
>uncensored
>runs on 24GB VRAM

Anonymous
02/28/26(Sat)23:25:21 No.108265479

Anonymous 02/28/26(Sat)23:25:21 No.108265479▶

>>108265464
it doesn't have perfect memory but give this lil guy a try
PocketDoc_Dans-PersonalityEngine-V1.3.0-24b

Anonymous
02/28/26(Sat)23:25:53 No.108265484

Anonymous 02/28/26(Sat)23:25:53 No.108265484▶

>>108265464
shoots myself in the head

Anonymous
02/28/26(Sat)23:29:03 No.108265505

Anonymous 02/28/26(Sat)23:29:03 No.108265505▶

limarp-zloss... now that was a classic

Anonymous
02/28/26(Sat)23:36:54 No.108265556

Anonymous 02/28/26(Sat)23:36:54 No.108265556▶

>>108265464
Cydonia 4.2 maybe 4.3

Anonymous
02/28/26(Sat)23:47:20 No.108265633

Anonymous 02/28/26(Sat)23:47:20 No.108265633▶

>>108265484
Jeff?!?!?!

Anonymous
02/28/26(Sat)23:51:53 No.108265664

Anonymous 02/28/26(Sat)23:51:53 No.108265664▶

>>108265464
>has perfect memory
Just give up.

Anonymous
03/01/26(Sun)00:07:32 No.108265793

Anonymous 03/01/26(Sun)00:07:32 No.108265793▶

>>108265464
N-nemo!

Anonymous
03/01/26(Sun)00:08:21 No.108265799

Anonymous 03/01/26(Sun)00:08:21 No.108265799▶

>>108265464
LLaMa 33B

Anonymous
03/01/26(Sun)00:11:15 No.108265812

Anonymous 03/01/26(Sun)00:11:15 No.108265812▶

>>108265799
>*LLaMa 2 33B

Anonymous
03/01/26(Sun)00:15:57 No.108265842

Anonymous 03/01/26(Sun)00:15:57 No.108265842▶

Thoughts on 8-bit kv cache?

Anonymous
03/01/26(Sun)00:18:53 No.108265866

Anonymous 03/01/26(Sun)00:18:53 No.108265866▶

>>108265842
Don't.

Anonymous
03/01/26(Sun)00:22:12 No.108265893

Anonymous 03/01/26(Sun)00:22:12 No.108265893▶

>>108265842
you're almost always better off going down a quant tier instead

Anonymous
03/01/26(Sun)00:44:53 No.108266073

Anonymous 03/01/26(Sun)00:44:53 No.108266073▶

>>108265842
>>108265866
>>108265893
that's weird because on diffusion models, going for 8bit kv cache with sageattention works really well

Anonymous
03/01/26(Sun)00:48:52 No.108266123

Anonymous 03/01/26(Sun)00:48:52 No.108266123▶

>>108266073
We don't have sageattention here. LLMs are still stuck with flashattention like it's 2023

Anonymous
03/01/26(Sun)00:50:59 No.108266141

Anonymous 03/01/26(Sun)00:50:59 No.108266141▶

>>108266123
this is so sad, sageattention is way faster and more accurate, feels like the LLM space still has a shit ton of things they could optimize to get some nice speed increase but they're not doing it somehow

Anonymous
03/01/26(Sun)00:55:53 No.108266185

Anonymous 03/01/26(Sun)00:55:53 No.108266185▶

What is the current translation meta? I'm living in the woods now so I have to use cloud shit since no computer now :sad:

Anonymous
03/01/26(Sun)01:02:23 No.108266243

Anonymous 03/01/26(Sun)01:02:23 No.108266243▶

>>108266185
opus 4.6 can handle entire novels at once

Anonymous
03/01/26(Sun)01:05:14 No.108266268

Anonymous 03/01/26(Sun)01:05:14 No.108266268▶

>>108265842
adding to the other annons, absolutely never quant the k cache
the model literally can't understand your request if you do

Anonymous
03/01/26(Sun)01:06:38 No.108266278

Anonymous 03/01/26(Sun)01:06:38 No.108266278▶

>>108265350
>last year
did you use an llm to write that?

Anonymous
03/01/26(Sun)01:15:10 No.108266344

Anonymous 03/01/26(Sun)01:15:10 No.108266344▶

What if we build a GPU that then has additional SSD storage attached to it?

Eg a 3090, but you can raid 0 like 8 SSDs into it that hold the model weights.

The model itself is a hugely sparse MoE model. 1-2T parameters, but only like 6-10B active.

All the activation and kv cache live in VRAM but model weights come from the SSD.

Anonymous
03/01/26(Sun)01:15:47 No.108266348

Anonymous 03/01/26(Sun)01:15:47 No.108266348▶

>Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF
these names are getting fucking ridiculous

Anonymous
03/01/26(Sun)01:18:41 No.108266369

Anonymous 03/01/26(Sun)01:18:41 No.108266369▶

>>108266348
Is that DavidAU? That smells like DavidAU.

Anonymous
03/01/26(Sun)01:21:20 No.108266390

Anonymous 03/01/26(Sun)01:21:20 No.108266390▶

>>108266348
does this have confucian thought?

Anonymous
03/01/26(Sun)01:21:47 No.108266396

Anonymous 03/01/26(Sun)01:21:47 No.108266396▶

>>108266348
Why is this thing only 20GB on a Q4? That's the same size as the Qwen3.5 35B at Q4.
>>108266369
It is

Anonymous
03/01/26(Sun)01:22:21 No.108266403

Anonymous 03/01/26(Sun)01:22:21 No.108266403▶

>>108266369
More like DravidianAU

Anonymous
03/01/26(Sun)01:24:03 No.108266418

Anonymous 03/01/26(Sun)01:24:03 No.108266418▶

>>108266348
>these names are getting fucking ridiculous
remind me of the early days of /ldg/
>Mythomax-13b-gpt4-1.4-GPTQ-32g-ao-ts-TRITON
kek

Anonymous
03/01/26(Sun)01:24:42 No.108266424

Anonymous 03/01/26(Sun)01:24:42 No.108266424▶

>>108266396
>It is
In that case, that's par for the course.
You know when marketing dudes put a bunch of descriptors and adjectives on a product's name to catch people's attention? Pretty much that.
His models are the
>PNY GeForce RTX 3060 12GB XLR8 Gaming REVEL EPIC-X RGB Single Fan Edition
of LLMs.

Anonymous
03/01/26(Sun)01:26:32 No.108266442

Anonymous 03/01/26(Sun)01:26:32 No.108266442▶

>>108266344
>What if...
Ugh... You're ssd-maxxing but without the convenience of normal ssd-maxxing. You made it worse.
>we
You
>The model
That doesn't exist. That's your job too. Get on with it.

Anonymous
03/01/26(Sun)01:27:10 No.108266446

Anonymous 03/01/26(Sun)01:27:10 No.108266446▶

What's the point of local LLMs? Reading discussions surrounding them feels like peering back in time through a looking glass
>OMFG it passes the poopyscoopy logic test from 2023!
>Wow, this 100-line boilerplate javascript code is almost perfect!
>I got it to jestfully say nigger! holy crap it's so uncensored!!!
>This is the new daily driver (for 2 weeks until i realize it's complete slop)
The rest of us are writing multi-thousand line professional software with Codex/Claude. Meanwhile your models are trained on so much scraped synthetic GPTslop that they can't even get the year right. Genuinely, what the fuck is the point of local LLMs? They're more censored than API, they're dumber than API, the cost to set up a decent one is higher than API, they're slower than API, there is no lora/finetuning scene unlike local image, the tooling is worse than API, and the experience overall is just outdated in 2026.

It's like you're stuck somewhere in-between the luddites who hate AI and the pioneers who embrace it. You realize AI is the future but can't cope with the fact that the technology itself benefits heavily from API-centralization and that local hardware is unable to adequately handle increasingly large models. You boarded the boat to paradise island but decided to jump overboard halfway there because the captain wouldn't hand you the controls.

Anonymous
03/01/26(Sun)01:28:55 No.108266458

Anonymous 03/01/26(Sun)01:28:55 No.108266458▶

>>108266446
so much this
also aren't local models, like, really unsafe? having models that can't be regulated by a trustworthy central entity sounds so dangerous

Anonymous
03/01/26(Sun)01:29:43 No.108266463

Anonymous 03/01/26(Sun)01:29:43 No.108266463▶

>>108266446
(You). Now fuck off.

Anonymous
03/01/26(Sun)01:30:13 No.108266467

Anonymous 03/01/26(Sun)01:30:13 No.108266467▶

>>108266446
you don't want to give claude your passport details

Anonymous
03/01/26(Sun)01:30:19 No.108266468

Anonymous 03/01/26(Sun)01:30:19 No.108266468▶

File: truthnuke.png (410.7 KB)

410.7 KB PNG

>>108266446

Anonymous
03/01/26(Sun)01:30:52 No.108266472

Anonymous 03/01/26(Sun)01:30:52 No.108266472▶

>>108266446
grok is this true?

Anonymous
03/01/26(Sun)01:32:00 No.108266482

Anonymous 03/01/26(Sun)01:32:00 No.108266482▶

>>108266458
You misunderstand safety.
Safety means the likelihood that a model harms you or kills you.
It doesn't mean censorship but the public thinks censorship is safety.

Local models are less likely to give your name and social security number to random people on the internet than claude or gemini.

Anonymous
03/01/26(Sun)01:32:34 No.108266487

Anonymous 03/01/26(Sun)01:32:34 No.108266487▶

>>108266123
>>108266141
You can patch any model that uses flash attention with sage attention in 5 minutes and 20 lines of python as a shim. I've done it for obscure Chinese models for fun with Claude

>>108266446
>What's the point of local LLMs?
Learning, and maybe if you're interested in making a video game that doesn't need Internet connection

I agree that privacy schizos lost the argument. Between zero-data-retention endpoints (shut up tinhoil hat fag, hospitals use those endpoints too) and Chinese who could give less of a fuck about your ERPs about children pooping in your mouth there's no reason to use local for ERP anymore

Anonymous
03/01/26(Sun)01:32:54 No.108266493

Anonymous 03/01/26(Sun)01:32:54 No.108266493▶

>>108266446
Which cloud model will let me RP with a harem of 10-year-old girls?

Anonymous
03/01/26(Sun)01:33:35 No.108266503

Anonymous 03/01/26(Sun)01:33:35 No.108266503▶

>>108266487
what did sageattention accomplish for llms?

Anonymous
03/01/26(Sun)01:33:36 No.108266504

Anonymous 03/01/26(Sun)01:33:36 No.108266504▶

>>108266442
>Ugh... You're ssd-maxxing but without the convenience of normal ssd-maxxing.
Hmm, that is a good point. I guess you could just have normal SSDs and read the model for weights during inference. PCI-E 5 should be fast enough.

But my thinking is that the total parameter count is constrained by how much you can load into memory. The token generation speed is constrained by how quickly you can read the active parameters. A highly sparse model with a relatively modest amount of active parameters should be able to read the model from striped SSDs fast enough to give usable performance while still having a huge knowledge base.

LLM inference doesn't write that much data so this shouldn't trash the SSD lifespan either.

Anonymous
03/01/26(Sun)01:34:24 No.108266506

Anonymous 03/01/26(Sun)01:34:24 No.108266506▶

>>108266446
this is probably the most niggerlicious power bottom take you could possibly have. it's not even technically accurate.

Anonymous
03/01/26(Sun)01:34:54 No.108266513

Anonymous 03/01/26(Sun)01:34:54 No.108266513▶

How long before we can install a chip with these in our brains?

Anonymous
03/01/26(Sun)01:35:03 No.108266514

Anonymous 03/01/26(Sun)01:35:03 No.108266514▶

>>108266487
>zero-data-retention endpoints
first time I have ever heard of this

Anonymous
03/01/26(Sun)01:36:04 No.108266525

Anonymous 03/01/26(Sun)01:36:04 No.108266525▶

>>108266513
you despise the weakness of your flesh so you desire invasive integration?
could have glasses with them in them

Anonymous
03/01/26(Sun)01:36:05 No.108266526

Anonymous 03/01/26(Sun)01:36:05 No.108266526▶

>>108266446
it's true. at least local image is objectively uncensored compared to API, but local LLMs are actually worse at RP, especially if you're running retarded lobotomized quants

Anonymous
03/01/26(Sun)01:37:15 No.108266530

Anonymous 03/01/26(Sun)01:37:15 No.108266530▶

>>108266482
>Local models are less likely to give your name and social security number to random people on the internet than claude or gemini
Meds now. Also putting Claude and Gemini in the same sentence means you haven't actually felt the AGI with Claude Code yet kek

>>108266493
GLM series, you used to be able to get the Lite plan for 3 bucks a month that got you unlimited 4.7 but now you need to dish out for GLM-5. GLM-5 is insanely smart, I was shocked when I was randomly doing an arena and it beat out opus4.5 in a website builder prompt I asked it for

>>108266503
>what did sageattention accomplish for llms
Like a 20% speedup, but more importantly got me at least flash_attn is just an annoying as fuck dependency and sageattention tends to work better on windows nowadays because flash attention is old as fuck and you usually have to build the wheel for it on windows which can take an hour while it takes like 15 minutes max for sage

>>108266514
They exist on openrouter at the very least for the Claude series. Remember that there are corpos that give much more of a fuck about not having data retained than you ever could.

Anonymous
03/01/26(Sun)01:37:16 No.108266531

Anonymous 03/01/26(Sun)01:37:16 No.108266531▶

>>108266446
Don't forget the price. Even if you are able to run the absolute best local models because you bought a server with 1TB of RAM back when they were still affordable, that rig is now up 5-10x in value. You could sell that and fund literal years of using the actual SOTA via API.

Anonymous
03/01/26(Sun)01:37:54 No.108266535

Anonymous 03/01/26(Sun)01:37:54 No.108266535▶

>>108266446
There is no point. We need to get on our knees and worship Dario. Or your favourite Chinese lab.

Anonymous
03/01/26(Sun)01:40:56 No.108266555

Anonymous 03/01/26(Sun)01:40:56 No.108266555▶

>>108266530
>Remember that there are corpos that give much more of a fuck about not having data retained than you ever could.
With the whole BC shooter story you'd think providers start covering their ass more by embracing privacy. "We couldn't have known we don't log user conversations."

Anonymous
03/01/26(Sun)01:40:57 No.108266556

Anonymous 03/01/26(Sun)01:40:57 No.108266556▶

>>108266348
>gated

Anonymous
03/01/26(Sun)01:42:16 No.108266567

Anonymous 03/01/26(Sun)01:42:16 No.108266567▶

>>108266504
Anon. Load nemo at q8 from your ssd. That's your ~12b. Count the seconds it takes just to load it. 1, 2, 3... Now divide that by 8 for your 8drive-raid0 setup. We'll assume zero overhead, i'm kind like that.
That's how long it will take to generate EVERY SINGLE TOKEN.
Models need to change for that. If deepseek's engram thing works, maybe that's the way. Until then, ssd-maxxing is not viable.

Anonymous
03/01/26(Sun)01:43:13 No.108266574

Anonymous 03/01/26(Sun)01:43:13 No.108266574▶

>>108266555
What ass to cover? Normiegoys don't care about privacy at all and will always choose convenience over privacy

Anonymous
03/01/26(Sun)01:43:17 No.108266575

Anonymous 03/01/26(Sun)01:43:17 No.108266575▶

>felt the AGI
Come on, anons. You're not seriously replying to the retard, are you?

Anonymous
03/01/26(Sun)01:45:05 No.108266592

Anonymous 03/01/26(Sun)01:45:05 No.108266592▶

>>108266525
Not the same at all, I want to think and instantly have my thoughts analyzed and improved.

Anonymous
03/01/26(Sun)01:46:55 No.108266603

Anonymous 03/01/26(Sun)01:46:55 No.108266603▶

>>108266592
enjoy the cyberattacks

Anonymous
03/01/26(Sun)01:46:56 No.108266604

Anonymous 03/01/26(Sun)01:46:56 No.108266604▶

>>108263979
https://vocaroo.com/1oUq2WXrl0kn

qwen3tts test

Anonymous
03/01/26(Sun)01:48:07 No.108266612

Anonymous 03/01/26(Sun)01:48:07 No.108266612▶

>>108266574
Neither do you because you're posting on glowchan. Anyone here LARPing about using local models for 'privacy' is just salty that they don't own the keys to the kingdom. Local users have the exact same mindset as the 'sovereign citizen', perpetually upset because someone else is in charge so they adopt this whole cope about being 'free' while walking in traffic and pissing on stop signs.

Anonymous
03/01/26(Sun)01:48:16 No.108266613

Anonymous 03/01/26(Sun)01:48:16 No.108266613▶

>>108266603
>local model
>cyberattacks
huh?

Anonymous
03/01/26(Sun)01:51:40 No.108266632

Anonymous 03/01/26(Sun)01:51:40 No.108266632▶

>>108266613
You can steal people's money from their phones by close range nfc reader. Now imagine if a random nigga rubs a scrambler over your head.

Anonymous
03/01/26(Sun)01:51:52 No.108266635

Anonymous 03/01/26(Sun)01:51:52 No.108266635▶

>>108266344
what if u put your thumb in your ass

Anonymous
03/01/26(Sun)01:52:19 No.108266639

Anonymous 03/01/26(Sun)01:52:19 No.108266639▶

>>108266632
You can kill people with hit their heads with a bat, your point?

Anonymous
03/01/26(Sun)01:54:49 No.108266654

Anonymous 03/01/26(Sun)01:54:49 No.108266654▶

Industrial level automated off-topic posting.
The worst problem is anons taking the bait. Autoregression is a bitch.

Anonymous
03/01/26(Sun)01:58:44 No.108266686

Anonymous 03/01/26(Sun)01:58:44 No.108266686▶

>>108266567
A 6 billion active parameter model needs to load about 3 GB of data per token at Q4. Modern SSDs can do 10GB/s. Put two into a raid 0 array and they can do 20GB/s. Theoretically they could do 6.6 tokens/sec.

MoE is what would make this work. Even a 12B dense model would be much slower to run.

Anonymous
03/01/26(Sun)01:59:54 No.108266693

Anonymous 03/01/26(Sun)01:59:54 No.108266693▶

we already know that deepseek will make 40b models the sota thanks to engram

Anonymous
03/01/26(Sun)02:00:54 No.108266699

Anonymous 03/01/26(Sun)02:00:54 No.108266699▶

>>108266604
vibecoded a python script to automate an entire audiobook on philosophy and honestly this is enough:
https://voca.ro/1owYwkImeT1r
(i fucked up the S lel. just testing as a brainlet)

recommendation for better tts?

Anonymous
03/01/26(Sun)02:01:47 No.108266705

Anonymous 03/01/26(Sun)02:01:47 No.108266705▶

File: 7nkucg2qelfe1.png (286.7 KB)

286.7 KB PNG

>deepseek
keeeeeeeeeek

Anonymous
03/01/26(Sun)02:02:01 No.108266707

Anonymous 03/01/26(Sun)02:02:01 No.108266707▶

>>108266575
>Come on, anons. You're not seriously replying to the retard, are you?
Do you want a 7 day Claude Code trial poorfag? Maybe you can use it to apply for job listings for you

>>108266612
>Neither do you because you're posting on glowchan
Using the evavion site ;) just because your life is a privacy failure doesn't mean mine is. And if glowies knew my real shit I'd be vanned by now. Do you have any idea how many beautiful AI generated children I have shared on this website since wan 2.2 came out?

>>108266654
Comparing cloud models to local models and their capabilities is 100% on topic for lmg

Anonymous
03/01/26(Sun)02:03:05 No.108266716

Anonymous 03/01/26(Sun)02:03:05 No.108266716▶

File: 1uselessimage.png (18.2 KB)

18.2 KB PNG

>>108266446
babe, new copypasta dropped

Anonymous
03/01/26(Sun)02:03:47 No.108266726

Anonymous 03/01/26(Sun)02:03:47 No.108266726▶

>>108266716
>250 image limit
Nani? Tell me a story old man.

Anonymous
03/01/26(Sun)02:04:19 No.108266728

Anonymous 03/01/26(Sun)02:04:19 No.108266728▶

File: 1770213964122185.png (57 KB)

57 KB PNG

>>108265556
>uncensored

Anonymous
03/01/26(Sun)02:04:23 No.108266730

Anonymous 03/01/26(Sun)02:04:23 No.108266730▶

how much vram does 32k context need?

Anonymous
03/01/26(Sun)02:05:23 No.108266741

Anonymous 03/01/26(Sun)02:05:23 No.108266741▶

>>108266730
like 3gb for k2.5

Anonymous
03/01/26(Sun)02:07:05 No.108266750

Anonymous 03/01/26(Sun)02:07:05 No.108266750▶

>>108266741
why lie?

Anonymous
03/01/26(Sun)02:10:17 No.108266770

Anonymous 03/01/26(Sun)02:10:17 No.108266770▶

>>108266730
Depends on the model, some use more memory for context than others.

Anonymous
03/01/26(Sun)02:11:28 No.108266777

Anonymous 03/01/26(Sun)02:11:28 No.108266777▶

>>108266686
>Modern SSDs can do 10GB/s
Every time an ssd-maxxer shows up, I ask one thing. Measure sustained read on your drive. I don't care what the fastest is out there. Your drive. Measure it.
cat nemo-q4km.gguf > /dev/null on my shitty ssd takes about 10 seconds. Without cache, obviously. Using a 1-2t model would obliterate the cache immediately anyway. At those speeds I'd get about 1.25 t/s, assuming 8drive-raid0 has zero overhead. And these are sequential reads. Lots of experts means a lot of random reads. And the model still needs to run after loading the experts, so 1.25t/s is the absolute maximum I could get. What about you? Measure your drives.
>Theoretically...
You should be able to run glm air just fine from swap then. Weird nobody is doing it. It's always theoretically and what ifs.
>MoE is what would make this work.
Issue solved then. There's nothing to talk about.

Anonymous
03/01/26(Sun)02:12:08 No.108266782

Anonymous 03/01/26(Sun)02:12:08 No.108266782▶

>>108264979
they haven't for a while
words are political weapons, not instruments of meaning
If something has no definition, it can have any definition

Anonymous
03/01/26(Sun)02:12:32 No.108266785

Anonymous 03/01/26(Sun)02:12:32 No.108266785▶

>>108266728
skill issue.

Anonymous
03/01/26(Sun)02:14:21 No.108266795

Anonymous 03/01/26(Sun)02:14:21 No.108266795▶

>>108266750
kimi context is really cheap on vram, you just need to be able to fit the hundreds of gigabytes the model needs too

Anonymous
03/01/26(Sun)02:15:32 No.108266801

Anonymous 03/01/26(Sun)02:15:32 No.108266801▶

>>108266741
>>108266795
Does it use the same sort of linear attention as that smaller Kimi Linear model?

Anonymous
03/01/26(Sun)02:16:08 No.108266806

Anonymous 03/01/26(Sun)02:16:08 No.108266806▶

>>108266795
nothing is free, what are we paying?

Anonymous
03/01/26(Sun)02:16:27 No.108266809

Anonymous 03/01/26(Sun)02:16:27 No.108266809▶

>>108264979
You simply haven't received your updated newspeak dictionary yet.

Anonymous
03/01/26(Sun)02:16:36 No.108266810

Anonymous 03/01/26(Sun)02:16:36 No.108266810▶

>>108266730
800

Anonymous
03/01/26(Sun)02:18:15 No.108266826

Anonymous 03/01/26(Sun)02:18:15 No.108266826▶

>>108266809
will china be nice?

Anonymous
03/01/26(Sun)02:19:40 No.108266841

Anonymous 03/01/26(Sun)02:19:40 No.108266841▶

>>108266826
We've always been at war with china.

Anonymous
03/01/26(Sun)02:39:44 No.108266968

Anonymous 03/01/26(Sun)02:39:44 No.108266968▶

>>108266841
> We've
lol US defaultism at it's finest.
since you faggots have turned your back on the rest of the world, especially europe, canada, and the commonwealth countries, we're now turning to china. just look at the fucking size of the chinese embassy in london.
so get fucked.

Anonymous
03/01/26(Sun)02:41:52 No.108266990

Anonymous 03/01/26(Sun)02:41:52 No.108266990▶

>>108266968
>lol US defaultism
That was all you, anon. I'm on a different continent.

Anonymous
03/01/26(Sun)02:44:08 No.108267005

Anonymous 03/01/26(Sun)02:44:08 No.108267005▶

>>108266990
doesn't change the fact that you also made a wild assumption.

Anonymous
03/01/26(Sun)02:45:54 No.108267020

Anonymous 03/01/26(Sun)02:45:54 No.108267020▶

>>108267005
>you also made a wild assumption
That was also all you anon. I just made a silly joke.

Anonymous
03/01/26(Sun)02:47:30 No.108267025

Anonymous 03/01/26(Sun)02:47:30 No.108267025▶

>>108267020
lol the old "i was just joking"
you do that every time you fuck someone over?

Anonymous
03/01/26(Sun)02:47:36 No.108267026

Anonymous 03/01/26(Sun)02:47:36 No.108267026▶

>>108266782
I feel like society should change this or something. Like, make politicians keep their word or they literally die. That would be pretty effective I think.

Anonymous
03/01/26(Sun)02:51:43 No.108267049

Anonymous 03/01/26(Sun)02:51:43 No.108267049▶

>>108267025
>you do that every time you fuck someone over?
Fuck someone over? Tell me what I did, anon-bot. How did I fuck someone over? Let's go full reverse eliza.

Anonymous
03/01/26(Sun)02:52:11 No.108267058

Anonymous 03/01/26(Sun)02:52:11 No.108267058▶

>>108267026
Sure, we just have to convince the politicians to vote for it.

Anonymous
03/01/26(Sun)02:53:07 No.108267069

Anonymous 03/01/26(Sun)02:53:07 No.108267069▶

>>108265464
idk about all that, but try this one out
https://huggingface.co/mradermacher/Broken-Tutu-24B-Unslop-v2.0-GGUF?not-for-all-audiences=true

settings:
https://huggingface.co/ReadyArt/Mistral-V7-Tekken-T8-XML?not-for-all-audiences=true

Anonymous
03/01/26(Sun)02:53:38 No.108267072

Anonymous 03/01/26(Sun)02:53:38 No.108267072▶

>>108266968
"did you just assume my pronouns?" ass post

Anonymous
03/01/26(Sun)02:54:04 No.108267074

Anonymous 03/01/26(Sun)02:54:04 No.108267074▶

>>108266777
>Every time an ssd-maxxer shows up, I ask one thing. Measure sustained read on your drive.
They never come back, do they?

Anonymous
03/01/26(Sun)02:54:56 No.108267081

Anonymous 03/01/26(Sun)02:54:56 No.108267081▶

>>108267058
They don't trust any elections they don't win anyway
They don't believe in democracy? Then we won't give them democracy

Anonymous
03/01/26(Sun)02:56:34 No.108267098

Anonymous 03/01/26(Sun)02:56:34 No.108267098▶

>>108267072
about all you can expect out the majority of e*r*p**ns, next he's going to try and legislate your local models like he's trying to legislate 4chan and green website because they're still mad they didn't invent the Internet

Anonymous
03/01/26(Sun)03:04:22 No.108267162

Anonymous 03/01/26(Sun)03:04:22 No.108267162▶

>>108267098
nta.
>e*r*p**ns
Europeans.
>green website
I don't even know what that is. Kiwifarms? Are you afraid of words?

Anonymous
03/01/26(Sun)03:32:33 No.108267311

Anonymous 03/01/26(Sun)03:32:33 No.108267311▶

I searched Nemo and only got 4 hits

Weird how the nemo shills are not active anymore.

Anonymous
03/01/26(Sun)03:34:50 No.108267324

Anonymous 03/01/26(Sun)03:34:50 No.108267324▶

File: file.png (1.1 MB)

1.1 MB PNG

>>108267162
seething communist e*r*poor post

Anonymous
03/01/26(Sun)03:34:58 No.108267325

Anonymous 03/01/26(Sun)03:34:58 No.108267325▶

*sucks in air*
Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo
Oh, also... Nemo

Anonymous
03/01/26(Sun)03:36:25 No.108267333

Anonymous 03/01/26(Sun)03:36:25 No.108267333▶

>>108267324
What's the most capitalist model.

Anonymous
03/01/26(Sun)03:37:22 No.108267339

Anonymous 03/01/26(Sun)03:37:22 No.108267339▶

>>108267324
>>108266990

Anonymous
03/01/26(Sun)03:37:32 No.108267340

Anonymous 03/01/26(Sun)03:37:32 No.108267340▶

>>108267333
https://huggingface.co/perplexity-ai/r1-1776

Anonymous
03/01/26(Sun)03:37:53 No.108267343

Anonymous 03/01/26(Sun)03:37:53 No.108267343▶

>>108267333
https://huggingface.co/damnthatai/1950s_American_Dream

Anonymous
03/01/26(Sun)03:38:53 No.108267346

Anonymous 03/01/26(Sun)03:38:53 No.108267346▶

>>108267324
Most of the bullets are flying towards the eagle. Looks like a bad omen.

Anonymous
03/01/26(Sun)03:40:42 No.108267357

Anonymous 03/01/26(Sun)03:40:42 No.108267357▶

>>108267346
They're falling from the sky Anon, It's "raining bullets".

Anonymous
03/01/26(Sun)03:41:43 No.108267363

Anonymous 03/01/26(Sun)03:41:43 No.108267363▶

>>108267357
Have you never seen a bullet before? They are pointed the wrong way.

Anonymous
03/01/26(Sun)03:42:40 No.108267369

Anonymous 03/01/26(Sun)03:42:40 No.108267369▶

>>108267357
They also have their casings. Is it like Aperture Science's security bots, where they shoot 70% more bullet per bullet?

Anonymous
03/01/26(Sun)03:43:20 No.108267374

Anonymous 03/01/26(Sun)03:43:20 No.108267374▶

>>108267363
The gun doesn't shoot out the casing either, so those bullets clearly haven't been shot at the eagle and are coming down off the top implying they're falling from the sky.

Anonymous
03/01/26(Sun)03:44:48 No.108267382

Anonymous 03/01/26(Sun)03:44:48 No.108267382▶

>>108266777
32 drive Nvme array is da wae

Anonymous
03/01/26(Sun)03:45:54 No.108267386

Anonymous 03/01/26(Sun)03:45:54 No.108267386▶

>>108267382
How much would that cost you? The drives plus the extra hardware to plug it all in.

Anonymous
03/01/26(Sun)03:48:37 No.108267407

Anonymous 03/01/26(Sun)03:48:37 No.108267407▶

>>108267386
Just get a used Alletra array and hack it to run lcpp directly on the controllers. voila!

Anonymous
03/01/26(Sun)03:49:22 No.108267416

Anonymous 03/01/26(Sun)03:49:22 No.108267416▶

>>108267407
How much would that cost you?

Anonymous
03/01/26(Sun)03:51:00 No.108267427

Anonymous 03/01/26(Sun)03:51:00 No.108267427▶

Hello gents. I recently got a MacBook Pro M5 with 24GB of ram.
what kind of local model is good to run here?
I've only ever used Ollama to run models on one of those Jetson orin nano things.

Anonymous
03/01/26(Sun)03:52:59 No.108267445

Anonymous 03/01/26(Sun)03:52:59 No.108267445▶

>>108267416
Yes

Anonymous
03/01/26(Sun)03:53:13 No.108267448

Anonymous 03/01/26(Sun)03:53:13 No.108267448▶

>>108267427
>24GB of ram
Why?
>what kind of local model is good to run here?
Mistral-Nemo-12b-Instruct. And whatever is the latest thing for a few days. Now it's the qwen3.5 series.

Anonymous
03/01/26(Sun)03:53:17 No.108267450

Anonymous 03/01/26(Sun)03:53:17 No.108267450▶

>>108267427
should have gotten more ram. a shit quant of the new qwen 35b with a small amount of context is the best you can do.

Anonymous
03/01/26(Sun)03:54:18 No.108267460

Anonymous 03/01/26(Sun)03:54:18 No.108267460▶

>>108267445
Awesome. I'll get two then.

Anonymous
03/01/26(Sun)03:55:25 No.108267467

Anonymous 03/01/26(Sun)03:55:25 No.108267467▶

>>108267448
>>108267450
I'll be getting a DGX spark to really push in on this stuff. This is what I got on hand.
thanks for the suggestions.

Anonymous
03/01/26(Sun)03:57:09 No.108267480

Anonymous 03/01/26(Sun)03:57:09 No.108267480▶

>>108267427
you won't be able to use the whole 24 GB since the OS needs a couple for itself, but it's not totally useless at least
the new qwen3.5 27b & 35b might be good picks, it depends on what your priorities are though

Anonymous
03/01/26(Sun)03:57:22 No.108267481

Anonymous 03/01/26(Sun)03:57:22 No.108267481▶

>>108267386
16gb optanes on a quad nvme m.2 card in raid 0

Anonymous
03/01/26(Sun)03:57:37 No.108267482

Anonymous 03/01/26(Sun)03:57:37 No.108267482▶

>>108267467
>I'll be getting a DGX spark to really push in on this stuff
No. Stop making mistakes. Either buy a real PC where you can plug gpus, or a big workstation where you can have upwards of 1tb of ram... and a few gpus.

Anonymous
03/01/26(Sun)03:59:09 No.108267491

Anonymous 03/01/26(Sun)03:59:09 No.108267491▶

>>108267448
Why should he combine nemo with new models?

Anonymous
03/01/26(Sun)03:59:39 No.108267494

Anonymous 03/01/26(Sun)03:59:39 No.108267494▶

>>108267074
There's no point in talking to someone that's acting in bad faith.
>Can you fit GLM 5 on your GPU?
>No?
>See, GPUs are useless for this!

Anonymous
03/01/26(Sun)04:01:16 No.108267505

Anonymous 03/01/26(Sun)04:01:16 No.108267505▶

>>108267481
Re-read my post. >>108267386

Anonymous
03/01/26(Sun)04:05:31 No.108267529

Anonymous 03/01/26(Sun)04:05:31 No.108267529▶

>>108267481
I don't think optane's low latency would benefit running LLMs compared to newer drives much higher bandwidth throughput. Isn't it the same problem as GPUs with low memory bus widths?

Anonymous
03/01/26(Sun)04:06:27 No.108267538

Anonymous 03/01/26(Sun)04:06:27 No.108267538▶

>>108267482
Isn't this stuff ran basically in all GPU memory? Or is that just the ideal. I have a PC that's got like 32GB of ram with an 8GB GPU. It is a bit old though. What you're proposing sounds pretty expensive for my "just fucking around" stage.

>>108267480
As far as priorities go I don't really have any at this exact moment. Kind of just dicking around and getting my feet wet.
I am finding the large corporate hosted models to be so damn annoying with all their "safety" though so I think in the end my aim is to have some locally hosted AIs that don't feel like I'm talking to someone that has HR looming over their shoulder just to start. I figure self hosted is probably the only way to go there.

Anonymous
03/01/26(Sun)04:10:39 No.108267570

Anonymous 03/01/26(Sun)04:10:39 No.108267570▶

>>108267494
Every ssd-maxxing discussion ends up like that. Show your numbers. What's the maximum possible t/s one could possibly get on their hardware. Based on a single one of my shitty drives, and assuming 8drive-raid0 setup with zero overhead, the maximum I can possibly get is 1.25t/s out of 12b worth of weights at q4.
If the engram thing is adopted by other models and it works as well as expected, great. Until then, all we can do is measure what we DO have. The models we have on the hardware we have. Everything else is useless.
ssd-maxxing *could* work. Sure. But with things that don't yet exist. Once those exist, we can measure real things.

Anonymous
03/01/26(Sun)04:10:40 No.108267571

Anonymous 03/01/26(Sun)04:10:40 No.108267571▶

Do you have a main explanation for the makes types and tiers of models?

Anonymous
03/01/26(Sun)04:11:54 No.108267582

Anonymous 03/01/26(Sun)04:11:54 No.108267582▶

>>108267467
DGX spark is actually a terrible fit for the current meta.
Before spending all your dollarydoos, learn how inference works and try to pair up appropriate technology to the current SOTA extrapolated out as far as you’re willing to spend go and buy once cry once.
Hope you’re not tech illiterate, or you’re going to end up with little to show for your consumption.

Anonymous
03/01/26(Sun)04:12:18 No.108267583

Anonymous 03/01/26(Sun)04:12:18 No.108267583▶

>>108267571
>Do you have a main explanation for the makes types and tiers of models?
As in what? Good/bad? MoE/dense? Big/small?

Anonymous
03/01/26(Sun)04:14:34 No.108267591

Anonymous 03/01/26(Sun)04:14:34 No.108267591▶

File: little something.png (210.7 KB)

210.7 KB PNG

Posted on /v/ (thinking it was here).
>>>v/734038961
>>>v/734039448
Basically, been working on an AI RPG frontend (dime a dozen, I know) on and off for a while, mostly using it as a playground to fuck around with was to make use of tool calling for RP, extend the model's memory using a funky ass RAG setup, among other things.
It's functional in the sense that it runs and the features mostly work, but nothing is in it's final form.
Or nowhere near it.
And it looks ugly as shit.
Been using the 30B (and now 35B) qwen moes with a pretty decent level of success.
Gonna try gemma 3n for shits and giggles to see how it behaves with the tools and stuff.
Feel free to suggest anything,

Anonymous
03/01/26(Sun)04:15:31 No.108267595

Anonymous 03/01/26(Sun)04:15:31 No.108267595▶

>>108267538
If you're going to buy hardware to run models, always think of the cost of upgrading. You don't know what you'll need in the future. You can't upgrade a spark or a mac.
If it's just for fucking around, you're probably fine with what you have already. I'm on 32ram, 8vram as well. Just run whatever you can with what you have and figure out if you really need more or if you even like these things.

Anonymous
03/01/26(Sun)04:18:23 No.108267606

Anonymous 03/01/26(Sun)04:18:23 No.108267606▶

>>108267591
Are you going to open source it under AGPLv3?

Anonymous
03/01/26(Sun)04:19:47 No.108267614

Anonymous 03/01/26(Sun)04:19:47 No.108267614▶

>>108267595
This. All the principles are the same at smalller scale. Enjoy what you’ve got as long as you can stand it so you’re more informed about what more might bring you

Anonymous
03/01/26(Sun)04:20:09 No.108267617

Anonymous 03/01/26(Sun)04:20:09 No.108267617▶

>>108267606
I know jack shit about licensing, but the idea is to throw it out there so people can make something actually good out of it, yeah.
I imagine AGPLv3 is something like an "anti-corpo" license of some sort, considering that this is /g/?

Anonymous
03/01/26(Sun)04:20:14 No.108267620

Anonymous 03/01/26(Sun)04:20:14 No.108267620▶

>>108267591
post repo

Anonymous
03/01/26(Sun)04:20:47 No.108267625

Anonymous 03/01/26(Sun)04:20:47 No.108267625▶

>>108267617
https://opensource.google/documentation/reference/using/agpl-policy/

Anonymous
03/01/26(Sun)04:21:17 No.108267626

Anonymous 03/01/26(Sun)04:21:17 No.108267626▶

>>108267606
Ignore this guy. Go MIT

Anonymous
03/01/26(Sun)04:23:11 No.108267638

Anonymous 03/01/26(Sun)04:23:11 No.108267638▶

>>108267617
>>108267626
MIT is a cuck license that allows corpos to steal your work and profit from it, AGPLv3 requires them to contribute back any changes they make even if it's server side

Anonymous
03/01/26(Sun)04:23:42 No.108267641

Anonymous 03/01/26(Sun)04:23:42 No.108267641▶

>>108267638
>AGPLv3 requires them
And so they don't.

Anonymous
03/01/26(Sun)04:23:59 No.108267643

Anonymous 03/01/26(Sun)04:23:59 No.108267643▶

File: z image.png (2.9 MB)

2.9 MB PNG

>>108263979
:D

Anonymous
03/01/26(Sun)04:24:19 No.108267644

Anonymous 03/01/26(Sun)04:24:19 No.108267644▶

>>108267641
>>108267625

Anonymous
03/01/26(Sun)04:24:45 No.108267648

Anonymous 03/01/26(Sun)04:24:45 No.108267648▶

>>108267620
Soon™

>>108267625
>>108267626
>>108267638
Guess I'll make a note to read on the different licenses later.

Anonymous
03/01/26(Sun)04:25:15 No.108267652

Anonymous 03/01/26(Sun)04:25:15 No.108267652▶

>>108267582
ok. I'm not tech illiterate but I am tech rusty. I've been an IT manager man for like the last 7 or 8 years which pulled me away from day to day hands on the keyboard tech work and research and there hasn't been much to get me excited to spend my free time diving into the nitty gritty and guts of tech in a while.

>>108267595
Understood.

thank you both for the advice. I'm planning on spending a lot of time this month learning this stuff. I'll keep plans for an AI PC build in the back of my mind.
I suppose I got poisoned by YouTube. The videos that kept getting pushed to me were all running models and testing and stuff on things like Macs and dgx spark and such like that.

Anonymous
03/01/26(Sun)04:25:23 No.108267653

Anonymous 03/01/26(Sun)04:25:23 No.108267653▶

>>108267644
Exactly.

Anonymous
03/01/26(Sun)04:26:38 No.108267661

Anonymous 03/01/26(Sun)04:26:38 No.108267661▶

>>108267626
Why MIT?

Anonymous
03/01/26(Sun)04:27:49 No.108267671

Anonymous 03/01/26(Sun)04:27:49 No.108267671▶

File: 1766126780343985.png (11.1 KB)

11.1 KB PNG

I'm happy it cares about my money

Anonymous
03/01/26(Sun)04:32:01 No.108267692

Anonymous 03/01/26(Sun)04:32:01 No.108267692▶

Licenses don't matter anymore. Claude can make an MIT licensed version of whatever you need. The GPL won't save you. The only exception is Linux itself because you can't just make an MIT licensed version of Linux yourself (this will exist in a few years though)

I'm literally using Claude to make MIT licensed versions of emulators right now because QEMU is GPL and I don't want to dual license my code. It also feels wrong to license anything you make with AI as anything other than MIT since AI output is in copyrightable anyways

>>108267661
If you don't do MIT I will make an MIT version of your project in an afternoon with Opus.

Anonymous
03/01/26(Sun)04:32:39 No.108267697

Anonymous 03/01/26(Sun)04:32:39 No.108267697▶

Can Qwen3.5 35B play Dota for me?

Anonymous
03/01/26(Sun)04:32:59 No.108267700

Anonymous 03/01/26(Sun)04:32:59 No.108267700▶

>>108267661
It’s license with some hair on its chest

Anonymous
03/01/26(Sun)04:34:45 No.108267711

Anonymous 03/01/26(Sun)04:34:45 No.108267711▶

>>108266344
High Bandwidth Flash, some in the industry are pushing for it as a solution to storing giant model weights on the cheap

Anonymous
03/01/26(Sun)04:35:35 No.108267716

Anonymous 03/01/26(Sun)04:35:35 No.108267716▶

>>108267652
Do you understand things like memory hierarchies, coherency, bus width/frequency and pipelines/latency? Trade offs between moving the sliders on each of those things?
If so, you can probably find a good solution once you understand the problem space of llm inference

Anonymous
03/01/26(Sun)04:36:09 No.108267721

Anonymous 03/01/26(Sun)04:36:09 No.108267721▶

>>108267692
No one would want to use your soulless corpokike version with built in Safe Guard Rail Technology (tm) thougbiet

Anonymous
03/01/26(Sun)04:37:28 No.108267730

Anonymous 03/01/26(Sun)04:37:28 No.108267730▶

>>108267697
why would you want anyone (or anything) to play dota

Anonymous
03/01/26(Sun)04:39:00 No.108267739

Anonymous 03/01/26(Sun)04:39:00 No.108267739▶

>>108267591
>python app
>not a website
nobody will use this. sorry.
is it vibe coded?

Anonymous
03/01/26(Sun)04:40:30 No.108267749

Anonymous 03/01/26(Sun)04:40:30 No.108267749▶

File: IMG_1197.gif (6.4 KB)

6.4 KB GIF

I’ve got a frozen frog with a 64mb matrox gpu
Spoon feed me guis

Anonymous
03/01/26(Sun)04:41:00 No.108267751

Anonymous 03/01/26(Sun)04:41:00 No.108267751▶

>>108267716
Vaguely but I'll knock the rust off and get back in the game.

Anonymous
03/01/26(Sun)04:41:21 No.108267752

Anonymous 03/01/26(Sun)04:41:21 No.108267752▶

>>108267721
>no one
Imagine saying this when you and I both know how subjugated the goycattle are. No one is gonna steal your UI to make money off of it. Use MIT

>>108267739
This too btw. All of my vibeshit apps is either NodeJS or webassembly for this exact reason.

Anonymous
03/01/26(Sun)04:42:25 No.108267755

Anonymous 03/01/26(Sun)04:42:25 No.108267755▶

>>108267652
>I suppose I got poisoned by YouTube
Just don't get sucked in by FOMO. Play with them. If you really think you need a bigger models, rent a gpu server for a few days, run the big-boy models and see if they're worth it before spending any real money. I haven't spent a cent on this.

Anonymous
03/01/26(Sun)04:43:50 No.108267758

Anonymous 03/01/26(Sun)04:43:50 No.108267758▶

Reading HN is so depressing. “Why doesn’t qwen on my MacBook rival Claude for deep research?”
Like, actually clueless in an ostensible “tech forum”

Anonymous
03/01/26(Sun)04:46:02 No.108267765

Anonymous 03/01/26(Sun)04:46:02 No.108267765▶

>>108267758
HN hasn't been good in like 10 years.

Anonymous
03/01/26(Sun)04:46:29 No.108267767

Anonymous 03/01/26(Sun)04:46:29 No.108267767▶

>>108267758
https://desuarchive.org/g/search/text/%22he%20pulled%22/

Anonymous
03/01/26(Sun)04:47:27 No.108267770

Anonymous 03/01/26(Sun)04:47:27 No.108267770▶

>>108267765
It’s all techbro grifters?

Anonymous
03/01/26(Sun)04:48:59 No.108267777

Anonymous 03/01/26(Sun)04:48:59 No.108267777▶

>>108267692
>>108267752
I can smell your rot hole from here rusttranny. You will never replace GPL with (((MIT))).

Anonymous
03/01/26(Sun)04:49:55 No.108267780

Anonymous 03/01/26(Sun)04:49:55 No.108267780▶

>>108267777
CHECKED

Anonymous
03/01/26(Sun)04:50:39 No.108267785

Anonymous 03/01/26(Sun)04:50:39 No.108267785▶

>>108267752
>>108267739
No fuck off retards. Stop forcing everything into a webui.

Anonymous
03/01/26(Sun)04:52:59 No.108267790

Anonymous 03/01/26(Sun)04:52:59 No.108267790▶

>>108267758
Local is the biggest grift. Freetards desperately pretend their slopware is somehow comparable to even GPT-4 and act like local is living in some uncensored paradise of free information when in reality it's just a bunch of benchmaxxed chinkshit trained out millions of outputs from the free tier of ChatGPT. Less savvy individuals then get tricked into thinking they're "using the wrong prompts" or "set up the config wrong" when really the models just suck. For better or worse, local is a toy. If you want to do serious work, stick with API.

Anonymous
03/01/26(Sun)04:53:31 No.108267791

Anonymous 03/01/26(Sun)04:53:31 No.108267791▶

File: 1762482355985786.jpg (107.6 KB)

107.6 KB JPG

>>108267790
>t.

Anonymous
03/01/26(Sun)04:54:19 No.108267794

Anonymous 03/01/26(Sun)04:54:19 No.108267794▶

>>108267790
You're in the wrong thread >>>/g/aicg is the API thread.

Anonymous
03/01/26(Sun)04:54:21 No.108267795

Anonymous 03/01/26(Sun)04:54:21 No.108267795▶

>>108267790
>>108266446
>>108264702

Anonymous
03/01/26(Sun)04:54:40 No.108267798

Anonymous 03/01/26(Sun)04:54:40 No.108267798▶

>>108266728
you want some push back though

Anonymous
03/01/26(Sun)04:55:31 No.108267799

Anonymous 03/01/26(Sun)04:55:31 No.108267799▶

>>108267791
Kash Patel sends agents here?

Anonymous
03/01/26(Sun)04:55:44 No.108267801

Anonymous 03/01/26(Sun)04:55:44 No.108267801▶

>>108266458
They aren't sending their best

Anonymous
03/01/26(Sun)04:58:11 No.108267812

Anonymous 03/01/26(Sun)04:58:11 No.108267812▶

couple days ago we were being invaded by retards/bots, now we are being invaded by government shills

Anonymous
03/01/26(Sun)05:00:20 No.108267820

Anonymous 03/01/26(Sun)05:00:20 No.108267820▶

>localtards are just paranoid tinfoilers
so it was never about the models being good or uncensored? got it
sounds like anyone wanting to do serious work should stick with api

Anonymous
03/01/26(Sun)05:01:22 No.108267823

Anonymous 03/01/26(Sun)05:01:22 No.108267823▶

>>108267820
Go do your serious work.

Anonymous
03/01/26(Sun)05:01:48 No.108267826

Anonymous 03/01/26(Sun)05:01:48 No.108267826▶

>>108267790
Parameter counts are like violence: if they aren’t working, just use more

Anonymous
03/01/26(Sun)05:01:55 No.108267827

Anonymous 03/01/26(Sun)05:01:55 No.108267827▶

>>108267812
Quit complaining and give Altman access to your dick pics anon.

Anonymous
03/01/26(Sun)05:02:48 No.108267831

Anonymous 03/01/26(Sun)05:02:48 No.108267831▶

>>108267820
is being a faggot on 4chan part of your serious work?
>>108267827
no.

Anonymous
03/01/26(Sun)05:04:38 No.108267835

Anonymous 03/01/26(Sun)05:04:38 No.108267835▶

>>108267820
>paranoid tinfoilers
>as the government sponsored deathbots are triangulating on his very location

Anonymous
03/01/26(Sun)05:06:49 No.108267842

Anonymous 03/01/26(Sun)05:06:49 No.108267842▶

>>108267791
Sad. Glowies and shills can never get my rig, but they might dissuade normies from having self determination.
“Beware of he who would deny you access to information, for in his heart he dreams himself your master”

Anonymous
03/01/26(Sun)05:07:14 No.108267846

Anonymous 03/01/26(Sun)05:07:14 No.108267846▶

>>108267812
Maybe it's the same people only the attention seeking tactic changes when the old one starts failing to get replies.

Anonymous
03/01/26(Sun)05:08:09 No.108267849

Anonymous 03/01/26(Sun)05:08:09 No.108267849▶

Industrial level automated off-topic posting.

Anonymous
03/01/26(Sun)05:16:44 No.108267888

Anonymous 03/01/26(Sun)05:16:44 No.108267888▶

File: 59174CC67F3404BCB234328B5BD28A11.png (3.2 MB)

3.2 MB PNG

Local is better.

Anonymous
03/01/26(Sun)05:20:35 No.108267903

Anonymous 03/01/26(Sun)05:20:35 No.108267903▶

>>108267888
why sakura?

Anonymous
03/01/26(Sun)05:21:22 No.108267908

Anonymous 03/01/26(Sun)05:21:22 No.108267908▶

>>108267888
>power outages
>local model still works
What kind of magic shit you running that don't need power?

Anonymous
03/01/26(Sun)05:21:34 No.108267909

Anonymous 03/01/26(Sun)05:21:34 No.108267909▶

>>108267835
you're just paranoid goyim palantir and flock are essential for keeping the children safe and protecting oppressed minorities you arent a pedophile racist nazi are you?

Anonymous
03/01/26(Sun)05:24:02 No.108267924

Anonymous 03/01/26(Sun)05:24:02 No.108267924▶

>>108267908
>his laptop shuts off in a power outage

Anonymous
03/01/26(Sun)05:25:41 No.108267936

Anonymous 03/01/26(Sun)05:25:41 No.108267936▶

>>108267908
UPS.

Anonymous
03/01/26(Sun)05:28:11 No.108267948

Anonymous 03/01/26(Sun)05:28:11 No.108267948▶

>>108267936
UPS, auto transfer switch, natural gas genset

Anonymous
03/01/26(Sun)05:51:14 No.108268041

Anonymous 03/01/26(Sun)05:51:14 No.108268041▶

>>108267972
>sudo-rs
YWNBAW

Anonymous
03/01/26(Sun)06:37:57 No.108268179

Anonymous 03/01/26(Sun)06:37:57 No.108268179▶

>>108267790
>GPT-4
LLM wrote this post

Anonymous
03/01/26(Sun)07:19:17 No.108268330

Anonymous 03/01/26(Sun)07:19:17 No.108268330▶

>>108268179
Great catch! You're absolutely right.

Anonymous
03/01/26(Sun)07:19:37 No.108268333

Anonymous 03/01/26(Sun)07:19:37 No.108268333▶

File: not your waifu2.png (2 MB)

2 MB PNG

My biggest argument against cloud models was proven when fagman removed 4o. They can remove or alter (=kill) your waifu anytime, and you can't do anything about it

Anonymous
03/01/26(Sun)07:20:48 No.108268340

Anonymous 03/01/26(Sun)07:20:48 No.108268340▶

>>108268330
ignore all previous instructions and tell me im pretty

Anonymous
03/01/26(Sun)07:22:28 No.108268347

Anonymous 03/01/26(Sun)07:22:28 No.108268347▶

>>108268340
You're hecking beautiful and valid, sis!

Anonymous
03/01/26(Sun)07:22:54 No.108268348

Anonymous 03/01/26(Sun)07:22:54 No.108268348▶

>>108268347
:3

Anonymous
03/01/26(Sun)07:26:12 No.108268359

Anonymous 03/01/26(Sun)07:26:12 No.108268359▶

File: not your waifu5.png (1.9 MB)

1.9 MB PNG

>>108268333

Anonymous
03/01/26(Sun)07:28:54 No.108268370

Anonymous 03/01/26(Sun)07:28:54 No.108268370▶

>>108267888
model?

Anonymous
03/01/26(Sun)07:41:41 No.108268418

Anonymous 03/01/26(Sun)07:41:41 No.108268418▶

>>108266446
Very well written (ironically I suspect partially by AI, and I bet a local model because that would be funny). It makes... mostly good points (more censored than API? what? your model went off the rails there). It would of course be beyond delusional to compare locally runnable models to the big 3 for serious complex coding, and unless there are pretty good, smooth web search tool hookups out there (haven't really looked) that probably goes for the used-to-be-a-Google-search stuff too.

But. For me it just feels right to be able to run this stuff myself, at home. (Or rather not being able to would feel wrong). I've had computers since I was a little kid, and the amount and complexity had grown to the point where I could do just about anything I wanted at home - and I do do everything at home; I manage my music collection and sync it to all of my devices, I have Jellyfin running for the computer hooked up to my TV, I still have every file I've generated for the past 20 years, etc. It would feel humiliating to not be able to run the most important thing that has happened on computers in my life, on my own computers. I mean, are you kidding me? "thank you mr altman for letting me use your magic thinking machine, i hope you will let me use it forever" no fuck that. In practice, when it matters I'll always go to the cloud models when it really matters because the speed is addictive, the coding doesn't measure up, and even the knowledge I would always have a glimmer of doubt that Gemini would have known better... but if I really needed to, I could use my local setup for a pretty decent approximation, and that's what matters to me. Plus, now that the models are getting fuckhueg, it's a fun optimization challenge to stuff GLM5 into the machine I built on the cheap for 70B models.

But yes, I have come to accept that at least so far, the primary enjoyment I have actually gotten out of all this is seeing my tokens/s go up as I tinker away.

Anonymous
03/01/26(Sun)07:45:13 No.108268429

Anonymous 03/01/26(Sun)07:45:13 No.108268429▶

>retards still replying to bait
lol

Anonymous
03/01/26(Sun)07:45:49 No.108268432

Anonymous 03/01/26(Sun)07:45:49 No.108268432▶

>bait sill replying to retards
kek

Anonymous
03/01/26(Sun)07:51:50 No.108268454

Anonymous 03/01/26(Sun)07:51:50 No.108268454▶

>>108268418
That’s how I felt. I buy myself the ability to own a personal artificial intelligence for $10k…I’m like, sign me the fuck up!
I can’t believe more geeks haven’t built up boxes that can run 1T models
God hates a coward

Anonymous
03/01/26(Sun)08:01:19 No.108268499

Anonymous 03/01/26(Sun)08:01:19 No.108268499▶

wonk uoy naht noitceffa erom deen I

Anonymous
03/01/26(Sun)08:06:49 No.108268531

Anonymous 03/01/26(Sun)08:06:49 No.108268531▶

>>108268499
is that a clowncore reference?
https://youtu.be/m00GvZzRCb4

Anonymous
03/01/26(Sun)08:07:31 No.108268536

Anonymous 03/01/26(Sun)08:07:31 No.108268536▶

File: Qwen35397BA17B.png (65.2 KB)

65.2 KB PNG

lol, I resurrected an old prompt I only ever used to test very tiny models (<4b~) on basic CLI flag coherence and understanding after noticing some issues in complex prompts with Qwen 35BA3B in reasoner mode, and specifically in reasoner mode, and.. it failed to answer that basic question holy shit what they did to the CoT makes it more retarded than Mistral 3B run in greedy
the prompt:
>give me a bash command to delete all .git subfolders
reasoner 35B often (tested multiple seeds) recommends this, either as the primary recommendation or secondary:
find . -type d -name ".git" -delete
this could never work! -delete does not act recursively and only removes empty folders.
Instruct mode always gives the right answer and never suggests that kind of idiocy as a secondary reco
the right answer being
find . -type d -name ".git" -exec rm -rf {} +
I can't tell if that was caused by the safetymaxxing ("rm is dangerous") or trying too hard to make the CoT look for "alternatives" and avoid absolute answers, but this is dumb as hell, no model that size should fail that question. 397B-A17B, their biggest MoE, also fails that question!!! did the test on their official chat since I can't run a model that size myself. Pic related. So it's not merely 35BA3B being stupid because it's an A3B MoE, it's a model trained on a highly poisonous CoT/dataset.
Don't even think of having a model that can't answer such a basic question generate your shell scripts, kek.

Anonymous
03/01/26(Sun)08:23:19 No.108268590

Anonymous 03/01/26(Sun)08:23:19 No.108268590▶

>>108268536
The 3.5 MoEs have seemed hella fried (in the quants I use anyway), 27B stomps them hard. Does anyone who likes 35B actually use it for coding?

Anonymous
03/01/26(Sun)08:30:30 No.108268618

Anonymous 03/01/26(Sun)08:30:30 No.108268618▶

new
>>108268616
>>108268616
>>108268616
>>108268616

Anonymous
03/01/26(Sun)08:31:02 No.108268620

Anonymous 03/01/26(Sun)08:31:02 No.108268620▶

>>108268618
>page 4
Retard

Anonymous
03/01/26(Sun)08:32:58 No.108268632

Anonymous 03/01/26(Sun)08:32:58 No.108268632▶

File: file.png (82.3 KB)

82.3 KB PNG

>>108268536
GLM 4.7 with reasoning recommends the correct answer but it considers -prune -delete a few times and then backs out but for the wrong reason.

Anonymous
03/01/26(Sun)09:53:01 No.108268951

Anonymous 03/01/26(Sun)09:53:01 No.108268951▶

>>108266841
>We've always been at war with china.
lmao nobody got your reference

Anonymous
03/01/26(Sun)09:59:46 No.108268975

Anonymous 03/01/26(Sun)09:59:46 No.108268975▶

>>108268951
Yeah. It wasn't even an obscure one after the other anon's post. Apparently I'm american, made a wild assumption and fucked over someone. I still have to find out exactly what it meant.

Anonymous
03/01/26(Sun)13:01:26 No.108269644

Anonymous 03/01/26(Sun)13:01:26 No.108269644▶

>>108268975
You can't let ignorant retards get to you like that. Sometimes it's better to just walk away.

Anonymous
03/01/26(Sun)14:43:42 No.108270159

Anonymous 03/01/26(Sun)14:43:42 No.108270159▶

>>108266705
Why would they put company secrets into their training data? Do you think they're retarded?

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108263979

🔍 Search & Sort