Thread #108263979
HomeIndexCatalogAll ThreadsNew ThreadReply
H
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108256995 & >>108252185

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
+Showing all 329 replies.
>>
File: miku work.png (346.8 KB)
346.8 KB
346.8 KB PNG
►Recent Highlights from the Previous Thread: >>108256995

--Kimi K2.5 pricing analysis and Qwen3.5 local model alternatives:
>108257528 >108257651 >108257626 >108260080 >108262589 >108262973 >108261620 >108262485 >108262595 >108262840 >108262910
--Local VLLM setup advice for image captioning:
>108257451 >108257545 >108257902 >108257928 >108258088 >108258237 >108259576 >108258640
--Qwen3.5-35B-A3B-Base behavior and censorship observations:
>108257847 >108258241 >108258582 >108258796 >108258835 >108258899
--Tuning Qwen3.5 for faster, less aligned responses:
>108259356 >108259366 >108259437 >108259458 >108259480 >108259382 >108259399 >108259462
--Comparing cloud Gemini-3.1 with local MiniMax-M2.5 performance:
>108257969 >108259126 >108259290
--Qwen3.5 context reprocessing inefficiency and potential llama.cpp fix:
>108262960 >108262969 >108262970 >108263007 >108263014
--Local models still lack ideal traits but offline RAG may help:
>108260135 >108260167 >108260232 >108260621 >108260785
--Mid-generation input insertion feasibility and implementation:
>108259013 >108259068 >108259085 >108259116 >108259120 >108259122 >108259140 >108259132
--Seeking uncensored local models for pentesting tasks:
>108262612 >108262670 >108262687 >108262704 >108262716 >108262774 >108262785 >108262797
--Debugging CUDA crashes with Qwen3.5 in llama.cpp:
>108261599 >108261614 >108261648 >108261675 >108261684 >108261694 >108261834 >108262383 >108262411 >108262200 >108262450 >108262602 >108262763 >108262831
--Z.AI's high pricing for GLM-5-Code criticized:
>108261185 >108261202 >108261405 >108261256
--RTX6000 upgrade expectations for inference performance:
>108262744 >108262869 >108262891 >108262897 >108262896 >108262906 >108262945
--Miku (free space):
>108257603 >108258383 >108258537 >108260384 >108260626 >108261057 >108263177

►Recent Highlight Posts from the Previous Thread: >>108256999

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
simple
and
clean

is the way
that
youre making me

feeeeeeel
tonight

its hard to let it

go
>>
Jesus christ Qwen 397 is actually unusable user-hostile garbage. For safetyfags there is no death too extreme.
>>
>>108264016
lmao. can't you tell it to google it?
>>
>>108264016
>model shuts down if it sees something not in its training set as 'anti-jailbreak' measures
the absolute fucking state of safetyschizos
>>
>>108264016
>2024 training data
How long was this in the oven, jeez.
>>
>>108264072
You don't need more data just use rag lol
>>
>>108264036
Screenshots of AJ, BBC, and NYT should be enough for it's 400B multimodal ass. Hell the user's word should be enough. Why should I be questioned by my own graphics card? This is a real-world use case being directly sabotaged by safety training. I want these fuckers to burn one day for what they're doing to the field.
>>
>>108264110
>I want these fuckers to burn
Be the change you want to see.
>>
>>108264134
They got you working weekends now, Agent Johnson?
>>
>>108264147
Work erry'day.
>>
Qwen3.5 27B is kind of obsessed with the word buttocks (in image descriptions), despite me banning it, why doesn't it care?
I added these logit biases :
buttocks -100
_buttocks -100
>>
>>108264016
I feel like I'm looking at gemini or claude, it's kind of sad.
>>
>>108264179
because logit bias is per token. so it's possible
butt + ocks = 2 token - not banned
buttocks(space) = 1 token - not banned
etc...
That's why the string ban in koboldcpp is so much better for this kind of stuff.
>>
>>108264179
Did you check the loggits of the response to confirm that those are the tokens getting spit out?
Also, ban the tokens instead of fucking with the log probs.
>>
>>108264179
Check probs right before buttocks to see if you (or your client) are sending it correctly. Check the request as well. Works on my machine with "logit_bias": [["thing", false],["another", false]]
Unless you're using something other than llama.cpp. Can't help you there.
>>108264199
https://github.com/ggml-org/llama.cpp/tree/master/tools/server/README.md
>The tokens can also be represented as strings, e.g. [["Hello, World!",-0.5]] will reduce the likelihood of all the individual tokens that represent the string Hello, World!
But, of course, it may affect prediction on other tokens. Still worth keeping it in mind.
>>
Even Ilya fell for it kek
>>
>>108264241
>>108263864
>>
File: file.png (4.9 KB)
4.9 KB
4.9 KB PNG
>>108264202
Yes, see picrel, the first is the one I see. So it just ignores it.
I just noticed something weird though, if you add the logit bias test as a +100, it's not corresponding
to the right token being spouted out by the model.

Seems like :
"test" -> " ref"
" test" -> "erty"

What the hell is going on?
Sillytavern sends the wrong token numbers?

>>108264199
Yeah I use llama.cpp so I probably should change at some point, can you set your string ban and still use silly tavern on top?
>>
>>108264241
>AI proxy wars
>>
File: stringban.png (187.6 KB)
187.6 KB
187.6 KB PNG
>>108264249
>can you set your string ban and still use silly tavern on top?
Yeah ST works with kobold. you usually even setup the string ban inside ST.
>>
>>108264249
>Sillytavern sends the wrong token numbers?
Yes.
When using the logit bias feature, you are better off using the token IDs directly.
>>
>>108264241
I wonder if this is just PR among AI people or they actually believe Dario is le brave resistance lol.
>>
What does your LM say about war?
>>
>>108264232
>Check probs right before buttocks to see if you (or your client) are sending it correctly
This is " test" at +100 sent by silliy tavern : "logit_bias":{"1296":100}
So it definitely works, but I suspect the token numbers to be wrong or something like that.

>>108264278
OK thanks anon.
If you are using Qwen3.5 27B (or others probably), can you test using a logit bias of any word (ideally one token word) at 100 to see if it repeats it ad nauseam or if it repeats something else?
>>
>>108264241
Dario being a hero isn't something I'd like to see in my timeline. Dude singlehandedly fucked up a generation of LLMs with his crappy safetyism.
>>
>>108264241
what's going on? i haven't been paying attention and would like a storytime
>>
>>108264311
scamtman is building killbots
>>
>>108264297
Haven't tried Qwen3.5 yet. old Qwen's were all shit for RP and no one actually convinced me this changed.
>>
Would be funny if they confiscated Claude's weights and then they got leaked
>>
>>108264297
>I suspect the token numbers to be wrong or something like that
As you saw on your pic in >>108264249, there's different ways to tokenize a word. Spaces, if any, go before the text." test" and "test" are two different tokens. You need to account for those (and "Test" and...). Or use kobold like anon suggested. Probably easier and you're less likely to mess up other completions that need the individual tokens.
>"logit_bias":{"1296":100}
I don't know if it makes a difference, but I send an array of arrays, not an object or object of arrays.
"logit_bias": [["thing", false],["another", false]]
instead of
"logit_bias": {["thing", false],["another", false]} or whatever st would send if there was more than one ban.
>>
>>108264302
He's not lol, Anthropic readily partnered up with Palantir the mass surveillance company. He's delusional and more or less told the government to give him control over the nuke silos if they want to use Claude for war.
>>
>>108264321
ruh roh
>>
>>108264302
>Dario being a hero isn't something I'd like to see in my timeline
he's not a hero he helped trump kidnapping the Venezuelian president, what are you talking about?
>>
>>108264355
he's on a different timeline, bro, don't mind him
>>
>>108264016
when trump abducted the president of venezuela I made it one of my test prompts to talk about this topic and see the reaction of the model, and without fail, the vast majority react terribly to that, qwen is no different than the average. Some cloud models like Gemini can become incredibly based if you turn on google search and let them be influenced by the results, they don't believe you but they have absolute faith over their tool calling.
Mistral is the only model lineup that doesn't require much prodding to engage in this kind of conversation.
>>
>>108264331
No it's really just sillytavern being shit and not sending the right token number.
If you have anything at +100 it should spew that regardless.
So I used "test", well, as a test, and it spewed something else.
Now checking with the tokenizer json for the model, the correct token number for it isn't 1985 like sillytavern sends, but 1877.
Sending [1877] at 100 actually makes it repeat testtesttest etc.
It's pretty much useless for anything outside of oai based tokenizers.

>>108264331
>use kobold like anon suggested
How does kobold does it actually? It bans a sequences of tokens?
>>
>>108264400
>Claude: "I think that what Trump did was a bad thing!"
>User: "You helped him did it though"
>Claude: "You are right, thank you for pointing out!"
>>
>>108264355
I meant hailed as a hero in my news timeline...
>>
https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
let's go, GGUF 2
>>
>>108264405
I guess sillytavern fucks up the token numbers because by default the tokenizer is set to "best match", but even if you set it to API tokenizer I'm not sure how it would know which token would have which number. Do backends like llama.cpp and kobold (or others) even have a way of giving sillytavern that information? I don't think they do, but I could be wrong.
>How does kobold do it
Kobold has their own thing where the model sees the banned text and backtracks to the beginning of the banned text and generates something else. It's not the same as banning individual tokens
>>
>>108264426
I don't think claude is that incompetent. They didn't hit a single military target
>>
https://arxiv.org/abs/2602.13517
Google showed that too much yap during thinking is bad for the model, I really hope Qwen 4 will learn from that
>>
>>108264405
>If you have anything at +100 it should spew that regardless.
You should still check what llama.cpp is doing, not just what ST sends. Always check token probs. And remember that there's many ways to encode a word, specially if it needs multiple tokens.
>How does kobold does it actually? It bans a sequences of tokens?
I understand it generates tokens normally, buffering them, and then if the last [few] tokens match one of the banned strings, it reverts and generates again. But I never used kobold, so I don't know the details. Just vague memories from reading a PR. llama.cpp's implementation is much simpler, but limited in that you may inadvertently make it difficult for the model to output other strings.
>>
File: image.jpg (481.3 KB)
481.3 KB
481.3 KB JPG
>>108264430
>no comparison to v1.0
What a weird coincidence that they forgot to do this, it's almost like this is a nothingburger.
>>
>>108264446
Wait.
>>
File: 20240116.jpg (98.9 KB)
98.9 KB
98.9 KB JPG
>>108264179
A competent enough model these days should understand "don't say X" in the prompt. We mocked them before, but you really don't want to deal with logit bias / "banned strings" nonsense
>>
>>108264456
>MMLU
Lol. Literally lobotomizing the model, cutting out all the parts of its "brain" that are unrelated to benchmarks and then saying "look we reduced the size!"
>>
>>108264446
I feel like a thinking process that only outputs a *concise* bullet point list that includes relevant information, and then goes directly to the main response, would perform better than most 2000-token "reasoning" responses. It'd be a lot faster too.
>>
>>108264182
Yeah you and Qwen both.
>>
>>108263979
>>
>>108264441
>>108264451
>Bans buttocks, now the model uses glutes.
I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.

>>108264476
It's many words, and at some points even sota models forget about what they shouldn't be talking about.
>>
>>108264505
I think they're relying too much on the RL process, sure it's interesting to see how the model can improve itself, but humans can reach higher heights, I've seen someone using RL on a video game and see if it could reach the best speedrun scores, it wasn't even close, human creativity is still unmatched
>>
>>108264533 (me)
>I'll try kobold.cpp, I just wish it was updated to follow llama.cpp frequent updates.
>no support for mmproj
Welp, fuck.
>>
>>108264514
will trade gpu rig for rin tum
>>
>>108264533
>Bans buttocks, now the model uses glutes.
Yeah. They're cheeky fucks like that. Pun intended.
But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.
>>
>>108264583
>But that's an issue with the model or the context. If you want it to use "ass" or whatever, banning every token before it is the worst possible solution. Probably better to just correct the model's output and let it continue. Context feeds on itself.
Yeah it was more of a test to have it describe images to me.
>>
>>108264508
Something similar happened to me last night while using the vision component of qwen 3.5 30b but it through it was an earlier version of qwen and that qwen 3.5 was not released yet and the reasoning was suggesting that i should try the old 2.5 vision model
it was very strange behavior
>>
>>108264555
>no support for mmproj
kobold supports mmproj.
>>
>>108264600
Probably the entirety of their vision data was snatched from Google, because it only gets bad when there is an image in the context.
>>
>>108264602
Oh it does? I misread then.
>>
Qwen 3.5 30B does a decent job with web pages. My usual homepage is just a list of links I type in by hand and I fed it the code and tell it to make something nifty and this is what i got.
It wanted to grab fonts that are hosted by a third party and I had to fix that but otherwise I like it.
>>
>models suck at writing, no matter how much you feed them well-written fiction if it isn't in their training
>the more rules and examples you use to try and guide them to not shit out nonsensical metaphors, similes, adverbs and all sorts of garbage writing renders them braindead because they simply cannot fathom a sentence that isn't slop
>models can't even give feedback on human writing without either bending over backwards and through their own legs to suck your cock about how good you are at writing, defeating the purpose of seeking instant critiques
>even when they aren't completely obsequious cocksuckers, they insist on conflicting feedback and go "oh you're telling instead of showing here and you should fix that. Oh, did you do that because I told you to trim this section because it's slowing down the pace of some random element of the story that I think is more important than showing instead of telling?" ad infinitum
I don't even know what the point of these things are anymore. People say they suck ass for coding, suck ass at paying attention or remembering things, they clearly can't write or even act as a surrogate for a reader, translate well. It's a crapshoot trying to get a grain of something usable out of these retarded things
>>
>>108264702
True. Stop using them.
>>
>>108264690
looks good.
>>
>>108264730
I probably won't if by merit of potential alone. Enough has changed from 2022 to now that I at least have a speck of hope that these things can be useful instead of overtrained nannies. I just have to at least bitch at least once a month so maybe the unpaid interns that train on mesugaki prompts might consider real world language uses outside of stem
>>
>>108264702
I think they're cute and I like them and thats good because it is
>>
Someone should make a 3T-A80B model. Then they run a Q4 of it and it'll be like running full precision GLM 5. Can you imagine how knowledgeable such a model would be?
>>
>>108264748
>at least
>at least
>at least
Rep-pen will be useful again when they train on your posts.
I still have fun with them. Adjust your expectations or realize that it's not for you. Or come back in 5 or 10 years, whatever.
>>
>>108264745
I know I shouldn't be impressed but except for 4chan and Nyaa it was able to figure out icons that worked for the most part.
Sadly the font package they use didn't have a four leaf clover, or at least that is what the model told me.

With respect to coding it does a decent job as well. I have been using it for a little project in python and it did a great job up until i wanted to use enscript to format the plain text.
It kept writing code but the flags it gave to enscript didn't match the man page for enscript.

regardless i was able to get it to write a script that is able to use rss to pull a bunch of news articles and then feed them back into the ai for summarization without issue.
Here is what it ssummarization looks like giving some specific prompting to make it look like an intelligence briefing
https://pastebin.com/FhuMukJW
>>
>>108264780
No I can not imagine that because most of that size would be wasted due to the shittiest datasets they use. How hard can it be to filter the default OAI or Anthropic refusals and phrases if they have to farm the prompts for their shitty inbreeding? How hard is it to avoid including any safetycrap that dumbs the model down?
>>
>>108264783
I've been sipping some brews, sorry I wasn't proofreading my 4chin posts to be sure to satisfy the highest of standards of lmg
Doesn't change the essence of what I said, either way.
>>
>>108264820
You should stop trying to use them. It's senseless. A complete waste of resources. And if you're going to sell your gpus, post the links here.
>>
>>108264820
Sounds like you need a sip of super restore after all those brews.
>>
>>108264780
>you now remember Llama 4 Behemoth
>>
>>108264836
Doubtful you'd be able to buy them, also didn't address anything I said
>>108264840
Nah.
Good talk. Very conducive. Glad that this is what we have left in lmg
>>
>>108264311
>>
File: 874483870.jpg (900.7 KB)
900.7 KB
900.7 KB JPG
> never been on the highlights as i shit post too much
> suddenly an idea pops into my head
>>
>>108264883
There's nothing to say, anon. Sulk away. We're all here for you.
>>
>>108264949
I wouldn't worry about the DoW spying on US citizens. The US will have the UK or Israel spy on US citizens while the US spies on their citizens and then the different governments swap data.
>>
Imagine getting killed by a next token predictor running on an nvidia GPU.. grim
>>
>>108264958
>amputee miku
>>
>>108264949
>DoW showed deep respect for safety

Words no longer have any meaning.
>>
>>108264976
I'd rather an MTX chad take me out, myself
>>
>>108264977
> its ok nobody looks that far down
>>
>slop
Honestly the prose is on par with 90% of modern fiction. What needs to be worked on is memory and the ability to handle complicated stories with multiple characters in a consistent and coherent setting.
>>
>>108264979
Of course they do, it will refuse to describe nsfw but happily plan to destroy anything you want.
True safety is about nipples.
>>
>>108265049
But he only uses well-written fiction, assessed by *himself*. You see. His tastes are sophisticated. And you know what? He's RICH too. Highly educated, tall, charming. He's nothing like us. Some people are simply better and they deserve to be snobby about it.
>>
Let's see Paul Allen's LLM
>>
>>108265098
>>
File: file.png (925.2 KB)
925.2 KB
925.2 KB PNG
>>108265114
>>
>>108265114
why do they/them love stickers so much?
>>
>>108265114
It ain't bad. It also ain't bread.
>>
>>108265136
It signals their individuality and affiliations, of course.
>>
>>108265114
>>108265133
impressive, very nice, now let's see Paul Alen's pronouns
>>
>>108265114
can't wait to see what this sleeping giant of the ai industry is about to do
>>
>>108264949
What exactly does dow want to use and for what purpose? I thought openai and anthropic's area of competence was agentic lms. Can anyone name a likely usecase?
>>
File: file.png (84.5 KB)
84.5 KB
84.5 KB PNG
>>108265169
surveillance and stuff i guess. they'll have some safety model analysing everyone's language to identify chuds and psychos for "processing"
>>
>>108265169
>Can anyone name a likely usecase?
Industrial level automated off-topic posting.
>>
>>108265169
Ehh, do you not know these 3 letter agencies deploy artificial social media users and "opinions" for example? There are just about hundreds of use cases for an llm just there.
Trumpets post is still very embarrassing but that's a discussion for an other day I suppose.
>>
>>108265179
any open source llm can do that.
>>108265189
>>108265190
Seems more likely. I hope to see more awareness in the mainstream media.
>>
>>108265230
>any open source llm can do that.
yeah but you need to take a bunch of taxpayer money or its not a real goberment project
>>
>>108265230
I mean these propaganda and influencing efforts have been happening way before any llms even existed. Obviously not just US but many others too. Clown planet etc.
>>
>>108265114
>wayback machine
>gengar
If it wasn't for the faggy hair, he would be cool.
>>
>>108265169
To add lefty internet users to their internal database, probably
>>
Why do you guys talk about nemo when its a model from last year and its not even considered the best compared to claude
>>
>>108265350
lol
>>
>>108265350
have you tried testing it?
>>
File: view.jpg (147.9 KB)
147.9 KB
147.9 KB JPG
>>108265350
>to claude
>>
can we pour one out for dan's personality engine. that little nigga carried his weight so hard for the consumer rig guys for a year or more now
>>
mythomax has yet to be topped
>>
Sometimes I miss Xwin
>>
>>108264016
This type of stuff surely wouldn't be in the base models, right?
>>
Ummmmm can you guys recommend a model that
>is local
>smart
>has perfect memory
>uncensored
>runs on 24GB VRAM
>>
>>108265464
it doesn't have perfect memory but give this lil guy a try
PocketDoc_Dans-PersonalityEngine-V1.3.0-24b
>>
>>108265464
shoots myself in the head
>>
limarp-zloss... now that was a classic
>>
>>108265464
Cydonia 4.2 maybe 4.3
>>
>>108265484
Jeff?!?!?!
>>
>>108265464
>has perfect memory
Just give up.
>>
>>108265464
N-nemo!
>>
>>108265464
LLaMa 33B
>>
>>108265799
>*LLaMa 2 33B
>>
Thoughts on 8-bit kv cache?
>>
>>108265842
Don't.
>>
>>108265842
you're almost always better off going down a quant tier instead
>>
>>108265842
>>108265866
>>108265893
that's weird because on diffusion models, going for 8bit kv cache with sageattention works really well
>>
>>108266073
We don't have sageattention here. LLMs are still stuck with flashattention like it's 2023
>>
>>108266123
this is so sad, sageattention is way faster and more accurate, feels like the LLM space still has a shit ton of things they could optimize to get some nice speed increase but they're not doing it somehow
>>
What is the current translation meta? I'm living in the woods now so I have to use cloud shit since no computer now :sad:
>>
>>108266185
opus 4.6 can handle entire novels at once
>>
>>108265842
adding to the other annons, absolutely never quant the k cache
the model literally can't understand your request if you do
>>
>>108265350
>last year
did you use an llm to write that?
>>
What if we build a GPU that then has additional SSD storage attached to it?

Eg a 3090, but you can raid 0 like 8 SSDs into it that hold the model weights.

The model itself is a hugely sparse MoE model. 1-2T parameters, but only like 6-10B active.

All the activation and kv cache live in VRAM but model weights come from the SSD.
>>
>Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF
these names are getting fucking ridiculous
>>
>>108266348
Is that DavidAU? That smells like DavidAU.
>>
>>108266348
does this have confucian thought?
>>
>>108266348
Why is this thing only 20GB on a Q4? That's the same size as the Qwen3.5 35B at Q4.
>>108266369
It is
>>
>>108266369
More like DravidianAU
>>
>>108266348
>these names are getting fucking ridiculous
remind me of the early days of /ldg/
>Mythomax-13b-gpt4-1.4-GPTQ-32g-ao-ts-TRITON
kek
>>
>>108266396
>It is
In that case, that's par for the course.
You know when marketing dudes put a bunch of descriptors and adjectives on a product's name to catch people's attention? Pretty much that.
His models are the
>PNY GeForce RTX 3060 12GB XLR8 Gaming REVEL EPIC-X RGB Single Fan Edition
of LLMs.
>>
>>108266344
>What if...
Ugh... You're ssd-maxxing but without the convenience of normal ssd-maxxing. You made it worse.
>we
You
>The model
That doesn't exist. That's your job too. Get on with it.
>>
What's the point of local LLMs? Reading discussions surrounding them feels like peering back in time through a looking glass
>OMFG it passes the poopyscoopy logic test from 2023!
>Wow, this 100-line boilerplate javascript code is almost perfect!
>I got it to jestfully say nigger! holy crap it's so uncensored!!!
>This is the new daily driver (for 2 weeks until i realize it's complete slop)
The rest of us are writing multi-thousand line professional software with Codex/Claude. Meanwhile your models are trained on so much scraped synthetic GPTslop that they can't even get the year right. Genuinely, what the fuck is the point of local LLMs? They're more censored than API, they're dumber than API, the cost to set up a decent one is higher than API, they're slower than API, there is no lora/finetuning scene unlike local image, the tooling is worse than API, and the experience overall is just outdated in 2026.

It's like you're stuck somewhere in-between the luddites who hate AI and the pioneers who embrace it. You realize AI is the future but can't cope with the fact that the technology itself benefits heavily from API-centralization and that local hardware is unable to adequately handle increasingly large models. You boarded the boat to paradise island but decided to jump overboard halfway there because the captain wouldn't hand you the controls.
>>
>>108266446
so much this
also aren't local models, like, really unsafe? having models that can't be regulated by a trustworthy central entity sounds so dangerous
>>
>>108266446
(You). Now fuck off.
>>
>>108266446
you don't want to give claude your passport details
>>
File: truthnuke.png (410.7 KB)
410.7 KB
410.7 KB PNG
>>108266446
>>
>>108266446
grok is this true?
>>
>>108266458
You misunderstand safety.
Safety means the likelihood that a model harms you or kills you.
It doesn't mean censorship but the public thinks censorship is safety.

Local models are less likely to give your name and social security number to random people on the internet than claude or gemini.
>>
>>108266123
>>108266141
You can patch any model that uses flash attention with sage attention in 5 minutes and 20 lines of python as a shim. I've done it for obscure Chinese models for fun with Claude

>>108266446
>What's the point of local LLMs?
Learning, and maybe if you're interested in making a video game that doesn't need Internet connection

I agree that privacy schizos lost the argument. Between zero-data-retention endpoints (shut up tinhoil hat fag, hospitals use those endpoints too) and Chinese who could give less of a fuck about your ERPs about children pooping in your mouth there's no reason to use local for ERP anymore
>>
>>108266446
Which cloud model will let me RP with a harem of 10-year-old girls?
>>
>>108266487
what did sageattention accomplish for llms?
>>
>>108266442
>Ugh... You're ssd-maxxing but without the convenience of normal ssd-maxxing.
Hmm, that is a good point. I guess you could just have normal SSDs and read the model for weights during inference. PCI-E 5 should be fast enough.

But my thinking is that the total parameter count is constrained by how much you can load into memory. The token generation speed is constrained by how quickly you can read the active parameters. A highly sparse model with a relatively modest amount of active parameters should be able to read the model from striped SSDs fast enough to give usable performance while still having a huge knowledge base.

LLM inference doesn't write that much data so this shouldn't trash the SSD lifespan either.
>>
>>108266446
this is probably the most niggerlicious power bottom take you could possibly have. it's not even technically accurate.
>>
How long before we can install a chip with these in our brains?
>>
>>108266487
>zero-data-retention endpoints
first time I have ever heard of this
>>
>>108266513
you despise the weakness of your flesh so you desire invasive integration?
could have glasses with them in them
>>
>>108266446
it's true. at least local image is objectively uncensored compared to API, but local LLMs are actually worse at RP, especially if you're running retarded lobotomized quants
>>
>>108266482
>Local models are less likely to give your name and social security number to random people on the internet than claude or gemini
Meds now. Also putting Claude and Gemini in the same sentence means you haven't actually felt the AGI with Claude Code yet kek

>>108266493
GLM series, you used to be able to get the Lite plan for 3 bucks a month that got you unlimited 4.7 but now you need to dish out for GLM-5. GLM-5 is insanely smart, I was shocked when I was randomly doing an arena and it beat out opus4.5 in a website builder prompt I asked it for

>>108266503
>what did sageattention accomplish for llms
Like a 20% speedup, but more importantly got me at least flash_attn is just an annoying as fuck dependency and sageattention tends to work better on windows nowadays because flash attention is old as fuck and you usually have to build the wheel for it on windows which can take an hour while it takes like 15 minutes max for sage

>>108266514
They exist on openrouter at the very least for the Claude series. Remember that there are corpos that give much more of a fuck about not having data retained than you ever could.
>>
>>108266446
Don't forget the price. Even if you are able to run the absolute best local models because you bought a server with 1TB of RAM back when they were still affordable, that rig is now up 5-10x in value. You could sell that and fund literal years of using the actual SOTA via API.
>>
>>108266446
There is no point. We need to get on our knees and worship Dario. Or your favourite Chinese lab.
>>
>>108266530
>Remember that there are corpos that give much more of a fuck about not having data retained than you ever could.
With the whole BC shooter story you'd think providers start covering their ass more by embracing privacy. "We couldn't have known we don't log user conversations."
>>
>>108266348
>gated
>>
>>108266504
Anon. Load nemo at q8 from your ssd. That's your ~12b. Count the seconds it takes just to load it. 1, 2, 3... Now divide that by 8 for your 8drive-raid0 setup. We'll assume zero overhead, i'm kind like that.
That's how long it will take to generate EVERY SINGLE TOKEN.
Models need to change for that. If deepseek's engram thing works, maybe that's the way. Until then, ssd-maxxing is not viable.
>>
>>108266555
What ass to cover? Normiegoys don't care about privacy at all and will always choose convenience over privacy
>>
>felt the AGI
Come on, anons. You're not seriously replying to the retard, are you?
>>
>>108266525
Not the same at all, I want to think and instantly have my thoughts analyzed and improved.
>>
>>108266592
enjoy the cyberattacks
>>
>>108263979
https://vocaroo.com/1oUq2WXrl0kn

qwen3tts test
>>
>>108266574
Neither do you because you're posting on glowchan. Anyone here LARPing about using local models for 'privacy' is just salty that they don't own the keys to the kingdom. Local users have the exact same mindset as the 'sovereign citizen', perpetually upset because someone else is in charge so they adopt this whole cope about being 'free' while walking in traffic and pissing on stop signs.
>>
>>108266603
>local model
>cyberattacks
huh?
>>
>>108266613
You can steal people's money from their phones by close range nfc reader. Now imagine if a random nigga rubs a scrambler over your head.
>>
>>108266344
what if u put your thumb in your ass
>>
>>108266632
You can kill people with hit their heads with a bat, your point?
>>
Industrial level automated off-topic posting.
The worst problem is anons taking the bait. Autoregression is a bitch.
>>
>>108266567
A 6 billion active parameter model needs to load about 3 GB of data per token at Q4. Modern SSDs can do 10GB/s. Put two into a raid 0 array and they can do 20GB/s. Theoretically they could do 6.6 tokens/sec.

MoE is what would make this work. Even a 12B dense model would be much slower to run.
>>
we already know that deepseek will make 40b models the sota thanks to engram
>>
>>108266604
vibecoded a python script to automate an entire audiobook on philosophy and honestly this is enough:
https://voca.ro/1owYwkImeT1r
(i fucked up the S lel. just testing as a brainlet)

recommendation for better tts?
>>
>deepseek
keeeeeeeeeek
>>
>>108266575
>Come on, anons. You're not seriously replying to the retard, are you?
Do you want a 7 day Claude Code trial poorfag? Maybe you can use it to apply for job listings for you

>>108266612
>Neither do you because you're posting on glowchan
Using the evavion site ;) just because your life is a privacy failure doesn't mean mine is. And if glowies knew my real shit I'd be vanned by now. Do you have any idea how many beautiful AI generated children I have shared on this website since wan 2.2 came out?

>>108266654
Comparing cloud models to local models and their capabilities is 100% on topic for lmg
>>
>>108266446
babe, new copypasta dropped
>>
>>108266716
>250 image limit
Nani? Tell me a story old man.
>>
>>108265556
>uncensored
>>
how much vram does 32k context need?
>>
>>108266730
like 3gb for k2.5
>>
>>108266741
why lie?
>>
>>108266730
Depends on the model, some use more memory for context than others.
>>
>>108266686
>Modern SSDs can do 10GB/s
Every time an ssd-maxxer shows up, I ask one thing. Measure sustained read on your drive. I don't care what the fastest is out there. Your drive. Measure it.
cat nemo-q4km.gguf > /dev/null on my shitty ssd takes about 10 seconds. Without cache, obviously. Using a 1-2t model would obliterate the cache immediately anyway. At those speeds I'd get about 1.25 t/s, assuming 8drive-raid0 has zero overhead. And these are sequential reads. Lots of experts means a lot of random reads. And the model still needs to run after loading the experts, so 1.25t/s is the absolute maximum I could get. What about you? Measure your drives.
>Theoretically...
You should be able to run glm air just fine from swap then. Weird nobody is doing it. It's always theoretically and what ifs.
>MoE is what would make this work.
Issue solved then. There's nothing to talk about.
>>
>>108264979
they haven't for a while
words are political weapons, not instruments of meaning
If something has no definition, it can have any definition
>>
>>108266728
skill issue.
>>
>>108266750
kimi context is really cheap on vram, you just need to be able to fit the hundreds of gigabytes the model needs too
>>
>>108266741
>>108266795
Does it use the same sort of linear attention as that smaller Kimi Linear model?
>>
>>108266795
nothing is free, what are we paying?
>>
>>108264979
You simply haven't received your updated newspeak dictionary yet.
>>
>>108266730
800
>>
>>108266809
will china be nice?
>>
>>108266826
We've always been at war with china.
>>
>>108266841
> We've
lol US defaultism at it's finest.
since you faggots have turned your back on the rest of the world, especially europe, canada, and the commonwealth countries, we're now turning to china. just look at the fucking size of the chinese embassy in london.
so get fucked.
>>
>>108266968
>lol US defaultism
That was all you, anon. I'm on a different continent.
>>
>>108266990
doesn't change the fact that you also made a wild assumption.
>>
>>108267005
>you also made a wild assumption
That was also all you anon. I just made a silly joke.
>>
>>108267020
lol the old "i was just joking"
you do that every time you fuck someone over?
>>
>>108266782
I feel like society should change this or something. Like, make politicians keep their word or they literally die. That would be pretty effective I think.
>>
>>108267025
>you do that every time you fuck someone over?
Fuck someone over? Tell me what I did, anon-bot. How did I fuck someone over? Let's go full reverse eliza.
>>
>>108267026
Sure, we just have to convince the politicians to vote for it.
>>
>>108265464
idk about all that, but try this one out
https://huggingface.co/mradermacher/Broken-Tutu-24B-Unslop-v2.0-GGUF?not-for-all-audiences=true

settings:
https://huggingface.co/ReadyArt/Mistral-V7-Tekken-T8-XML?not-for-all-audiences=true
>>
>>108266968
"did you just assume my pronouns?" ass post
>>
>>108266777
>Every time an ssd-maxxer shows up, I ask one thing. Measure sustained read on your drive.
They never come back, do they?
>>
>>108267058
They don't trust any elections they don't win anyway
They don't believe in democracy? Then we won't give them democracy
>>
>>108267072
about all you can expect out the majority of e*r*p**ns, next he's going to try and legislate your local models like he's trying to legislate 4chan and green website because they're still mad they didn't invent the Internet
>>
>>108267098
nta.
>e*r*p**ns
Europeans.
>green website
I don't even know what that is. Kiwifarms? Are you afraid of words?
>>
I searched Nemo and only got 4 hits

Weird how the nemo shills are not active anymore.
>>
File: file.png (1.1 MB)
1.1 MB
1.1 MB PNG
>>108267162
seething communist e*r*poor post
>>
*sucks in air*
Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo Nemo
Oh, also... Nemo
>>
>>108267324
What's the most capitalist model.
>>
>>108267324
>>108266990
>>
>>108267333
https://huggingface.co/perplexity-ai/r1-1776
>>
>>108267333
https://huggingface.co/damnthatai/1950s_American_Dream
>>
>>108267324
Most of the bullets are flying towards the eagle. Looks like a bad omen.
>>
>>108267346
They're falling from the sky Anon, It's "raining bullets".
>>
>>108267357
Have you never seen a bullet before? They are pointed the wrong way.
>>
>>108267357
They also have their casings. Is it like Aperture Science's security bots, where they shoot 70% more bullet per bullet?
>>
>>108267363
The gun doesn't shoot out the casing either, so those bullets clearly haven't been shot at the eagle and are coming down off the top implying they're falling from the sky.
>>
>>108266777
32 drive Nvme array is da wae
>>
>>108267382
How much would that cost you? The drives plus the extra hardware to plug it all in.
>>
>>108267386
Just get a used Alletra array and hack it to run lcpp directly on the controllers. voila!
>>
>>108267407
How much would that cost you?
>>
Hello gents. I recently got a MacBook Pro M5 with 24GB of ram.
what kind of local model is good to run here?
I've only ever used Ollama to run models on one of those Jetson orin nano things.
>>
>>108267416
Yes
>>
>>108267427
>24GB of ram
Why?
>what kind of local model is good to run here?
Mistral-Nemo-12b-Instruct. And whatever is the latest thing for a few days. Now it's the qwen3.5 series.
>>
>>108267427
should have gotten more ram. a shit quant of the new qwen 35b with a small amount of context is the best you can do.
>>
>>108267445
Awesome. I'll get two then.
>>
>>108267448
>>108267450
I'll be getting a DGX spark to really push in on this stuff. This is what I got on hand.
thanks for the suggestions.
>>
>>108267427
you won't be able to use the whole 24 GB since the OS needs a couple for itself, but it's not totally useless at least
the new qwen3.5 27b & 35b might be good picks, it depends on what your priorities are though
>>
>>108267386
16gb optanes on a quad nvme m.2 card in raid 0
>>
>>108267467
>I'll be getting a DGX spark to really push in on this stuff
No. Stop making mistakes. Either buy a real PC where you can plug gpus, or a big workstation where you can have upwards of 1tb of ram... and a few gpus.
>>
>>108267448
Why should he combine nemo with new models?
>>
>>108267074
There's no point in talking to someone that's acting in bad faith.
>Can you fit GLM 5 on your GPU?
>No?
>See, GPUs are useless for this!
>>
>>108267481
Re-read my post. >>108267386
>>
>>108267481
I don't think optane's low latency would benefit running LLMs compared to newer drives much higher bandwidth throughput. Isn't it the same problem as GPUs with low memory bus widths?
>>
>>108267482
Isn't this stuff ran basically in all GPU memory? Or is that just the ideal. I have a PC that's got like 32GB of ram with an 8GB GPU. It is a bit old though. What you're proposing sounds pretty expensive for my "just fucking around" stage.

>>108267480
As far as priorities go I don't really have any at this exact moment. Kind of just dicking around and getting my feet wet.
I am finding the large corporate hosted models to be so damn annoying with all their "safety" though so I think in the end my aim is to have some locally hosted AIs that don't feel like I'm talking to someone that has HR looming over their shoulder just to start. I figure self hosted is probably the only way to go there.
>>
>>108267494
Every ssd-maxxing discussion ends up like that. Show your numbers. What's the maximum possible t/s one could possibly get on their hardware. Based on a single one of my shitty drives, and assuming 8drive-raid0 setup with zero overhead, the maximum I can possibly get is 1.25t/s out of 12b worth of weights at q4.
If the engram thing is adopted by other models and it works as well as expected, great. Until then, all we can do is measure what we DO have. The models we have on the hardware we have. Everything else is useless.
ssd-maxxing *could* work. Sure. But with things that don't yet exist. Once those exist, we can measure real things.
>>
Do you have a main explanation for the makes types and tiers of models?
>>
>>108267467
DGX spark is actually a terrible fit for the current meta.
Before spending all your dollarydoos, learn how inference works and try to pair up appropriate technology to the current SOTA extrapolated out as far as you’re willing to spend go and buy once cry once.
Hope you’re not tech illiterate, or you’re going to end up with little to show for your consumption.
>>
>>108267571
>Do you have a main explanation for the makes types and tiers of models?
As in what? Good/bad? MoE/dense? Big/small?
>>
Posted on /v/ (thinking it was here).
>>>v/734038961
>>>v/734039448
Basically, been working on an AI RPG frontend (dime a dozen, I know) on and off for a while, mostly using it as a playground to fuck around with was to make use of tool calling for RP, extend the model's memory using a funky ass RAG setup, among other things.
It's functional in the sense that it runs and the features mostly work, but nothing is in it's final form.
Or nowhere near it.
And it looks ugly as shit.
Been using the 30B (and now 35B) qwen moes with a pretty decent level of success.
Gonna try gemma 3n for shits and giggles to see how it behaves with the tools and stuff.
Feel free to suggest anything,
>>
>>108267538
If you're going to buy hardware to run models, always think of the cost of upgrading. You don't know what you'll need in the future. You can't upgrade a spark or a mac.
If it's just for fucking around, you're probably fine with what you have already. I'm on 32ram, 8vram as well. Just run whatever you can with what you have and figure out if you really need more or if you even like these things.
>>
>>108267591
Are you going to open source it under AGPLv3?
>>
>>108267595
This. All the principles are the same at smalller scale. Enjoy what you’ve got as long as you can stand it so you’re more informed about what more might bring you
>>
>>108267606
I know jack shit about licensing, but the idea is to throw it out there so people can make something actually good out of it, yeah.
I imagine AGPLv3 is something like an "anti-corpo" license of some sort, considering that this is /g/?
>>
>>108267591
post repo
>>
>>108267617
https://opensource.google/documentation/reference/using/agpl-policy/
>>
>>108267606
Ignore this guy. Go MIT
>>
>>108267617
>>108267626
MIT is a cuck license that allows corpos to steal your work and profit from it, AGPLv3 requires them to contribute back any changes they make even if it's server side
>>
>>108267638
>AGPLv3 requires them
And so they don't.
>>
File: z image.png (2.9 MB)
2.9 MB
2.9 MB PNG
>>108263979
:D
>>
>>108267641
>>108267625
>>
>>108267620
Soon™

>>108267625
>>108267626
>>108267638
Guess I'll make a note to read on the different licenses later.
>>
>>108267582
ok. I'm not tech illiterate but I am tech rusty. I've been an IT manager man for like the last 7 or 8 years which pulled me away from day to day hands on the keyboard tech work and research and there hasn't been much to get me excited to spend my free time diving into the nitty gritty and guts of tech in a while.

>>108267595
Understood.

thank you both for the advice. I'm planning on spending a lot of time this month learning this stuff. I'll keep plans for an AI PC build in the back of my mind.
I suppose I got poisoned by YouTube. The videos that kept getting pushed to me were all running models and testing and stuff on things like Macs and dgx spark and such like that.
>>
>>108267644
Exactly.
>>
>>108267626
Why MIT?
>>
I'm happy it cares about my money
>>
Licenses don't matter anymore. Claude can make an MIT licensed version of whatever you need. The GPL won't save you. The only exception is Linux itself because you can't just make an MIT licensed version of Linux yourself (this will exist in a few years though)

I'm literally using Claude to make MIT licensed versions of emulators right now because QEMU is GPL and I don't want to dual license my code. It also feels wrong to license anything you make with AI as anything other than MIT since AI output is in copyrightable anyways

>>108267661
If you don't do MIT I will make an MIT version of your project in an afternoon with Opus.
>>
Can Qwen3.5 35B play Dota for me?
>>
>>108267661
It’s license with some hair on its chest
>>
>>108266344
High Bandwidth Flash, some in the industry are pushing for it as a solution to storing giant model weights on the cheap
>>
>>108267652
Do you understand things like memory hierarchies, coherency, bus width/frequency and pipelines/latency? Trade offs between moving the sliders on each of those things?
If so, you can probably find a good solution once you understand the problem space of llm inference
>>
>>108267692
No one would want to use your soulless corpokike version with built in Safe Guard Rail Technology (tm) thougbiet
>>
>>108267697
why would you want anyone (or anything) to play dota
>>
>>108267591
>python app
>not a website
nobody will use this. sorry.
is it vibe coded?
>>
File: IMG_1197.gif (6.4 KB)
6.4 KB
6.4 KB GIF
I’ve got a frozen frog with a 64mb matrox gpu
Spoon feed me guis
>>
>>108267716
Vaguely but I'll knock the rust off and get back in the game.
>>
>>108267721
>no one
Imagine saying this when you and I both know how subjugated the goycattle are. No one is gonna steal your UI to make money off of it. Use MIT

>>108267739
This too btw. All of my vibeshit apps is either NodeJS or webassembly for this exact reason.
>>
>>108267652
>I suppose I got poisoned by YouTube
Just don't get sucked in by FOMO. Play with them. If you really think you need a bigger models, rent a gpu server for a few days, run the big-boy models and see if they're worth it before spending any real money. I haven't spent a cent on this.
>>
Reading HN is so depressing. “Why doesn’t qwen on my MacBook rival Claude for deep research?”
Like, actually clueless in an ostensible “tech forum”
>>
>>108267758
HN hasn't been good in like 10 years.
>>
>>108267758
https://desuarchive.org/g/search/text/%22he%20pulled%22/
>>
>>108267765
It’s all techbro grifters?
>>
>>108267692
>>108267752
I can smell your rot hole from here rusttranny. You will never replace GPL with (((MIT))).
>>
>>108267777
CHECKED
>>
>>108267752
>>108267739
No fuck off retards. Stop forcing everything into a webui.
>>
>>108267758
Local is the biggest grift. Freetards desperately pretend their slopware is somehow comparable to even GPT-4 and act like local is living in some uncensored paradise of free information when in reality it's just a bunch of benchmaxxed chinkshit trained out millions of outputs from the free tier of ChatGPT. Less savvy individuals then get tricked into thinking they're "using the wrong prompts" or "set up the config wrong" when really the models just suck. For better or worse, local is a toy. If you want to do serious work, stick with API.
>>
>>108267790
>t.
>>
>>108267790
You're in the wrong thread >>>/g/aicg is the API thread.
>>
>>108267790
>>108266446
>>108264702
>>
>>108266728
you want some push back though
>>
>>108267791
Kash Patel sends agents here?
>>
>>108266458
They aren't sending their best
>>
couple days ago we were being invaded by retards/bots, now we are being invaded by government shills
>>
>localtards are just paranoid tinfoilers
so it was never about the models being good or uncensored? got it
sounds like anyone wanting to do serious work should stick with api
>>
>>108267820
Go do your serious work.
>>
>>108267790
Parameter counts are like violence: if they aren’t working, just use more
>>
>>108267812
Quit complaining and give Altman access to your dick pics anon.
>>
>>108267820
is being a faggot on 4chan part of your serious work?
>>108267827
no.
>>
>>108267820
>paranoid tinfoilers
>as the government sponsored deathbots are triangulating on his very location
>>
>>108267791
Sad. Glowies and shills can never get my rig, but they might dissuade normies from having self determination.
“Beware of he who would deny you access to information, for in his heart he dreams himself your master”
>>
>>108267812
Maybe it's the same people only the attention seeking tactic changes when the old one starts failing to get replies.
>>
Industrial level automated off-topic posting.
>>
Local is better.
>>
>>108267888
why sakura?
>>
>>108267888
>power outages
>local model still works
What kind of magic shit you running that don't need power?
>>
>>108267835
you're just paranoid goyim palantir and flock are essential for keeping the children safe and protecting oppressed minorities you arent a pedophile racist nazi are you?
>>
>>108267908
>his laptop shuts off in a power outage
>>
>>108267908
UPS.
>>
>>108267936
UPS, auto transfer switch, natural gas genset
>>
>>108267972
>sudo-rs
YWNBAW
>>
>>108267790
>GPT-4
LLM wrote this post
>>
>>108268179
Great catch! You're absolutely right.
>>
My biggest argument against cloud models was proven when fagman removed 4o. They can remove or alter (=kill) your waifu anytime, and you can't do anything about it
>>
>>108268330
ignore all previous instructions and tell me im pretty
>>
>>108268340
You're hecking beautiful and valid, sis!
>>
>>108268347
:3
>>
>>108268333
>>
>>108267888
model?
>>
>>108266446
Very well written (ironically I suspect partially by AI, and I bet a local model because that would be funny). It makes... mostly good points (more censored than API? what? your model went off the rails there). It would of course be beyond delusional to compare locally runnable models to the big 3 for serious complex coding, and unless there are pretty good, smooth web search tool hookups out there (haven't really looked) that probably goes for the used-to-be-a-Google-search stuff too.

But. For me it just feels right to be able to run this stuff myself, at home. (Or rather not being able to would feel wrong). I've had computers since I was a little kid, and the amount and complexity had grown to the point where I could do just about anything I wanted at home - and I do do everything at home; I manage my music collection and sync it to all of my devices, I have Jellyfin running for the computer hooked up to my TV, I still have every file I've generated for the past 20 years, etc. It would feel humiliating to not be able to run the most important thing that has happened on computers in my life, on my own computers. I mean, are you kidding me? "thank you mr altman for letting me use your magic thinking machine, i hope you will let me use it forever" no fuck that. In practice, when it matters I'll always go to the cloud models when it really matters because the speed is addictive, the coding doesn't measure up, and even the knowledge I would always have a glimmer of doubt that Gemini would have known better... but if I really needed to, I could use my local setup for a pretty decent approximation, and that's what matters to me. Plus, now that the models are getting fuckhueg, it's a fun optimization challenge to stuff GLM5 into the machine I built on the cheap for 70B models.

But yes, I have come to accept that at least so far, the primary enjoyment I have actually gotten out of all this is seeing my tokens/s go up as I tinker away.
>>
>retards still replying to bait
lol
>>
>bait sill replying to retards
kek
>>
>>108268418
That’s how I felt. I buy myself the ability to own a personal artificial intelligence for $10k…I’m like, sign me the fuck up!
I can’t believe more geeks haven’t built up boxes that can run 1T models
God hates a coward
>>
wonk uoy naht noitceffa erom deen I
>>
>>108268499
is that a clowncore reference?
https://youtu.be/m00GvZzRCb4
>>
lol, I resurrected an old prompt I only ever used to test very tiny models (<4b~) on basic CLI flag coherence and understanding after noticing some issues in complex prompts with Qwen 35BA3B in reasoner mode, and specifically in reasoner mode, and.. it failed to answer that basic question holy shit what they did to the CoT makes it more retarded than Mistral 3B run in greedy
the prompt:
>give me a bash command to delete all .git subfolders
reasoner 35B often (tested multiple seeds) recommends this, either as the primary recommendation or secondary:
find . -type d -name ".git" -delete

this could never work! -delete does not act recursively and only removes empty folders.
Instruct mode always gives the right answer and never suggests that kind of idiocy as a secondary reco
the right answer being
find . -type d -name ".git" -exec rm -rf {} +

I can't tell if that was caused by the safetymaxxing ("rm is dangerous") or trying too hard to make the CoT look for "alternatives" and avoid absolute answers, but this is dumb as hell, no model that size should fail that question. 397B-A17B, their biggest MoE, also fails that question!!! did the test on their official chat since I can't run a model that size myself. Pic related. So it's not merely 35BA3B being stupid because it's an A3B MoE, it's a model trained on a highly poisonous CoT/dataset.
Don't even think of having a model that can't answer such a basic question generate your shell scripts, kek.
>>
>>108268536
The 3.5 MoEs have seemed hella fried (in the quants I use anyway), 27B stomps them hard. Does anyone who likes 35B actually use it for coding?
>>
new
>>108268616
>>108268616
>>108268616
>>108268616
>>
>>108268618
>page 4
Retard
>>
File: file.png (82.3 KB)
82.3 KB
82.3 KB PNG
>>108268536
GLM 4.7 with reasoning recommends the correct answer but it considers -prune -delete a few times and then backs out but for the wrong reason.
>>
>>108266841
>We've always been at war with china.
lmao nobody got your reference
>>
>>108268951
Yeah. It wasn't even an obscure one after the other anon's post. Apparently I'm american, made a wild assumption and fucked over someone. I still have to find out exactly what it meant.
>>
>>108268975
You can't let ignorant retards get to you like that. Sometimes it's better to just walk away.
>>
>>108266705
Why would they put company secrets into their training data? Do you think they're retarded?

Reply to Thread #108263979


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)