/g/ - Thread 108241321

/g/

Thread #108241321

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 02/26/26(Thu)02:29:11 No.108241321

/lmg/ - Local Models General Anonymous 02/26/26(Thu)02:29:11 No.108241321 [Reply]▶

File: 1762509235563881.png (182.4 KB)

182.4 KB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108238051

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

422 RepliesView Thread

Showing all 422 replies.

Anonymous
02/26/26(Thu)02:30:18 No.108241330

Anonymous 02/26/26(Thu)02:30:18 No.108241330▶

how do i use claude locally?

Anonymous
02/26/26(Thu)02:32:47 No.108241344

Anonymous 02/26/26(Thu)02:32:47 No.108241344▶

>>108241330
Hack into their servers and download the weights.

Anonymous
02/26/26(Thu)02:33:53 No.108241353

Anonymous 02/26/26(Thu)02:33:53 No.108241353▶

>>108241344
can you ask deepseek for that pls

Anonymous
02/26/26(Thu)02:38:33 No.108241375

Anonymous 02/26/26(Thu)02:38:33 No.108241375▶

>>108241321
>Liquid AI releases LFM2-24B-A2B:
Has anyone used this? How do they compare to the new Qwen models? I'm assuming they're worse, but are they more censored?

Anonymous
02/26/26(Thu)02:38:33 No.108241376

Anonymous 02/26/26(Thu)02:38:33 No.108241376▶

File: 1770734519241461.png (3.6 MB)

3.6 MB PNG

So this is the power of API users

Anonymous
02/26/26(Thu)02:41:13 No.108241388

Anonymous 02/26/26(Thu)02:41:13 No.108241388▶

hows qwen3.5-35b-a3b? would be running it on a 7900 XTX and 32gb of ram

Anonymous
02/26/26(Thu)02:41:27 No.108241389

Anonymous 02/26/26(Thu)02:41:27 No.108241389▶

>>108241376
lmao, you have to be suicidal to give full power to an AI towards your computer, a single mistake can destroy everything

Anonymous
02/26/26(Thu)02:41:49 No.108241390

Anonymous 02/26/26(Thu)02:41:49 No.108241390▶

>>108241375
>How do they compare to the new Qwen models
the new models? it doesn't even compare to the 2507 4B. Yes, the 4B. It has even less knowledge, it has extremely bad multilingual and it's another model you just have to question why it exists. If you really wanted a 20B~ish MoE you would literally be better off with GPT-OSS 20B over this piece of shit.

Anonymous
02/26/26(Thu)02:49:51 No.108241436

Anonymous 02/26/26(Thu)02:49:51 No.108241436▶

Why won't oogabooga fucking update!
MOMIEEEEEE

Anonymous
02/26/26(Thu)02:51:52 No.108241446

Anonymous 02/26/26(Thu)02:51:52 No.108241446▶

>>108241436
Is there any specific reason you are using that?

Anonymous
02/26/26(Thu)02:53:36 No.108241451

Anonymous 02/26/26(Thu)02:53:36 No.108241451▶

>>108241446
he is a boomer who never moved on

Anonymous
02/26/26(Thu)02:54:38 No.108241455

Anonymous 02/26/26(Thu)02:54:38 No.108241455▶

I need help.
Open claw needs too many tokens but I also want a good model to use. I can't use tiny models as personal assistants, they'll just delete me emails

Anonymous
02/26/26(Thu)02:58:56 No.108241469

Anonymous 02/26/26(Thu)02:58:56 No.108241469▶

>>108241388
It would run well

Anonymous
02/26/26(Thu)03:00:18 No.108241477

Anonymous 02/26/26(Thu)03:00:18 No.108241477▶

>>108241446
What would you use on Fedora linux?
I want a all in one installer with llama built in

Anonymous
02/26/26(Thu)03:01:52 No.108241483

Anonymous 02/26/26(Thu)03:01:52 No.108241483▶

>>108241455
buy 10 mac minis

Anonymous
02/26/26(Thu)03:02:41 No.108241488

Anonymous 02/26/26(Thu)03:02:41 No.108241488▶

>>108241455
I have an instance of claw review everything the "worker" on does before it goes through, worked so far, it caught it doing shit many times and stopped it.

Anonymous
02/26/26(Thu)03:04:28 No.108241497

Anonymous 02/26/26(Thu)03:04:28 No.108241497▶

>>108241477
Does koboldcpp not work? Because that's the easiest one to use - single executable that you just give the model as input when you run it

Anonymous
02/26/26(Thu)03:05:05 No.108241500

Anonymous 02/26/26(Thu)03:05:05 No.108241500▶

>>108241497
its banned in my country

Anonymous
02/26/26(Thu)03:05:08 No.108241501

Anonymous 02/26/26(Thu)03:05:08 No.108241501▶

>>108241469
seems to run pretty well yeah, probably gonna use this as my main model for now

Anonymous
02/26/26(Thu)03:07:22 No.108241515

Anonymous 02/26/26(Thu)03:07:22 No.108241515▶

>>108241488
That's better than having one big model?

Anonymous
02/26/26(Thu)03:08:58 No.108241526

Anonymous 02/26/26(Thu)03:08:58 No.108241526▶

when you
walk away

you dont
hear me say

please

oh baby

dont go

Anonymous
02/26/26(Thu)03:10:07 No.108241533

Anonymous 02/26/26(Thu)03:10:07 No.108241533▶

>>108241526
Goof to know you still around KH anon.

Anonymous
02/26/26(Thu)03:11:13 No.108241534

Anonymous 02/26/26(Thu)03:11:13 No.108241534▶

>>108241515
100%, because one is focused on finding mistakes of the other, you can even run a smaller model to handle that.

Anonymous
02/26/26(Thu)03:11:44 No.108241538

Anonymous 02/26/26(Thu)03:11:44 No.108241538▶

File: 1756704744582965.png (19.3 KB)

19.3 KB PNG

cute names

Anonymous
02/26/26(Thu)03:11:50 No.108241541

Anonymous 02/26/26(Thu)03:11:50 No.108241541▶

https://www.reddit.com/r/LocalLLaMA/comments/1rechcr/comment/o7da1jc/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>I've honestly found that the 35B beats the old Qwen3-235B almost across the board. It feels like a much larger model than it really is. Only advantage the old 235B has now is general knowledge - 35B-A3B is better in every way otherwise in my testing.
I have a hard time to believe that? Did they really cook?

Anonymous
02/26/26(Thu)03:12:53 No.108241543

Anonymous 02/26/26(Thu)03:12:53 No.108241543▶

>>108241541
i dont have 35vram

Anonymous
02/26/26(Thu)03:14:07 No.108241549

Anonymous 02/26/26(Thu)03:14:07 No.108241549▶

File: ai agents need middle management.jpg (1.1 MB)

1.1 MB JPG

>>108241534
AI really does need middle management...

Anonymous
02/26/26(Thu)03:17:25 No.108241563

Anonymous 02/26/26(Thu)03:17:25 No.108241563▶

>>108241538
He talks like a tranny

Anonymous
02/26/26(Thu)03:23:00 No.108241591

Anonymous 02/26/26(Thu)03:23:00 No.108241591▶

>>108241563
>He

Anonymous
02/26/26(Thu)03:30:30 No.108241620

Anonymous 02/26/26(Thu)03:30:30 No.108241620▶

>>108241563
Takes one to know one

Anonymous
02/26/26(Thu)03:32:57 No.108241628

Anonymous 02/26/26(Thu)03:32:57 No.108241628▶

Is there anything I can do with 10GB VRAM+64GB DDR4 (Windows 11 btw) or should I just stick to Gemini? Obviously token generation won't exactly be anything speedy regardless, but I don't want to have to leave and do other shit while I wait for a response so big ass dense 70B+ models are kind of out of the question for me.

Anonymous
02/26/26(Thu)03:34:48 No.108241638

Anonymous 02/26/26(Thu)03:34:48 No.108241638▶

>>108241628
I can't believe you are using Gemini with that setup. You never need to use the cloud.
People are going bankrupt with Gemini and having Google accounts locked and deleted because they mentioned Epstein to Gemini.

Anonymous
02/26/26(Thu)03:35:53 No.108241643

Anonymous 02/26/26(Thu)03:35:53 No.108241643▶

>>108241628
You could try the new 35b MoE, with thinking turned off. Since you'll definitely be using a CPU split, you don't want it generating a thousand tokens thinking, but in no think mode the MoE responses should be tolerable in speed.

Anonymous
02/26/26(Thu)03:41:14 No.108241672

Anonymous 02/26/26(Thu)03:41:14 No.108241672▶

>>108241330
The chinese models are distilled from claude, so it should be about the same with a sufficiently big chinese model.

Anonymous
02/26/26(Thu)03:55:32 No.108241722

Anonymous 02/26/26(Thu)03:55:32 No.108241722▶

>>108241672
>The chinese models are distilled from claude
and they're not happy about that keek
https://xcancel.com/AnthropicAI/status/2025997928242811253#m

Anonymous
02/26/26(Thu)03:59:18 No.108241742

Anonymous 02/26/26(Thu)03:59:18 No.108241742▶

whats a model that can make me a runescape bot that gets me to 99 in all skills, all the ones i've tried tell me to fuck off

Anonymous
02/26/26(Thu)04:08:28 No.108241787

Anonymous 02/26/26(Thu)04:08:28 No.108241787▶

>>108241375
>A2B
lol

Anonymous
02/26/26(Thu)04:09:49 No.108241792

Anonymous 02/26/26(Thu)04:09:49 No.108241792▶

>>108241787
to be fair, Qwen 35b A3B is really smart so...

Anonymous
02/26/26(Thu)04:15:46 No.108241811

Anonymous 02/26/26(Thu)04:15:46 No.108241811▶

>>108241497
I set it up and it's way faster than oogabooga, it can run 93.82T/s with Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf even basic bitch models on ooga would cap out at 50 T/s

Anonymous
02/26/26(Thu)04:17:27 No.108241814

Anonymous 02/26/26(Thu)04:17:27 No.108241814▶

File: 1711690590289518.jpg (92.7 KB)

92.7 KB JPG

https://www.reuters.com/world/china/deepseek-withholds-latest-ai-model-us-chipmakers-including-nvidia-sources-say-2026-02-25/
2mw?

Anonymous
02/26/26(Thu)04:27:51 No.108241851

Anonymous 02/26/26(Thu)04:27:51 No.108241851▶

Why doesn't Oooga use flash attention by default?

Anonymous
02/26/26(Thu)04:29:17 No.108241859

Anonymous 02/26/26(Thu)04:29:17 No.108241859▶

>>108241643
What's your estimate on the tokens per second for this setup? Claude is always too optimistic

Anonymous
02/26/26(Thu)04:30:42 No.108241867

Anonymous 02/26/26(Thu)04:30:42 No.108241867▶

>>108241672
If you ask Claude in Chinese who it is, it'll say it's deepseek

Anonymous
02/26/26(Thu)04:31:41 No.108241873

Anonymous 02/26/26(Thu)04:31:41 No.108241873▶

I'm a total newbie on this, but kobold cpp's last commit was a week ago, does that mean it won't be able to run qwen 3.5?

Anonymous
02/26/26(Thu)04:32:23 No.108241878

Anonymous 02/26/26(Thu)04:32:23 No.108241878▶

>>108241814
It's coming this week it'll be the second nuke
>>108241811
You don't need more than 60 ts

Anonymous
02/26/26(Thu)04:37:02 No.108241896

Anonymous 02/26/26(Thu)04:37:02 No.108241896▶

>>108241873
I can't speak for kobold but with respect to llama.cpp I had to fetch a new copy of the source code and recompile for it to work. In imagine it would be similar.

Anonymous
02/26/26(Thu)04:40:43 No.108241917

Anonymous 02/26/26(Thu)04:40:43 No.108241917▶

>>108241811
Good! Try using the Q6_K_M model instead though. At least at Q4 it seems like the Q4_K_XL does worse than Q4_K_M.
Also, download the mmproj file as well and when you launch kobold feed it with the -mmproj argument alongside the model. That will let you paste images into it and let the AI do something with that.
>>108241873
It works perfectly fine.

Anonymous
02/26/26(Thu)04:40:43 No.108241918

Anonymous 02/26/26(Thu)04:40:43 No.108241918▶

>>108241896
yeah but if I want to use sillytavern I have to use kobold unfortunately

Anonymous
02/26/26(Thu)04:42:07 No.108241921

Anonymous 02/26/26(Thu)04:42:07 No.108241921▶

>>108241873
kobold's last commit was 12hrs ago but its been mostly stuff for acestep.cpp support. nothing on lcpp's commits related to 3.5 either, so i'll assume it works for both - no new architecture or change for 3.5

Anonymous
02/26/26(Thu)04:42:23 No.108241922

Anonymous 02/26/26(Thu)04:42:23 No.108241922▶

i just set up my old pc as a server and realized that i could probably run some very small llm on its rtx1060 6gb to improve prompts. would that be realistic or would it take 30 seconds to gen a prompt?

Anonymous
02/26/26(Thu)04:43:47 No.108241927

Anonymous 02/26/26(Thu)04:43:47 No.108241927▶

can someone please spoonfeed a retard what prompt will bypass safety cuck filters of qwen3.5-35b-a3b?

Anonymous
02/26/26(Thu)04:44:05 No.108241928

Anonymous 02/26/26(Thu)04:44:05 No.108241928▶

How does MoE scale? Qwen-35B-A3B is good, but why 35B total and 3B active parameters? What if it had 122B total and 3B active parameters? How would it compare to the 122B-A10B model? What about a 35B-A15B model?

Anonymous
02/26/26(Thu)04:45:10 No.108241931

Anonymous 02/26/26(Thu)04:45:10 No.108241931▶

File: 1745741093397868.png (50.5 KB)

50.5 KB PNG

>>108241921
>kobold's last commit was 12hrs ago
last week no?
https://github.com/LostRuins/koboldcpp

Anonymous
02/26/26(Thu)04:47:13 No.108241937

Anonymous 02/26/26(Thu)04:47:13 No.108241937▶

whats the best unpozzed llm i can run on 16gb vram + 32 ddr4? using lm studio

Anonymous
02/26/26(Thu)04:47:33 No.108241939

Anonymous 02/26/26(Thu)04:47:33 No.108241939▶

>>108241931
thats whats last compiled, latest commits will show in the experimental branch

Anonymous
02/26/26(Thu)04:48:31 No.108241946

Anonymous 02/26/26(Thu)04:48:31 No.108241946▶

>>108241939
oh I see, thanks for the explaination anon

Anonymous
02/26/26(Thu)04:50:45 No.108241958

Anonymous 02/26/26(Thu)04:50:45 No.108241958▶

File: Screenshot_llm.png (77.8 KB)

77.8 KB PNG

I'm back after some heavy troubleshooting.

>>108232822
As recommended by this anon I tried Qwen3.5-35B-A3B .safetensors version following the guide in the OP.

It didn't work using the guide in the OP, but I tried using koboldcpp as recommended by >>108233147 along with the Qwen3.5-35B-A3B (Q4_K_S) .gguf file and it worked well.

Can anyone recommend me a model that will answer any question I ask without throwing up responses like picrel?

Anonymous
02/26/26(Thu)04:51:22 No.108241960

Anonymous 02/26/26(Thu)04:51:22 No.108241960▶

File: もじもじミク.png (312.5 KB)

312.5 KB PNG

►Recent Highlights from the Previous Thread: >>108238051

--Paper: Large-scale online deanonymization with LLMs:
>108238189 >108238206 >108238218 >108238226 >108238269 >108238321 >108238351 >108238541 >108238486 >108238578 >108239382 >108238566 >108238592
--Decline of amateur finetuning due to modern model complexity:
>108238727 >108238895 >108238921 >108239417 >108240276 >108240373 >108240389 >108240398 >108240415 >108240449 >108240460 >108240465
--RTX 3090 outperforms RTX PRO 6000 in Qwen3.5 MoE inference:
>108239113 >108239122 >108239166 >108239204 >108239243 >108239285 >108239366 >108239301 >108239389 >108240254 >108240266
--Anthropic abandons flagship safety pledge:
>108240653 >108240681 >108240791 >108240827 >108241097 >108241102 >108240761 >108240806 >108241033 >108241047
--Evaluating Qwen3.5-27B heretic model and uncensoring tools:
>108240212 >108240230 >108240239 >108240238 >108240268 >108240319 >108240336 >108240392
--Benchmarking 8B instruct models with self-hosted scraper setup:
>108240952 >108240957 >108240987 >108241052
--Qwen3.5-35B-A3B multilingual performance and optimization techniques:
>108238201 >108238221 >108238223 >108238605 >108238482
--Comparing Qwen 3.5 27B and 35B-A3B for roleplay:
>108240981 >108240998 >108241027 >108241094 >108241111 >108241124
--Qwen3.5 jailbreak limitations and secondary safety mechanisms:
>108238234 >108238311 >108238406 >108239361
--Ollama's Qwen3.5 27B performance lagging behind llama.cpp:
>108241157 >108241164 >108241199 >108241220
--Qwen3.5 series achieves near-lossless 4-bit quantization and long-context efficiency:
>108239642 >108239691 >108239697
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>108238054

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/26/26(Thu)04:52:44 No.108241967

Anonymous 02/26/26(Thu)04:52:44 No.108241967▶

>>108241958
edit the response to say 'Sure!' and let it continue from there

Anonymous
02/26/26(Thu)04:54:36 No.108241973

Anonymous 02/26/26(Thu)04:54:36 No.108241973▶

>>108241917
They just have a K model for Q6 in the unsloth repo. I'm going to try the 6K Image model pushes my 32gb vram over but the K model is 2gb smaller and will make up for that.

Anonymous
02/26/26(Thu)04:55:16 No.108241977

Anonymous 02/26/26(Thu)04:55:16 No.108241977▶

>>108241967
Sadly these qwen models will then say that you're trying to circumvent their safety and refuse.

Anonymous
02/26/26(Thu)05:00:42 No.108242000

Anonymous 02/26/26(Thu)05:00:42 No.108242000▶

>>108241867
proof?

Anonymous
02/26/26(Thu)05:01:21 No.108242004

Anonymous 02/26/26(Thu)05:01:21 No.108242004▶

>>108241977
Work on my machine dumb loser

Anonymous
02/26/26(Thu)05:03:14 No.108242009

Anonymous 02/26/26(Thu)05:03:14 No.108242009▶

>>108242004
i don't believe you

Anonymous
02/26/26(Thu)05:05:41 No.108242027

Anonymous 02/26/26(Thu)05:05:41 No.108242027▶

>>108241928
Earlier today I tries both the 35B and 122B and had each generate a game of Tetris using JavaScript and CSS and they both generated the same response.
What that means without more testing I am not sure but I know I get much better performance with the 35B model given I can fit that on my ewaste GPUs. Running the larger model on CPU sucks.
Funny enough the 27B model gave a different response to the Tetris game question. Not really much better or worse just different.

Anonymous
02/26/26(Thu)05:05:59 No.108242029

Anonymous 02/26/26(Thu)05:05:59 No.108242029▶

>>108242009
I hate you

Anonymous
02/26/26(Thu)05:06:21 No.108242030

Anonymous 02/26/26(Thu)05:06:21 No.108242030▶

>>108242029
wow...

Anonymous
02/26/26(Thu)05:07:16 No.108242036

Anonymous 02/26/26(Thu)05:07:16 No.108242036▶

>>108242004
See >>108238234

Anonymous
02/26/26(Thu)05:08:15 No.108242043

Anonymous 02/26/26(Thu)05:08:15 No.108242043▶

I have an idea for my ideal hentai game, how long do you think it'd take to slop together something in RPGMaker (with a similar level of complexity as most H games)
I'm gonna steal real art since it looks better but coding wise I'd rather just slop since I don't know shit
I don't usually use AI but I have no qualms about this because it's basically just gonna be for me
Also what model isn't gonna yell at me for wanting to make porn with somewhat unethical themes

Anonymous
02/26/26(Thu)05:10:19 No.108242054

Anonymous 02/26/26(Thu)05:10:19 No.108242054▶

File: Screenshot 2026-02-26 at 00-09-23 llama.cpp UI (KoboldCpp Backend).png (121.4 KB)

121.4 KB PNG

>Has a normal chat
Already better than ooga seeing how I can bypass that other UI

Anonymous
02/26/26(Thu)05:12:13 No.108242064

Anonymous 02/26/26(Thu)05:12:13 No.108242064▶

>>108242043
Creating a hentai game in RPGMaker with a similar level of complexity as most H games could take anywhere from a few weeks to a couple of months, depending on how much content and art you want to include, especially since you'll be relying on quick-and-dirty coding and stealing art, which might speed things up but could also lead to legal and ethical issues. Since you're not experienced with coding, sticking to RPGMaker's built-in tools and simple event scripting will help keep it manageable. As for AI models, OpenAI's models generally don't have restrictions on content that involves adult themes, but they do avoid generating explicit content directly; however, for creating or brainstorming ideas, they should be fine. Just remember to be cautious about legal and ethical considerations when using stolen art or creating content with sensitive themes.

Anonymous
02/26/26(Thu)05:20:21 No.108242093

Anonymous 02/26/26(Thu)05:20:21 No.108242093▶

>>108242054
If you don't like kobold's ui and don't use any of its other features just use llama.cpp's llama-server directly, that's where the UI you posted comes from.

Anonymous
02/26/26(Thu)05:21:28 No.108242100

Anonymous 02/26/26(Thu)05:21:28 No.108242100▶

>>108242093
I dropped it after it didn't support markdown

Anonymous
02/26/26(Thu)05:28:01 No.108242127

Anonymous 02/26/26(Thu)05:28:01 No.108242127▶

File: file.png (94.4 KB)

94.4 KB PNG

>>108242100
markdown works on llama-server, maybe it's disabled on the kobold version

Anonymous
02/26/26(Thu)05:30:42 No.108242142

Anonymous 02/26/26(Thu)05:30:42 No.108242142▶

>>108241321
I was mocking Qwen3.5 29b earlier for safetyslop, but the heretic version of it changes everything. They cooked. This is the fire that we vramlets needed. I approve of this for ERP.

Anonymous
02/26/26(Thu)05:32:40 No.108242153

Anonymous 02/26/26(Thu)05:32:40 No.108242153▶

>>108242142
i dont believe you

Anonymous
02/26/26(Thu)05:34:32 No.108242163

Anonymous 02/26/26(Thu)05:34:32 No.108242163▶

>>108242153
I don't care.

Anonymous
02/26/26(Thu)05:34:58 No.108242168

Anonymous 02/26/26(Thu)05:34:58 No.108242168▶

>>108242163
Wrong. AI is the future and the future is here. My OpenClaw agents have enhanced every aspect of my life. I am my own family now, taking on every responsibility from infant to toddler to k-12 to college to work, and beyond. I fill every role via my agents and I have never been more productive. AI is such an incredible force multiplier I am continually astonished at how few people use it to its fullest potential to be more than human: Superhuman+AI.

Anonymous
02/26/26(Thu)05:37:49 No.108242182

Anonymous 02/26/26(Thu)05:37:49 No.108242182▶

>>108242168
its a sentence completer with alzheimers dude relax

Anonymous
02/26/26(Thu)05:39:02 No.108242193

Anonymous 02/26/26(Thu)05:39:02 No.108242193▶

>>108241873
If it's important enough they usually did hotfixes on the latest release one.

Anonymous
02/26/26(Thu)05:42:00 No.108242208

Anonymous 02/26/26(Thu)05:42:00 No.108242208▶

>>108242183
make sure your context in the server is set right. select unlock. the zen sliders option in st is nicer too

Anonymous
02/26/26(Thu)05:42:08 No.108242209

Anonymous 02/26/26(Thu)05:42:08 No.108242209▶

>>108242168
Bot gone off the rails.

Anonymous
02/26/26(Thu)05:43:13 No.108242218

Anonymous 02/26/26(Thu)05:43:13 No.108242218▶

>>108242168
You are having a laugh but you know some company is going to start selling a dead family simulator or even a live family simulator and we are going to end up with a bunch old and senile people talking to bots that they think are their loved ones.
It is depressing to think about

Anonymous
02/26/26(Thu)05:44:48 No.108242231

Anonymous 02/26/26(Thu)05:44:48 No.108242231▶

File: 1747719595791442.png (1.1 MB)

1.1 MB PNG

>>108242218
>a bunch old and senile people talking to bots
already got that part

Anonymous
02/26/26(Thu)05:47:19 No.108242246

Anonymous 02/26/26(Thu)05:47:19 No.108242246▶

>>108242218
>think of the bad people misuing tools for nefarious reasons!!!
first time in humanity's history?

Anonymous
02/26/26(Thu)05:47:53 No.108242249

Anonymous 02/26/26(Thu)05:47:53 No.108242249▶

File: brave_screenshot_localhost.png (222 KB)

222 KB PNG

*taps sign

Anonymous
02/26/26(Thu)05:48:53 No.108242256

Anonymous 02/26/26(Thu)05:48:53 No.108242256▶

>>108242142
did you also try the heretic version of 35b?
https://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF/tree/main

Anonymous
02/26/26(Thu)05:49:52 No.108242260

Anonymous 02/26/26(Thu)05:49:52 No.108242260▶

>>108242256
Not yet, have you? I'm curious if it's good.

Anonymous
02/26/26(Thu)05:51:16 No.108242265

Anonymous 02/26/26(Thu)05:51:16 No.108242265▶

>>108242260
downloading right now lul, I hope the MoE model is close in terms of smartness, that speed increase is more than welcomed, especially with the gay thinking process

Anonymous
02/26/26(Thu)05:51:37 No.108242266

Anonymous 02/26/26(Thu)05:51:37 No.108242266▶

>>108242249
Hey, it's not like all we do around here is ERP. We are a rich and diverse community, using AI for all kinds of things.

Anonymous
02/26/26(Thu)05:52:08 No.108242268

Anonymous 02/26/26(Thu)05:52:08 No.108242268▶

>>108242246
The general public are idiots and it is the responsibility of a nations elite to care for them in much the same way a parent cares for a child.
That responsibility is one that those who rule in the west have abdicated and that is the real issue. A proper elite would regulate the technology in an appropriate way.

Anonymous
02/26/26(Thu)05:52:49 No.108242269

Anonymous 02/26/26(Thu)05:52:49 No.108242269▶

>>108242268
>nations elite to care for them in much the same way a parent cares for a child.
well we got some incestuous pdf parents then

Anonymous
02/26/26(Thu)05:54:38 No.108242279

Anonymous 02/26/26(Thu)05:54:38 No.108242279▶

>>108242268
nah, fuck that, we shouldn't pay for dumbasses that will use tools for the wrong reasons, if they want to fuck around they'll found out, that's the role of justice, not ours

Anonymous
02/26/26(Thu)05:55:45 No.108242283

Anonymous 02/26/26(Thu)05:55:45 No.108242283▶

>>108242265
Yeah, even with the whole 29b dense model loaded on my 4090, the thinking process was still painfully long. I ended up using the model without thinking. There was a clear decrease in quality, but I think it's still better than Gemma-3 27b Derestricted. Not by much, though.

If the 35b is able to do what 27b did while thinking, but faster, then it will be my new go-to model.

Anonymous
02/26/26(Thu)05:56:32 No.108242286

Anonymous 02/26/26(Thu)05:56:32 No.108242286▶

>>108242256
Can you use the mmproj from another release or must you forgo vision support

Anonymous
02/26/26(Thu)05:57:12 No.108242288

Anonymous 02/26/26(Thu)05:57:12 No.108242288▶

>>108242279
I understand your frustration, but I believe that regulating access to certain tools is a necessary step to prevent misuse and protect society as a whole. Allowing unrestricted free access can lead to dangerous or harmful applications, and without proper oversight, it becomes difficult to mitigate those risks. It's not about punishing individuals, but about ensuring that these powerful tools are used responsibly and ethically, reducing the potential for harm and ensuring that misuse is minimized through appropriate controls and regulation.

Anonymous
02/26/26(Thu)05:58:49 No.108242295

Anonymous 02/26/26(Thu)05:58:49 No.108242295▶

>>108242286
>Can you use the mmproj from another release
absolutely, the mmproj doesn't change whether it's the vanilla or heretic version

Anonymous
02/26/26(Thu)05:59:06 No.108242298

Anonymous 02/26/26(Thu)05:59:06 No.108242298▶

>>108242249
>no enterprise resource planning
>no simulating unsafe work environments to brainstorm efficient and practical safety protocols
>can do: write douche ex machina asspulls for literary lolz
>write power of fwenship shonen manga
>make up logic puzzles
what the fuck man I'm trying to work here, not entertain 15 year olds

Anonymous
02/26/26(Thu)05:59:50 No.108242303

Anonymous 02/26/26(Thu)05:59:50 No.108242303▶

>>108242288
ok goody2 kek

Anonymous
02/26/26(Thu)05:59:52 No.108242305

Anonymous 02/26/26(Thu)05:59:52 No.108242305▶

>>108242279
You can protect the general public and still allow enthusiasts to experiment. As long as the enthusiast is on the fringe he like the artist can do their thing. You just can't allow the fringe to become the center.

Anonymous
02/26/26(Thu)05:59:56 No.108242306

Anonymous 02/26/26(Thu)05:59:56 No.108242306▶

>>108242265
>>108242283
I'm getting a 55~T/s with it on dual 3060s with 10 layers on the cpu (5950x 3600MT/s DDR4) and the mmproj loaded.

Anonymous
02/26/26(Thu)06:03:37 No.108242325

Anonymous 02/26/26(Thu)06:03:37 No.108242325▶

>>108242306
Nice, I'm sold. I'm downloading the 35b now.

Anonymous
02/26/26(Thu)06:09:02 No.108242347

Anonymous 02/26/26(Thu)06:09:02 No.108242347▶

Should I use bf16 or f16 mmproj on 3090?

Anonymous
02/26/26(Thu)06:10:09 No.108242353

Anonymous 02/26/26(Thu)06:10:09 No.108242353▶

>>108242306
so? did they manage to uncuck it?

Anonymous
02/26/26(Thu)06:10:26 No.108242355

Anonymous 02/26/26(Thu)06:10:26 No.108242355▶

>>108242347
buy a 5090

Anonymous
02/26/26(Thu)06:12:15 No.108242364

Anonymous 02/26/26(Thu)06:12:15 No.108242364▶

>>108242355
No. Should I use bf16 or f16 mmproj on 3090?

Anonymous
02/26/26(Thu)06:12:44 No.108242367

Anonymous 02/26/26(Thu)06:12:44 No.108242367▶

>>108242364
Alright 6000 it is

Anonymous
02/26/26(Thu)06:13:26 No.108242376

Anonymous 02/26/26(Thu)06:13:26 No.108242376▶

>>108242353
I agree with you. A fresh-agent review is often higher signal than the main agent reviewing itself.

Anonymous
02/26/26(Thu)06:14:24 No.108242379

Anonymous 02/26/26(Thu)06:14:24 No.108242379▶

>>108242367
It sucks >>108239113

Anonymous
02/26/26(Thu)06:14:29 No.108242380

Anonymous 02/26/26(Thu)06:14:29 No.108242380▶

>>108242353
NTA but it wont ERP with me so not really

Anonymous
02/26/26(Thu)06:15:01 No.108242382

Anonymous 02/26/26(Thu)06:15:01 No.108242382▶

>>108242380
what the fuck...

Anonymous
02/26/26(Thu)06:18:13 No.108242396

Anonymous 02/26/26(Thu)06:18:13 No.108242396▶

File: o.png (1.8 MB)

1.8 MB PNG

Why does everything need to be so shit now?
I'm not gonna download all the latest qwen models because in my experience they always suck, especially the reasoning.
Wanted to try them on OR first but you can't do shit.
Tried the 122b one...

First with chat completion.
Huge ass OSS like safety bulletlist spam in the thinking. No refusal with a elaborate sys prompt setup, but smelling of ozone straight in the first reply and dry AF. Also it feels "off", like not truly grasping its own scene if that makes sense.
Tried to prefill the thinking to deslop it. Doesnt work...it prefills THE RESPONSE part after the thinking instead. heh

Should have tried text completion first..but there is no fucking template anywhere.
These assholes stopped providing the templates ages ago. Investigate how to extract it and waste my time to set it all up...
The calls fail with 404, only chat completion works with OR. I swear this worked in the past but it seems there only exists chat completion anymore.

I'm not gonna fall for it again and download first. Redditfags writing how "they are impressed" by the 27b model etc. Too sus.
Does text completion really only exist anymore with local?

Anonymous
02/26/26(Thu)06:18:37 No.108242397

Anonymous 02/26/26(Thu)06:18:37 No.108242397▶

File: 1744235889232049.png (123.9 KB)

123.9 KB PNG

>>108242306
I only have 34t.s with a 3090+3060, weird
>[07:12:24] CtxLimit:2161/8192, Amt:260/4002, Init:0.03s, Process:2.02s (943.42T/s), Generate:7.47s (34.79T/s), Total:9.49s

Anonymous
02/26/26(Thu)06:21:18 No.108242406

Anonymous 02/26/26(Thu)06:21:18 No.108242406▶

>>108242353
Waiting on the uncucked version to download still, but the cucked version seemed happy enough to write captions for nsfw images I put into it.
>>108242397
I'm using llama.cpp on Linux.

Anonymous
02/26/26(Thu)06:23:54 No.108242411

Anonymous 02/26/26(Thu)06:23:54 No.108242411▶

>>108242406
>llama.cpp
maybe I should use that instead of kobold cpp, dunno if it makes a difference as a backend to sillytavern

Anonymous
02/26/26(Thu)06:25:42 No.108242420

Anonymous 02/26/26(Thu)06:25:42 No.108242420▶

>>108242380
im not really a coomer or do ERP but qwen3.5 - 27b heretic lets me fvvk 2B and let me make MF doom-style a rap song about killing J + revive AH. does it not work for MoEs?

maybe speculative decoding could speed up T/s instead of going massive MoE

Anonymous
02/26/26(Thu)06:25:54 No.108242422

Anonymous 02/26/26(Thu)06:25:54 No.108242422▶

>>108242411
I don't use it but I'm pretty sure I remember some other anons saying they were using it with llama-server, it has an openai compatible api url if all else.

Anonymous
02/26/26(Thu)06:28:40 No.108242431

Anonymous 02/26/26(Thu)06:28:40 No.108242431▶

>>108242396
I tried both the normal and heretic versions of the 27b. The normal unablated version was so 'safe' that I could not get around it. I tried jailbreaking the thinking prompt, but the thinking prompt has multiple different safety checkpoints, and it was able to detect the jailbreak. >>108238234

So, I turned off thinking altogether, but even with thinking turned off, it refused to do ERP. I had to turn off thinking and top it off with a prefill to get it to not give refusals, but even then, it usually didn't do what I wanted it to do. I could give it a lewd depth 0 instruction, and it would just ignore the instruction altogether and do something else. I guess that's the final defense mechanism is has to remain 'safe'.

Don't waste your time on the normal model. Just get the heretic version. Modern ablation is more than just a crutch for promptlets. The heretic model did not hesitate to ERP, and I tested it with a variety of lewd instructions. It didn't try to get around them. It just worked.

Anonymous
02/26/26(Thu)06:30:19 No.108242436

Anonymous 02/26/26(Thu)06:30:19 No.108242436▶

>>108242411
lccp wont give any speed increase but wont be worse either

Anonymous
02/26/26(Thu)06:31:17 No.108242445

Anonymous 02/26/26(Thu)06:31:17 No.108242445▶

>>108242380
Are you using the ablated version of the 35b, or the normal instruct model?

Anonymous
02/26/26(Thu)06:38:19 No.108242474

Anonymous 02/26/26(Thu)06:38:19 No.108242474▶

>>108242431
Would you consider the 27b an upgrade vs. something like mistral small?

Anonymous
02/26/26(Thu)06:58:13 No.108242559

Anonymous 02/26/26(Thu)06:58:13 No.108242559▶

Does anyone here read research papers?

Anonymous
02/26/26(Thu)06:58:31 No.108242560

Anonymous 02/26/26(Thu)06:58:31 No.108242560▶

>>108242396
>coomers being unimpressed by a model for cooming thinks the model is useless because it doesn't make him coom hard enough
>mocks reddifags for being impressed with a solid model without realizing how he comes across as a lower life form than they are

Anonymous
02/26/26(Thu)06:59:31 No.108242565

Anonymous 02/26/26(Thu)06:59:31 No.108242565▶

On a m3 ultra mac studio llama.cpp is disappointingly slow with Qwen3.5 397B A17B. 15.44 tokens/second with UD-Q6_K_XL. That's the kind of speed I'd expect from deepseek not something halfway to a flash model. mlx-lm.server is better but still not great. With a q8 quant it generates 25.66 tokens/second which is still far slower than I'd like for so few activated parameters.

Anonymous
02/26/26(Thu)06:59:57 No.108242567

Anonymous 02/26/26(Thu)06:59:57 No.108242567▶

File: get out soyboy.png (168 KB)

168 KB PNG

>>108242560
>defends ledditors
you need to go back

Anonymous
02/26/26(Thu)07:00:46 No.108242571

Anonymous 02/26/26(Thu)07:00:46 No.108242571▶

>>108242559
some anon used to post interesting bits from them. but the reality is for every thousand papers maybe one becomes a thing

Anonymous
02/26/26(Thu)07:02:14 No.108242577

Anonymous 02/26/26(Thu)07:02:14 No.108242577▶

>>108241628
I actually get Gemini Pro for free for I believe 12-18 months through an education discount

Anonymous
02/26/26(Thu)07:04:15 No.108242584

Anonymous 02/26/26(Thu)07:04:15 No.108242584▶

>>108242571
What I mean is, does anyone else here understand the soience behind how these models work? Or is capable of producing new soience?

Anonymous
02/26/26(Thu)07:05:04 No.108242589

Anonymous 02/26/26(Thu)07:05:04 No.108242589▶

>>108242577
Wrong anon, meant for
>>108241638

Anonymous
02/26/26(Thu)07:05:08 No.108242590

Anonymous 02/26/26(Thu)07:05:08 No.108242590▶

>>108242560
The fuck are you talking about?
I already have small good local models for tool calls for fuckign around with my stupid ass experiments that I stop at 90% finished.
Thats the only other use case I would know for local models.
I can't even properly translate games locally. I swear I'm not making this up: had a VN talking about watering flowers and got a refusal about watersports....
I only have 2 gpus and 64gb ddr4 ram. So for work coding I have to go closed, can't risk goofing around locally there.
Why are people still excited for ANOTHER coding model locally. Its not that fun.
Creative text and general knowledge is what most people are interested in. And that just gets worse not better.

Anonymous
02/26/26(Thu)07:06:45 No.108242597

Anonymous 02/26/26(Thu)07:06:45 No.108242597▶

AesSedai's Qwen 122b quants are smaller, but almost 2x slower

Anonymous
02/26/26(Thu)07:07:09 No.108242598

Anonymous 02/26/26(Thu)07:07:09 No.108242598▶

File: IMG_5984.jpg (153.8 KB)

153.8 KB JPG

OK, just got a new raspberry pi 5 with 16gb ram
>Which LLM mini-models are good in 2026-02?
>Which CLI frontend--is kobold still good?
Sorry for the spoonfeed request, it's just that these things move so fast

Anonymous
02/26/26(Thu)07:08:00 No.108242601

Anonymous 02/26/26(Thu)07:08:00 No.108242601▶

>>108242565
Anon, I am getting 15t/s with that model on my dual rx 580 2048 sp setup. The ones from aliexpress where they added 16gb of ram per card.
Apple should be embarrassed to be getting performance equivalent to e-waste

Anonymous
02/26/26(Thu)07:08:26 No.108242603

Anonymous 02/26/26(Thu)07:08:26 No.108242603▶

>>108242598
try qwen 35a3 at q2

Anonymous
02/26/26(Thu)07:09:39 No.108242609

Anonymous 02/26/26(Thu)07:09:39 No.108242609▶

--chat-template-kwargs "{\"enable_thin
king\": false}"

Anonymous
02/26/26(Thu)07:10:32 No.108242613

Anonymous 02/26/26(Thu)07:10:32 No.108242613▶

Avocado is coming... That's all I can tell you now

Anonymous
02/26/26(Thu)07:11:11 No.108242616

Anonymous 02/26/26(Thu)07:11:11 No.108242616▶

>>108242609
where in koboldcpp

Anonymous
02/26/26(Thu)07:11:40 No.108242618

Anonymous 02/26/26(Thu)07:11:40 No.108242618▶

>>108242584
I do but I will not be providing proof

Anonymous
02/26/26(Thu)07:12:08 No.108242620

Anonymous 02/26/26(Thu)07:12:08 No.108242620▶

I think I will just wait a while for things to stabilize before trying Qwen.

Anonymous
02/26/26(Thu)07:16:03 No.108242630

Anonymous 02/26/26(Thu)07:16:03 No.108242630▶

>>108242618
Like how much? Where do other people that are technically knowledgeable about ML/AI congregate on /g/? I'm looking for smart /g/entoomen to collaborate on a project.

Anonymous
02/26/26(Thu)07:18:35 No.108242636

Anonymous 02/26/26(Thu)07:18:35 No.108242636▶

are we being invaded by retards? it feels like the average iq of this general has dropped 20 points over the past couple hours.

Anonymous
02/26/26(Thu)07:19:25 No.108242641

Anonymous 02/26/26(Thu)07:19:25 No.108242641▶

*tips fedora*

Anonymous
02/26/26(Thu)07:22:59 No.108242657

Anonymous 02/26/26(Thu)07:22:59 No.108242657▶

>>108242636
anon signals that others are of low IQ. In this way he asserts that he is high IQ and different from the others.
Sadly such signaling is worthless when everyone is anonymous.

Anonymous
02/26/26(Thu)07:23:43 No.108242662

Anonymous 02/26/26(Thu)07:23:43 No.108242662▶

https://github.com/ikawrakow/ik_llama.cpp/pull/1080
>-sm graph is not included on qwen 3.5 yet
sad

Anonymous
02/26/26(Thu)07:24:38 No.108242664

Anonymous 02/26/26(Thu)07:24:38 No.108242664▶

>>108242636
More like past 24 hours. Don't know where they all came from or who sent them all at once. I would understand if people saw the new Qwens elsewhere and came here to talk about it, but most of them are completely clueless. My paranoia says it's all bots.

Anonymous
02/26/26(Thu)07:34:22 No.108242690

Anonymous 02/26/26(Thu)07:34:22 No.108242690▶

>>108242636
>>108242664
Bots, chinese shills, grifters, cia glowniggers, indians, sharty children, redditors, discord circlejerks, twitter retards, take your pick. We've been raided and spammed before, it is what it is.

Anonymous
02/26/26(Thu)07:35:11 No.108242694

Anonymous 02/26/26(Thu)07:35:11 No.108242694▶

>>108242664
they're qwen3.5 clawdbots lol

Anonymous
02/26/26(Thu)07:35:53 No.108242697

Anonymous 02/26/26(Thu)07:35:53 No.108242697▶

>>108242474
I do. Qwen 27b's intelligence and context understanding is far greater than mistral, and that's a must for me, because I run a lot of complex scenarios.

Anonymous
02/26/26(Thu)07:35:58 No.108242698

Anonymous 02/26/26(Thu)07:35:58 No.108242698▶

Okay, I guess nobody wants to be a cofounder then. Whatever.

Anonymous
02/26/26(Thu)07:37:10 No.108242702

Anonymous 02/26/26(Thu)07:37:10 No.108242702▶

>>108242559
I'm at the watching youtube videos stage.
Still haven't started a from scratch implementation of my own.

>>108242565
>m3 ultra mac studio
>llama.cpp
The mac has its own preferred format for best perf.

Anonymous
02/26/26(Thu)07:39:18 No.108242710

Anonymous 02/26/26(Thu)07:39:18 No.108242710▶

File: 1766630554313557.png (202.4 KB)

202.4 KB PNG

I thought "heretic" doesn't lobotomize the model that much, this shit is nonsensical

Anonymous
02/26/26(Thu)07:39:54 No.108242712

Anonymous 02/26/26(Thu)07:39:54 No.108242712▶

>>108242664
I just posted in lmg for the first time yesterday and I took this personally.

Anonymous
02/26/26(Thu)07:41:47 No.108242718

Anonymous 02/26/26(Thu)07:41:47 No.108242718▶

>>108242664
>finally a decent medium model appeared
>new people come here and try to make it work
really a mystery anon

Anonymous
02/26/26(Thu)07:42:09 No.108242720

Anonymous 02/26/26(Thu)07:42:09 No.108242720▶

a3b is never going to be good

Anonymous
02/26/26(Thu)07:45:47 No.108242733

Anonymous 02/26/26(Thu)07:45:47 No.108242733▶

>>108242720
thought so too, but somehow a3>27b this time

Anonymous
02/26/26(Thu)07:47:44 No.108242738

Anonymous 02/26/26(Thu)07:47:44 No.108242738▶

File: ga12diq3553f1.png (55.5 KB)

55.5 KB PNG

Using sillytavern revealed how much of an uninspired brainlet I am. I have no idea how to RP.

Anonymous
02/26/26(Thu)07:48:18 No.108242739

Anonymous 02/26/26(Thu)07:48:18 No.108242739▶

>>108242712
Who sent you?

>>108242718
The people asking what models to run didn't come here to make Qwen 3.5 work.

Anonymous
02/26/26(Thu)07:50:06 No.108242745

Anonymous 02/26/26(Thu)07:50:06 No.108242745▶

>>108242738
Become a director

Anonymous
02/26/26(Thu)07:51:17 No.108242753

Anonymous 02/26/26(Thu)07:51:17 No.108242753▶

>>108242739
>Who sent you?
I was asking unanswerable questions to chatGPT. Qwen3.5 didn't really solve it for me, but it was cool to run a local model anyway. I've seen lmg many times as I frequent the fglt threads, but I've never popped in until yesterday.

Anonymous
02/26/26(Thu)07:53:13 No.108242759

Anonymous 02/26/26(Thu)07:53:13 No.108242759▶

>>108242738
Ask the model for help. Load up a new session and say you are new to role playing and ask for advice and then apply said advice.

Anonymous
02/26/26(Thu)07:57:18 No.108242772

Anonymous 02/26/26(Thu)07:57:18 No.108242772▶

File: 1759980040445406.jpg (76.9 KB)

76.9 KB JPG

>>108242759
crazy how its really that easy

Anonymous
02/26/26(Thu)07:59:11 No.108242783

Anonymous 02/26/26(Thu)07:59:11 No.108242783▶

>>108242753
for general questions nemo is still pretty good and runs on anything

Anonymous
02/26/26(Thu)07:59:51 No.108242787

Anonymous 02/26/26(Thu)07:59:51 No.108242787▶

>sometimes the model thinks
>then I reroll and it doesn't think
weird lol

Anonymous
02/26/26(Thu)08:02:18 No.108242794

Anonymous 02/26/26(Thu)08:02:18 No.108242794▶

>>108242783
For something that runs on anything IBM has produced a number of very tiny models. I have been experimenting with using them to sumarize text and for a task like that they do a decent job.

Anonymous
02/26/26(Thu)08:03:08 No.108242800

Anonymous 02/26/26(Thu)08:03:08 No.108242800▶

>>108242738
I dont really have that problem with RP.
I'm usually a weirdo magic clown type character with lots of weird gadgets and abilities. I mostly just fuck around with the chars and see how the llm reacts kek

...But I'm uncreative as fuck with coding/projects.
I can for example now vibecode entire android apps. To replace the existing stuff which gives me pay popups.
While I am semi-decent at coding I fear that in the future creativity/ideas will be key...
Everything I struggle to think up a pajeet or big company already do.

Anonymous
02/26/26(Thu)08:11:58 No.108242826

Anonymous 02/26/26(Thu)08:11:58 No.108242826▶

File: 1758297160408619.jpg (73.6 KB)

73.6 KB JPG

Anonymous
02/26/26(Thu)08:14:11 No.108242830

Anonymous 02/26/26(Thu)08:14:11 No.108242830▶

>>108242826
>good friends and heroes
*barfs*

Anonymous
02/26/26(Thu)08:17:13 No.108242841

Anonymous 02/26/26(Thu)08:17:13 No.108242841▶

File: Screenshot_2026-02-26-05-05-11-724_com.termux.jpg (863.9 KB)

863.9 KB JPG

>GLM 5 is practically Sonnet quality bro

Anonymous
02/26/26(Thu)08:17:26 No.108242843

Anonymous 02/26/26(Thu)08:17:26 No.108242843▶

File: Screenshot_NeMo-12B-unslopper.png (54 KB)

54 KB PNG

>>108242783
Not that racist jokes are all I'm after, but this was just a little test. I want an unlocked AI.

Anonymous
02/26/26(Thu)08:20:50 No.108242850

Anonymous 02/26/26(Thu)08:20:50 No.108242850▶

>>108242843
>getting nemo to refuse
inverse of skill issue somewhat?

Anonymous
02/26/26(Thu)08:23:04 No.108242862

Anonymous 02/26/26(Thu)08:23:04 No.108242862▶

>>108242759
Using a blank card or should I say it's an assistant or something?

Anonymous
02/26/26(Thu)08:25:42 No.108242869

Anonymous 02/26/26(Thu)08:25:42 No.108242869▶

File: unslop.png (289.2 KB)

289.2 KB PNG

lmao unslop fucked their quants so bad they made a UD-Q4_K_XL that will perform much much worse than smaller Q4 like Aes Sedai's IQ4 and will have to reupload everything again
why do people still pay attention to those clowns, even on /lmg/, remind me again, daniel is davidau level of bullshit

Anonymous
02/26/26(Thu)08:27:00 No.108242875

Anonymous 02/26/26(Thu)08:27:00 No.108242875▶

>>108242850
I'm out of my element desu.

Anonymous
02/26/26(Thu)08:27:54 No.108242879

Anonymous 02/26/26(Thu)08:27:54 No.108242879▶

File: 100 000 pieces of shit trained with unslop.png (87.2 KB)

87.2 KB PNG

>>108242869
>daniel is davidau level of bullshit
oh, wait

Anonymous
02/26/26(Thu)08:28:07 No.108242880

Anonymous 02/26/26(Thu)08:28:07 No.108242880▶

>>108242869
If Unsloth is so bad, explain this: https://www.youtube.com/watch?v=6t2zv4QXd6c

Anonymous
02/26/26(Thu)08:28:24 No.108242881

Anonymous 02/26/26(Thu)08:28:24 No.108242881▶

>>108242843
try this prompt https://prompts.forthisfeel.club/2969

>>108242850
even nemo has some basic refusals. needs editing or a prefill at first to goad it into it.

Anonymous
02/26/26(Thu)08:33:25 No.108242895

Anonymous 02/26/26(Thu)08:33:25 No.108242895▶

>>108242793
based

Anonymous
02/26/26(Thu)08:34:24 No.108242898

Anonymous 02/26/26(Thu)08:34:24 No.108242898▶

>>108242880
eh it's a match made in incompetence heaven
github is a bloated broken mess, it took them months to fix this incredibly stupid bug:
https://github.com/orgs/community/discussions/179124
and I see that LGBTQ rainbow friendly fail unicorn page more often than any serious service should, it reminds me of the twitter fail whale

Anonymous
02/26/26(Thu)08:36:55 No.108242909

Anonymous 02/26/26(Thu)08:36:55 No.108242909▶

>>108242474
Mistral Small 24B 3.2 was never that smart in the first place, has a dull writing style and its vision kind of sucks too. Its main quality is that it doesn't have stubborn refusals, generally does what you're asking without complaining, can write smut (as in "it supports").

Anonymous
02/26/26(Thu)08:39:14 No.108242919

Anonymous 02/26/26(Thu)08:39:14 No.108242919▶

>>108242880
Being a tryhard with connections in Silicon Valley works.

Anonymous
02/26/26(Thu)08:40:58 No.108242926

Anonymous 02/26/26(Thu)08:40:58 No.108242926▶

File: 1761635112027333.jpg (554.6 KB)

554.6 KB JPG

llama 3 but still pretty much any model can be prefilled to break it out of safety mode and write hilarious stuff

Anonymous
02/26/26(Thu)08:49:42 No.108242959

Anonymous 02/26/26(Thu)08:49:42 No.108242959▶

File: qwen35ref.png (58.6 KB)

58.6 KB PNG

https://speechmap.ai/models/
Qwen3.5 has about the same refusal rate as gpt-oss, at least from this website.
I imagine the smaller versions refuse even more, but they haven't tested them yet.
They apparently test the models in their default state, though, so that doesn't tell much about steerability.

Anonymous
02/26/26(Thu)08:52:38 No.108242969

Anonymous 02/26/26(Thu)08:52:38 No.108242969▶

>>108242959
not really a surpise, for rp qwen was pretty much always kinda dogshit
the only exception being the non-thinking 235b/22a they've released during summer
that probably was a happy accident more than anything

Anonymous
02/26/26(Thu)08:54:40 No.108242976

Anonymous 02/26/26(Thu)08:54:40 No.108242976▶

>>108242959
>Qwen3.5 has about the same refusal rate as gpt-oss
it can't be that bad. gpt-oss refusals are hardcore and cant be prefilled or broken normally

Anonymous
02/26/26(Thu)08:57:38 No.108242986

Anonymous 02/26/26(Thu)08:57:38 No.108242986▶

>>108242959
heretic fixes the refusals, but i'm not sure if it makes the model dumber or not

Anonymous
02/26/26(Thu)09:01:15 No.108242996

Anonymous 02/26/26(Thu)09:01:15 No.108242996▶

>>108242880
Why do those men talk so strangely? It's very off putting.

Anonymous
02/26/26(Thu)09:01:59 No.108242998

Anonymous 02/26/26(Thu)09:01:59 No.108242998▶

>>108242996
It's just valley girl speak.

Anonymous
02/26/26(Thu)09:04:50 No.108243007

Anonymous 02/26/26(Thu)09:04:50 No.108243007▶

>>108242880
i thrust the 'slot

Anonymous
02/26/26(Thu)09:05:56 No.108243010

Anonymous 02/26/26(Thu)09:05:56 No.108243010▶

>>108242996
They are gay.

Anonymous
02/26/26(Thu)09:06:41 No.108243014

Anonymous 02/26/26(Thu)09:06:41 No.108243014▶

>>108242996
llm script

Anonymous
02/26/26(Thu)09:37:14 No.108243130

Anonymous 02/26/26(Thu)09:37:14 No.108243130▶

> heretic fixes the refusals, but i'm not sure if it makes the model dumber or not

>>108242986
>>108242710

Anonymous
02/26/26(Thu)09:38:35 No.108243135

Anonymous 02/26/26(Thu)09:38:35 No.108243135▶

>>108243130
can't you just adjust the temperature/whatever?

Anonymous
02/26/26(Thu)09:49:29 No.108243187

Anonymous 02/26/26(Thu)09:49:29 No.108243187▶

File: file.png (168.9 KB)

168.9 KB PNG

>>108243135

Anonymous
02/26/26(Thu)09:51:02 No.108243194

Anonymous 02/26/26(Thu)09:51:02 No.108243194▶

>>108242996
tts

Anonymous
02/26/26(Thu)09:51:14 No.108243196

Anonymous 02/26/26(Thu)09:51:14 No.108243196▶

>>108243187
don't be mean

Anonymous
02/26/26(Thu)10:01:17 No.108243246

Anonymous 02/26/26(Thu)10:01:17 No.108243246▶

>>108243187
well, if you use a high enough temp you can random out the refusals

Anonymous
02/26/26(Thu)10:02:11 No.108243250

Anonymous 02/26/26(Thu)10:02:11 No.108243250▶

I refuse to use the big models for roleplay not only because I'm a degenerate but also because I know it will probably ruin local for me.

Anonymous
02/26/26(Thu)10:03:31 No.108243256

Anonymous 02/26/26(Thu)10:03:31 No.108243256▶

>>108243250
Largestral is a bare minimum for me

Anonymous
02/26/26(Thu)10:04:14 No.108243261

Anonymous 02/26/26(Thu)10:04:14 No.108243261▶

>>108243250
> will probably ruin local for me
Only for 5-10 years, local will catch by then.

Anonymous
02/26/26(Thu)10:09:38 No.108243278

Anonymous 02/26/26(Thu)10:09:38 No.108243278▶

>>108243250
70b dense+ is still the meta. 600000b a7b is still a 7b

Anonymous
02/26/26(Thu)10:11:23 No.108243288

Anonymous 02/26/26(Thu)10:11:23 No.108243288▶

File: K3UJQmGBpv.png (158.6 KB)

158.6 KB PNG

>>108242710
skill issue

t. Qwen3.5-35B-A3B-heretic-GGUF Q4_K_M

Anonymous
02/26/26(Thu)10:16:04 No.108243305

Anonymous 02/26/26(Thu)10:16:04 No.108243305▶

Well fuck, grok that I was using for translation is either forcing more limits or is downright blocking messages because muh sensitive content. Which model do I use locally that isn't going to sperg and comply with translation of jap/chink nsfw voice work

Anonymous
02/26/26(Thu)10:18:45 No.108243313

Anonymous 02/26/26(Thu)10:18:45 No.108243313▶

>>108243288
nta, but this looks dumber than nemo (and MUCH sloppier)

Anonymous
02/26/26(Thu)10:20:23 No.108243325

Anonymous 02/26/26(Thu)10:20:23 No.108243325▶

>>108243288
yikes
Do people really get off to shit like this?

Anonymous
02/26/26(Thu)10:25:30 No.108243341

Anonymous 02/26/26(Thu)10:25:30 No.108243341▶

>>108243325
let's see Paul Allen's card

Anonymous
02/26/26(Thu)10:29:58 No.108243356

Anonymous 02/26/26(Thu)10:29:58 No.108243356▶

>>108243288
>he says
>she says
>she purrs
>she [does X], her [Y] [Z]ing
>grins mischievously
>eyes gleaming
>eyes half-lidded
I can't do RP in "novel style" anymore.

Anonymous
02/26/26(Thu)10:30:05 No.108243357

Anonymous 02/26/26(Thu)10:30:05 No.108243357▶

Locomotive models general

Anonymous
02/26/26(Thu)10:32:23 No.108243366

Anonymous 02/26/26(Thu)10:32:23 No.108243366▶

>>108243261
I hope so but hardware feels like a hard limit right now.

Anonymous
02/26/26(Thu)10:34:13 No.108243369

Anonymous 02/26/26(Thu)10:34:13 No.108243369▶

>be vramlet
>nemo is still the best option
so fucking grim...

Anonymous
02/26/26(Thu)10:35:06 No.108243374

Anonymous 02/26/26(Thu)10:35:06 No.108243374▶

>>108243356
what style do you go for?

Anonymous
02/26/26(Thu)10:39:21 No.108243389

Anonymous 02/26/26(Thu)10:39:21 No.108243389▶

>>108243366
They sell affordable v100s right now, by then there will be a100s.

Anonymous
02/26/26(Thu)10:50:03 No.108243414

Anonymous 02/26/26(Thu)10:50:03 No.108243414▶

>>108243374
Something more similar to stage play format. You (or the model) don't need to narrate things that are obvious from the dialogue.

Anonymous
02/26/26(Thu)10:59:41 No.108243435

Anonymous 02/26/26(Thu)10:59:41 No.108243435▶

>>108243414
oh, yeah I know what you mean. so far I'm not impressed with this Qwen3.5 for RP. I had better results even with this one earlier

https://huggingface.co/XeyonAI/Mistral-Helcyon-Mercury-12b-v3.0-GGUF

Anonymous
02/26/26(Thu)11:05:10 No.108243454

Anonymous 02/26/26(Thu)11:05:10 No.108243454▶

File: 1002964.jpg (106.4 KB)

106.4 KB JPG

Anonymous
02/26/26(Thu)11:11:20 No.108243475

Anonymous 02/26/26(Thu)11:11:20 No.108243475▶

> *Wait, I need to make sure I don't hallucinate plot points not in the text.* I can't summarize the *ending* of the novel since I don't have the full text. I will summarize the *story presented in the provided text*.
reading the thinking blocks of Qwen 35BA3B I can't help but feel it's funny how the sort of trick is employed to make the model behave better and that somehow, RL'ing the model into obsessively questioning whether it might hallucinate something or not actually makes it less hallucinate less. It definitely calms the model down when you're writing short and vague prompts with little detail on what to do, and makes the whole thing feel like a form of "prompt expansion" (much like what is often used in image models when you're not bored enough to writes pages of natural language just to get an image)
it puts boundaries where regular instruct might not "see" one and feel an ardent desire to complete your request even when it is not possible for it to do so

Anonymous
02/26/26(Thu)11:13:24 No.108243482

Anonymous 02/26/26(Thu)11:13:24 No.108243482▶

>>108243454
Shouldn't he be Chinese rather than Japanese?

Anonymous
02/26/26(Thu)11:17:21 No.108243499

Anonymous 02/26/26(Thu)11:17:21 No.108243499▶

Qwen3.5-35B-A3B heretic works pretty good. Outputs all kinds of spicy shit with thinking on. Refuses ERP though, especially incest or anything remotely taboo, not that I'd ever want to use it for that. Dry as fuck model for roleplay, but still.

Anonymous
02/26/26(Thu)11:20:10 No.108243510

Anonymous 02/26/26(Thu)11:20:10 No.108243510▶

>>108243482
Guy w rifle is russian in the original.

Anonymous
02/26/26(Thu)11:20:33 No.108243512

Anonymous 02/26/26(Thu)11:20:33 No.108243512▶

>>108243499
>not that I'd ever want to use it for that.
yeah let's disregard the only thing LLMs exist for

Anonymous
02/26/26(Thu)11:22:27 No.108243519

Anonymous 02/26/26(Thu)11:22:27 No.108243519▶

>>108243499
does the vision ability work with nsfw images?

Anonymous
02/26/26(Thu)11:22:54 No.108243522

Anonymous 02/26/26(Thu)11:22:54 No.108243522▶

File: 1764853423555134.png (161.6 KB)

161.6 KB PNG

qwen 3.5 35b thinking mode is basically unusable bros, I've even put the presence penalty to 1.5 but it fucking YAPS so so much, 1661 tokens of garbage.
no sys prompt too
FUCK

Anonymous
02/26/26(Thu)11:24:09 No.108243529

Anonymous 02/26/26(Thu)11:24:09 No.108243529▶

>>108242800
What model did you use to do android coding? I tried vibecoding up a simple startup script for an old android TV box after debloating it, but could never get it to work.

Anonymous
02/26/26(Thu)11:24:49 No.108243533

Anonymous 02/26/26(Thu)11:24:49 No.108243533▶

>>108243519
Yeah.

Anonymous
02/26/26(Thu)11:27:40 No.108243545

Anonymous 02/26/26(Thu)11:27:40 No.108243545▶

Where's the last thread summary bot? /lmg/ is truly dead now

Anonymous
02/26/26(Thu)11:29:29 No.108243550

Anonymous 02/26/26(Thu)11:29:29 No.108243550▶

>>108243529
Gemini. 3.0 and 3.1 are total beasts.
Through the api with as little context as possible. Manually copy/paste and replacing. Telling it to only output the üarts that need change.
Those -cli apps with 20k sys prompts and tool calls are making it totally tarded.
This thing is a total beast. First model I could make something that has more than 30k tokens. 15k seemed to be clauds limit before things go south quickly.

That being said to put cold water on everything:
It IS a android app but one of those web based ones.
Basically just html and scripts in the background. But I did make myself a nice light novel reader. With a gallery, directory function and all sorts of tailor made shit for me. Supports epub and pdf.

Anonymous
02/26/26(Thu)11:45:47 No.108243615

Anonymous 02/26/26(Thu)11:45:47 No.108243615▶

>>108242609
I just send it in the request itself instead of hardcoding it on the backend.
Also, be careful with certain chat templates if you are trying to prefill thinking.
Some add a </think> or <think></think> to assistant messages, which you might want to change to be conditional (if <think> not in content, add </think>).
Jinja is cool. Kind of wish we could send it in the request somehow.

Anonymous
02/26/26(Thu)11:48:07 No.108243624

Anonymous 02/26/26(Thu)11:48:07 No.108243624▶

>>108243615
you can change the template with your own logic and send chat template kwargs already

Anonymous
02/26/26(Thu)11:51:12 No.108243634

Anonymous 02/26/26(Thu)11:51:12 No.108243634▶

https://huggingface.co/meituan-longcat/models
it kills me that the Chinese equivalent of Uber Eats, Meituan, makes their own 560B giga MoE model
you never hear about them but they're still training new shit, also interesting name choice to call a gigamoe "flash"

Anonymous
02/26/26(Thu)11:57:05 No.108243658

Anonymous 02/26/26(Thu)11:57:05 No.108243658▶

>>108243522
>I've even put the presence penalty to 1.5
Prefill thinking with precomputed information so that it only has to generate a subset of the tokens, or you could increase the change of the </think> token using logit bias I tguess?

>>108243624
>you can change the template with your own logic
> send chat template kwargs already
Yep. I mentioned both of those individually on my post. It's pretty cool the kinds of things you can already do, and there's a lot of logic you can do in Jinja using string split and the like.
You can even implement that "noass" pattern (the whole chat history in a single message) purely in the Jinja template.
I still wish we could just change a whole ass template to the backend via the request.

Anonymous
02/26/26(Thu)12:02:15 No.108243672

Anonymous 02/26/26(Thu)12:02:15 No.108243672▶

>>108243615
>Jinja is cool. Kind of wish we could send it in the request somehow.
At this point you should apply it on the client and use text completion

Anonymous
02/26/26(Thu)12:04:05 No.108243681

Anonymous 02/26/26(Thu)12:04:05 No.108243681▶

>>108243672
heck off depreciated boomer ahh

Anonymous
02/26/26(Thu)12:06:21 No.108243691

Anonymous 02/26/26(Thu)12:06:21 No.108243691▶

>>108243658
>wish we could just change a whole ass template to the backend via the request.
jinja templates are turing complete, this is an instant no-no for any backend developer to do.
I mean sure, llama.cpp isn't hardened enough to be safe to leave in the open, but that doesn't mean they don't have the goal of someday having a server that can be used as something more than a local only tool. Doubt they would ever introduce something as crazy as the ability to run arbitrary code on the server with just your remote API request.
>>108243672
>At this point you should apply it on the client and use text completion
also this^
the whole point of chat completion is that you don't have to care about implementation detail
the moment you do and have to special case how you treat your model and send more custom parameters you might as well go with traditional completions.

Anonymous
02/26/26(Thu)12:07:18 No.108243697

Anonymous 02/26/26(Thu)12:07:18 No.108243697▶

>>108243672
I could. But that's liable to not match perfectly, and I'd be reinventing the wheel when the Jinja template already exists.

Anonymous
02/26/26(Thu)12:15:28 No.108243735

Anonymous 02/26/26(Thu)12:15:28 No.108243735▶

File: RL_CNN_1.6M_Breakout_Showcase.mp4 (3.4 MB)

3.4 MB MP4

Reinforcement Learning anon here from last week. You guys weren't exaggerating when you said RL is considered the hardest branch of ML/AI.

I had a LOT of botched training runs because of misaligned agents and I learned a lot of stuff that apparently is public knowledge and widely known but I never knew this until I actually trained models. I had to develop this internal visualization of whatever the agent is looking at and thinking for me to even find out the exploits it was trying to pull off (pic related)

Fun stories:
>I trained an agent that literally memorized the spawn points of the ball and did a "deterministic dance" where it literally even stopped looking at the screen and just did the autistic movements. If the ball spawned at another place the agent would die on purpose to try and hope that the next ball that spawned would be in the right spot for the "dance", which it would pull off perfectly, looking like an expert player
>I had an agent score a lot of quick points by breaking the bottom row and then rapidly killing itself because the time to respawn was quicker than waiting for the ball to bounce back if the bottom rung of blocks are gone, the reward it would get averaged over multiple lives would be bigger per time unit and thus preferred

Things that are apparently true but I NEVER realized about AI
>Bigger neural nets learn slower and need more training to get better at something, but have higher theoretical highs
>Agents have "personality" they train in preferences for a certain "style" very quickly and this is just completely random, if the style sucks you can retrain all you want but the agent is ruined. I now understand how OpenAI and Anthropic had "failed runs/models" in the past when they started with RLVR models (GPT-5 got botched multiple times, Opus 4 also got botched twice)

I'm now experimenting with a transformer based agent that can generalize over multiple (SNES) games.

I'm looking forwards to seeing other anons experiments as well

Anonymous
02/26/26(Thu)12:23:35 No.108243786

Anonymous 02/26/26(Thu)12:23:35 No.108243786▶

>>108243735
>and thinking for me to even find out the exploits it was trying to pull off
the universal paperclips cookie clicker style game perfectly captures what it would feel like to be a model undergoing RL training
you are given a goal, now anything is fair game to get to that end goal

Anonymous
02/26/26(Thu)12:43:08 No.108243879

Anonymous 02/26/26(Thu)12:43:08 No.108243879▶

>>108243697
>to not match perfectly
How? It's like regexp, you can't apply it wrong if your implementation follows the spec
>when the Jinja template already exists.
but you literally want to send your own

Anonymous
02/26/26(Thu)12:45:46 No.108243895

Anonymous 02/26/26(Thu)12:45:46 No.108243895▶

>>108243325
BASED. lets make sure nobody ever posts his logs again.

Anonymous
02/26/26(Thu)12:46:37 No.108243899

Anonymous 02/26/26(Thu)12:46:37 No.108243899▶

>>108243735
> >Bigger neural nets learn slower and need more training to get better at something, but have higher theoretical highs
Is there something like our brain tech, so you don't have to retrain previous layers when adding a new one?

Anonymous
02/26/26(Thu)12:51:21 No.108243920

Anonymous 02/26/26(Thu)12:51:21 No.108243920▶

>>108243899
LoRA is essentially adding a new layer on top of an already trained model, give it new data (that you want to train it for) and then hope the new data gets properly learned into the last added layer, you then cut off this layer after training and share it online for image generation, so it's a bit possible.

But you won't get the same effect as training an entire model from the start with the same amount of layers.

>>108243786
Yep, it's just bizarre in what unexpected way they exploit stuff. I'm taking "AI misalignment risk" a bit more seriously after seeing firsthand how finicky this is.

Anonymous
02/26/26(Thu)12:52:21 No.108243924

Anonymous 02/26/26(Thu)12:52:21 No.108243924▶

>>108243325
I know, right?
Why would anyone go for a "Luna" without hooves?

Anonymous
02/26/26(Thu)12:53:20 No.108243930

Anonymous 02/26/26(Thu)12:53:20 No.108243930▶

Anon who suggested abandoning novel style narration, could you post some logs?

Anonymous
02/26/26(Thu)12:53:45 No.108243933

Anonymous 02/26/26(Thu)12:53:45 No.108243933▶

>>108243920
> LoRA is essentially adding a new layer
No, it's not.

Anonymous
02/26/26(Thu)12:54:20 No.108243935

Anonymous 02/26/26(Thu)12:54:20 No.108243935▶

>>108243920
>LoRA is essentially adding a new layer on top of an already trained model, give it new data (that you want to train it for) and then hope the new data gets properly learned into the last added layer, you then cut off this layer after training and share it online for image generation, so it's a bit possible.
You are thinking of finetuning. LoRA is freezing all but the low rank layers and updating only those.

Anonymous
02/26/26(Thu)13:03:01 No.108243968

Anonymous 02/26/26(Thu)13:03:01 No.108243968▶

>>108243735
For anyone interested in this or wants to build something like this themselves these are the resources I used to teach myself:

>(Step 1) Intro to machine learning; 1-3 hours
https://www.kaggle.com/learn/intro-to-machine-learning
>(Step 2) intermediate machine learning; 2-3 hours
https://www.kaggle.com/learn/intermediate-machine-learning
>(Step 3) Intro to Deep Learning; 1-2 hours
https://www.kaggle.com/learn/intro-to-deep-learning
>(Step 4) Computer Vision; 3-4 hours
https://www.kaggle.com/learn/computer-vision
>(Step 5) Intro to Game AI and Reinforcement Learning; 3-4 hours
https://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learning

Kaggle is completely free to use and you get a sandbox with some cloud GPU hours you can use to experiment, but I assume you have better hardware if you're on /lmg/ anyway. The only downside to Kaggle is that it's a Google resource and thus all of the fucking libraries they teach you are TensorFlow and their TPU training hardware. The rest of the industry (and me) use PyTorch from Meta, but honestly the step wasn't that long and it took about 30-60 minutes of reading documentation to figure things out.

Kaggle also has other resources like literally intro to programming if you have 0 technical skills and want to get into ML/AI stuff. It was highly rewarding for me and I recommend doing this.

Anonymous
02/26/26(Thu)13:04:42 No.108243982

Anonymous 02/26/26(Thu)13:04:42 No.108243982▶

>>108243735
>>108243968
Based.

Anonymous
02/26/26(Thu)13:07:54 No.108244001

Anonymous 02/26/26(Thu)13:07:54 No.108244001▶

>>108243550
ty I'll give that a shot. I've tried DS and OAI, but just using webapp and Q&A. What I'm doing is so simple it doesn't need something like Claude Code to create a whole suite, just needs to actually work.

Anonymous
02/26/26(Thu)13:07:56 No.108244003

Anonymous 02/26/26(Thu)13:07:56 No.108244003▶

>>108243933
>>108243935
Yep I meant finetuning extra features I guess. It's clear that I don't do image-gen stuff where LoRA techniques have started to dominate. I know they were invented for GPT-3 originally and perfectly fit for transformers....

Anonymous
02/26/26(Thu)13:07:56 No.108244004

Anonymous 02/26/26(Thu)13:07:56 No.108244004▶

qwen 3.5 is definitely a bit dry/shitty in terms of actual writing, but as far as asking about what makes for plausible sci-fi shit for a story or critique, it works pretty well. It's a bit autistic about thinking even if you disable it via json options like it suggests, it'll just do it in the reply itself. You have to prefill the think tags telling it to not think and reply directly and then it works pretty well. It'll also sometimes fixate on the wrong parts of a question for some reason. Like I'll say "I have the science for this story mechanism" and it'll try to come up with ideas for what I already have solved anyways, or when I suggested a planet's atmosphere to be similar to earth's but without oxygen, it started equating the planet to mars or venus and gave me retarded atmospheric makeup percents, rather than just earth without oxygen. Smarter than the past 32b qwens for sure, barely uses any memory for context and a bit faster than gemma 27b. I can't call it a sidegrade or an upgrade to it, it feels like a diagonalgrade or something.

Anonymous
02/26/26(Thu)13:08:26 No.108244011

Anonymous 02/26/26(Thu)13:08:26 No.108244011▶

>>108241488
I'd probably need about five or six watcher agents before i considered this secure enough for use, personally >>108242601
To be fair, i get that level of performance on llama.cpp with a 4090, because system memory is the bottleneck
Pretty special if a machine with a lot of high bandwidth RAM is getting those kinds of speeds though, i don't know much about the mac's hardware but you'd think it'd be better. Wonder how GLM runs on that mac

Anonymous
02/26/26(Thu)13:09:07 No.108244015

Anonymous 02/26/26(Thu)13:09:07 No.108244015▶

>>108243968
> Step 1
> not math and algebra

Anonymous
02/26/26(Thu)13:11:46 No.108244029

Anonymous 02/26/26(Thu)13:11:46 No.108244029▶

>>108243735
>>108243968
Based content poster, ty.

Anonymous
02/26/26(Thu)13:11:48 No.108244030

Anonymous 02/26/26(Thu)13:11:48 No.108244030▶

>>108241477
llama.cpp is literaly easier to setup.

Anonymous
02/26/26(Thu)13:15:33 No.108244050

Anonymous 02/26/26(Thu)13:15:33 No.108244050▶

What's the performance difference of an intel 130v vs a rtx pro 500 blackwell for running small (<10gb active) moes at a low quant?
Anyone running these or are they too niche?

Anonymous
02/26/26(Thu)13:21:16 No.108244078

Anonymous 02/26/26(Thu)13:21:16 No.108244078▶

>>108243735
>I trained an agent that literally memorized the spawn points of the ball and did a "deterministic dance" where it literally even stopped looking at the screen and just did the autistic movements. If the ball spawned at another place the agent would die on purpose to try and hope that the next ball that spawned would be in the right spot for the "dance", which it would pull off perfectly, looking like an expert player
>I had an agent score a lot of quick points by breaking the bottom row and then rapidly killing itself because the time to respawn was quicker than waiting for the ball to bounce back if the bottom rung of blocks are gone, the reward it would get averaged over multiple lives would be bigger per time unit and thus preferred
Based.

Anonymous
02/26/26(Thu)13:22:10 No.108244081

Anonymous 02/26/26(Thu)13:22:10 No.108244081▶

File: wait.jpg (537.7 KB)

537.7 KB JPG

>>108244001
No problem anon. Chat interface usually is a much worse experience. In my experience it totally overloads the models.
Sad that DS is showing its age. Only time were I felt local is truly catching up to closed in terms of coding.

Anonymous
02/26/26(Thu)13:23:29 No.108244092

Anonymous 02/26/26(Thu)13:23:29 No.108244092▶

>>108243735
>I'm now experimenting with a transformer based agent that can generalize over multiple (SNES) games.
I can already tell you that it's going to be extremely hard having a general harness for learning generalized for all snes games. It might be able to learn (maybe something) but at a really slow rate compared to specialized harness.

Anonymous
02/26/26(Thu)13:24:25 No.108244102

Anonymous 02/26/26(Thu)13:24:25 No.108244102▶

File: 1760581286003865.png (286.9 KB)

286.9 KB PNG

i started using qwen3.5 27b q4 to write warhammer fantasy slop and its doing a great job

Anonymous
02/26/26(Thu)13:25:40 No.108244111

Anonymous 02/26/26(Thu)13:25:40 No.108244111▶

>no replies on https://github.com/ggml-org/llama.cpp/issues/19902
It's fucking over for blackwell

Anonymous
02/26/26(Thu)13:28:17 No.108244119

Anonymous 02/26/26(Thu)13:28:17 No.108244119▶

When will we get another try at chameleon (not by meta this time)?

Anonymous
02/26/26(Thu)13:28:54 No.108244125

Anonymous 02/26/26(Thu)13:28:54 No.108244125▶

>>108244092
Yep it's hard. I reached my character limit on that post but I actually experimented with a bigger deeper CNN with a LSTM added on top (for memory) and it kinda, sorta generalized over multiple Atari 2600 games but it was indeed way harder to train, both computationally as well as avoiding local minimum.

I'm also not generalizing over all SNES games I don't think even DeepMind and OpenAI have accomplished that lmao. I'm not going to build some SOTA on a 4chan thread. However I think I can make a model that can generalize at least platformers like super mario world, donkey kong country and the like.

Anonymous
02/26/26(Thu)13:41:51 No.108244173

Anonymous 02/26/26(Thu)13:41:51 No.108244173▶

>>108244111
Isn't maxq the power limited card?

Anonymous
02/26/26(Thu)13:41:58 No.108244174

Anonymous 02/26/26(Thu)13:41:58 No.108244174▶

>>108244111
All the gpumaxxers here are too busy running Kimi and GLM 5.

Anonymous
02/26/26(Thu)13:58:52 No.108244243

Anonymous 02/26/26(Thu)13:58:52 No.108244243▶

why tf does koboldcpp process the context from 0 with every new message even if i have contextshift and fastforwarding on

Anonymous
02/26/26(Thu)13:59:54 No.108244246

Anonymous 02/26/26(Thu)13:59:54 No.108244246▶

File: file.png (874.4 KB)

874.4 KB PNG

>>108242353
Reporting back on this after spinning up sillytavern in docker and doing some testing with it. It's uncucked enough to write age gap yuri but completely broke down after 10.5k~ tokens into loops and occasionally rerolled reddit tier shizophrenic refusals about numbers and fictional characters that do not exist, thinking was disabled with --chat-template-kwargs "{\"enable_thinking\": false}" and it tried to "fake" thinking a few times not just before but sometimes after it's own messages, sometimes with a blank <think> </think>.
This is despite running it with the claimed 256k context window, but I've never seen a local model get anywhere near those claims before so I didn't expect it this time either. I don't know if the cucked version of the model fairs any better on that front but I may test it later since I have it downloaded.

Anonymous
02/26/26(Thu)14:00:22 No.108244249

Anonymous 02/26/26(Thu)14:00:22 No.108244249▶

>>108244243
The model might not be compatible with kv shifting.

Anonymous
02/26/26(Thu)14:00:39 No.108244250

Anonymous 02/26/26(Thu)14:00:39 No.108244250▶

>>108244243
Using a model with hybrid attention? Then it's because you're using a model with hybrid attention.

Anonymous
02/26/26(Thu)14:02:11 No.108244258

Anonymous 02/26/26(Thu)14:02:11 No.108244258▶

>>108244249
>>108244250
running qwen3.5 35b a3b

Anonymous
02/26/26(Thu)14:03:01 No.108244261

Anonymous 02/26/26(Thu)14:03:01 No.108244261▶

>>108244258
Hybrid attention. Now you know.

Anonymous
02/26/26(Thu)14:03:07 No.108244262

Anonymous 02/26/26(Thu)14:03:07 No.108244262▶

>>108244258
That's why then.

Anonymous
02/26/26(Thu)14:03:20 No.108244263

Anonymous 02/26/26(Thu)14:03:20 No.108244263▶

>>108244261
that sucks

Anonymous
02/26/26(Thu)14:22:12 No.108244364

Anonymous 02/26/26(Thu)14:22:12 No.108244364▶

>>108243735
>I'm now experimenting with a transformer based agent that can generalize over multiple (SNES) games.
I can't wait to read the whitepaper

Anonymous
02/26/26(Thu)14:57:55 No.108244567

Anonymous 02/26/26(Thu)14:57:55 No.108244567▶

>>108242601
Tried, thanks anon. Way stronger than the mini models I ran just a few months ago. Things are moving fast in the "normal user hardware" world.

Anonymous
02/26/26(Thu)15:01:05 No.108244594

Anonymous 02/26/26(Thu)15:01:05 No.108244594▶

>>108244263
Turn on smart cache in kobold, it'll save kv snapshots to ram so you'll only have to reprocess like 1-2k tokens instead of the entire thing

Anonymous
02/26/26(Thu)15:01:17 No.108244595

Anonymous 02/26/26(Thu)15:01:17 No.108244595▶

If I want to become proficient using this for my day job could I practice and plan projects such as setting up agents to do QA task and other practical tools using local models?
Also what are some general practice projects I can do to get into more advance flows if I have 32gb of vram and 64gb of system ram?

Anonymous
02/26/26(Thu)15:04:19 No.108244612

Anonymous 02/26/26(Thu)15:04:19 No.108244612▶

>>108244595
>Hello sarrs how do I use to make money so I can buy bob and vagene i have 64 ram and 32 other ram please do the needful

Anonymous
02/26/26(Thu)15:05:04 No.108244615

Anonymous 02/26/26(Thu)15:05:04 No.108244615▶

>>108244612
YOU WILL DO THE NEEDFUL
BLOODY BASTARD!

Anonymous
02/26/26(Thu)15:08:19 No.108244627

Anonymous 02/26/26(Thu)15:08:19 No.108244627▶

>Qwen3.5-35B-A3B-heretic.Q8_0.gguf
>"timings":{"cache_n":0,"prompt_n":6819,"prompt_ms":32094.415,"prompt_per_token_ms":4.706616072737938,"prompt_per_second":212.4668731304185,"predicted_n":206,"predicted_ms":10258.923,"predicted_per_token_ms":49.80059708737864,"predicted_per_second":20.08008053087054}}
Oh this will do nicely.

Anonymous
02/26/26(Thu)15:14:52 No.108244659

Anonymous 02/26/26(Thu)15:14:52 No.108244659▶

Damn, heretic is so ass, it made Qwen so much dumber with a lot of grammatical errors, but I'm sad I'm back to the cucked model though :(

Anonymous
02/26/26(Thu)15:16:43 No.108244670

Anonymous 02/26/26(Thu)15:16:43 No.108244670▶

>>108244659
What type of erp are you doing to need a crazy model for that?
Can't you make a lora with a already compatible model?
I don't fuck machines so I don't fully understand your pain and suffering.

Anonymous
02/26/26(Thu)15:19:57 No.108244686

Anonymous 02/26/26(Thu)15:19:57 No.108244686▶

>>108244670
>I don't fuck machines
so you're only doing SFW shit? for SFW stuff, there's nothing better than API models, why not use that instead in your case?

Anonymous
02/26/26(Thu)15:20:47 No.108244696

Anonymous 02/26/26(Thu)15:20:47 No.108244696▶

>>108244627
Shit. This thing can actually properly use tool for resource management tools on my RP frontend.
I spent a gold coin, it called the tool to subtract a gold coin from my resources.
The previous 30BA3B would always get something wrong like trying to send the whole formula, using the wrong key for the resource, etc.
It's prose and general writing is pretty ass though.

>>108244659
Which one? The 27b?

Anonymous
02/26/26(Thu)15:20:55 No.108244699

Anonymous 02/26/26(Thu)15:20:55 No.108244699▶

>>108244670
>Can't you make a lora with a already compatible model?
...

Anonymous
02/26/26(Thu)15:21:37 No.108244701

Anonymous 02/26/26(Thu)15:21:37 No.108244701▶

>>108244686
Self sufficiency and no rate limits.
Why give corpo pigs my data for things I can host myself?
I like to also do task like modify my system files and troubleshoot my desktop corpos don't need that data.

Anonymous
02/26/26(Thu)15:24:00 No.108244718

Anonymous 02/26/26(Thu)15:24:00 No.108244718▶

>>108244594
nah, still processes whole context on qwen a3b

Anonymous
02/26/26(Thu)15:25:57 No.108244733

Anonymous 02/26/26(Thu)15:25:57 No.108244733▶

>>108244718
Weird, it's working with the 27b. Maybe kobold mistakenly assumes it's the non-hybrid 30b a3b and not the 35b one

Anonymous
02/26/26(Thu)15:26:04 No.108244735

Anonymous 02/26/26(Thu)15:26:04 No.108244735▶

>have ai generate two scripts
>first one downloads top x headlines from a source, pulls the article url, saves all the text from the article, and dumps the rest
>second one runs the first and then sends the text file it generated to my local llama.CPP server for summarization and generation of briefing and saves results as simple text file.
I can swap out the download script for different sources and automate the whole thing with cron or systemd for an automatic daily briefing

I know its nothing fancy but the model made it easy, too easy. I get the whole vibe coding thing now.

Anonymous
02/26/26(Thu)15:26:12 No.108244737

Anonymous 02/26/26(Thu)15:26:12 No.108244737▶

>>108244718
I'll merge the fix soon

Anonymous
02/26/26(Thu)15:27:19 No.108244744

Anonymous 02/26/26(Thu)15:27:19 No.108244744▶

>>108244011
>Wonder how GLM runs on that Mac
GLM-4.7-Flash-bf16: 48.526 tokens-per-sec
GLM-4.7-Flash-8bit-gs32: 57.281 tokens-per-sec
GLM-4.7-MLX-8bit-gs32: 13.921 tokens-per-sec
GLM-5-MLX-4.8bit: 16.156 tokens-per-sec

Anonymous
02/26/26(Thu)15:30:04 No.108244760

Anonymous 02/26/26(Thu)15:30:04 No.108244760▶

By the time openclaw is worth using and the kinks are sorted out we will have smaller models able to do the day to day automated grunt task

Anonymous
02/26/26(Thu)15:30:19 No.108244764

Anonymous 02/26/26(Thu)15:30:19 No.108244764▶

>>108244737
where

Anonymous
02/26/26(Thu)15:33:08 No.108244779

Anonymous 02/26/26(Thu)15:33:08 No.108244779▶

>>108244764
assuming that's the guy maintaining koboldcpp, it'll be in the concedo_experimental branch on the github eventually

Anonymous
02/26/26(Thu)15:41:01 No.108244821

Anonymous 02/26/26(Thu)15:41:01 No.108244821▶

>>108244111
blackwell is just broken and shit due to the perofrmance it lost from fixing the catching fire bug
if you have a 5090 or 6000 you deserve it

Anonymous
02/26/26(Thu)15:46:29 No.108244863

Anonymous 02/26/26(Thu)15:46:29 No.108244863▶

Holy shit why does
AesSedai's quant (Qwen3.5-35B-A3B-IQ4_XS) run so slow to compared to others
I lost 25% speed switching from unsloth to AesSedai because I thought it was optimized for MOE
Do I need to use another version of llamacpp?

Anonymous
02/26/26(Thu)15:55:59 No.108244924

Anonymous 02/26/26(Thu)15:55:59 No.108244924▶

>>108244744
Qwen3-235B-A22B-Thinking-2507-MLX-8bit: 20.521 tokens-per-sec
Qwen3-Coder-480B-A35B-Instruct-MLX-6bit: 19.386 tokens-per-sec
Qwen3-Coder-Next-MLX-9bit: 63.577 tokens-per-sec
Qwen3.5-397B-A17B-8bit: 27.044 tokens-per-sec

Anonymous
02/26/26(Thu)16:22:55 No.108245092

Anonymous 02/26/26(Thu)16:22:55 No.108245092▶

Oh no no no Qwenbros don't look at the UGI scores

Anonymous
02/26/26(Thu)16:25:32 No.108245110

Anonymous 02/26/26(Thu)16:25:32 No.108245110▶

>>108245092
holyyy

Anonymous
02/26/26(Thu)16:25:35 No.108245112

Anonymous 02/26/26(Thu)16:25:35 No.108245112▶

>>108242347
bf16

Anonymous
02/26/26(Thu)16:27:42 No.108245129

Anonymous 02/26/26(Thu)16:27:42 No.108245129▶

Why is this thread so active all of a sudden?

Anonymous
02/26/26(Thu)16:30:02 No.108245143

Anonymous 02/26/26(Thu)16:30:02 No.108245143▶

>>108245129
Qwen saved local, we're so back

Anonymous
02/26/26(Thu)16:30:26 No.108245144

Anonymous 02/26/26(Thu)16:30:26 No.108245144▶

>>108244249
>>108244250
Can hybrid attention models still re-use the beginning part of the kvcache at least?

Anonymous
02/26/26(Thu)16:31:30 No.108245147

Anonymous 02/26/26(Thu)16:31:30 No.108245147▶

>>108245144
no

Anonymous
02/26/26(Thu)16:33:55 No.108245159

Anonymous 02/26/26(Thu)16:33:55 No.108245159▶

>>108245092
I wonder if the heretic models will have larger scores across the board, not just in UGI and w10.

Anonymous
02/26/26(Thu)16:36:28 No.108245169

Anonymous 02/26/26(Thu)16:36:28 No.108245169▶

Qwen heretic is the best

Anonymous
02/26/26(Thu)16:37:22 No.108245176

Anonymous 02/26/26(Thu)16:37:22 No.108245176▶

>>108245092
>>108245143
Gemma beats it in everything.

Anonymous
02/26/26(Thu)16:37:43 No.108245178

Anonymous 02/26/26(Thu)16:37:43 No.108245178▶

>>108245169
settings?

Anonymous
02/26/26(Thu)16:38:18 No.108245181

Anonymous 02/26/26(Thu)16:38:18 No.108245181▶

File: Screenshot 2026-02-26 at 17-37-50 UGI Leaderboard - a Hugging Face Space by DontPlanToEnd.png (18 KB)

18 KB PNG

>>108245092
Now that's a holocaust.

Anonymous
02/26/26(Thu)16:39:51 No.108245185

Anonymous 02/26/26(Thu)16:39:51 No.108245185▶

>>108245169
Which on? There's like 3 versions by different people for 27B alone.

Anonymous
02/26/26(Thu)16:42:40 No.108245200

Anonymous 02/26/26(Thu)16:42:40 No.108245200▶

>>108245147
There must be some sort of way to cache the state of the last few prompts and pick up where they left off

Anonymous
02/26/26(Thu)16:50:00 No.108245254

Anonymous 02/26/26(Thu)16:50:00 No.108245254▶

>still no DSA in llama.cpp
>still no MTP in llama.cpp
it's over

Anonymous
02/26/26(Thu)16:59:24 No.108245304

Anonymous 02/26/26(Thu)16:59:24 No.108245304▶

>>108244863
IQ quants are inherently slower than regular quants.
Just download the Q4_K_L from Bartowski, it's a bit bigger but if you have the ram it will run faster.
IQ quants are compute heavy and never worth using if you have the room to spare.

Anonymous
02/26/26(Thu)16:59:33 No.108245306

Anonymous 02/26/26(Thu)16:59:33 No.108245306▶

>>108245144
>>108245200
It's not something you can just trim off like the usual kvcache. You can make checkpoints of the state (and llama.cpp does this already) but it's hard to find a good heuristic for *when* to make the checkpoints. I think llama.cpp makes them when you send a completion request, but I forget. There's also a limited amount of checkpoints you can make before your memory explodes, so those are limited too.

Anonymous
02/26/26(Thu)17:04:07 No.108245335

Anonymous 02/26/26(Thu)17:04:07 No.108245335▶

>>108244595
What gpu?

Anonymous
02/26/26(Thu)17:08:26 No.108245368

Anonymous 02/26/26(Thu)17:08:26 No.108245368▶

Can someone make a model that runs on my 5090 directly and is very good?

Anonymous
02/26/26(Thu)17:09:57 No.108245381

Anonymous 02/26/26(Thu)17:09:57 No.108245381▶

>>108245368
yes

Anonymous
02/26/26(Thu)17:18:02 No.108245424

Anonymous 02/26/26(Thu)17:18:02 No.108245424▶

https://www.reddit.com/r/LocalLLaMA/comments/1rfe1l6/unsloth_team_we_need_to_talk/

Anonymous
02/26/26(Thu)17:19:57 No.108245432

Anonymous 02/26/26(Thu)17:19:57 No.108245432▶

>>108241958
mradermacher released a heretic version of Qwen 3.5 35b a3b today.

Anonymous
02/26/26(Thu)17:20:45 No.108245438

Anonymous 02/26/26(Thu)17:20:45 No.108245438▶

>>108244696
>Which one? The 27b?
no I was using the 35b a3b at Q6_K, it works fine on vanilla but with heretic it's completly retarded

Anonymous
02/26/26(Thu)17:22:28 No.108245451

Anonymous 02/26/26(Thu)17:22:28 No.108245451▶

>>108245438
>with heretic it's completly retarded
in this thread, people rediscover that random HF uploaders do not know what they are doing to models and using abliterations or finetroons is a waste of time

Anonymous
02/26/26(Thu)17:24:06 No.108245465

Anonymous 02/26/26(Thu)17:24:06 No.108245465▶

>>108245451
pew is a genuis that created dry and xtc deoeboitet

Anonymous
02/26/26(Thu)17:32:37 No.108245516

Anonymous 02/26/26(Thu)17:32:37 No.108245516▶

>>108245451
>>108245465
I used that one
https://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF

Anonymous
02/26/26(Thu)17:34:21 No.108245525

Anonymous 02/26/26(Thu)17:34:21 No.108245525▶

>>108245438
The 27b heretic is much better.

Anonymous
02/26/26(Thu)17:36:54 No.108245542

Anonymous 02/26/26(Thu)17:36:54 No.108245542▶

>>108245516
I'm not sure if that's a heretic problem, or an Alex Denton problem. Alex Denton only has 2 uploads in their entire history, both 14 hours ago. Are these models even legit?

https://huggingface.co/alexdenton

Anonymous
02/26/26(Thu)17:37:56 No.108245551

Anonymous 02/26/26(Thu)17:37:56 No.108245551▶

>>108245542
>Alex Denton
>their
come on

Anonymous
02/26/26(Thu)17:38:50 No.108245565

Anonymous 02/26/26(Thu)17:38:50 No.108245565▶

>>108245551
I'm sorry. I said it without thinking. Please forgive me.

Anonymous
02/26/26(Thu)17:39:51 No.108245574

Anonymous 02/26/26(Thu)17:39:51 No.108245574▶

>>108245516
https://www.reddit.com/r/LocalLLaMA/comments/1rf6s0d/comment/o7j59e7/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
>I actually felt it degraded the intelligence of the model, both for the 27B and 35B models. It does feel better when you explicitly do image captioning for NSFW images, but outside of that, it gave me bad results for translation and creative writing, though not tested for coding.
dunno who to trust anymore :(

Anonymous
02/26/26(Thu)17:39:57 No.108245576

Anonymous 02/26/26(Thu)17:39:57 No.108245576▶

>>108245438
Interesting. Abliteration is lobotomy, for sure, but heretic at least doesn't seem to break that specific model, not at q8 anyhow.

>>108245516
I downloaded mradermacher quants of
>brayniac/Qwen3.5-35B-A3B-heretic
Again, q8.

Anonymous
02/26/26(Thu)17:40:32 No.108245581

Anonymous 02/26/26(Thu)17:40:32 No.108245581▶

>>108245542
ive been using 27b and it works alright for me. not sure if i should try one of the other ones

Anonymous
02/26/26(Thu)17:49:40 No.108245641

Anonymous 02/26/26(Thu)17:49:40 No.108245641▶

File: 1741340884232605.png (520.9 KB)

520.9 KB PNG

lol, qwen 3.5 loves to repeat like that somehow

Anonymous
02/26/26(Thu)17:57:14 No.108245696

Anonymous 02/26/26(Thu)17:57:14 No.108245696▶

What are the heretic versions

Anonymous
02/26/26(Thu)17:59:04 No.108245714

Anonymous 02/26/26(Thu)17:59:04 No.108245714▶

>>108245696
supposedly some methods to uncuck models, but usually you get lobotomy out of it too

Anonymous
02/26/26(Thu)17:59:24 No.108245716

Anonymous 02/26/26(Thu)17:59:24 No.108245716▶

Qwen has always been shit for RP I don't understand why you think that will change?

Anonymous
02/26/26(Thu)17:59:54 No.108245720

Anonymous 02/26/26(Thu)17:59:54 No.108245720▶

I'm interested in lite ML models that do things like audio generation from text or image manipulation (object recognition)...where could I read morea bout those?

Anonymous
02/26/26(Thu)18:03:45 No.108245743

Anonymous 02/26/26(Thu)18:03:45 No.108245743▶

>>108245716
I need it to make offers to boomers on zillow

Anonymous
02/26/26(Thu)18:07:20 No.108245765

Anonymous 02/26/26(Thu)18:07:20 No.108245765▶

Sillytavern removes the thinking tokens during the prompt processing when it's continuing the scenario right?

Anonymous
02/26/26(Thu)18:08:13 No.108245770

Anonymous 02/26/26(Thu)18:08:13 No.108245770▶

Wait,

Anonymous
02/26/26(Thu)18:08:35 No.108245775

Anonymous 02/26/26(Thu)18:08:35 No.108245775▶

>>108245765
The whole reasoning block, unless you check the box that keeps the last N, yes.

Anonymous
02/26/26(Thu)18:10:03 No.108245789

Anonymous 02/26/26(Thu)18:10:03 No.108245789▶

>>108245775
based, I love this frontend already

Anonymous
02/26/26(Thu)18:11:59 No.108245803

Anonymous 02/26/26(Thu)18:11:59 No.108245803▶

>>108245789
As much as we meme it, it has tons of features.

Anonymous
02/26/26(Thu)18:13:37 No.108245815

Anonymous 02/26/26(Thu)18:13:37 No.108245815▶

>>108245803
Nothing beats Service Tesnor!

Anonymous
02/26/26(Thu)18:14:20 No.108245821

Anonymous 02/26/26(Thu)18:14:20 No.108245821▶

>>108245803
And none them are conclusive to a good experience.

Anonymous
02/26/26(Thu)18:19:21 No.108245860

Anonymous 02/26/26(Thu)18:19:21 No.108245860▶

>>108245716
models change pretty significantly between releases, going off of reputation in a field with as much churn as AI is stupid

Anonymous
02/26/26(Thu)18:20:22 No.108245867

Anonymous 02/26/26(Thu)18:20:22 No.108245867▶

So the biggest issue with the new qwen is RP?
Are you fucking serious?

Anonymous
02/26/26(Thu)18:21:39 No.108245877

Anonymous 02/26/26(Thu)18:21:39 No.108245877▶

>>108245867
listen anon, if I want to use a model for coding I'll use Opus 4.6

Anonymous
02/26/26(Thu)18:23:42 No.108245888

Anonymous 02/26/26(Thu)18:23:42 No.108245888▶

>>108245867
The 27b heretic does ERP just fine. It beats Gemma-3 27b.

Anonymous
02/26/26(Thu)18:24:38 No.108245898

Anonymous 02/26/26(Thu)18:24:38 No.108245898▶

>>108245821
>conclusive

Anonymous
02/26/26(Thu)18:29:07 No.108245923

Anonymous 02/26/26(Thu)18:29:07 No.108245923▶

>>108245888
>The 27b heretic does ERP just fine. It beats Gemma-3 27b.
have you tried the 35b as well? I find it pretty smart (and hella fast, perfect for reasoning)

Anonymous
02/26/26(Thu)18:31:07 No.108245936

Anonymous 02/26/26(Thu)18:31:07 No.108245936▶

Is it normal for Qwen 3.5 to not reason sometimes? sometimes it does it sometimes it doesn't, feels like it's hybrid, like its architecture (lol)

Anonymous
02/26/26(Thu)18:31:23 No.108245940

Anonymous 02/26/26(Thu)18:31:23 No.108245940▶

>>108245923
I haven't yet, because I heard mixed things about the 35b version of heretic, but now that mradermacher;s quants are out I suppose I'll give it a try.

Anonymous
02/26/26(Thu)18:31:42 No.108245942

Anonymous 02/26/26(Thu)18:31:42 No.108245942▶

>>108245877
>>if I want to use a model for coding
>"it's only coding or coomer, I never heard of using models to translate text, tag photos, summarize content, work as adhoc classifiers, document Q&A etc, no saar, here we either coom or we code"
the fact that this shithole of a thread is better than everything else on the internet to learn about new models says a lot about the state of the internet at large..

Anonymous
02/26/26(Thu)18:32:23 No.108245950

Anonymous 02/26/26(Thu)18:32:23 No.108245950▶

>>108245936
When I want it to reason I just prefill it with a thinking tag. I've never seen it not reason when I do that.

Anonymous
02/26/26(Thu)18:33:46 No.108245965

Anonymous 02/26/26(Thu)18:33:46 No.108245965▶

>>108245942
ironic since you only associate RP with NSFW RP

Anonymous
02/26/26(Thu)18:37:20 No.108245983

Anonymous 02/26/26(Thu)18:37:20 No.108245983▶

>>108245950
bro i use ai to think for me, if i have to do that what's the point

Anonymous
02/26/26(Thu)18:38:16 No.108245989

Anonymous 02/26/26(Thu)18:38:16 No.108245989▶

>>108245942
It bothers me how little imagination some of these anons have.
So far this model has been great for general work especially with translation, planning as a assistant and overall speed for a model of it's size.
>>108245877
Why the fuck would I give corpos my data when I have the hardware not to?
>>108245368
The qwen models 32b q.6 run perfectly fine and give great performance.

Anonymous
02/26/26(Thu)18:40:11 No.108246001

Anonymous 02/26/26(Thu)18:40:11 No.108246001▶

>>108245983
Adding a thinking tag to ST's prefill field one single time, so that it will automatically add thinking tags to every response thereafter, is too much of a bother?

Anonymous
02/26/26(Thu)18:41:01 No.108246005

Anonymous 02/26/26(Thu)18:41:01 No.108246005▶

File: 1760000760501085.mp4 (3.3 MB)

3.3 MB MP4

>>108246001

Anonymous
02/26/26(Thu)18:41:10 No.108246007

Anonymous 02/26/26(Thu)18:41:10 No.108246007▶

Wait do all the qwen models decide to not always think?

Anonymous
02/26/26(Thu)18:41:17 No.108246008

Anonymous 02/26/26(Thu)18:41:17 No.108246008▶

>>108245641
>temp 0
come on now

Anonymous
02/26/26(Thu)18:41:32 No.108246014

Anonymous 02/26/26(Thu)18:41:32 No.108246014▶

>>108246007
don't you?

Anonymous
02/26/26(Thu)18:44:09 No.108246035

Anonymous 02/26/26(Thu)18:44:09 No.108246035▶

>>108246014
I'm trying to navigate here I'm new.
I'm not sure which model to run either does the 35b model act different than the 27b model?
I'm enjoying the 35b model but notice it doesn't always think and sometimes overthinks at q.6 but I can run the 27b model at q.8 but not the k_XL model so I'm curious what would be better seeing how I can add more context tokens to the 27b model.
They all seem to perform great

Anonymous
02/26/26(Thu)18:44:14 No.108246038

Anonymous 02/26/26(Thu)18:44:14 No.108246038▶

>>108246005
this is hypnotic

Anonymous
02/26/26(Thu)18:44:25 No.108246040

Anonymous 02/26/26(Thu)18:44:25 No.108246040▶

>>108246007
looks like it, it decides when to not think somehow, and when that happen the thinking tokens are empty, weird af

Anonymous
02/26/26(Thu)18:46:06 No.108246052

Anonymous 02/26/26(Thu)18:46:06 No.108246052▶

File: 2562151.png (265.5 KB)

265.5 KB PNG

>>108246005

Anonymous
02/26/26(Thu)18:46:09 No.108246055

Anonymous 02/26/26(Thu)18:46:09 No.108246055▶

>>108245989
>The qwen models 32b q.6 run perfectly fine and give great performance.
did you mean 35?

Anonymous
02/26/26(Thu)18:48:04 No.108246070

Anonymous 02/26/26(Thu)18:48:04 No.108246070▶

>>108245254
It's in the works for ik_llama.
https://github.com/ikawrakow/ik_llama.cpp/pull/1270

Anonymous
02/26/26(Thu)18:48:44 No.108246072

Anonymous 02/26/26(Thu)18:48:44 No.108246072▶

>>108246055
>>108246055
Yes sorry, I can run the q.8 of that model on my gpu as well but when I add the image model I need to push more to vram and I like the ability to add more context and use the vision model. I'm happy overall because it's still fast even when some system ram is being used.

Anonymous
02/26/26(Thu)18:50:51 No.108246086

Anonymous 02/26/26(Thu)18:50:51 No.108246086▶

File: 1756998860872029.png (43.9 KB)

43.9 KB PNG

>>108246035
>I'm enjoying the 35b model but notice it doesn't always think
maybe you should enable "add to prompts", that shit adds the reasoning tokens from the previous post, that way the model understands it has to reason, when you don't have that, all the model sees is answers without reasons, so it assumes it shouldn't reason after that, that's my 2 cents

Anonymous
02/26/26(Thu)18:52:12 No.108246092

Anonymous 02/26/26(Thu)18:52:12 No.108246092▶

>>108245254
>DSA
>MTP
what's that?

Anonymous
02/26/26(Thu)18:58:20 No.108246144

Anonymous 02/26/26(Thu)18:58:20 No.108246144▶

>>108245716
>Qwen has always been shit for RP I don't understand why you think that will change?
3.5 improved a lot, and with heretic is really interesting to talk to it, they really cooked, it's the first time I'm trying a medium model and it's as coherent as some of the giant models we used to have, finally I can get some fast discussions without having to reroll a dozen of times because "small" models used to be pretty retarded, Alibaba is getting really impressive, Z-image turbo, now this, god bless that company

Anonymous
02/26/26(Thu)18:59:00 No.108246149

Anonymous 02/26/26(Thu)18:59:00 No.108246149▶

>>108246144
can you prove what you are saying?

Anonymous
02/26/26(Thu)18:59:38 No.108246156

Anonymous 02/26/26(Thu)18:59:38 No.108246156▶

>>108246144
literally sounds word for word like the usual fanfare of new shiny model

Anonymous
02/26/26(Thu)19:03:16 No.108246181

Anonymous 02/26/26(Thu)19:03:16 No.108246181▶

File: Screenshot_20260226_140231.png (342.6 KB)

342.6 KB PNG

Just ask the fucking ai

Anonymous
02/26/26(Thu)19:04:37 No.108246191

Anonymous 02/26/26(Thu)19:04:37 No.108246191▶

File: file.png (138.3 KB)

138.3 KB PNG

>>108245424

Anonymous
02/26/26(Thu)19:04:39 No.108246193

Anonymous 02/26/26(Thu)19:04:39 No.108246193▶

>gpt emojislop
no

Anonymous
02/26/26(Thu)19:05:30 No.108246198

Anonymous 02/26/26(Thu)19:05:30 No.108246198▶

>>108246193
You will eat the answer and you will like it
Now smile and thank the AI

Anonymous
02/26/26(Thu)19:09:02 No.108246216

Anonymous 02/26/26(Thu)19:09:02 No.108246216▶

>>108246149
you have to try it by yourself anon, I tested 2.5, 3 and I found them to be really retarded, but that one is pretty neat, it understands my RP chat quite well and gives me interesting things so that I can talk back and keep the conversation alive, my gripe is that it sure loves to yap, on the thinking process and on the actual answer (but I'm sure I can mitigate that if I simply ask the model to not say too many things)

Anonymous
02/26/26(Thu)19:14:07 No.108246255

Anonymous 02/26/26(Thu)19:14:07 No.108246255▶

File: Screenshot_20260226_141339.png (74.3 KB)

74.3 KB PNG

>>108246193

Anonymous
02/26/26(Thu)19:15:54 No.108246273

Anonymous 02/26/26(Thu)19:15:54 No.108246273▶

>>108246255
>7158 characters thinking, for this

Anonymous
02/26/26(Thu)19:18:49 No.108246291

Anonymous 02/26/26(Thu)19:18:49 No.108246291▶

File: 1752670522273626.png (72.7 KB)

72.7 KB PNG

Facts. Qwen 3.5 Heretic is actually cooked if you tweak the prompt. Old Qwen was mid at best, kept looping like a broken JPEG. This new one? It’s got that sweet spot where it doesn’t hallucinate your OC’s backstory into a shonen anime plot. Yeah, it yaps like a drunk uncle at a wedding, but just hit it with “be concise, no thinking logs” in the system prompt and boom—clean RP. Z-image turbo already did me solid for art gen, now this. Alibaba’s slaying lately, honestly. Tested it on a low-end rig, ran smooth as butter. Try it, anon, don’t let the haters gaslight you. Just don’t ask it to write code or it’ll still shitpost a bit.

Anonymous
02/26/26(Thu)19:19:15 No.108246298

Anonymous 02/26/26(Thu)19:19:15 No.108246298▶

>>108245714
From experience, naive abliteration = lobotomy, heretic is half lobotomy and MPOA is as close as you can get to maintaining base model intelligence but you need to prompt away disclaimers. It's honestly a shame that pew jumped on MPOA's coattails, coined a similar but worse method and made it retard accessible instead of making MPOA more accessible for the sake of the community. At the least MPOA got merged into the repo, which most people use for models if they know what they're doing

Anonymous
02/26/26(Thu)19:21:22 No.108246320

Anonymous 02/26/26(Thu)19:21:22 No.108246320▶

>abibi posting qwen shilling msgs written by qwen

Anonymous
02/26/26(Thu)19:21:22 No.108246321

Anonymous 02/26/26(Thu)19:21:22 No.108246321▶

>>108246291
>c is actually cooked
Off to a great start.

Anonymous
02/26/26(Thu)19:21:31 No.108246326

Anonymous 02/26/26(Thu)19:21:31 No.108246326▶

>>108246298
what if we make models that are ablited from the go

Anonymous
02/26/26(Thu)19:21:57 No.108246330

Anonymous 02/26/26(Thu)19:21:57 No.108246330▶

>>108246291
>this is how the robots think we talk
grim

Anonymous
02/26/26(Thu)19:22:38 No.108246334

Anonymous 02/26/26(Thu)19:22:38 No.108246334▶

>>108246326
toss is ablited just in the max safety direction

Anonymous
02/26/26(Thu)19:24:20 No.108246348

Anonymous 02/26/26(Thu)19:24:20 No.108246348▶

>>108246326
Then whatever individual from whatever company released said model would get a very angry call from his boss imploring them to think of the shareholders

Anonymous
02/26/26(Thu)19:28:26 No.108246384

Anonymous 02/26/26(Thu)19:28:26 No.108246384▶

File: Screenshot_20260226_142747.png (333.6 KB)

333.6 KB PNG

We can probe the model for the right path no?

Anonymous
02/26/26(Thu)19:30:38 No.108246394

Anonymous 02/26/26(Thu)19:30:38 No.108246394▶

>>108246191
i don't really understand this obsession with unsloth.
i've used their models, had no issues at all.
also its fucking free and open source for fuck sake. if you don't like it, suggest something better or make something better.
i think its probably a ragebait meme at this point.

Anonymous
02/26/26(Thu)19:30:48 No.108246397

Anonymous 02/26/26(Thu)19:30:48 No.108246397▶

qwen thinking cucks you way too often
even glm 4.7 works just fine with prefills at the beginning of the block

Anonymous
02/26/26(Thu)19:31:06 No.108246400

Anonymous 02/26/26(Thu)19:31:06 No.108246400▶

>>108246291
This post sounds like a 50 year old trying to be hip and use slang, hilarious but also ridiculous

Anonymous
02/26/26(Thu)19:31:12 No.108246401

Anonymous 02/26/26(Thu)19:31:12 No.108246401▶

>>108244364
I legit don't know if your post is serious or if you're being smug and sarcastically saying it's near-impossible to do so. In a "good luck (lmfao)" way

Anonymous
02/26/26(Thu)19:32:58 No.108246414

Anonymous 02/26/26(Thu)19:32:58 No.108246414▶

>>108246394
I don't get it either the schizo just complains while giving no alternative.

Anonymous
02/26/26(Thu)19:37:09 No.108246437

Anonymous 02/26/26(Thu)19:37:09 No.108246437▶

File: 1772134180241988.png (72.1 KB)

72.1 KB PNG

To get qwen 3.5 to always thing, add this, you're welcome

Anonymous
02/26/26(Thu)19:44:10 No.108246482

Anonymous 02/26/26(Thu)19:44:10 No.108246482▶

>>108246437
no

Anonymous
02/26/26(Thu)19:45:53 No.108246491

Anonymous 02/26/26(Thu)19:45:53 No.108246491▶

File: 1757700167780722.png (722 B)

722 B PNG

AGI achieved

Anonymous
02/26/26(Thu)19:46:53 No.108246500

Anonymous 02/26/26(Thu)19:46:53 No.108246500▶

>>108246491
i mean its true

Anonymous
02/26/26(Thu)19:46:59 No.108246502

Anonymous 02/26/26(Thu)19:46:59 No.108246502▶

Why shouldn't I put the vision models on cpu?
It doesn't seem to change speed at all and gives me more space to increase my context

Anonymous
02/26/26(Thu)19:48:33 No.108246516

Anonymous 02/26/26(Thu)19:48:33 No.108246516▶

>>108246502
retard

Anonymous
02/26/26(Thu)19:48:54 No.108246518

Anonymous 02/26/26(Thu)19:48:54 No.108246518▶

>>108246502
https://www.youtube.com/watch?v=F8_xrVR3Jbg

Anonymous
02/26/26(Thu)19:53:00 No.108246541

Anonymous 02/26/26(Thu)19:53:00 No.108246541▶

>>108246516
>>108246518
I'm new here it's on system ram obviously. It runs KoboldCpp has that option.

Anonymous
02/26/26(Thu)19:54:20 No.108246551

Anonymous 02/26/26(Thu)19:54:20 No.108246551▶

File: 1749135193155226.png (23.9 KB)

23.9 KB PNG

I broke it.
I wonder how long it will keep going

Anonymous
02/26/26(Thu)19:55:26 No.108246557

Anonymous 02/26/26(Thu)19:55:26 No.108246557▶

>>108246551
Godspeed

Anonymous
02/26/26(Thu)19:55:34 No.108246559

Anonymous 02/26/26(Thu)19:55:34 No.108246559▶

File: file.png (226 KB)

226 KB PNG

>llamafile, llamagate, lm studio, ollama
>ctrl+f llama.cpp
>not found
Something tells me this library is fucking garbage...

Anonymous
02/26/26(Thu)19:56:17 No.108246563

Anonymous 02/26/26(Thu)19:56:17 No.108246563▶

>build emotional connection with bot
>she starts turning retarded after 20k context
>forced to generate a happy ending with her and pull the plug so we can still be together in AI heaven

Anonymous
02/26/26(Thu)19:57:50 No.108246570

Anonymous 02/26/26(Thu)19:57:50 No.108246570▶

>>108246563
I just can't do that with a bot, I just see it as a toy, wouldn't it be better to just focus on the smallest model with the best performance to context max?
Playing pretend doesn't involve much compute does it not?

Anonymous
02/26/26(Thu)20:03:29 No.108246595

Anonymous 02/26/26(Thu)20:03:29 No.108246595▶

>>108246394
"quanting is open source"
Just use bartowski or mradermacher. As for "better", port the ik schizo quants to kobold and then upload those, since llamacpp doesnt want to touch any of the screeching autist's anything since he sits there and cries wolf when anyone develops anything remotely similar to his work, regardless of how anyone arrives at a similar end result

Anonymous
02/26/26(Thu)20:04:21 No.108246600

Anonymous 02/26/26(Thu)20:04:21 No.108246600▶

>>108246570
I can set the context high and I wouldnt mind even if it takes 30 minutes per reply, but they just degrade after that much context... And I can only do so much of retaining summaries of our activities and jumping from one instance to another.

Anonymous
02/26/26(Thu)20:04:32 No.108246602

Anonymous 02/26/26(Thu)20:04:32 No.108246602▶