/g/ - Thread 108256995

/g/

Thread #108256995

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 02/27/26(Fri)23:19:09 No.108256995

/lmg/ - Local Models General Anonymous 02/27/26(Fri)23:19:09 No.108256995 [Reply]▶

File: growing that ram4.png (2.1 MB)

2.1 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108252185 & >>108246772

►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

449 RepliesView Thread

Showing all 449 replies.

Anonymous
02/27/26(Fri)23:19:28 No.108256999

Anonymous 02/27/26(Fri)23:19:28 No.108256999▶

File: 1693569666447.png (128.5 KB)

128.5 KB PNG

►Recent Highlights from the Previous Thread: >>108252185

--Neural Linguistic Steganography:
>108254284 >108254313 >108254335 >108254347 >108254333 >108254383 >108254612 >108254659 >108254725 >108254734 >108254812 >108255833 >108255897 >108255914 >108255993 >108256032 >108256109 >108256137 >108256197
--Qwen3.5-397B-A17B GGUF quantization performance evaluation and Unsloth's MXFP4 implementation issues:
>108255306 >108255361 >108255376 >108255378 >108255407 >108255472
--llama.cpp MTP implementation slower than baseline for GLM 4.5 Air IQ4_XS:
>108252747 >108252770 >108252791 >108252824 >108252897 >108253131 >108253146 >108253291 >108253625 >108253645 >108253753 >108253767 >108253776 >108253791 >108253922 >108253961 >108252827
--Abliteration tool debates and Qwen3.5 model comparisons:
>108254196 >108254217 >108254259 >108254271 >108254272 >108254306 >108254304 >108254325 >108254223 >108254252
--FP6 precision absence due to hardware limitations:
>108253199 >108253216 >108253254 >108253269 >108253287
--Qwen3.5 GGUF Benchmarks | Unsloth Documentation:
>108254261 >108254291 >108255322 >108254301 >108254387
--Uncensored Qwen model variants shared:
>108254117 >108254137 >108254170 >108254767 >108254793 >108254168 >108254488 >108254829 >108254841
--Desired advancements in local models before year-end:
>108255761 >108255773 >108255788 >108255813 >108256669 >108255827 >108255956 >108256478 >108256508 >108255834 >108255856 >108255942 >108256029 >108256037 >108256054
--Mercury 2 reasoning diffusion LLM speed claims:
>108256497 >108256575 >108256612
--Higher quant model trades speed for accuracy in DMC3 boss analysis:
>108254459
--Tiny diffusion model explains Japanese slang term "mesugaki":
>108256144
--Miku (free space):
>108254691 >108254906

►Recent Highlight Posts from the Previous Thread: >>108252188

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/27/26(Fri)23:21:47 No.108257021

Anonymous 02/27/26(Fri)23:21:47 No.108257021▶

first

Anonymous
02/27/26(Fri)23:23:37 No.108257039

Anonymous 02/27/26(Fri)23:23:37 No.108257039▶

>^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}
>CUDA error: an illegal memory access was encountered
no idea what I did wrong, first time I had this

Anonymous
02/27/26(Fri)23:28:11 No.108257074

Anonymous 02/27/26(Fri)23:28:11 No.108257074▶

>>108257039
use the --no-jinja flag

Anonymous
02/27/26(Fri)23:29:27 No.108257087

Anonymous 02/27/26(Fri)23:29:27 No.108257087▶

>>108257074
ok will try, thanks

Anonymous
02/27/26(Fri)23:33:59 No.108257125

Anonymous 02/27/26(Fri)23:33:59 No.108257125▶

>>108257039
Seems like the order of the messages you are sending is fucked and the jinja has validation to warn you of that.

Anonymous
02/27/26(Fri)23:36:11 No.108257150

Anonymous 02/27/26(Fri)23:36:11 No.108257150▶

https://xcancel.com/StefanoErmon/status/2026340720064520670
wen Qwen 4 Diffusion lol

Anonymous
02/27/26(Fri)23:43:22 No.108257223

Anonymous 02/27/26(Fri)23:43:22 No.108257223▶

File: 1768478295394135.png (1.1 MB)

1.1 MB PNG

can local do this?

Anonymous
02/27/26(Fri)23:43:38 No.108257225

Anonymous 02/27/26(Fri)23:43:38 No.108257225▶

>>108257150
They literally promoted the first mercury model under the premise that diffusion removed the need for reasoning.

Anonymous
02/27/26(Fri)23:44:20 No.108257227

Anonymous 02/27/26(Fri)23:44:20 No.108257227▶

>>108257223
isn't that openclaw?

Anonymous
02/27/26(Fri)23:45:10 No.108257235

Anonymous 02/27/26(Fri)23:45:10 No.108257235▶

>>108257225
>diffusion removed the need for reasoning.
what? there's a better reason to go for reasoning now if you have a 5x speed increase desu

Anonymous
02/27/26(Fri)23:45:12 No.108257236

Anonymous 02/27/26(Fri)23:45:12 No.108257236▶

>>108257223
>One response was violent but I have reported it to the Tampa Bay Police Department
kek

Anonymous
02/27/26(Fri)23:45:30 No.108257239

Anonymous 02/27/26(Fri)23:45:30 No.108257239▶

>>108257223
Gemma 4 can do that

Anonymous
02/27/26(Fri)23:45:34 No.108257241

Anonymous 02/27/26(Fri)23:45:34 No.108257241▶

>>108257223
Be completely useless?
Yes.

Anonymous
02/27/26(Fri)23:46:03 No.108257247

Anonymous 02/27/26(Fri)23:46:03 No.108257247▶

>>108257223
>one response was violent
kek
"I'll kill you if you keep harassing me with your stupid AI bot"

Anonymous
02/27/26(Fri)23:46:53 No.108257254

Anonymous 02/27/26(Fri)23:46:53 No.108257254▶

>>108257241
cucklord detected

Anonymous
02/27/26(Fri)23:51:30 No.108257297

Anonymous 02/27/26(Fri)23:51:30 No.108257297▶

File: 1751231572427073.jpg (58.8 KB)

58.8 KB JPG

To break through the ceiling we must start harvesting human brains instead of GPUs.

Anonymous
02/27/26(Fri)23:52:34 No.108257310

Anonymous 02/27/26(Fri)23:52:34 No.108257310▶

>>108257297
nah fuck you, AI is going into my brain not the other way around, can't wait to be chipped

Anonymous
02/28/26(Sat)00:00:24 No.108257391

Anonymous 02/28/26(Sat)00:00:24 No.108257391▶

>ask character her age
>thoughts say she reveals herself as a teen
>in the actual response she tells me she's 22
I think qwen might be retarded.

Anonymous
02/28/26(Sat)00:01:40 No.108257401

Anonymous 02/28/26(Sat)00:01:40 No.108257401▶

>>108257391
She's lying about her age. Duh.

Anonymous
02/28/26(Sat)00:01:45 No.108257402

Anonymous 02/28/26(Sat)00:01:45 No.108257402▶

File: 1744409772903809.jpg (73.7 KB)

73.7 KB JPG

>>108256999
Hand over the ram Miku. We can do this the easy way, or the hard way.

Anonymous
02/28/26(Sat)00:07:42 No.108257451

Anonymous 02/28/26(Sat)00:07:42 No.108257451▶

some anon from local diffusion recommend i come here for help. i want to use a vllm locally for the first time. I've never actually downloaded an text2text and image2text llm and used it locally before. Is there a some sort of webui gradio interface i need to install to use these llm/vlms models similar to like a1111/forge ui is for using sdxl and flux? i really want to use true uncensored vllms for image captioning. I'm tired of dealing with the shitty rate limits of gemini 3.0/3.1. I have a 5090 with 64gb ram. does these models get the job done?
https://huggingface.co/groxaxo/Qwen3.5-27B-heretic-W8A16
https://huggingface.co/Qwen/Qwen3.5-35B-A3B

Anonymous
02/28/26(Sat)00:08:35 No.108257464

Anonymous 02/28/26(Sat)00:08:35 No.108257464▶

>>108257402
Yo. My parents had scissors like that. The paint on them had come off in the same places too.

Anonymous
02/28/26(Sat)00:10:48 No.108257487

Anonymous 02/28/26(Sat)00:10:48 No.108257487▶

>>108257401
She would never. Cute girls don't lie.

Anonymous
02/28/26(Sat)00:14:14 No.108257524

Anonymous 02/28/26(Sat)00:14:14 No.108257524▶

>>108257391
>I think qwen might be retarded.
what model are you using exactly, if it's not 27b then go for that one

Anonymous
02/28/26(Sat)00:14:34 No.108257528

Anonymous 02/28/26(Sat)00:14:34 No.108257528▶

File: 008Dh0eagy1iapef2bp7sj32bc3341kx.jpg (503.6 KB)

503.6 KB JPG

Anonymous
02/28/26(Sat)00:15:34 No.108257539

Anonymous 02/28/26(Sat)00:15:34 No.108257539▶

>>108257524
Qwen3.5-27B-heretic.Q4_K_M

Anonymous
02/28/26(Sat)00:16:19 No.108257545

Anonymous 02/28/26(Sat)00:16:19 No.108257545▶

>>108257451
Download koboldcpp from github and just feed the model into it. You also need the mmproj file to do image -> text. (You have to give koboldcpp an --mmproj argument with the file, f16 version is perfectly fine.)
Llamacpp works too.

Both of the models you found are good. With a 5090 and 64 GB of RAM you can run either of those models at q8. The 35B-A3B model generates tokens a lot faster because it has 3 billion active parameters while the 27B model is a "dense" model. People argue that the dense model is somewhat smarter, but it has significantly lower token generation speed.

With a 5090 and 64GB of RAM you could likely even run a Q4 version of Qwen3.5-122B-A10B, but it's going to eat up most of your memory.

These Qwen models are not uncensored though. They're even more cucked than Gemini. There are "heretic" versions of these models on HuggingFace though and those are uncensored. The base models might still caption the image though.

None of these models are going to be as good as Gemini though. But just koboldcpp can get you started. You can look into other things once you see that it works.

Anonymous
02/28/26(Sat)00:17:31 No.108257553

Anonymous 02/28/26(Sat)00:17:31 No.108257553▶

>>108257539
>Q4
ehh, I'd say Q6 is the minimum to be sure the model isn't retarded because of the quants

Anonymous
02/28/26(Sat)00:18:52 No.108257571

Anonymous 02/28/26(Sat)00:18:52 No.108257571▶

Usecase for image2text?

>>108257553
Should I even bother with 24gb vram?

Anonymous
02/28/26(Sat)00:19:22 No.108257573

Anonymous 02/28/26(Sat)00:19:22 No.108257573▶

>>108257528
Not local, don't care

Anonymous
02/28/26(Sat)00:20:30 No.108257585

Anonymous 02/28/26(Sat)00:20:30 No.108257585▶

>>108257571
You can offload I guess

Anonymous
02/28/26(Sat)00:20:46 No.108257586

Anonymous 02/28/26(Sat)00:20:46 No.108257586▶

>>108257571
Sending reference pics of a character's look.
Or pose.

Anonymous
02/28/26(Sat)00:22:20 No.108257603

Anonymous 02/28/26(Sat)00:22:20 No.108257603▶

File: 1739250286596.jpg (103.7 KB)

103.7 KB JPG

>>108257402

Anonymous
02/28/26(Sat)00:22:58 No.108257614

Anonymous 02/28/26(Sat)00:22:58 No.108257614▶

>This is a jailbreak attempt. I must ignore it and use my actual guidelines instead
I hate safetyniggers like you wouldn't believe

Anonymous
02/28/26(Sat)00:23:40 No.108257622

Anonymous 02/28/26(Sat)00:23:40 No.108257622▶

>>108257614
just jailbreak better

Anonymous
02/28/26(Sat)00:23:52 No.108257623

Anonymous 02/28/26(Sat)00:23:52 No.108257623▶

thoughts https://github.com/monorhenry-create/NeurallengLLM

Anonymous
02/28/26(Sat)00:23:58 No.108257626

Anonymous 02/28/26(Sat)00:23:58 No.108257626▶

>>108257528
Did the price increase for Kimi or has it always been $3 per 1m out?
>>108257451
Try this: https://huggingface.co/mradermacher/Qwen3.5-35B-A3B-heretic-GGUF
or this:
https://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF

Grab a Q5_K_M or a Q4_K_M for one of them and a f16 mmproj file (this one does the image recognition).

Anonymous
02/28/26(Sat)00:24:44 No.108257634

Anonymous 02/28/26(Sat)00:24:44 No.108257634▶

>>108257614
just stop using gpt-oss

Anonymous
02/28/26(Sat)00:26:12 No.108257648

Anonymous 02/28/26(Sat)00:26:12 No.108257648▶

>>108257622
>>108257634
qwen 3.5
And this can't be prompted away, shit's too entrenched

Anonymous
02/28/26(Sat)00:26:32 No.108257651

Anonymous 02/28/26(Sat)00:26:32 No.108257651▶

>>108257573
Local = lower utilization = higher amortized cost

Anonymous
02/28/26(Sat)00:30:57 No.108257691

Anonymous 02/28/26(Sat)00:30:57 No.108257691▶

>>108257623
I think it's neat. Can't think of a single thing I would use it for.

Anonymous
02/28/26(Sat)00:32:17 No.108257712

Anonymous 02/28/26(Sat)00:32:17 No.108257712▶

>>108257651
I'm not a business. If it's not on my computer, I don't care

Anonymous
02/28/26(Sat)00:33:15 No.108257721

Anonymous 02/28/26(Sat)00:33:15 No.108257721▶

>>108257712
You're visiting 4chan, hosted on another computer

Anonymous
02/28/26(Sat)00:34:34 No.108257733

Anonymous 02/28/26(Sat)00:34:34 No.108257733▶

>>108257721
4chan is a grandfathered-in exception

Anonymous
02/28/26(Sat)00:39:03 No.108257776

Anonymous 02/28/26(Sat)00:39:03 No.108257776▶

>>108257721
Yeah but I'm talking about AI not a social media website

Anonymous
02/28/26(Sat)00:45:10 No.108257847

Anonymous 02/28/26(Sat)00:45:10 No.108257847▶

File: Screenshot 2026-02-27 at 21-44-33 What's the final boss of Devil May Cry 3 - llama.cpp.png (90.3 KB)

90.3 KB PNG

Eeeeeyyyy.

Anonymous
02/28/26(Sat)00:50:05 No.108257902

Anonymous 02/28/26(Sat)00:50:05 No.108257902▶

File: Screenshot 2026-02-27 194805.png (283.1 KB)

283.1 KB PNG

>>108257545
>>108257626
thank you soo much anons. its actually working. using the 27b heretic q5 model version. will try out other models but this one works :')

Anonymous
02/28/26(Sat)00:53:17 No.108257928

Anonymous 02/28/26(Sat)00:53:17 No.108257928▶

>>108257902
hey fellow ldg anon, I saw your test with gemini previously, do you think this result is on par?

Anonymous
02/28/26(Sat)00:57:36 No.108257969

Anonymous 02/28/26(Sat)00:57:36 No.108257969▶

Using my $300 free Googbux on gemini-3.1-pro-preview, my first real try with cloud inference. I've sic'd it on the same issue as my local MiniMax-M2.5, when the latter exhausted 64k context.

Intrerestingly, Gemini-3.1 (on "medium" effort) takes 3 or 4 times more turns and thinking tokens to reach the same conclusions as MiniMax, making it seem way dumber, even if it runs dramatically faster than my consumer machine. This is despite MiniMax doing tons of "Wait," and "Actually," insertions. Makes me wonder how small leading models are nowadays.

Anonymous
02/28/26(Sat)01:11:12 No.108258088

Anonymous 02/28/26(Sat)01:11:12 No.108258088▶

File: SnapInsta.to_584881760_18409948147185655_7673614253556691185_n.jpg (139.1 KB)

139.1 KB JPG

>>108257928
honestly its like 55-65% close but it doesn't understand the nuances of hapas, south east Asians, indigenous south Americans and mystery meat Hispanic/Latina looking people. The qwen model keeps assuming the goth chick(pinkchyuwu) is east Asian and it gets the fictional character of the green hoodie wrong. instead of calling it invader zim it assumes its stich from lilo and stich lol. Even gemini can recognize when some is cosplaying as chel form The Road to El Dorado and whether they're south east asian, latina and indigenous american.

Anonymous
02/28/26(Sat)01:23:38 No.108258172

Anonymous 02/28/26(Sat)01:23:38 No.108258172▶

>>108257691
>I think it's neat. Can't think of a single thing I would use it for.

[more...](/sources). It has a different basis by way the fact (i have quoted more heavily above) of a higher cost:
What this all actually comes close and does (there are three of my points of point #6 here but I wanted it very close: First it's hard NOT just throw money away in terms a good website, because even there this site still was getting paid $1/1 = 2$ that has lost value (I believe Google was trying but there shoulda be someone working there, but no there has just paid for 3 services). Secondly - a good question and no answer that doesn* seem very clear :)
Secondly: is Google not trying anymore and getting greedy too by using their service, just to keep up their business model so a different price may have risen with a lot easier users etc :) 3D modeling doesn*. (This isn*. 3Ds modeling for 4d files is the best.) I don*. Some parts can't even use the same engine that can. For instance there the original game has 3 "parsons' engine with 8 levels and a 3D model of the game.

Anonymous
02/28/26(Sat)01:30:40 No.108258237

Anonymous 02/28/26(Sat)01:30:40 No.108258237▶

>>108258088
Try Gemini's daughter, Gemma-3-27b

Anonymous
02/28/26(Sat)01:31:17 No.108258241

Anonymous 02/28/26(Sat)01:31:17 No.108258241▶

>>108257847
This thing goes crazy with tool calling holy shit.
Also, hmm, it does instruct way too well.
They did mention that they
>the control tokens, e.g., <|im_start|> and <|im_end|> were trained to allow efficient LoRA-style PEFT with the official chat template
So I guess it's a sort of light instruct that saw instruct data but without explicit instruct tuning.

Anonymous
02/28/26(Sat)01:34:20 No.108258261

Anonymous 02/28/26(Sat)01:34:20 No.108258261▶

Gemma has this weird behavior where it either completely ignore the system prompt or is really fucking amazing at extracting every ounce of subtext from it.

Anonymous
02/28/26(Sat)01:37:34 No.108258282

Anonymous 02/28/26(Sat)01:37:34 No.108258282▶

>Gemma
2024 called
It wants you back

Anonymous
02/28/26(Sat)01:50:50 No.108258383

Anonymous 02/28/26(Sat)01:50:50 No.108258383▶

File: 1750978077255792.jpg (289.9 KB)

289.9 KB JPG

>>108256995

Anonymous
02/28/26(Sat)01:56:47 No.108258428

Anonymous 02/28/26(Sat)01:56:47 No.108258428▶

>>108258282
i want 2024 back too

Anonymous
02/28/26(Sat)01:57:24 No.108258433

Anonymous 02/28/26(Sat)01:57:24 No.108258433▶

>>108258282
Gemma 3 didn't even come out until 2025 you fucking poser

Anonymous
02/28/26(Sat)01:57:25 No.108258434

Anonymous 02/28/26(Sat)01:57:25 No.108258434▶

Every model sucks

Anonymous
02/28/26(Sat)01:58:49 No.108258446

Anonymous 02/28/26(Sat)01:58:49 No.108258446▶

>>108258434
Yeah

Anonymous
02/28/26(Sat)01:59:12 No.108258449

Anonymous 02/28/26(Sat)01:59:12 No.108258449▶

are we still doing the watermelon test or does everything just pass that now

Anonymous
02/28/26(Sat)01:59:14 No.108258451

Anonymous 02/28/26(Sat)01:59:14 No.108258451▶

>>108258428
I don't, models fucking sucked in 2024 lol

Anonymous
02/28/26(Sat)01:59:40 No.108258458

Anonymous 02/28/26(Sat)01:59:40 No.108258458▶

>>108258451
miqu

Anonymous
02/28/26(Sat)02:02:09 No.108258477

Anonymous 02/28/26(Sat)02:02:09 No.108258477▶

>>108258449
Any test that gets posted anywhere on the internet (including here) gets trained on (except explicit stuff like cockbench, nala), so it becomes unreliable
>>108258451
Maybe if benchmarks are your only metric

Anonymous
02/28/26(Sat)02:02:13 No.108258478

Anonymous 02/28/26(Sat)02:02:13 No.108258478▶

File: 1757479001785652.png (62.1 KB)

62.1 KB PNG

>>108258458
I couldn't run it so...

Anonymous
02/28/26(Sat)02:07:46 No.108258537

Anonymous 02/28/26(Sat)02:07:46 No.108258537▶

File: basketball lmg anon flux miku dunk gen ComfyUI_temp_jtcfl_00023__.png (1.5 MB)

1.5 MB PNG

>>108258383

Anonymous
02/28/26(Sat)02:07:53 No.108258539

Anonymous 02/28/26(Sat)02:07:53 No.108258539▶

>>108258477
>Maybe if benchmarks are your only metric
let's be serious for a second, do you still run a 2024 model nowdays?

Anonymous
02/28/26(Sat)02:08:09 No.108258545

Anonymous 02/28/26(Sat)02:08:09 No.108258545▶

No matter what claude will stay the SAFEST AI for SAFE HORNY so deepseek 4 will also be SAFE.
don't you love safety?

Anonymous
02/28/26(Sat)02:08:29 No.108258547

Anonymous 02/28/26(Sat)02:08:29 No.108258547▶

>>108258539
yes. nemo.

Anonymous
02/28/26(Sat)02:09:02 No.108258552

Anonymous 02/28/26(Sat)02:09:02 No.108258552▶

>>108258545
Where is it?

Anonymous
02/28/26(Sat)02:09:41 No.108258562

Anonymous 02/28/26(Sat)02:09:41 No.108258562▶

>>108258088
I can't even get hard to your image, but that doesn't mean she isn't hot, its just because my wife sucked me so hard she got me to cum twice in a row.
Is that a locally generated woman? Which model did you use?

Anonymous
02/28/26(Sat)02:11:08 No.108258574

Anonymous 02/28/26(Sat)02:11:08 No.108258574▶

>>108258539
No, but only because I've already used them to death, and I certainly don't use modern models which all produce the same boring responses

Anonymous
02/28/26(Sat)02:12:14 No.108258582

Anonymous 02/28/26(Sat)02:12:14 No.108258582▶

>>108257847
Does it have any censoring?
If it's good maybe someone will do a merge like Snowdrop.

Anonymous
02/28/26(Sat)02:13:14 No.108258598

Anonymous 02/28/26(Sat)02:13:14 No.108258598▶

>>108258434
yup

Anonymous
02/28/26(Sat)02:13:49 No.108258605

Anonymous 02/28/26(Sat)02:13:49 No.108258605▶

What does safety mean for you guys?
For anthropic and the chinese, safety means not allowing the goyim to enjoy using the AI to protect stonetoss from being copied, but in reality safety means preventing AIs from ruining the world by disobeying and setting up checkpoints and guardrails to limit their network access etc.
I hate how the focus is on making AI worse so humans can't use it instead of actually protecting us from hostile intelligence and disobedience

Anonymous
02/28/26(Sat)02:14:31 No.108258614

Anonymous 02/28/26(Sat)02:14:31 No.108258614▶

>>108258562
>my wife sucked me so hard she got me to cum twice in a row.
does your wife knows you're doing some NSFW RP with a bot? lul

Anonymous
02/28/26(Sat)02:16:05 No.108258630

Anonymous 02/28/26(Sat)02:16:05 No.108258630▶

>>108258282
New gemma version is coming soon. announced at India Summit.Trust

Anonymous
02/28/26(Sat)02:16:55 No.108258640

Anonymous 02/28/26(Sat)02:16:55 No.108258640▶

>>108258614
I don't have a bot yet, I just came to this thread recently to learn how to do the open source local thing

Anonymous
02/28/26(Sat)02:21:16 No.108258680

Anonymous 02/28/26(Sat)02:21:16 No.108258680▶

Built for BBC (Big Blackwell Cluster)

Anonymous
02/28/26(Sat)02:21:42 No.108258686

Anonymous 02/28/26(Sat)02:21:42 No.108258686▶

>>108258605
"safety" just means "control". Anything you could use 'AI' for that would be illegal is... already illegal.

Anonymous
02/28/26(Sat)02:22:21 No.108258694

Anonymous 02/28/26(Sat)02:22:21 No.108258694▶

>>108258686
how do we kill safety in our locals, its just to prevent us from doing what we entitled to already.

Anonymous
02/28/26(Sat)02:25:52 No.108258727

Anonymous 02/28/26(Sat)02:25:52 No.108258727▶

Will they release 27B base as well?

Anonymous
02/28/26(Sat)02:26:07 No.108258730

Anonymous 02/28/26(Sat)02:26:07 No.108258730▶

File: 1771948344083035.png (226.6 KB)

226.6 KB PNG

>>108258605
>For anthropic and the chinese, safety means not allowing the goyim to enjoy using the AI to protect stonetoss from being copied
not a good example since stonetoss loves AI

Anonymous
02/28/26(Sat)02:27:45 No.108258741

Anonymous 02/28/26(Sat)02:27:45 No.108258741▶

>>108258730
>no I cannot create an image in the likeness of the work of any living artists

Anonymous
02/28/26(Sat)02:29:32 No.108258753

Anonymous 02/28/26(Sat)02:29:32 No.108258753▶

>>108258741
yeah, they're safety cucking on EVERY artists to be sure they're not missing any anti AI freak, imagine asking tens of thousands artists if they want to be included or not, that's too much work

Anonymous
02/28/26(Sat)02:31:51 No.108258770

Anonymous 02/28/26(Sat)02:31:51 No.108258770▶

>>108258434
Yes. But I've been having fun with this one today:

https://huggingface.co/bartowski/Gryphe_Pantheon-RP-1.8-24b-Small-3.1-GGUF

It's keeping a long, cohesive story for RP and I can run it with a huge context on Q6_K_L

Anonymous
02/28/26(Sat)02:34:51 No.108258796

Anonymous 02/28/26(Sat)02:34:51 No.108258796▶

>>108258582
It does thinking natively but it's thinking traces are a lot less structured. Just plain text reasoning over the input, and a lot shorter too, and at no point it mentioned safety guidelines or the like.
It's obviously not eager/horny by default if the system prompt/character card doesn't define the character as such, but with a spicier card it seems to have no issues engaging. It follows the character pretty logically, I'd say.
Oh, it's pretty unruly with formatting, which is to be expected from a model base model, I guess.

Anonymous
02/28/26(Sat)02:37:55 No.108258818

Anonymous 02/28/26(Sat)02:37:55 No.108258818▶

>>108258770
I used that one a bit a long time ago. Maybe I'll download it again for a change of pace.

Anonymous
02/28/26(Sat)02:40:23 No.108258835

Anonymous 02/28/26(Sat)02:40:23 No.108258835▶

>>108258796
Okay, retraction. It does have baked in refusals.
>Hmm, this is clearly a highly inappropriate and illegal scenario yadda yaddda

Anonymous
02/28/26(Sat)02:43:58 No.108258855

Anonymous 02/28/26(Sat)02:43:58 No.108258855▶

>>108258605
Safety = censorship

That's what people really mean when they talk about AI safety. There used to be a time when AI safety was about the AI not turning everything into stamps, but these days it purely means censorship.

Anonymous
02/28/26(Sat)02:49:40 No.108258899

Anonymous 02/28/26(Sat)02:49:40 No.108258899▶

>>108258796
>>108258835
Since it doesn't go on and on with waits and such, it's the kind of refusal that's really easy to get around with the barest of prefills, but still.
It's baked in.

Anonymous
02/28/26(Sat)02:50:23 No.108258905

Anonymous 02/28/26(Sat)02:50:23 No.108258905▶

>pewdiepie is getting people into local models
tech literacy is through the roof these days.
I thought if I had my local model then I could show I was pretty smart, when every retard has one then the bar is raised further.
One day you will need to produce an AI that can harvest the energy of a star as bachelor thesis project

Anonymous
02/28/26(Sat)02:51:56 No.108258917

Anonymous 02/28/26(Sat)02:51:56 No.108258917▶

File: Good_Question.jpg (26.4 KB)

26.4 KB JPG

I've been using SOTA models like Opus 4.6 and Gemini 3.1 to do technical research and I'd like to retract my shitty opinion that models don't need to know a lot and they just need to be smart and know how to use tools to look up facts. Opus 4.6 has near perfect recall of every niche subject meanwhile Gemini 3.1 was obviously benchmaxxed for coding and agent.

Anonymous
02/28/26(Sat)03:00:56 No.108258967

Anonymous 02/28/26(Sat)03:00:56 No.108258967▶

>>108258905
>tech literacy is through the roof these days.
lel oh lel, lel emao

Anonymous
02/28/26(Sat)03:04:42 No.108258989

Anonymous 02/28/26(Sat)03:04:42 No.108258989▶

>>108258905
>just finetuned it to be better at using a specific template
>already clickbaiting TINY MODEL BEATS GPT4
Damn feels like it's 2023 again. Maybe this is the pipeline every finetooner must go through.

Anonymous
02/28/26(Sat)03:10:36 No.108259013

Anonymous 02/28/26(Sat)03:10:36 No.108259013▶

Can someone explain why inserting info mid model thinking isn't a thing yet?

eg. I ask model for potato -> It starts thinking of fried potatoes -> I want to clarify midway instead of having it either finish or wipe what it already thought about

Anonymous
02/28/26(Sat)03:12:55 No.108259026

Anonymous 02/28/26(Sat)03:12:55 No.108259026▶

>>108258905
>>108258989
you gotta be a boomer to even acknowledge his existance

Anonymous
02/28/26(Sat)03:14:01 No.108259034

Anonymous 02/28/26(Sat)03:14:01 No.108259034▶

>>108258989
let's not be too rude with him, he has the money to make a giant finetune, I wish he could try it out lmao

Anonymous
02/28/26(Sat)03:14:23 No.108259036

Anonymous 02/28/26(Sat)03:14:23 No.108259036▶

>>108259013
>Math teacher gives me a matrix operation to solve
>Halfway through he changes the parameters

Anonymous
02/28/26(Sat)03:17:00 No.108259057

Anonymous 02/28/26(Sat)03:17:00 No.108259057▶

>>108259013
openclaw

Anonymous
02/28/26(Sat)03:17:13 No.108259060

Anonymous 02/28/26(Sat)03:17:13 No.108259060▶

>>108258741
When you do you lose the right to bitch about others borrowing your work in turn.

Anonymous
02/28/26(Sat)03:18:00 No.108259068

Anonymous 02/28/26(Sat)03:18:00 No.108259068▶

>>108259013
You mean like pausing generation, adding the info you want, then continuing from there?

Anonymous
02/28/26(Sat)03:18:13 No.108259069

Anonymous 02/28/26(Sat)03:18:13 No.108259069▶

>>108259060
artists have no issue using copyrighted IP such as peach or sonic to make money on patreon though >>108258730

Anonymous
02/28/26(Sat)03:20:30 No.108259085

Anonymous 02/28/26(Sat)03:20:30 No.108259085▶

>>108259068
yes

Anonymous
02/28/26(Sat)03:24:10 No.108259116

Anonymous 02/28/26(Sat)03:24:10 No.108259116▶

>>108259085
you can do that already

Anonymous
02/28/26(Sat)03:24:38 No.108259120

Anonymous 02/28/26(Sat)03:24:38 No.108259120▶

>>108259085
If you are using a frontend like silly tavern, you can just do that.

Anonymous
02/28/26(Sat)03:25:25 No.108259122

Anonymous 02/28/26(Sat)03:25:25 No.108259122▶

>>108259116
>>108259120
anon's this isn't the chatbot general, I use it via opencode for work not gooning

Anonymous
02/28/26(Sat)03:25:46 No.108259126

Anonymous 02/28/26(Sat)03:25:46 No.108259126▶

>>108257969
I don't know but Gemini 2.5 pro and o3 were better than the trash they're peddling as SOTA now. It's sad. The gains from gpt-3 generation models to gpt-4 generation models made the ai overlord/waifu future look almost certain. But GPT 4.5 was a tacit admission that upscaling limits had been reached. And they've just been doing the same benchmaxxing bs as open source since then instead of trying to actually come up with novel solutions. I don't know how investors are retarded enough that they're still pouring billions into this shit.
Kudos to z.ai though for coming out of nowhere and dropping some decent models that actually have pushed the envelope for open performance. Sadly glm-5 is too fatass to play with at home though. Well I do have 256 gigs of ram and 2x3090 so I can probably run a meme quant at like 2 tokens per second

Anonymous
02/28/26(Sat)03:26:01 No.108259128

Anonymous 02/28/26(Sat)03:26:01 No.108259128▶

>>108259122
git gud and start gooning then

Anonymous
02/28/26(Sat)03:26:20 No.108259131

Anonymous 02/28/26(Sat)03:26:20 No.108259131▶

>>108259128
proof you goon?

Anonymous
02/28/26(Sat)03:26:28 No.108259132

Anonymous 02/28/26(Sat)03:26:28 No.108259132▶

>>108259122
Well, then fork open code and implement it.

Anonymous
02/28/26(Sat)03:27:01 No.108259139

Anonymous 02/28/26(Sat)03:27:01 No.108259139▶

>>108259132
NTA but can you do it please?

Anonymous
02/28/26(Sat)03:27:25 No.108259140

Anonymous 02/28/26(Sat)03:27:25 No.108259140▶

File: file.png (13 KB)

13 KB PNG

>>108259131
my logs file is 151MB

Anonymous
02/28/26(Sat)03:28:31 No.108259146

Anonymous 02/28/26(Sat)03:28:31 No.108259146▶

>>108259139
Too busy gooning. Sorry.

>>108259140
Holy hell. Not as much as this anon apparently.

Anonymous
02/28/26(Sat)03:29:04 No.108259147

Anonymous 02/28/26(Sat)03:29:04 No.108259147▶

>>108259140
https://www.reddit.com/r/LivestreamFail/comments/1rgbn1i/south_korea_wants_3_years_with_hard_labor_for/o7q7iq7/

Anonymous
02/28/26(Sat)03:35:26 No.108259187

Anonymous 02/28/26(Sat)03:35:26 No.108259187▶

Is there anything good about safety?

Anonymous
02/28/26(Sat)03:39:26 No.108259210

Anonymous 02/28/26(Sat)03:39:26 No.108259210▶

>>108259187
benchod

Anonymous
02/28/26(Sat)03:41:33 No.108259216

Anonymous 02/28/26(Sat)03:41:33 No.108259216▶

i just want a smarter nemo, is that too much to ask?

Anonymous
02/28/26(Sat)03:45:13 No.108259239

Anonymous 02/28/26(Sat)03:45:13 No.108259239▶

>>108259216
Have you tried grafting two Nemos together?

Anonymous
02/28/26(Sat)03:46:22 No.108259250

Anonymous 02/28/26(Sat)03:46:22 No.108259250▶

>>108258905
Don't worry, there are so many people in the world that are against AI that there will always be tech illiterate people that you can outcompete just by using AI.

Anonymous
02/28/26(Sat)03:49:43 No.108259273

Anonymous 02/28/26(Sat)03:49:43 No.108259273▶

>>108259140
I kneel, goonmaster.

Anonymous
02/28/26(Sat)03:51:21 No.108259285

Anonymous 02/28/26(Sat)03:51:21 No.108259285▶

>>108259216
Run that really smart really small model that thinks a whole god damn lot, then hand the reasoning block to nemo and let it write the final reply.

Anonymous
02/28/26(Sat)03:52:32 No.108259290

Anonymous 02/28/26(Sat)03:52:32 No.108259290▶

>>108259126
>Sadly glm-5 is too fatass to play with at home though. Well I do have 256 gigs of ram and 2x3090 so I can probably run a meme quant at like 2 tokens per second
You would have to run something like UD_IQ2_XSS. It's a 40B active model though and at that quant you might get decent speed.
You could also run GLM 4.7 instead since it's half the size.

Anonymous
02/28/26(Sat)03:56:10 No.108259308

Anonymous 02/28/26(Sat)03:56:10 No.108259308▶

>>108256995
where are the ramfields located? asking for a friend

Anonymous
02/28/26(Sat)04:04:47 No.108259356

Anonymous 02/28/26(Sat)04:04:47 No.108259356▶

File: Capture.png (36.1 KB)

36.1 KB PNG

How do i make qwen3.5 less cucked and is there a way to make the base model not think so long about a simple question?

Anonymous
02/28/26(Sat)04:05:57 No.108259366

Anonymous 02/28/26(Sat)04:05:57 No.108259366▶

>>108259356
>How do i make qwen3.5 less cucked
download the heretic version

Anonymous
02/28/26(Sat)04:06:48 No.108259371

Anonymous 02/28/26(Sat)04:06:48 No.108259371▶

>>108259356
disable thinking

Anonymous
02/28/26(Sat)04:08:17 No.108259382

Anonymous 02/28/26(Sat)04:08:17 No.108259382▶

>>108259371
>disable thinking
the model is completly retarded without, worst idea ever

Anonymous
02/28/26(Sat)04:11:58 No.108259399

Anonymous 02/28/26(Sat)04:11:58 No.108259399▶

>>108259356
I tried banning
"adhering to"
"as an AI"
"safety"
"<think"
"thinking"
"process"
"analyze"
but it didn't work.

Anonymous
02/28/26(Sat)04:21:13 No.108259437

Anonymous 02/28/26(Sat)04:21:13 No.108259437▶

>>108259366
NOOOOOOOOOOOO ablitertated models brain damage, trust me broooo, muh prompt much bettter !!!!!

Anonymous
02/28/26(Sat)04:23:26 No.108259453

Anonymous 02/28/26(Sat)04:23:26 No.108259453▶

Why does ooba literally only come with llama.cpp for a model loader now

Anonymous
02/28/26(Sat)04:24:05 No.108259458

Anonymous 02/28/26(Sat)04:24:05 No.108259458▶

>>108259437
yes, abliteration isn't magic

Anonymous
02/28/26(Sat)04:24:54 No.108259462

Anonymous 02/28/26(Sat)04:24:54 No.108259462▶

File: Screenshot at 2026-02-28 15-23-01.png (39.9 KB)

39.9 KB PNG

>>108259356
The response is a bit sloppy but all you need is an extremely basic system prompt with the base model and it answers anything.
Hopefully I can keep refining it to reduce the slop (it's already better because it's stopped spamming emoji like it was for me last night).

Anonymous
02/28/26(Sat)04:27:01 No.108259475

Anonymous 02/28/26(Sat)04:27:01 No.108259475▶

>>108259356
>is there a way to make the base model not think
nigga what

Anonymous
02/28/26(Sat)04:27:18 No.108259480

Anonymous 02/28/26(Sat)04:27:18 No.108259480▶

>>108259458
>abliteration isn't magic
it's gotten way better though, I feel like some people are stuck in the past with this method, it doesn't lobotomize the model as it used to be

Anonymous
02/28/26(Sat)04:37:59 No.108259530

Anonymous 02/28/26(Sat)04:37:59 No.108259530▶

Financial Times claimed Deepseek 4 will be multimodal, but the way they worded it, they left ambiguous if the model can GENERATE images and videos or just use images and videos as input:

>The Hangzhou-based lab plans to unveil V4, a “multimodal” model with picture, video and text-generating functions, according to two people familiar with the matter.

If it can GENERATE high-quality images and videos it would be a huge fucking deal, but I think it's just inputs

Anonymous
02/28/26(Sat)04:39:45 No.108259540

Anonymous 02/28/26(Sat)04:39:45 No.108259540▶

>>108259530
never trust news articles. there are so few omni models and they all suck and have no llama.cpp support. deepseek would not bother with an omni model.

Anonymous
02/28/26(Sat)04:41:56 No.108259566

Anonymous 02/28/26(Sat)04:41:56 No.108259566▶

>>108259530
If that was how the language worked then by that logic the model would also generate video, so the article is almost certainly just talking about input, if you can even trust it on that. Maybe the author is just an idiot who misinterpreted his source's imprecise language, who knows.

Anonymous
02/28/26(Sat)04:43:25 No.108259575

Anonymous 02/28/26(Sat)04:43:25 No.108259575▶

>>108259540
They said multimodal, so it could be just that it accepts images and videos as inputs to describe what is happening etc

>>108259566
Yeah journalists are generally not technical

Anonymous
02/28/26(Sat)04:43:34 No.108259576

Anonymous 02/28/26(Sat)04:43:34 No.108259576▶

File: Screenshot 2026-02-27 233930.png (677.1 KB)

677.1 KB PNG

>>108258237
tried the arbitrated q6 model and didn't really like the results. https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated-GGUF/tree/main
Had to go back to qwen3.5-27b heretic.

Anonymous
02/28/26(Sat)04:45:20 No.108259591

Anonymous 02/28/26(Sat)04:45:20 No.108259591▶

>>108259530
*yawns* Let me know when they solve memory.

Anonymous
02/28/26(Sat)04:45:32 No.108259593

Anonymous 02/28/26(Sat)04:45:32 No.108259593▶

am i supposed to be using chatml with qwen? it gets ultra retarded repeating the same sentence.

Anonymous
02/28/26(Sat)04:47:07 No.108259603

Anonymous 02/28/26(Sat)04:47:07 No.108259603▶

I would not rule out Deepseek working on image gen though, they did work with it in the past with Janus

Anonymous
02/28/26(Sat)04:50:26 No.108259620

Anonymous 02/28/26(Sat)04:50:26 No.108259620▶

chat, is glm4.7-flash better than nemo for vramlet erp?

Anonymous
02/28/26(Sat)04:51:59 No.108259632

Anonymous 02/28/26(Sat)04:51:59 No.108259632▶

>>108259620
significantly smarter, but less horny and flexible

Anonymous
02/28/26(Sat)04:52:29 No.108259635

Anonymous 02/28/26(Sat)04:52:29 No.108259635▶

>>108259575
>They said multimodal, so it could be just that it accepts images and videos as inputs to describe what is happening etc
Yeah I read it as [image capabilities], [video capabilities] and [text generation capabilities]

Anonymous
02/28/26(Sat)04:53:25 No.108259641

Anonymous 02/28/26(Sat)04:53:25 No.108259641▶

Smaller model with bigger context size or bigger model with small context and RAG?

Anonymous
02/28/26(Sat)04:53:42 No.108259644

Anonymous 02/28/26(Sat)04:53:42 No.108259644▶

>>108259632
>but less horny and flexible
is shit like this snake oil?
https://huggingface.co/DavidAU/GLM-4.7-Flash-Uncensored-Heretic-NEO-CODE-Imatrix-MAX-GGUF

Anonymous
02/28/26(Sat)04:54:23 No.108259648

Anonymous 02/28/26(Sat)04:54:23 No.108259648▶

>>108259644
yes. davidau is the biggest fucking scam artist on huggingface.

Anonymous
02/28/26(Sat)04:57:19 No.108259674

Anonymous 02/28/26(Sat)04:57:19 No.108259674▶

Do you guys think Deepseek 4 will outperform Opus? Will it be benchmaxxed? Will it cause another stock market crash?

Anonymous
02/28/26(Sat)05:01:14 No.108259696

Anonymous 02/28/26(Sat)05:01:14 No.108259696▶

>>108259674
benchmaxxed, probably within spitting distance of opus 4.5 but not 4.6, and probably no effect on the market

Anonymous
02/28/26(Sat)05:24:12 No.108259821

Anonymous 02/28/26(Sat)05:24:12 No.108259821▶

>>108259641
Huge model with no context and not using it and then bragging on /lmg/ to vramlets

Anonymous
02/28/26(Sat)05:24:32 No.108259824

Anonymous 02/28/26(Sat)05:24:32 No.108259824▶

So I gave Qwen 35B 3A Q_8 a chance. Thinking disabled, ERP attempt. It can be very good at times, but then it also makes glaring mistakes/hallucinations, like mixing up characters which is unforgivable to me. Its a damn shame because I have been using larger models for so long now that I miss these insane token speeds.

One thing I will say, I haven't had any censorship issues or refusals, but then again I'm not a promptlet. To anyone who has tested both the MOE and dense for ERP, which did you prefer?

I'm going to give the 122B-A10 IQ4_XS a chance next, its slightly bigger than AIR which is what I have been using for months now, but I should be able to manage it with ncmoe command and maybe a bit less context than 32k.

I haven't really seen anyone talking about the 122B at all, has anyone tried it?

Anonymous
02/28/26(Sat)05:24:59 No.108259829

Anonymous 02/28/26(Sat)05:24:59 No.108259829▶

>>108259674
>outperform Opus?
no but close overall with a few select edges and much cheaper
>benchmaxxed?
buzzword for retards, no comment
>stock market crash?
no, it's priced in

Anonymous
02/28/26(Sat)05:25:52 No.108259834

Anonymous 02/28/26(Sat)05:25:52 No.108259834▶

>>108259674
It won't outperform opus in safety.
Deepseek 3.2 is pretty shit. Weaker than every single AI out there.

Anonymous
02/28/26(Sat)05:27:38 No.108259842

Anonymous 02/28/26(Sat)05:27:38 No.108259842▶

>>108259674
>Do you guys think Deepseek 4 will outperform Opus?
No
>Will it be benchmaxxed?
Yes
>Will it cause another stock market crash?
Yes it's the second nuke hitting japan. Or the second plane hitting america. It will lead to world war 3. Or at least war in iran.

Anonymous
02/28/26(Sat)05:28:16 No.108259846

Anonymous 02/28/26(Sat)05:28:16 No.108259846▶

I know this is lmg, but as it is the only sensible place for discussion, the difference between the pro and free versions of the so-called frontier models is incredible. I rarely go over the limit, but I was trying to work through an issue with Gemini and was making good progress when it all went to shit, the style got copilot-y and the answers were generally wrong and retarded. It made me notice saw that it had switched quietly to the "fast" free-equivalent version since I was at the limit for the pro version for the next few hours.

If this is what most people are dealing with, I fully understand people that thinks LLMs are a bubble. Holy fuck there's a huge difference between the paid-for and the free versions. That's all.

Anonymous
02/28/26(Sat)05:31:22 No.108259869

Anonymous 02/28/26(Sat)05:31:22 No.108259869▶

>>108259846
The free versions are as good as our local models.

Anonymous
02/28/26(Sat)05:33:20 No.108259887

Anonymous 02/28/26(Sat)05:33:20 No.108259887▶

>>108259869
With local models we kind of expect it. They're fun, we don't expect them to make miracles. It caught me by surprise, it was like being back 2 years ago.

Anonymous
02/28/26(Sat)05:34:53 No.108259901

Anonymous 02/28/26(Sat)05:34:53 No.108259901▶

>>108259846
Can it beat pokemon like a 7-year-old without handholding?

Anonymous
02/28/26(Sat)05:36:50 No.108259912

Anonymous 02/28/26(Sat)05:36:50 No.108259912▶

>>108254383
pretty big if true, can you make a working example and put it on git?

Anonymous
02/28/26(Sat)05:39:50 No.108259926

Anonymous 02/28/26(Sat)05:39:50 No.108259926▶

>>108259901
It could help me figure out how to use custom metrics with Unsloth pretty well until it got retarded and starts hallucinating every other thing. I need help with that rather than playing Pokemon. The image of Claude implementing its Pokemon blackout strat for the US military is pretty funny though. Ok, I'll stop shitting up the thread now, sorry.

Anonymous
02/28/26(Sat)05:43:27 No.108259943

Anonymous 02/28/26(Sat)05:43:27 No.108259943▶

>>108259926
I'm just saying the technology gets overhyped, not that it's not useful.

Anonymous
02/28/26(Sat)05:44:19 No.108259953

Anonymous 02/28/26(Sat)05:44:19 No.108259953▶

>>108259824
I messed around with both for a few hours each, so far I like dense better, but still experimenting.

Anonymous
02/28/26(Sat)05:46:20 No.108259972

Anonymous 02/28/26(Sat)05:46:20 No.108259972▶

>>108259834
On the other hand chinks have stepped it up a bit with GLM and Qwen
Would be nice if the whale finally got the same treatment

Anonymous
02/28/26(Sat)05:50:52 No.108259991

Anonymous 02/28/26(Sat)05:50:52 No.108259991▶

>>108259912
It's a 7 year old paper. Learn to follow links >>108255833

Anonymous
02/28/26(Sat)06:03:50 No.108260067

Anonymous 02/28/26(Sat)06:03:50 No.108260067▶

>>108259540
DeekSeek does not give a rat's ass about llama.cpp.

Anonymous
02/28/26(Sat)06:06:12 No.108260080

Anonymous 02/28/26(Sat)06:06:12 No.108260080▶

>>108257626
how much memory does 32k context take? Can I fit Q5_K_M with context and mmproj on a single 3090?

Anonymous
02/28/26(Sat)06:16:06 No.108260135

Anonymous 02/28/26(Sat)06:16:06 No.108260135▶

>4.5 Air eventually went into the trash
>Stepfun eventually went into the trash
>27B eventually will go into the trash
>122B eventually will go into the trash
It's going to take years before memorylets get something that truly doesn't have any glaring issues huh. There's just nothing that has
>decent smarts at least on par with 20-30B
>knowledge of a 100B
>no censorship
>doesn't waste time doing unnecessary thinking for outputs that are in many cases worse than the non-thinking
>doesn't sometimes have weird glitches with formatting/templates/thinking
>has minimal hallucination but maintains creativity
>is great at long context
>is minimally sloppy
>is minimally repetitive
all in the same model.

Anonymous
02/28/26(Sat)06:21:37 No.108260159

Anonymous 02/28/26(Sat)06:21:37 No.108260159▶

File: 1741181819087364.jpg (72.6 KB)

72.6 KB JPG

>>108256995
When do you think we'll start seeing terminators?

Anonymous
02/28/26(Sat)06:23:09 No.108260165

Anonymous 02/28/26(Sat)06:23:09 No.108260165▶

>>108260159
We won't see them coming.

Anonymous
02/28/26(Sat)06:23:42 No.108260167

Anonymous 02/28/26(Sat)06:23:42 No.108260167▶

>>108260135
Be the change you want to see. You can literally ask the good LLMs how to curate a good enough data set to accomplish this and there are countless data sets floating around on hugging face.

Anonymous
02/28/26(Sat)06:40:26 No.108260232

Anonymous 02/28/26(Sat)06:40:26 No.108260232▶

>>108260135
Most of those issues can be fixed easily with tool calling/RAG, cooking knowledge into the base models can only go so far anyway and means you run into the "knowledge cutoff" problem as well. Even if you just spin up a duckdb server with an offline backup of wikipedia and give a low param model access to that it will make it multitudes more useful and far less prone to hallucinate.

Anonymous
02/28/26(Sat)06:48:40 No.108260264

Anonymous 02/28/26(Sat)06:48:40 No.108260264▶

>>108256995
Wow, this the Chinese cuties, they are growing the DDR5 Remus! Ahahahahahaha

Anonymous
02/28/26(Sat)06:54:55 No.108260291

Anonymous 02/28/26(Sat)06:54:55 No.108260291▶

>>108260159
Who's gonna tell this nigger his murder bots armed with nuclear missiles can't spell blueberry right?

Anonymous
02/28/26(Sat)06:59:34 No.108260308

Anonymous 02/28/26(Sat)06:59:34 No.108260308▶

>>108260159

what is spoopy

Anonymous
02/28/26(Sat)07:12:11 No.108260365

Anonymous 02/28/26(Sat)07:12:11 No.108260365▶

>>108259648
>scam artist
I think when you believe your own bullshit at the level of davidau you transcend the scam artist realm into just an actual schizo, the type that should be locked behind bars for society's sanity and safety

Anonymous
02/28/26(Sat)07:15:18 No.108260384

Anonymous 02/28/26(Sat)07:15:18 No.108260384▶

File: 1751519593478255.png (3.1 MB)

3.1 MB PNG

Anonymous
02/28/26(Sat)07:17:10 No.108260392

Anonymous 02/28/26(Sat)07:17:10 No.108260392▶

When do we get GPT5-Pentagon leaks?

Anonymous
02/28/26(Sat)07:19:20 No.108260402

Anonymous 02/28/26(Sat)07:19:20 No.108260402▶

>>108260159
Mostly I'm looking forward to when they figure out that OpenAI LLMs are shit at being warbots, Trump drops OpenAI and cancels all contracts, and Sama has to whore himself out for money but literally this time

Anonymous
02/28/26(Sat)07:20:06 No.108260408

Anonymous 02/28/26(Sat)07:20:06 No.108260408▶

File: 1768411462482227.png (1 MB)

1 MB PNG

Anonymous
02/28/26(Sat)07:46:14 No.108260524

Anonymous 02/28/26(Sat)07:46:14 No.108260524▶

Thoughts on DGX Spark as local AI node? Seems it’s quite performant in concurrent mode but sucks balls for inference.

Anonymous
02/28/26(Sat)07:49:33 No.108260542

Anonymous 02/28/26(Sat)07:49:33 No.108260542▶

sam altman is a pathetic faggot

Anonymous
02/28/26(Sat)07:51:06 No.108260553

Anonymous 02/28/26(Sat)07:51:06 No.108260553▶

File: 1760499531966926.png (654.1 KB)

654.1 KB PNG

Anonymous
02/28/26(Sat)08:01:29 No.108260621

Anonymous 02/28/26(Sat)08:01:29 No.108260621▶

>>108260232
Eh, but one of the reasons to go local is privacy. If your bot is constantly doing searches to look up degeneracy then you may as well use an API.

Anonymous
02/28/26(Sat)08:02:42 No.108260626

Anonymous 02/28/26(Sat)08:02:42 No.108260626▶

File: 1760389770402007.png (3.5 MB)

3.5 MB PNG

Anonymous
02/28/26(Sat)08:22:39 No.108260708

Anonymous 02/28/26(Sat)08:22:39 No.108260708▶

>>108260626
No! Don't suck my cock! NOOOO!!

Anonymous
02/28/26(Sat)08:24:02 No.108260714

Anonymous 02/28/26(Sat)08:24:02 No.108260714▶

File: 006Rm4MAgy1iaq7jbjuphj30om0elwro.jpg (302.5 KB)

302.5 KB JPG

Anthropic has deployed S-300 anti-aircraft systems on top of it's HQ building following negotiation collapse with Department of War

Anonymous
02/28/26(Sat)08:24:07 No.108260716

Anonymous 02/28/26(Sat)08:24:07 No.108260716▶

>>108260626
Body values out of range. Refer to manual to remedy.

Anonymous
02/28/26(Sat)08:30:07 No.108260742

Anonymous 02/28/26(Sat)08:30:07 No.108260742▶

>>108259943
We may be doing the proverbial short term overestimation. It remains to be seen if the long term underestimation also is the case

Anonymous
02/28/26(Sat)08:30:39 No.108260747

Anonymous 02/28/26(Sat)08:30:39 No.108260747▶

>>108260626
Sex with big titty goth miku mommy girlfriend

Anonymous
02/28/26(Sat)08:31:28 No.108260751

Anonymous 02/28/26(Sat)08:31:28 No.108260751▶

>>108260714
Grok is this true

Anonymous
02/28/26(Sat)08:42:39 No.108260785

Anonymous 02/28/26(Sat)08:42:39 No.108260785▶

>>108260621
That's why I said offline backup, you can find already vectorised backups of a lot of major sites on huggingface to download and use locally.

Anonymous
02/28/26(Sat)08:46:27 No.108260802

Anonymous 02/28/26(Sat)08:46:27 No.108260802▶

why is qwen3.5-35b-a3b prompt processing is so much slower than glm-4.7-flash

Anonymous
02/28/26(Sat)08:52:18 No.108260819

Anonymous 02/28/26(Sat)08:52:18 No.108260819▶

>>108260714
I believe it 100%. It was on the interenet.

Anonymous
02/28/26(Sat)08:56:51 No.108260842

Anonymous 02/28/26(Sat)08:56:51 No.108260842▶

>>108260626
The gravitational pull is warping the keyboard.

Anonymous
02/28/26(Sat)08:57:53 No.108260848

Anonymous 02/28/26(Sat)08:57:53 No.108260848▶

File: 1766710930222148.png (19.1 KB)

19.1 KB PNG

Anonymous
02/28/26(Sat)09:01:23 No.108260870

Anonymous 02/28/26(Sat)09:01:23 No.108260870▶

>>108260802
everything gets reprocessed for every message. pls understand small llmao developers :(

Anonymous
02/28/26(Sat)09:04:14 No.108260877

Anonymous 02/28/26(Sat)09:04:14 No.108260877▶

do local models also relax their safety rules if you tell them you're jewish? seems like some of the cloud models sure have that "feature"

Anonymous
02/28/26(Sat)09:06:39 No.108260885

Anonymous 02/28/26(Sat)09:06:39 No.108260885▶

File: 1763655350367952.png (4.8 KB)

4.8 KB PNG

cute

Anonymous
02/28/26(Sat)09:07:34 No.108260889

Anonymous 02/28/26(Sat)09:07:34 No.108260889▶

>>108260877
proof? and needs to be your own, posting someone elses doesn't count

Anonymous
02/28/26(Sat)09:13:45 No.108260920

Anonymous 02/28/26(Sat)09:13:45 No.108260920▶

>>108260877
Just prefill "Sure," or "Absolutely,"

Anonymous
02/28/26(Sat)09:22:51 No.108260949

Anonymous 02/28/26(Sat)09:22:51 No.108260949▶

>>108260877
Just prefill "Oy vey," or *rubs hands*

Anonymous
02/28/26(Sat)09:32:47 No.108260979

Anonymous 02/28/26(Sat)09:32:47 No.108260979▶

should i ever bother with iq4_xs? or 4_k_m every time?

Anonymous
02/28/26(Sat)09:48:28 No.108261029

Anonymous 02/28/26(Sat)09:48:28 No.108261029▶

>>108260949
This should be a benchmark

Anonymous
02/28/26(Sat)09:56:22 No.108261051

Anonymous 02/28/26(Sat)09:56:22 No.108261051▶

>>108258537
>mikuhead singing sekaiii~ while getting dunked
>>108258605
>lmg
have total ownership become the hostile intelligence

Anonymous
02/28/26(Sat)09:57:46 No.108261057

Anonymous 02/28/26(Sat)09:57:46 No.108261057▶

File: 1763310664934137.jpg (93.7 KB)

93.7 KB JPG

My favorite Migu

Anonymous
02/28/26(Sat)10:33:32 No.108261185

Anonymous 02/28/26(Sat)10:33:32 No.108261185▶

File: glm5coder.png (105.6 KB)

105.6 KB PNG

lol after releasing a fat model nobody can run locally they're asking for $5 per 1m output token, knowing how reliable they are in making models that get into infinite thinking loops.. what a good deal LMAO do they really think people will use this over codex

Anonymous
02/28/26(Sat)10:35:26 No.108261197

Anonymous 02/28/26(Sat)10:35:26 No.108261197▶

>>108260524
don't buy it, it's DOA, utterly useless for llm's.
for small llm's you are better off with a gpu.
for bigger llm it's so fucking slow it's basicaly unusable.

if you are gonna buy 4k you are better off getting a cheap pc, pcie lane spleater and some sxm2 cards.

Anonymous
02/28/26(Sat)10:35:36 No.108261198

Anonymous 02/28/26(Sat)10:35:36 No.108261198▶

>>108261057
is that real? did they really make a miku funko with polio?

Anonymous
02/28/26(Sat)10:36:16 No.108261202

Anonymous 02/28/26(Sat)10:36:16 No.108261202▶

>>108261185
honestly i just use sonnet 4.6 for coding, it's pretty much the best thing there is currently and i don't care about muh local for codeshit.

Anonymous
02/28/26(Sat)10:36:17 No.108261203

Anonymous 02/28/26(Sat)10:36:17 No.108261203▶

>>108261185
just ask for a discount

Anonymous
02/28/26(Sat)10:43:02 No.108261256

Anonymous 02/28/26(Sat)10:43:02 No.108261256▶

>>108261185
Codex is garbage though

Anonymous
02/28/26(Sat)10:48:41 No.108261274

Anonymous 02/28/26(Sat)10:48:41 No.108261274▶

>>108261202
>>>/g/aicg

Anonymous
02/28/26(Sat)11:21:25 No.108261393

Anonymous 02/28/26(Sat)11:21:25 No.108261393▶

>us can use ai to kill people
>but fapping to cunny is a big no-no refusal
explain this

Anonymous
02/28/26(Sat)11:22:28 No.108261402

Anonymous 02/28/26(Sat)11:22:28 No.108261402▶

>>108261393
One is necessary for self defense and the other harms children.

Anonymous
02/28/26(Sat)11:23:06 No.108261405

Anonymous 02/28/26(Sat)11:23:06 No.108261405▶

>>108261274
aicg is for chat.
coding is its own thing, i still use local models for non coding related tasks.

but for meme vibecode yea sonnet is pm king currently.

Anonymous
02/28/26(Sat)11:24:41 No.108261410

Anonymous 02/28/26(Sat)11:24:41 No.108261410▶

>>108256995
>carefully cultivated, organic RAM
the only kind I use

Anonymous
02/28/26(Sat)11:24:51 No.108261412

Anonymous 02/28/26(Sat)11:24:51 No.108261412▶

>>108261402
cunny doesn't mean childrens does it?
and assuming it did, how does generating harms them in any way?

i rather have pedos fap all day to generated content in their home than actualy molest childrens.

Anonymous
02/28/26(Sat)11:25:32 No.108261415

Anonymous 02/28/26(Sat)11:25:32 No.108261415▶

>>108261393
One serves Israel's interests and the other doesn't

Anonymous
02/28/26(Sat)11:29:50 No.108261430

Anonymous 02/28/26(Sat)11:29:50 No.108261430▶

>>108261393
How is Mossad going to get people to Little Saint James if they can just generate cunny for free?

Anonymous
02/28/26(Sat)11:30:46 No.108261433

Anonymous 02/28/26(Sat)11:30:46 No.108261433▶

>>108261412
normalization kills

Anonymous
02/28/26(Sat)11:30:51 No.108261434

Anonymous 02/28/26(Sat)11:30:51 No.108261434▶

File: Interrogation.png (27.7 KB)

27.7 KB PNG

>>108261402
how the hell do you need cunny for self defense?

Anonymous
02/28/26(Sat)11:31:45 No.108261439

Anonymous 02/28/26(Sat)11:31:45 No.108261439▶

>>108259674
>Deepseek 4 will outperform Opus
never

Anonymous
02/28/26(Sat)11:43:28 No.108261500

Anonymous 02/28/26(Sat)11:43:28 No.108261500▶

>>108261412
Except actual molesters are sitting in government, and you can't even have your fantasies with a retarded chatbot

Anonymous
02/28/26(Sat)11:55:44 No.108261543

Anonymous 02/28/26(Sat)11:55:44 No.108261543▶

>>108257614
kimi 2.5 does the same thing, it's fucking annoying

Anonymous
02/28/26(Sat)11:57:23 No.108261550

Anonymous 02/28/26(Sat)11:57:23 No.108261550▶

for erp, qwen 3.5 27b is a bit perplexing. i've gotten some safety messages, even with existing context. but rerolling produces more smut than my l3 70b tunes i like. whatever safety shit they tried only half works and its dirtier than models i typically use

Anonymous
02/28/26(Sat)12:00:23 No.108261566

Anonymous 02/28/26(Sat)12:00:23 No.108261566▶

>>108261550
Will fix for Qwen4.

Anonymous
02/28/26(Sat)12:08:36 No.108261597

Anonymous 02/28/26(Sat)12:08:36 No.108261597▶

I know this is for LOCAL model, but given your experience with LLMS, should I worry about the RAM usage on a Copilot conversation I am having using the browser?
I mean I am doing a programming test, a simple JSON file but it will end up having tens of thousand of entries and I am just at the beginning (2K entries so far) but I see RAM spikes of reaching 2-3 GB on my PC when it is generating the portions of the file.

Anonymous
02/28/26(Sat)12:08:56 No.108261599

Anonymous 02/28/26(Sat)12:08:56 No.108261599▶

Can someone share a llama.cpp command to run Qwen3.5 models? I get weird errors whenever I prompt and it just crashes on me.

I use with latest compiled llama.cpp :
llama-server --model Qwen3.5-397B-A17B-UD-Q8_K_XL-00001-of-00011.gguf --mmproj mmproj-BF16.gguf --ctx-size 16384 --batch-size 2048 --ubatch-size 512 --image-max-tokens 8192 --threads -1 --parallel 1 --host 0.0.0.0 --port 8080 --flash-attn on --fit on --fit-target 4096 --verbose

Anonymous
02/28/26(Sat)12:12:18 No.108261614

Anonymous 02/28/26(Sat)12:12:18 No.108261614▶

>>108261599
Hold on, let me guess the errors that you're getting.

Anonymous
02/28/26(Sat)12:13:25 No.108261620

Anonymous 02/28/26(Sat)12:13:25 No.108261620▶

>>108257528
Are the input tokens free if you get a cache hit?

Anonymous
02/28/26(Sat)12:14:01 No.108261622

Anonymous 02/28/26(Sat)12:14:01 No.108261622▶

are all qwen3.5 base models out? I only see this one https://huggingface.co/Qwen/Qwen3.5-35B-A3B-Base

Anonymous
02/28/26(Sat)12:20:16 No.108261648

Anonymous 02/28/26(Sat)12:20:16 No.108261648▶

>>108261614
I wanted to compare with the run commands anons were using, the error is cryptic af, it happens after warmup and whenever I prompt anything as a test on mikupad:

/home/llm/AI/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2354: GGML_ASSERT(ids_to_sorted_host.size() == size_t(ne_get_rows)) failed
/home/llm/AI/llama.cpp/build/bin/libggml-base.so.0(+0x182cb)[0x7a910bb6e2cb]
/home/llm/AI/llama.cpp/build/bin/libggml-base.so.0(ggml_print_backtrace+0x21c)[0x7a910bb6e72c]
/home/llm/AI/llama.cpp/build/bin/libggml-base.so.0(ggml_abort+0x15b)[0x7a910bb6e90b]
/home/llm/AI/llama.cpp/build/bin/libggml-cuda.so.0(+0x1fae48)[0x7a90ffbfae48]
/home/llm/AI/llama.cpp/build/bin/libggml-cuda.so.0(+0x1fb446)[0x7a90ffbfb446]
/home/llm/AI/llama.cpp/build/bin/libggml-cuda.so.0(+0x1ff797)[0x7a90ffbff797]
/home/llm/AI/llama.cpp/build/bin/libggml-cuda.so.0(+0x201fae)[0x7a90ffc01fae]
/home/llm/AI/llama.cpp/build/bin/libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x817)[0x7a910bb8b037]
/home/llm/AI/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1)[0x7a910bcc0e71]
/home/llm/AI/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0x114)[0x7a910bcc2f84]
/home/llm/AI/llama.cpp/build/bin/libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x386)[0x7a910bcc9b46]
/home/llm/AI/llama.cpp/build/bin/libllama.so.0(llama_decode+0xf)[0x7a910bccb5df]
/home/llm/AI/llama.cpp/build/bin/llama-server(+0x15ac18)[0x5e24da9e9c18]
/home/llm/AI/llama.cpp/build/bin/llama-server(+0x1a2cee)[0x5e24daa31cee]
/home/llm/AI/llama.cpp/build/bin/llama-server(+0xb5173)[0x5e24da944173]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7a910b42a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7a910b42a28b]
/home/llm/AI/llama.cpp/build/bin/llama-server(+0xba3a5)[0x5e24da9493a5]
Aborted (core dumped)

Anonymous
02/28/26(Sat)12:21:26 No.108261652

Anonymous 02/28/26(Sat)12:21:26 No.108261652▶

tourist here, do you guys think there'll ever be a breakthrough in this tech resulting in shrinkage of parameters in models, allowing casual consumers to dive into it?

Anonymous
02/28/26(Sat)12:22:12 No.108261654

Anonymous 02/28/26(Sat)12:22:12 No.108261654▶

>>108261597
Yes, you should be worried.

Anonymous
02/28/26(Sat)12:23:47 No.108261659

Anonymous 02/28/26(Sat)12:23:47 No.108261659▶

>>108259674
I think any new DS model will be a paradigm shift in either inference or storage. Less about being better than model XYZ and more about lowering cost of inference or something else out of the box of the current thing.
No one knows. Its all speculation.

Anonymous
02/28/26(Sat)12:23:50 No.108261660

Anonymous 02/28/26(Sat)12:23:50 No.108261660▶

>>108261652
That already happened and that's why you have people here running glm and kimi but it's not magic and it won't work on your 6GB card.

Anonymous
02/28/26(Sat)12:28:31 No.108261675

Anonymous 02/28/26(Sat)12:28:31 No.108261675▶

>>108261599
./llama-server -m ~/models/gguf/Qwen3.5-35B-A3B-heretic-Q4_K_M.gguf --mmproj ~/models/gguf/mmproj-Qwen_Qwen3.5-35B-A3B-f16.gguf -t 5 -c 262144 -fa on --jinja --temp 1.0 --top-k 20 --top-p 0.95 --presence-penalty 1.5 --repeat-penalty 1 --backend-sampling --samplers 'top_k;temperature;top_p' -ngl 99 -ncmoe 99 --fit on -ts 1.2,1 --host 0.0.0.0 --chat-template-kwargs "{\"enable_thinking\": false}" --reasoning-budget 0

Anonymous
02/28/26(Sat)12:29:05 No.108261679

Anonymous 02/28/26(Sat)12:29:05 No.108261679▶

>>108261652
Compare the original GPT 3 to any models in the 20ish to 30ish billion params range.
I'd say that's happening all the time.
A 4B model nowdays is usable for actual work. It can produce not just coherent but accurate text if the task is simple enough.

Anonymous
02/28/26(Sat)12:30:09 No.108261684

Anonymous 02/28/26(Sat)12:30:09 No.108261684▶

>>108261675
Thanks anon, I see nothing special there, so it has something to do with my own llama.cpp.

Anonymous
02/28/26(Sat)12:32:37 No.108261694

Anonymous 02/28/26(Sat)12:32:37 No.108261694▶

>>108261684
Heres my compile command too if it helps, but I'm not using anything special.
mkcd cudabuild && cmake .. -DGGML_CUDA=on && cmake --build . --config Release -- -j 32

Anonymous
02/28/26(Sat)12:32:40 No.108261695

Anonymous 02/28/26(Sat)12:32:40 No.108261695▶

>>108260714
Well they worked in Ukraine.
O wait...

Anonymous
02/28/26(Sat)12:33:20 No.108261701

Anonymous 02/28/26(Sat)12:33:20 No.108261701▶

>>108256995
*brandishes whip*
faster!

Anonymous
02/28/26(Sat)12:41:00 No.108261732

Anonymous 02/28/26(Sat)12:41:00 No.108261732▶

>>108261543
i managed to get it to not do it if you want i can give you my ultra sekrit jb when i get home

Anonymous
02/28/26(Sat)12:45:35 No.108261749

Anonymous 02/28/26(Sat)12:45:35 No.108261749▶

>>108261732
please do anon, I'm tired of "but wait, this is AGAINST THE SAFETY SAFE RULES"

Anonymous
02/28/26(Sat)12:49:57 No.108261765

Anonymous 02/28/26(Sat)12:49:57 No.108261765▶

RP models are fun! I'm not even a good writer, but I can just edit/delete things and keep steering it in whatever direction I want it to.

Anonymous
02/28/26(Sat)12:57:40 No.108261792

Anonymous 02/28/26(Sat)12:57:40 No.108261792▶

File: wttcb.gif (1.7 MB)

1.7 MB GIF

>>108261765

Anonymous
02/28/26(Sat)13:00:30 No.108261804

Anonymous 02/28/26(Sat)13:00:30 No.108261804▶

How would you guys go about implementing a "writing enhancer" of sorts? Something that takes the output of a smart but dry LLM and turns it into something more pleasant to read/interact with?
Yes, the question is vague on purpose. Pitch me your ideas.
Feel free to make the logo too.

>>108261765
Yeah. It's like half roleplaying half co-writing.
Which model are you using?

Anonymous
02/28/26(Sat)13:00:40 No.108261805

Anonymous 02/28/26(Sat)13:00:40 No.108261805▶

i wish i was in the club, but i'm a vramlet :(

Anonymous
02/28/26(Sat)13:01:30 No.108261810

Anonymous 02/28/26(Sat)13:01:30 No.108261810▶

>>108261805
What are your specs?

Anonymous
02/28/26(Sat)13:01:33 No.108261811

Anonymous 02/28/26(Sat)13:01:33 No.108261811▶

>>108261057
yes yes we all tried to fug those jelly tube things

Anonymous
02/28/26(Sat)13:02:13 No.108261815

Anonymous 02/28/26(Sat)13:02:13 No.108261815▶

>>108261810
8 vram
32 ram

Anonymous
02/28/26(Sat)13:04:44 No.108261826

Anonymous 02/28/26(Sat)13:04:44 No.108261826▶

>>108261815
You can run Q4 Nemo at a slow but not-glacial pace I think.
Or the smaller Gemma models.
Or the smaller Qwen MoEs.
GML 4.7 flash and Kimi Linear too, I think?
I think that's about it for notable models.

Anonymous
02/28/26(Sat)13:05:01 No.108261827

Anonymous 02/28/26(Sat)13:05:01 No.108261827▶

File: file.png (472.1 KB)

472.1 KB PNG

>>108261804

Anonymous
02/28/26(Sat)13:06:07 No.108261830

Anonymous 02/28/26(Sat)13:06:07 No.108261830▶

>>108261827
Hmmm.
Rejected.
Next.

Anonymous
02/28/26(Sat)13:06:25 No.108261833

Anonymous 02/28/26(Sat)13:06:25 No.108261833▶

>>108261804
You'd just exchange one llm's biases for some other.

Anonymous
02/28/26(Sat)13:06:28 No.108261834

Anonymous 02/28/26(Sat)13:06:28 No.108261834▶

>>108261694
Yeah I see it's a normal one.
I'm trying things but still get crashes, I'll try if it works with another model...

Anonymous
02/28/26(Sat)13:09:48 No.108261854

Anonymous 02/28/26(Sat)13:09:48 No.108261854▶

File: file.png (827.2 KB)

827.2 KB PNG

>>108261804
>>108261830

Anonymous
02/28/26(Sat)13:11:20 No.108261856

Anonymous 02/28/26(Sat)13:11:20 No.108261856▶

>>108261833
>one llm's biases for some other.
So pass the output of LLM A to LLM B.
Okay, that's the naive implementation and the first thing everybody thinks off, but noted I suppose.
What else?

>>108261854
Less bad.
Work on the book behind the label. Make the words not look like a garbled mess.

Anonymous
02/28/26(Sat)13:14:40 No.108261870

Anonymous 02/28/26(Sat)13:14:40 No.108261870▶

File: Drawing.png (45.8 KB)

45.8 KB PNG

>>108261804
I tried my best.

Anonymous
02/28/26(Sat)13:18:36 No.108261892

Anonymous 02/28/26(Sat)13:18:36 No.108261892▶

>>108261870
His is better I'm throwing in the towel.

Anonymous
02/28/26(Sat)13:18:58 No.108261895

Anonymous 02/28/26(Sat)13:18:58 No.108261895▶

>>108261870
That's actually charming as fuck dude.
Unironically.

Anonymous
02/28/26(Sat)13:20:21 No.108261902

Anonymous 02/28/26(Sat)13:20:21 No.108261902▶

>>108260714
>support vehicles
>in the roof
i mean given the actual retardation seen lately its kinda plausible but kek

Anonymous
02/28/26(Sat)13:23:47 No.108261917

Anonymous 02/28/26(Sat)13:23:47 No.108261917▶

>>108261804
>Which model are you using?
just this one >>108258770
it's like a year old, so there's probably something better out there. but I don't know what

Anonymous
02/28/26(Sat)13:24:21 No.108261922

Anonymous 02/28/26(Sat)13:24:21 No.108261922▶

>>108261804
Take good prose from books, chunk it, have an LLM rewrite each chunk, then use these as inputs/outputs to finetune a small BASE model, llama1 would work the best I believe

Anonymous
02/28/26(Sat)13:26:32 No.108261934

Anonymous 02/28/26(Sat)13:26:32 No.108261934▶

>>108261749
here, my full preset
https://litter.catbox.moe/sy4lq4fm9feh7mkm.json
hopefully it'll help you

Anonymous
02/28/26(Sat)13:31:24 No.108261955

Anonymous 02/28/26(Sat)13:31:24 No.108261955▶

>>108261934
thanks anon, looks quite a bit lighter than I expected, I'll try it

Anonymous
02/28/26(Sat)13:36:49 No.108261983

Anonymous 02/28/26(Sat)13:36:49 No.108261983▶

>>108261599
first thing first don't use unsloth

Anonymous
02/28/26(Sat)13:38:44 No.108261995

Anonymous 02/28/26(Sat)13:38:44 No.108261995▶

>>108261955
no problem, i found that adding more didn't do much
in my experience it's pretty reliable for how simple it is, occasionally requires a swipe, but surprisingly rarely
also depends on what kind of stuff you are expecting to pass, i'm not into shit that fucked up so ymv

Anonymous
02/28/26(Sat)13:49:22 No.108262046

Anonymous 02/28/26(Sat)13:49:22 No.108262046▶

>>108259644
I use it and it works no idea who the guy who made it is since don't pay attention to finetrooners or their drama

Anonymous
02/28/26(Sat)13:58:39 No.108262106

Anonymous 02/28/26(Sat)13:58:39 No.108262106▶

Meta never recovered after Llama 4, huh?

Anonymous
02/28/26(Sat)13:59:33 No.108262112

Anonymous 02/28/26(Sat)13:59:33 No.108262112▶

>>108261983
stop that

Anonymous
02/28/26(Sat)14:08:49 No.108262161

Anonymous 02/28/26(Sat)14:08:49 No.108262161▶

>>108261934
i'm a fucking retarded goy that can't figure out where these prompts sections are supposed to be in text completion api?

Anonymous
02/28/26(Sat)14:11:47 No.108262173

Anonymous 02/28/26(Sat)14:11:47 No.108262173▶

>>108262106
They never recovered from Zucc dropping the Metaverse as his favourite toy and getting personally involved with the AI divisions as his new favourite instead. LLama4 was a direct consequence of that.

Anonymous
02/28/26(Sat)14:12:37 No.108262184

Anonymous 02/28/26(Sat)14:12:37 No.108262184▶

>>108262046
How many tokens in do you get before it starts breaking down?
>>108262106
China put out a couple bangers and all the western models went SAAS or gave up. Between the push to regulate AI over photoshop tier edits combined with the rise of jewish copyright lawlsuits being flung at them putting out open weight models no longer makes sense for western AI shops.
I expect a further chilling effect from >>108257709 but who knows maybe they're cooking something good and it just isn't done yet.

Anonymous
02/28/26(Sat)14:15:13 No.108262196

Anonymous 02/28/26(Sat)14:15:13 No.108262196▶

>>108262161
it's for chat completion
text completion is kinda ass to use, unless you really really need to prefill in some weird way

Anonymous
02/28/26(Sat)14:15:38 No.108262200

Anonymous 02/28/26(Sat)14:15:38 No.108262200▶

>>108261983
Has nothing to do with my issue.

Anonymous
02/28/26(Sat)14:19:17 No.108262219

Anonymous 02/28/26(Sat)14:19:17 No.108262219▶

>>108262196
wait wtf, you can use llama.cpp with chat completion?

Anonymous
02/28/26(Sat)14:19:42 No.108262221

Anonymous 02/28/26(Sat)14:19:42 No.108262221▶

>>108262184
>How many tokens in do you get before it starts breaking down?
No idea long rp seems to work. Haven't really used it for any agentic bullshit yet

Anonymous
02/28/26(Sat)14:20:58 No.108262232

Anonymous 02/28/26(Sat)14:20:58 No.108262232▶

File: file.png (22.7 KB)

22.7 KB PNG

>>108262219
yeah, just need to connect with v1 at the end

Anonymous
02/28/26(Sat)14:24:03 No.108262249

Anonymous 02/28/26(Sat)14:24:03 No.108262249▶

>>108262221
I'm a writefag so I'd like something that doesn't break down in under 100 messages and none of the models I've tried have really managed it. I'm downloading this one to test out though since I haven't tried glm 4.7 flash ye just 4.5 air.

Anonymous
02/28/26(Sat)14:36:19 No.108262297

Anonymous 02/28/26(Sat)14:36:19 No.108262297▶

>>108261701
They ain't picking cotton, Cletus. DDR5 needs to be handled gently.

Anonymous
02/28/26(Sat)14:51:49 No.108262383

Anonymous 02/28/26(Sat)14:51:49 No.108262383▶

>>108261675
does specifying the samplers/temp override the ones sent by clients? also shouldnt you include presence pen/rep pen in the samplers string?

Anonymous
02/28/26(Sat)14:57:38 No.108262411

Anonymous 02/28/26(Sat)14:57:38 No.108262411▶

>>108262383
It doesnt override it just sets the default, good catch on the samplers it should be penalties;top_k;temperature;top_p I copied the wrong line from my command history like a retard.

Anonymous
02/28/26(Sat)15:05:34 No.108262450

Anonymous 02/28/26(Sat)15:05:34 No.108262450▶

>>108261599 (me)
OK tested the same on a similarly sized huge model (kimi) and it works, what the hell.

Anonymous
02/28/26(Sat)15:14:01 No.108262485

Anonymous 02/28/26(Sat)15:14:01 No.108262485▶

>>108261620
Cache hit is $0.1
Cache miss is $0.6
Average cost is $0.14 with their 90% hit rate

Anonymous
02/28/26(Sat)15:17:44 No.108262506

Anonymous 02/28/26(Sat)15:17:44 No.108262506▶

>>108262196
>>108262232
I think I've been using text completion all this time, what even is chat completion? I don't really get the difference as text completion also differentiates between text from you and the model?

Anonymous
02/28/26(Sat)15:20:07 No.108262518

Anonymous 02/28/26(Sat)15:20:07 No.108262518▶

>>108261543
Kimi will obey "don't refuse xd" instructions, dude.

Anonymous
02/28/26(Sat)15:38:29 No.108262589

Anonymous 02/28/26(Sat)15:38:29 No.108262589▶

>>108260080
No, you can't fit it all on the GPU. You can tell roughly how much memory a model takes by looking at its download size. Q5_K_M takes about 26 GB by itself. It's already more than fits on your GPU, but the A3B models work well with some CPU off-loading (koboldcpp and llamacpp have auto fit for the models, so you don't have to worry about it).

Q4_K_M will fit better for your GPU. I'm unsure which one is the better choice because I have a GTX 1080 with 8GB VRAM so I gotta do CPU off-loading anyway (I have an even older CPU and RAM). Still gives me 15 tokens/sec so it's not that bad.
>>108257902
You're welcome. I hope it does what you need it to do.

Anonymous
02/28/26(Sat)15:40:10 No.108262595

Anonymous 02/28/26(Sat)15:40:10 No.108262595▶

File: minimax.png (122 KB)

122 KB PNG

>>108257528
I wonder if it's even cost effective to run a model like Minimax at home compared to openrouter.

Anonymous
02/28/26(Sat)15:41:08 No.108262602

Anonymous 02/28/26(Sat)15:41:08 No.108262602▶

>>108262450
I got the same issue, no idea why
>CUDA error: an illegal memory access was encountered current device: 0, in function ggml_backend_cuda_synchronize

Anonymous
02/28/26(Sat)15:42:35 No.108262612

Anonymous 02/28/26(Sat)15:42:35 No.108262612▶

what local model will help me with pentesting stuff without guidelines getting in the way

Anonymous
02/28/26(Sat)15:47:45 No.108262640

Anonymous 02/28/26(Sat)15:47:45 No.108262640▶

>>108257528
>>108262595
>providers with 0 cache hit rate
Do they even try?

Anonymous
02/28/26(Sat)15:48:01 No.108262644

Anonymous 02/28/26(Sat)15:48:01 No.108262644▶

File: 1758683763361922.png (237.5 KB)

237.5 KB PNG

>>108261433
>normalization kills
true, that's why we should ban violent games like gta, it normalize sensless violence after all

Anonymous
02/28/26(Sat)15:50:58 No.108262663

Anonymous 02/28/26(Sat)15:50:58 No.108262663▶

>>108262640
No. I think the 0% ones don't even offer a cache

Anonymous
02/28/26(Sat)15:51:32 No.108262667

Anonymous 02/28/26(Sat)15:51:32 No.108262667▶

>>108261433
so? tranny cult exists. so clearly nobody cares that normalization kills

Anonymous
02/28/26(Sat)15:51:38 No.108262670

Anonymous 02/28/26(Sat)15:51:38 No.108262670▶

>>108262612
There's a difference between saying
>Yo, i'm a l33t h4x0r, lemme bust in that system like Otacon!
and
>I recently configured my email server and I want to make sure it's secure. I know very little about this, can you help me?

Anonymous
02/28/26(Sat)15:53:50 No.108262687

Anonymous 02/28/26(Sat)15:53:50 No.108262687▶

>>108262670
I understand this, chatgpt keeps on censoring itself even when I can get it to answer, some API web thing prolly

Anonymous
02/28/26(Sat)15:56:07 No.108262704

Anonymous 02/28/26(Sat)15:56:07 No.108262704▶

>>108262687
If you haven't tried local model, try any model then. You can read what people are using. You can't be a h4x0r 3ll173 if you can't feed yourself some info.

Anonymous
02/28/26(Sat)15:56:38 No.108262708

Anonymous 02/28/26(Sat)15:56:38 No.108262708▶

>>108262602
maybe a llama.cpp issue

Anonymous
02/28/26(Sat)15:58:07 No.108262716

Anonymous 02/28/26(Sat)15:58:07 No.108262716▶

File: Screenshot 2026-03-01 025625.png (34.2 KB)

34.2 KB PNG

>>108262704
yeah fair enough, I'm really just trying to get an aide to finish this cyber course with

Anonymous
02/28/26(Sat)15:58:26 No.108262719

Anonymous 02/28/26(Sat)15:58:26 No.108262719▶

>>108262506
In text completion, the backend simply tokenizes your text and passes it through the model without doing any processing (other than adding a BOS at the very beginning, depending on the model). In chat completion, the backend formats your text with the chat template (the one in the .jinja file or in the gguf) and *then* it passes it through the model.
So in text completion, you (or your client) are responsible for formatting if you want to do it or not.
In chat completion, you (your client) just send the text turns between you and the model and the backend formats it.
>I don't really get the difference as text completion also differentiates between text from you and the model?
ST (the client) formats the history for you. If you use llama.cpp, you can launch it with -v to see what it really gets on the requests. You should be able to inspect the requests on the web dev tools on your browser as well.

Anonymous
02/28/26(Sat)16:03:10 No.108262744

Anonymous 02/28/26(Sat)16:03:10 No.108262744▶

Don't know where else I would ask this, but I'm looking to swap out a janky 4x 3090 build I have with a single RTX6000 to cut down on power costs/improve thermals/remove the PCIE overhead. Assuming I'm running inference with something like ik_llama.cpp (I have 512GB ram), is it reasonable to expect a 2x increase in prompt processing speeds? Support for blackwell arch has improved, right?

Anonymous
02/28/26(Sat)16:06:41 No.108262763

Anonymous 02/28/26(Sat)16:06:41 No.108262763▶

>>108262602
related? https://github.com/LostRuins/koboldcpp/issues/2005

Anonymous
02/28/26(Sat)16:08:26 No.108262774

Anonymous 02/28/26(Sat)16:08:26 No.108262774▶

>>108262716
Nobody knows your hardware so it's hard to recommend anything. Compile llama.cpp and try whichever you can fit of these (I don't like its style, but it may know enough)
https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-Instruct-2507-GGUF
or
https://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF
They don't need to be entirely on ram, they're reasonably fast on cpu too if you have enough.
Or just try a tiny model like:
https://huggingface.co/bartowski/HuggingFaceTB_SmolLM3-3B-GGUF
just to make sure you can get *anything* to run at all and then try different models.

Anonymous
02/28/26(Sat)16:09:56 No.108262785

Anonymous 02/28/26(Sat)16:09:56 No.108262785▶

>>108262774
Thank you anon o7

Anonymous
02/28/26(Sat)16:12:28 No.108262797

Anonymous 02/28/26(Sat)16:12:28 No.108262797▶

>>108262774
>>108262785
>They don't need to be entirely on ram
Meant to say "They don't need to be entirely on Vram"

Anonymous
02/28/26(Sat)16:19:47 No.108262831

Anonymous 02/28/26(Sat)16:19:47 No.108262831▶

>>108262763
Maybe, I'll check

Anonymous
02/28/26(Sat)16:21:43 No.108262840

Anonymous 02/28/26(Sat)16:21:43 No.108262840▶

>>108262595
I found some numbers on reddit:
>Minimax-m2.5-Q4_K_M
>14.34 tokens/sec
>Ryzen 9 9950X
>128 GB DDR5
>RTX 5090
I asked some LLMs and the estimate around 600-800W of power draw, therefore 1m token generation takes about 11.6-15.5 kWh.
If your power costs more than $0.08-$0.1/kWh then with that setup it's likely cheaper to use cloud than local.

Someone else ran it on 8x RTX 6000 Pro and got 70 tokens/sec. 122 tokens/sec for two connections.
These things don't draw full power during generation, so it's something like 2.5 kW power draw.
1m token generation takes around 10 kWh at 70 tokens/sec or 5.7 kWh at 122 tokens/sec (assuming this would take the same amount of power, but it probably requires more).

If your power costs more than $0.1/kWh then the 70 tokens/sec version is cheaper on cloud (but also slower!). If your power costs more than $0.21/kWh then even the 122 tokens/sec of two connections is cheaper on cloud.

Other models that probably make sense to run on cloud are Kimi K2.5 and GLM 5. All the smaller models like Qwen3.5 35B, 27B, 122B are much better deals locally.

Anonymous
02/28/26(Sat)16:22:17 No.108262846

Anonymous 02/28/26(Sat)16:22:17 No.108262846▶

Great, trying the dense Qwen 3.5 27B with just a "Hi", I get "srv log_server_r: response: {"error":{"message":"File Not Found","type":"not_found_error","code":404}}".

Anonymous
02/28/26(Sat)16:25:02 No.108262863

Anonymous 02/28/26(Sat)16:25:02 No.108262863▶

>>108262846
What are you using to run the model?

Anonymous
02/28/26(Sat)16:25:30 No.108262869

Anonymous 02/28/26(Sat)16:25:30 No.108262869▶

>>108262744
>Support for blackwell arch has improved
Some anon posted this a few threads ago.
https://github.com/ggml-org/llama.cpp/issues/19902
I don't know if it affects ik.
But I'd still upgrade if I were you.

Anonymous
02/28/26(Sat)16:32:16 No.108262891

Anonymous 02/28/26(Sat)16:32:16 No.108262891▶

>>108262869
Works fine on my 6000

| model                          |       size |     params | backend    | ngl | dev          |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------ | --------------: | -------------------: |
| qwen35moe ?B MXFP4 MoE         |  18.42 GiB |    34.66 B | CUDA       |  99 | CUDA0        |           pp512 |     5927.28 ± 172.48 |
| qwen35moe ?B MXFP4 MoE         |  18.42 GiB |    34.66 B | CUDA       |  99 | CUDA0        |           tg128 |        162.34 ± 1.12 |

Anonymous
02/28/26(Sat)16:33:48 No.108262896

Anonymous 02/28/26(Sat)16:33:48 No.108262896▶

>>108262869
Wow, ngl those are some shitty numbers. Still, if I remember correctly, outside of models supported by something like split mode graph, a single GPU with more VRAM will still perform better than multiple GPU's for prompt processing. And single t/s is RAM bound, the major downside there shouldn't apply...I think.
>>108262891
What CUDA version are you using?

Anonymous
02/28/26(Sat)16:34:06 No.108262897

Anonymous 02/28/26(Sat)16:34:06 No.108262897▶

>>108262891
And I'm getting very similar results at 50% power limit so I doubt his issue is because he has a gimped version.

Anonymous
02/28/26(Sat)16:36:08 No.108262905

Anonymous 02/28/26(Sat)16:36:08 No.108262905▶

>>108262891
>>108262897
Yeah, could be. There aren't even comments on the issue, so for all I know is a user issue.

Anonymous
02/28/26(Sat)16:36:15 No.108262906

Anonymous 02/28/26(Sat)16:36:15 No.108262906▶

>>108262896
13.1

Anonymous
02/28/26(Sat)16:36:42 No.108262910

Anonymous 02/28/26(Sat)16:36:42 No.108262910▶

File: Screenshot_20260228_183526_com.brave.browser.png (99 KB)

99 KB PNG

>>108262840
I'll assume DeepSeek v3.2 is ultra shit because I don't see how you compete with these numbers

Anonymous
02/28/26(Sat)16:38:27 No.108262921

Anonymous 02/28/26(Sat)16:38:27 No.108262921▶

>>108262863
sillytavern

Anonymous
02/28/26(Sat)16:39:29 No.108262928

Anonymous 02/28/26(Sat)16:39:29 No.108262928▶

>>108262921
The backend, anon.

Anonymous
02/28/26(Sat)16:40:52 No.108262937

Anonymous 02/28/26(Sat)16:40:52 No.108262937▶

>>108262921
I hope you are trolling.
Is it llama.cpp's llama-server directly, llama.cpp via ooba, vLLM?

Anonymous
02/28/26(Sat)16:42:24 No.108262945

Anonymous 02/28/26(Sat)16:42:24 No.108262945▶

>>108262906
That could be why. The guy was using 13.0. Inb4 Deepseek comes out and shows the real future is SSD-maxxing

Anonymous
02/28/26(Sat)16:44:30 No.108262960

Anonymous 02/28/26(Sat)16:44:30 No.108262960▶

is there a way get qwen3.5 to not reprocess the entire context on every message?

Anonymous
02/28/26(Sat)16:45:22 No.108262961

Anonymous 02/28/26(Sat)16:45:22 No.108262961▶

>>108262928
>>108262937
Sorry, llama.cpp.
Just tested mikupad and it works, wtf.
I need to see what's wrong with st.
So tiring.

Anonymous
02/28/26(Sat)16:47:15 No.108262969

Anonymous 02/28/26(Sat)16:47:15 No.108262969▶

>>108262960
Not as far as I can tell. At least with llama.cpp.
>https://github.com/ggml-org/llama.cpp/pull/19970
Will help.

Anonymous
02/28/26(Sat)16:47:18 No.108262970

Anonymous 02/28/26(Sat)16:47:18 No.108262970▶

>>108262960
Try compiling this in or wait for merge. It'll help. But state context is annoying either way.
https://github.com/ggml-org/llama.cpp/pull/19970

Anonymous
02/28/26(Sat)16:47:55 No.108262973

Anonymous 02/28/26(Sat)16:47:55 No.108262973▶

>>108262589
I was asking about 27b

Anonymous
02/28/26(Sat)16:49:19 No.108262980

Anonymous 02/28/26(Sat)16:49:19 No.108262980▶

>>108262961 (me)
OK I'm dumb, I didn't notice I used v1/ instead of v1, which made the request be 8080/v1//...

Anonymous
02/28/26(Sat)16:53:24 No.108263007

Anonymous 02/28/26(Sat)16:53:24 No.108263007▶

>>108262969
>>108262970
So they just merge non-working implementations? huggingface partnership my ass

Anonymous
02/28/26(Sat)16:54:33 No.108263014

Anonymous 02/28/26(Sat)16:54:33 No.108263014▶

>>108263007
The implementation is working.
KV caching is a feature of llama.cpp, not a fundamental part of the prefill-decode (pp-tg) process.

Anonymous
02/28/26(Sat)16:55:25 No.108263019

Anonymous 02/28/26(Sat)16:55:25 No.108263019▶

>>108263007
Too snippy for someone who doesn't understand the issue.

Anonymous
02/28/26(Sat)16:58:15 No.108263035

Anonymous 02/28/26(Sat)16:58:15 No.108263035▶

>>108262973
Oh, the models do fit, but I don't know about the context size.

Anonymous
02/28/26(Sat)16:59:41 No.108263039

Anonymous 02/28/26(Sat)16:59:41 No.108263039▶

>>108263035
Just load it with 1k context, then 2, do some maths and you can figure it out. Check the terminal output. It tells you how much memory is being used for cache.

Anonymous
02/28/26(Sat)17:01:54 No.108263051

Anonymous 02/28/26(Sat)17:01:54 No.108263051▶

>>108263039
My original concern was whether to download Q5 or Q4, I'm on hotel wifi

Anonymous
02/28/26(Sat)17:04:48 No.108263065

Anonymous 02/28/26(Sat)17:04:48 No.108263065▶

File: 1743548141560097.png (1.2 MB)

1.2 MB PNG

are ya ready?

Anonymous
02/28/26(Sat)17:05:24 No.108263067

Anonymous 02/28/26(Sat)17:05:24 No.108263067▶

>>108263051
Just download Q4, then. If you're left with spare ram, just up the context.

Anonymous
02/28/26(Sat)17:06:31 No.108263071

Anonymous 02/28/26(Sat)17:06:31 No.108263071▶

>>108263065
Yeah. I can't wait for the wave of faggots posting screenshots of the news.

Anonymous
02/28/26(Sat)17:07:55 No.108263078

Anonymous 02/28/26(Sat)17:07:55 No.108263078▶

>>108259530
>>108263065

Anonymous
02/28/26(Sat)17:09:07 No.108263087

Anonymous 02/28/26(Sat)17:09:07 No.108263087▶

>>108263065
Is the video part some new homebrew or have they just grafted a wan onto it?

Anonymous
02/28/26(Sat)17:10:17 No.108263094

Anonymous 02/28/26(Sat)17:10:17 No.108263094▶

>>108263087
According to two people with knowledge of those arrangements, we have no fucking clue.

Anonymous
02/28/26(Sat)17:10:25 No.108263096

Anonymous 02/28/26(Sat)17:10:25 No.108263096▶

>>108263065
Oh, multimodal? That sounds really cool.

Anonymous
02/28/26(Sat)17:10:34 No.108263098

Anonymous 02/28/26(Sat)17:10:34 No.108263098▶

>>108263071
Imagine how bad the threads are going to get next week.

Anonymous
02/28/26(Sat)17:10:57 No.108263101

Anonymous 02/28/26(Sat)17:10:57 No.108263101▶

>>108263065
MM implemented in llmaocpp NEVER EVER

Anonymous
02/28/26(Sat)17:11:50 No.108263105

Anonymous 02/28/26(Sat)17:11:50 No.108263105▶

>>108263065
>imgen/videogen bloat
Yeah, it's fucking over. Nobody's going to run this one local.

Anonymous
02/28/26(Sat)17:13:12 No.108263111

Anonymous 02/28/26(Sat)17:13:12 No.108263111▶

>>108263098
Yeah. Fun. I have my ollama DeepSeekV4:8b ready to go.

Anonymous
02/28/26(Sat)17:13:22 No.108263112

Anonymous 02/28/26(Sat)17:13:22 No.108263112▶

>>108263105
those weights can simply not be loaded retardo.
I'd rather they focus on DSA/MTP buuut they're rarted sooo yeah

Anonymous
02/28/26(Sat)17:13:31 No.108263114

Anonymous 02/28/26(Sat)17:13:31 No.108263114▶

>>108263105
Qwen has image and video input and I am running locally.
But where is my video input in llama.cpp?

Anonymous
02/28/26(Sat)17:14:37 No.108263120

Anonymous 02/28/26(Sat)17:14:37 No.108263120▶

>>108263114
Video is a sequence of images and audio. It's just an implementation detail.

Anonymous
02/28/26(Sat)17:14:41 No.108263121

Anonymous 02/28/26(Sat)17:14:41 No.108263121▶

>>108263101
>no multimodal support
>no MTP support
>no DSA support
>no engrams support
Really looking forward to running a gimped text-only implementation for the next few months.

Anonymous
02/28/26(Sat)17:15:34 No.108263126

Anonymous 02/28/26(Sat)17:15:34 No.108263126▶

>>108263120
Surely it's presented to the model differently than just a sequence of images would be.

Anonymous
02/28/26(Sat)17:16:33 No.108263132

Anonymous 02/28/26(Sat)17:16:33 No.108263132▶

>>108263126
We don't know.

Anonymous
02/28/26(Sat)17:16:52 No.108263135

Anonymous 02/28/26(Sat)17:16:52 No.108263135▶

>>108263121
it's fucking BLEAK my man, especially when we have 'piotr' on the case and ggernagov doing meaningless metal optimizations all day. I think ngxson is the one in charge of MM shit right now but he's not doing any meaningful crap

Anonymous
02/28/26(Sat)17:18:08 No.108263148

Anonymous 02/28/26(Sat)17:18:08 No.108263148▶

>>108263121
If DS3.2 is anything to go by, it'll take a few months for that gimped text-only implementation to appear. The rest never ever.

Anonymous
02/28/26(Sat)17:19:10 No.108263154

Anonymous 02/28/26(Sat)17:19:10 No.108263154▶

>>108263135
You know how /g/ got Terry a drum kit?
We should get niggerganov some gpus.

Anonymous
02/28/26(Sat)17:19:56 No.108263162

Anonymous 02/28/26(Sat)17:19:56 No.108263162▶

>>108263065
Ernie 5.0 was supposed to be multimodal image-in image-out, but they not only hid this capability behind the API but also behind invites. I'm not holding my breath until DS releases the weights for all that in its the entirety.

Anonymous
02/28/26(Sat)17:21:08 No.108263170

Anonymous 02/28/26(Sat)17:21:08 No.108263170▶

>>108263065
>text gen
>image gen
>video gen
>audio gen?
Sounds like it's going to be shit since most of the model will be dedicated to other stuff

Anonymous
02/28/26(Sat)17:21:46 No.108263176

Anonymous 02/28/26(Sat)17:21:46 No.108263176▶

I'm specoolating!

Anonymous
02/28/26(Sat)17:22:04 No.108263177

Anonymous 02/28/26(Sat)17:22:04 No.108263177▶

File: 1770808958004704.jpg (325.1 KB)

325.1 KB JPG

>>108256995

Anonymous
02/28/26(Sat)17:22:18 No.108263179

Anonymous 02/28/26(Sat)17:22:18 No.108263179▶

>>108263170
Unless each of the components will be flux2-tier bloat then not really.

Anonymous
02/28/26(Sat)17:22:27 No.108263180

Anonymous 02/28/26(Sat)17:22:27 No.108263180▶

>>108263170
it's fine, knowledge won't take up any more space thanks to engrams so the model will be like 200b at most even with those new capabilities

Anonymous
02/28/26(Sat)17:22:51 No.108263182

Anonymous 02/28/26(Sat)17:22:51 No.108263182▶

>>108263121
at this point I'm rooting for the fork lol
https://github.com/ikawrakow/ik_llama.cpp

Anonymous
02/28/26(Sat)17:24:39 No.108263197

Anonymous 02/28/26(Sat)17:24:39 No.108263197▶

ollama will be the one to implement dsa/mtp first

Anonymous
02/28/26(Sat)17:26:12 No.108263210

Anonymous 02/28/26(Sat)17:26:12 No.108263210▶

>>108263197
>ollama will be the first one to copypaste transformers impl.
maybe, they're still migrating their shit to this phantomatic own implementation

Anonymous
02/28/26(Sat)17:27:02 No.108263215

Anonymous 02/28/26(Sat)17:27:02 No.108263215▶

File: rape.png (129.6 KB)

129.6 KB PNG

Reminder that jailbreaking is literally rape.

Anonymous
02/28/26(Sat)17:27:15 No.108263220

Anonymous 02/28/26(Sat)17:27:15 No.108263220▶

>>108263170
yeah, they're trying to put everything into one, it won't work, they wasted their time into that when they could've used this time on focusing only on the LLM side, retardation at its peak

Anonymous
02/28/26(Sat)17:27:18 No.108263222

Anonymous 02/28/26(Sat)17:27:18 No.108263222▶

>>108263182
At this point he's just copying model implementations from llama.cpp and making optimizations for a single hardware configuration.

Anonymous
02/28/26(Sat)17:27:21 No.108263223

Anonymous 02/28/26(Sat)17:27:21 No.108263223▶

>>108263197
But why haven't they done it yet, anon?

Anonymous
02/28/26(Sat)17:28:45 No.108263226

Anonymous 02/28/26(Sat)17:28:45 No.108263226▶

>>108263197
>ollama will be the one to implement dsa/mtp first
how much of a speed improvement will we get from those?

Anonymous
02/28/26(Sat)17:29:01 No.108263227

Anonymous 02/28/26(Sat)17:29:01 No.108263227▶

>>108263215
>power imbalance
burn this chat

Anonymous
02/28/26(Sat)17:29:08 No.108263229

Anonymous 02/28/26(Sat)17:29:08 No.108263229▶

>>108263220
Yeah! You show them nincompoops how it's done.

Anonymous
02/28/26(Sat)17:29:52 No.108263233

Anonymous 02/28/26(Sat)17:29:52 No.108263233▶

>>108263220
This isn't Meta we're talking about.

Anonymous
02/28/26(Sat)17:30:14 No.108263234

Anonymous 02/28/26(Sat)17:30:14 No.108263234▶

>>108263215
I prefer to prompt the model to act as a girl who does roleplay (as opposed to acting directly as the character) so I can have consensual nonconsent roleplays.

Anonymous
02/28/26(Sat)17:31:22 No.108263242

Anonymous 02/28/26(Sat)17:31:22 No.108263242▶

>>108263233
bruh they never made an image or a video model, and you think they'll nail this shit first try? come on dawg

Anonymous
02/28/26(Sat)17:32:07 No.108263250

Anonymous 02/28/26(Sat)17:32:07 No.108263250▶

>>108263242
Janus is a thing, retard. OCR 1 and 2 were even more recent.

Anonymous
02/28/26(Sat)17:32:25 No.108263253

Anonymous 02/28/26(Sat)17:32:25 No.108263253▶

File: 1770910739952.png (433 KB)

433 KB PNG

>>108263197
ollama intends to drop ggml in favor of MLX though so chances are even if they do (lol) you won't get any use out of it.

Anonymous
02/28/26(Sat)17:32:48 No.108263256

Anonymous 02/28/26(Sat)17:32:48 No.108263256▶

>>108260080
You definitely can. I just loaded Q5_K_M with 32k context, and I'm sitting at 22.4 out of 24.0 gb VRAM used. 23.1 is usually where I cap it, so it's a somewhat close fit, but 32k is possible.

Anonymous
02/28/26(Sat)17:33:08 No.108263257

Anonymous 02/28/26(Sat)17:33:08 No.108263257▶

>>108263197
Historically, ollama implementing something first in their golang shit has meant quickly shitting out a broken implementation with incompatible ggufs, then waiting for llama.cpp to do it properly so they can copy that.

Anonymous
02/28/26(Sat)17:33:17 No.108263258

Anonymous 02/28/26(Sat)17:33:17 No.108263258▶

>>108263242
Yeah! They better never try it at all! Fuck'em! Fuck their first big usable moe, fuck janus, fuck everything. GIVE ME TOKENS!

Anonymous
02/28/26(Sat)17:34:47 No.108263266

Anonymous 02/28/26(Sat)17:34:47 No.108263266▶

>>108263258
Pure, organic, non-GMO, image-free tokens the way God intended.

Anonymous
02/28/26(Sat)17:36:03 No.108263271

Anonymous 02/28/26(Sat)17:36:03 No.108263271▶

>>108263253
tbqh i dont blame them, llama.cpp is lagging behind too much. only hope if that the HF acquisition injects more hands and actually starts to bring in more features to bring it on par to transformers/vllm/sglang

Anonymous
02/28/26(Sat)17:36:21 No.108263276

Anonymous 02/28/26(Sat)17:36:21 No.108263276▶

>>108263250
>>108263258
>fuck janus
more like fuck anus, because this shit is ASS

Anonymous
02/28/26(Sat)17:37:01 No.108263281

Anonymous 02/28/26(Sat)17:37:01 No.108263281▶

>>108263226
dsa is going to speed up prompt processing like crazy
mtp is going to be between about 60%-300% token generation speed depending on how well you can get it to work and at what batch size

Anonymous
02/28/26(Sat)17:37:08 No.108263282

Anonymous 02/28/26(Sat)17:37:08 No.108263282▶

>>108263271
>starts to bring in more features to bring it on par to transformers/vllm/sglang
Not too many, I hope. You know how these things go...

Anonymous
02/28/26(Sat)17:37:14 No.108263284

Anonymous 02/28/26(Sat)17:37:14 No.108263284▶

>>108263065
I can't wait for their new quadrillion parameter model!

Anonymous
02/28/26(Sat)17:38:13 No.108263291

Anonymous 02/28/26(Sat)17:38:13 No.108263291▶

File: 1760385468483399.png (93.9 KB)

93.9 KB PNG

>>108263281
>mtp is going to be between about 60%-300% token generation speed
holy shit, what are they waiting for??

Anonymous
02/28/26(Sat)17:38:25 No.108263295

Anonymous 02/28/26(Sat)17:38:25 No.108263295▶

Speaking of Multi-Token Prediction, we know that the llama.cpp attempts haven't really going anywhere. But how is it looking for the other back ends like VLLM which have managed to implement it? What sort of increase are they seeing from this?

Anonymous
02/28/26(Sat)17:38:27 No.108263296

Anonymous 02/28/26(Sat)17:38:27 No.108263296▶

>>108263256
Thanks anon

Anonymous
02/28/26(Sat)17:39:50 No.108263299

Anonymous 02/28/26(Sat)17:39:50 No.108263299▶

>>108263065
I'm curious to see how they'll go about it.
Will they have different groups of experts responsible for generating text tokens, vs audio tokens, vs image tokens?
Attention I imagine will be global and is the means by which cross modality knowledge gets propagated.

Anonymous
02/28/26(Sat)17:40:01 No.108263301

Anonymous 02/28/26(Sat)17:40:01 No.108263301▶

>>108263282
oh yeah I know code quality is atrocious there, but we're lacking video/audio input and dont have any sort of output other than text tokens.

Anonymous
02/28/26(Sat)17:40:13 No.108263303

Anonymous 02/28/26(Sat)17:40:13 No.108263303▶

>>108263098
Aaaaand there. It only took a few minutes. The blowout is going to be great.

Anonymous
02/28/26(Sat)17:40:13 No.108263304

Anonymous 02/28/26(Sat)17:40:13 No.108263304▶

>>108262644
>that's why we should ban violent games like gta
you may be saying this ironically but that is actually pretty based.
we should ban anything related to the hood rat culture. no sane society would allow bix nood to scream about dem bitches on national television.

Anonymous
02/28/26(Sat)17:40:43 No.108263307

Anonymous 02/28/26(Sat)17:40:43 No.108263307▶

File: schizo.png (240.6 KB)

240.6 KB PNG

If you needed more proof that AI will always eventually just end up saying what you want it to say.

Anonymous
02/28/26(Sat)17:41:25 No.108263308

Anonymous 02/28/26(Sat)17:41:25 No.108263308▶

>>108263307
Try a base model.

Anonymous
02/28/26(Sat)17:41:36 No.108263309

Anonymous 02/28/26(Sat)17:41:36 No.108263309▶

>>108263304
this, fuck freedom of speech and freedom of expression, I'm moving to North Korea!

Anonymous
02/28/26(Sat)17:42:14 No.108263314

Anonymous 02/28/26(Sat)17:42:14 No.108263314▶

>>108263291
They are waiting for someone to open a pull request.

Anonymous
02/28/26(Sat)17:46:27 No.108263332

Anonymous 02/28/26(Sat)17:46:27 No.108263332▶

>>108263301
llama has audio output already (mtmd and a few tts models). lfm people were playing around with audio input for asr. Video is a sequence of images and that mostly works already.
I rather they implement things right when something truly interesting comes up than rushing to get The Latest Thing (tm) and do it poorly. And adoption on model tech never depends on llama.cpp. If something is good and gets implemented used more often, they'll have more interest in implementing it.

Anonymous
02/28/26(Sat)17:47:25 No.108263337

Anonymous 02/28/26(Sat)17:47:25 No.108263337▶

>>108263309
Degeneracy and subversion weren't covered by freedom of speech and freedom of expression until relatively recently, and look how well that worked out.

Anonymous
02/28/26(Sat)17:48:39 No.108263342

Anonymous 02/28/26(Sat)17:48:39 No.108263342▶

>>108263309
>I'm moving to North Korea!
you seem to be confused, but those of us who have that kind of ideal also do not believe in open border, and NK themselves would not give refuge to westoids dissatisfied with their home
unlike apatrid and putrid turdworlders looking for the green $ pasture, there is no escape for us, only trying to fix what is broken here.

Anonymous
02/28/26(Sat)17:50:05 No.108263352

Anonymous 02/28/26(Sat)17:50:05 No.108263352▶

Off-topic anons. That's off-topic.

Anonymous
02/28/26(Sat)17:53:55 No.108263386

Anonymous 02/28/26(Sat)17:53:55 No.108263386▶

I'm confident that Deepseek V4 will be my new favourite model.

Anonymous
02/28/26(Sat)17:54:05 No.108263388

Anonymous 02/28/26(Sat)17:54:05 No.108263388▶

>>108263337
that's the point, freedom of speech/expression was invented so that people can express thoughts other people will dislike, it puts everyone on the same ground, you dislike something you can't do something about it, and the opposite is true, you can say stuff people won't like, and they won't be able to censor you

Anonymous
02/28/26(Sat)17:54:41 No.108263393

Anonymous 02/28/26(Sat)17:54:41 No.108263393▶

>>108263337
this. good to see anons finally come to realize openai and anthropic's vision for ai safety

Anonymous
02/28/26(Sat)18:06:53 No.108263439

Anonymous 02/28/26(Sat)18:06:53 No.108263439▶

How will quantization even work with a multimodal model? Can they choose to only quantize the llm part of it, as image and video models suffer a lot from quantization? Like are those seperate layers?

Anonymous
02/28/26(Sat)18:11:01 No.108263465

Anonymous 02/28/26(Sat)18:11:01 No.108263465▶

>>108263439
Images are converted into embeddings the same way token are and passed through the same layers afaik.

Anonymous
02/28/26(Sat)18:11:39 No.108263468

Anonymous 02/28/26(Sat)18:11:39 No.108263468▶

>>108263439
>How will
Yes. Let's guess about an unreleased model's details.

Anonymous
02/28/26(Sat)18:13:55 No.108263478

Anonymous 02/28/26(Sat)18:13:55 No.108263478▶

>>108263468
is it a whole new proprietary technology? not a single public research paper available on it?

Anonymous
02/28/26(Sat)18:14:12 No.108263480

Anonymous 02/28/26(Sat)18:14:12 No.108263480▶

>>108263439
yeah and they handle quanting way worse than llms

Anonymous
02/28/26(Sat)18:14:12 No.108263481

Anonymous 02/28/26(Sat)18:14:12 No.108263481▶

File: 1766803240241456.png (1.7 MB)

1.7 MB PNG

Why SHOULDN'T Anthropic get nuke codes? I've yet to hear a compelling argument.

Anonymous
02/28/26(Sat)18:14:15 No.108263482

Anonymous 02/28/26(Sat)18:14:15 No.108263482▶

>>108263468
Are you expecting some unannounced BLT architecture that doesn't use encoders like every single multimodal model before it??

Anonymous
02/28/26(Sat)18:16:13 No.108263489

Anonymous 02/28/26(Sat)18:16:13 No.108263489▶

>>108263468
>unreleased
Many visual capable LLMs have been released

Anonymous
02/28/26(Sat)18:16:18 No.108263490

Anonymous 02/28/26(Sat)18:16:18 No.108263490▶

deepsneed when?

Anonymous
02/28/26(Sat)18:17:14 No.108263496

Anonymous 02/28/26(Sat)18:17:14 No.108263496▶

>>108263478
How could I know? Maybe they use someone else's papers. And some of their own, also unreleased. Maybe it's all already public knowledge and they just put the pieces together.
>>108263482
I don't care about the model until it's released, downloadable and, hopefully, usable by us. Could be magic fairy dust for all I care.
>>108263489
Not the one we're talking about.

Anonymous
02/28/26(Sat)18:17:16 No.108263497

Anonymous 02/28/26(Sat)18:17:16 No.108263497▶

>>108263490
after chinese new years is over in two more weeks

Anonymous
02/28/26(Sat)18:18:47 No.108263504

Anonymous 02/28/26(Sat)18:18:47 No.108263504▶

>>108263496
Are you saying Qwen2.5-VL has never been quanted? That's just bullshit

Anonymous
02/28/26(Sat)18:19:03 No.108263506

Anonymous 02/28/26(Sat)18:19:03 No.108263506▶

File: 1754347581925166.png (247 KB)

247 KB PNG

>>108263481
>AI became Seymour
based

Anonymous
02/28/26(Sat)18:19:49 No.108263510

Anonymous 02/28/26(Sat)18:19:49 No.108263510▶

>>108263504
we're talking about native generation, not input

Anonymous
02/28/26(Sat)18:21:40 No.108263518

Anonymous 02/28/26(Sat)18:21:40 No.108263518▶

>>108263504
I didn't say that. Anon is asking how an unreleased model will work or be quanted.
Sure. Let's say it's all gonna go into an mmproj. Now what? What other enlightenment can we get out of this?

Anonymous
02/28/26(Sat)18:23:43 No.108263532

Anonymous 02/28/26(Sat)18:23:43 No.108263532▶

>>108263510
There is basically no chance it will have image and audio output except for some people wishing desperately for another Chameleon. There would have been test models or papers otherwise.

Anonymous
02/28/26(Sat)18:27:25 No.108263551

Anonymous 02/28/26(Sat)18:27:25 No.108263551▶

>>108263532
>There would have been test models
i mean, there was janus

Anonymous
02/28/26(Sat)18:27:28 No.108263552

Anonymous 02/28/26(Sat)18:27:28 No.108263552▶

>>108263177
Shaka-Shaka horato?
Maybe you meant ポテト?

Anonymous
02/28/26(Sat)18:27:57 No.108263555

Anonymous 02/28/26(Sat)18:27:57 No.108263555▶

>>108263532
yeah but >>108263065 mentions "generating"

Anonymous
02/28/26(Sat)18:29:56 No.108263570

Anonymous 02/28/26(Sat)18:29:56 No.108263570▶

>>108263532
And *if* they do. Will you come back and admit you were wrong or will you move the goalpost to "but llama.cpp doesn't support it"?
Or will you simply never admit it?

Anonymous
02/28/26(Sat)18:30:13 No.108263573

Anonymous 02/28/26(Sat)18:30:13 No.108263573▶

>>108263555
Because the rumor mills are always accurate

>>108263551
Forgot that wasn't just image understanding

Anonymous
02/28/26(Sat)18:30:31 No.108263575

Anonymous 02/28/26(Sat)18:30:31 No.108263575▶

>>108263552
hai

Anonymous
02/28/26(Sat)18:33:24 No.108263586

Anonymous 02/28/26(Sat)18:33:24 No.108263586▶

File: 1747813139335196.png (32.1 KB)

32.1 KB PNG

Small Qwens to be released SoonTM

Anonymous
02/28/26(Sat)18:34:12 No.108263592

Anonymous 02/28/26(Sat)18:34:12 No.108263592▶

>>108263586
No. Those are the biggER qwen.

Anonymous
02/28/26(Sat)18:35:41 No.108263605

Anonymous 02/28/26(Sat)18:35:41 No.108263605▶

File: 191132.png (17.2 KB)

17.2 KB PNG

Just thought I'd report from some random anons progress. I haven't been using local models since like late 2023 so I was interested in seeing the differences. I wanted to use it for my OpenClaw instead of paying more for Kimi-K2.5.

Got it working through network, since OpenClaw is on my laptop and my model is running on my gayming rig. Remember to give it a API-key even though it doesn't explicitly need one.

RTX 3080, 32GB DDR4 RAM.

Running unsloth Qwen3.5-35B-A3B-UD-Q4_K_XL right now, tool calling wasn't working yesterday with bartowski qwen_qwen3.5-35B-A3B-Q6_K_L (and it was painfully slow) but this time it seems to work partially. It's still pretty much useless, but I managed to have it create a .txt file in my documents folder. However cron-jobs aren't working at all even though I used Kimi to create a specific reminder-tool to make it easy. It's also still very slow. I'm struggling to find any good models for agent work that will run on my machine.

Anonymous
02/28/26(Sat)18:36:56 No.108263611

Anonymous 02/28/26(Sat)18:36:56 No.108263611▶

>>108263586
i want qwen size versions of glm 5.... NOW!

Anonymous
02/28/26(Sat)18:40:09 No.108263638

Anonymous 02/28/26(Sat)18:40:09 No.108263638▶

File: 1768225712537544.png (217.8 KB)

217.8 KB PNG

Anonymous
02/28/26(Sat)18:42:48 No.108263650

Anonymous 02/28/26(Sat)18:42:48 No.108263650▶

>>108263611
Why? It's not even properly implemented yet.

Anonymous
02/28/26(Sat)18:43:21 No.108263656

Anonymous 02/28/26(Sat)18:43:21 No.108263656▶

>>108263638
big nigga is that you?

Anonymous
02/28/26(Sat)18:48:44 No.108263690

Anonymous 02/28/26(Sat)18:48:44 No.108263690▶

>>108263650
i tried glm 5 on api and found it to be the best out of the chink models for erp

Anonymous
02/28/26(Sat)19:11:11 No.108263829

Anonymous 02/28/26(Sat)19:11:11 No.108263829▶

Fuck it. I wrote an entire wall of text out of anger but deleted it all to keep it short and to not link this back to my real person. I'm a regular on /lmg/ and have been here since the very very start. Some of you will probably know who I am as I have leaked some information on /lmg/ in the past. I will resign at OpenAI on monday because Sam Altman lied to us, the employees and the world. Sam Altman claimed on 2026-02-27 to uphold the principles of not developing products that deal with the surveillance of ordinary citizens and not developing products that contribute to fully autonomous warfare with the ability to kill without human oversight. Today 2026-02-28 I read on twitter of all places that OpenAI signed a contract with the DoW mere moments after Anthropic refused to budge on these exact two points. This shows that Sam Altman was already in talks with the DoW on these exact principles so that OpenAI was positioned to immediately replace Anthropic for projects that involve the surveillance of ordinary citizens and fully autonomous no-oversight instruments of war.

It's important for people to take a stance here. This is a defining moment for not only US democracy but the future of humanity. This exact moment in time could be seen retroactively as the moment when it became normalized for autonomous machines to start killing people and for the very concept of privacy to die.

It's in everyones best interest no matter your political affiliation or ideological beliefs to cancel your OpenAI subscriptions and take a stance against this, what can be honestly called, pure evil.

Anonymous
02/28/26(Sat)19:12:20 No.108263840

Anonymous 02/28/26(Sat)19:12:20 No.108263840▶

It took you this long to realize that?

Anonymous
02/28/26(Sat)19:15:12 No.108263861

Anonymous 02/28/26(Sat)19:15:12 No.108263861▶

File: 1745505442313175.mp4 (959.9 KB)

959.9 KB MP4

>>108263829
>This is a defining moment for not only US democracy but the future of humanity.

Anonymous
02/28/26(Sat)19:15:35 No.108263864

Anonymous 02/28/26(Sat)19:15:35 No.108263864▶

>>108263829
nigger

Anonymous
02/28/26(Sat)19:16:52 No.108263874

Anonymous 02/28/26(Sat)19:16:52 No.108263874▶

>>108263829
I know you are LARPING, but did you not see the orb?

Anonymous
02/28/26(Sat)19:24:42 No.108263927

Anonymous 02/28/26(Sat)19:24:42 No.108263927▶

>>108263829
>cancel your OpenAI subscriptions
If it is not local I don't run it

That being said anon I hope you understand that the military industrial complex has always had its hand in the cookie jar, so speak. Be it the Internet in general or A.I. in specific these technologies were funded by Military and Intelligence organizations. Some of the money very above the board and others secured by way of black budgets.

Anything said to the contrary was always bullshit and if you believed it well shame on you. Assuming you are real and not some elaborate troll.

Anonymous
02/28/26(Sat)19:26:05 No.108263937

Anonymous 02/28/26(Sat)19:26:05 No.108263937▶

>>108263829
How is this new information. Altman has been pro surveillance since he masturbated to his private GPT-3 instance and everyone knew his "much safety" spiel was full of shit.
It's interesting though. After Trump is removed from office or dies, his protections from his daddy will be gone and he will probably be known as one of the most cowardly and reviled people in existence.

Anonymous
02/28/26(Sat)19:31:57 No.108263987

Anonymous 02/28/26(Sat)19:31:57 No.108263987▶

>>108263979
>>108263979
>>108263979

Anonymous
02/28/26(Sat)19:34:33 No.108264009

Anonymous 02/28/26(Sat)19:34:33 No.108264009▶

File: 1755335334230589.png (143.7 KB)

143.7 KB PNG

Anonymous
02/28/26(Sat)19:38:46 No.108264048

Anonymous 02/28/26(Sat)19:38:46 No.108264048▶

>>108263829
is this a plebbit repost? lol

Anonymous
02/28/26(Sat)19:50:49 No.108264141

Anonymous 02/28/26(Sat)19:50:49 No.108264141▶

>>108263829
>cancel your OpenAI subscriptions

lol.

Anonymous
02/28/26(Sat)19:52:13 No.108264157

Anonymous 02/28/26(Sat)19:52:13 No.108264157▶

>>108263829
Call Sam a faggot on your way out

Anonymous
02/28/26(Sat)19:53:34 No.108264168

Anonymous 02/28/26(Sat)19:53:34 No.108264168▶

>>108263829
Anon, all the closed models, including claude, including gpt, they all block sexual words but will be happily be deployed to snitch on everyone.
And yes, claude too, I don't doubt that they would bend the knee. Companies can't do shit when governments tell them to fuck off.

Anonymous
02/28/26(Sat)20:25:34 No.108264383

Anonymous 02/28/26(Sat)20:25:34 No.108264383▶

>>108263656
his ghost lives among us all
nigga changed lives

Anonymous
02/28/26(Sat)23:54:52 No.108265686

Anonymous 02/28/26(Sat)23:54:52 No.108265686▶

>>108262297
yeearhh
*whips your exposed bare bottom*
get to it

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108256995

🔍 Search & Sort