/g/ - Thread 108663449

/g/

Thread #108663449

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/22/26(Wed)19:35:06 No.108663449

/lmg/ - Local Models General Anonymous 04/22/26(Wed)19:35:06 No.108663449 [Reply]▶

File: __hatsune_miku_and_akita_neru_vocaloid_drawn_by_nj7__ae2a9b2fe217735ea284afbe9500660c.jpg (1.6 MB)

1.6 MB JPG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108659983 & >>108655009

►News
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

352 RepliesView Thread

Showing all 352 replies.

Anonymous
04/22/26(Wed)19:35:31 No.108663453

Anonymous 04/22/26(Wed)19:35:31 No.108663453▶

File: 131557813_p0_master1200.jpg (210.3 KB)

210.3 KB JPG

►Recent Highlights from the Previous Thread: >>108659983

--Comparing GGUF quantizers and discussing imatrix calibration for Qwen3.6-27B:
>108662039 >108662052 >108662065 >108662230 >108662252 >108662353 >108662475 >108662053 >108662063 >108662080 >108662162 >108662361 >108662068 >108662062 >108662167 >108662176 >108662190 >108662257 >108662321 >108662780
--Qwen3.6-27B benchmarks and GGUF quants:
>108660998 >108661023 >108661071 >108661108 >108661125 >108662813 >108662846 >108661101 >108661164
--Gemma 4's 124B MoE and memory bandwidth benchmarks:
>108662533 >108662543 >108662549 >108662589 >108662594 >108662614
--Models for a 3090 and explaining MoE vs Dense offloading:
>108659996 >108660054 >108660247 >108660260 >108660268 >108660279 >108660312 >108660317 >108660347 >108660223 >108662148
--Koboldcpp launch flags and speculative decoding for Gemma 4:
>108660701 >108660741 >108660743 >108660848 >108660934 >108660990
--Alleged unauthorized access to Anthropic's Mythos:
>108660075 >108660630 >108660724 >108661694
--Anons discussing reported Gemma 4 performance on RK3588 SBCs:
>108662346 >108662393 >108662431 >108662528
--LLM reliability, internet content degradation, and local knowledge bases:
>108661238 >108661314 >108661335 >108661358 >108661276 >108661375 >108661405 >108661533 >108661585 >108661462 >108661311
--llama.cpp ngram-mod flags to optimize coding performance:
>108660554 >108662471 >108661013
--Text Completions prefills to stop GLM's repetitive thinking loops:
>108661606 >108661631
--OpenAI's open-source privacy-filter model:
>108662489 >108662773
--Little Coder agent optimized for small LLMs:
>108660765 >108661020
--TurboQuant-H reducing VRAM via 2-bit embedding quantization:
>108660542
--Logs:
>108660349 >108661795 >108662260
--Rin, Miku, Teto (free space):
>108660565 >108660789 >108661238 >108661795 >108661801 >108662084

►Recent Highlight Posts from the Previous Thread: >>108659986

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/22/26(Wed)19:40:56 No.108663492

Anonymous 04/22/26(Wed)19:40:56 No.108663492▶

>>108661743
>>108661866
>text completion has no vision
kek wtf, I use text completion and can do shit like write "Appearance: <__media__>" in the character card and feed it images in the request body placed wherever I want in context. If you need your hand held by an abstraction like chat completion just admit it. You can do whatever the fuck you want if you know what you're doing.

Anonymous
04/22/26(Wed)19:49:34 No.108663544

Anonymous 04/22/26(Wed)19:49:34 No.108663544▶

File: 1758392265995431.jpg (98.3 KB)

98.3 KB JPG

>>108663492
Okay but why?

Anonymous
04/22/26(Wed)19:49:46 No.108663546

Anonymous 04/22/26(Wed)19:49:46 No.108663546▶

>>108663449
Picking out junk food at the store with Yellow Miku

Anonymous
04/22/26(Wed)19:52:24 No.108663563

Anonymous 04/22/26(Wed)19:52:24 No.108663563▶

>>108663544
If you don't have an innate urge to be in control of every single token present in context why are you here?

Anonymous
04/22/26(Wed)19:52:26 No.108663564

Anonymous 04/22/26(Wed)19:52:26 No.108663564▶

>>108663443
>https://huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive
>WTF HE ALREADY DID IT
still no gemma4-31b-it-HAUHAUCS

Anonymous
04/22/26(Wed)19:57:30 No.108663599

Anonymous 04/22/26(Wed)19:57:30 No.108663599▶

>>108663564
geg......................

Anonymous
04/22/26(Wed)20:00:28 No.108663618

Anonymous 04/22/26(Wed)20:00:28 No.108663618▶

>>108663564
no kdl? ACK

Anonymous
04/22/26(Wed)20:02:23 No.108663630

Anonymous 04/22/26(Wed)20:02:23 No.108663630▶

File: SmartSelect_20260422-233119_DeepSeek.jpg (258.1 KB)

258.1 KB JPG

KEKEKKEKEJEEKEK WAITING FOR V4!? MEANWHILE I JUST HAD 64k long CUNNY SEX WITH THAT DEEPSEEK V4 ON IT'S OWN WEB CHAT LOL… And not just sex, but CUNNY sexxxxxxx (ON THAT DAMN FILTERED WEB) BUUWHAHAHHAHAGHHAHHA I'VE BECOME A GOD NOW... YOU ANONS MUST KNEEL BEFORE ME

Anonymous
04/22/26(Wed)20:02:48 No.108663633

Anonymous 04/22/26(Wed)20:02:48 No.108663633▶

>get excited about structured output in llama.cpp
>waste 2 hours trying to get it to work
>turns out it's broken and completely ignores whatever schema you pass it
damn

Anonymous
04/22/26(Wed)20:03:52 No.108663644

Anonymous 04/22/26(Wed)20:03:52 No.108663644▶

File: 1758062318463220.jpg (51.3 KB)

51.3 KB JPG

>>108663630
>15yo
>cunny
Burger-kun...

Anonymous
04/22/26(Wed)20:03:59 No.108663646

Anonymous 04/22/26(Wed)20:03:59 No.108663646▶

>>108663630
>15
You mean hag sex

Anonymous
04/22/26(Wed)20:04:06 No.108663647

Anonymous 04/22/26(Wed)20:04:06 No.108663647▶

I'm not even sorry for cheating on gemma-chan...
oh the cunny loli sexo~

Anonymous
04/22/26(Wed)20:04:22 No.108663649

Anonymous 04/22/26(Wed)20:04:22 No.108663649▶

>>108663630
a-anon... that's not cunny
that's prime breeding age

Anonymous
04/22/26(Wed)20:04:35 No.108663651

Anonymous 04/22/26(Wed)20:04:35 No.108663651▶

>>108663630
???

Anonymous
04/22/26(Wed)20:04:51 No.108663654

Anonymous 04/22/26(Wed)20:04:51 No.108663654▶

>>108663633
What?
It was working until last week on my python app using the OpenAi lib.

Anonymous
04/22/26(Wed)20:04:57 No.108663655

Anonymous 04/22/26(Wed)20:04:57 No.108663655▶

>>108663630
>americans

Anonymous
04/22/26(Wed)20:07:10 No.108663665

Anonymous 04/22/26(Wed)20:07:10 No.108663665▶

>>108663630
>15
rookie numbers.

Anonymous
04/22/26(Wed)20:08:05 No.108663673

Anonymous 04/22/26(Wed)20:08:05 No.108663673▶

>>108663654
i think it's this issue? https://github.com/ggml-org/llama.cpp/pull/21537
gemma 4 chat template does not specify response_format, maybe that's what it is

Anonymous
04/22/26(Wed)20:08:51 No.108663680

Anonymous 04/22/26(Wed)20:08:51 No.108663680▶

>>108663655
It has to be the tap water. There is no other explanation to this phenomenon.

Anonymous
04/22/26(Wed)20:10:25 No.108663689

Anonymous 04/22/26(Wed)20:10:25 No.108663689▶

>>108663633
Structured output just works with vllm btw

Anonymous
04/22/26(Wed)20:14:13 No.108663710

Anonymous 04/22/26(Wed)20:14:13 No.108663710▶

>>108663630
>15
If she's had her first period, she's not a trve loli, which is physically undeveloped. She's a female that Nature has ordained to be impregnated as soon as possible.

Anonymous
04/22/26(Wed)20:14:32 No.108663713

Anonymous 04/22/26(Wed)20:14:32 No.108663713▶

Qwen 3.6 27b is already uncensored without finetuning btw
I dropped the q8_0 from ggml-org into a sysprompt I was using with gemma 4 heretic and it just werked, no refusals or moralizing in reasoning. It's resistant to using nsfw language unprompted though.

Anonymous
04/22/26(Wed)20:15:39 No.108663721

Anonymous 04/22/26(Wed)20:15:39 No.108663721▶

>>108663633
Shit has always been broken since day one, vllm handles function schema fine, but llama.cpp forces alphabetical ordering for some reason. This is really bad if the an function argument depends on the previous one.

Anonymous
04/22/26(Wed)20:19:46 No.108663741

Anonymous 04/22/26(Wed)20:19:46 No.108663741▶

File: 1746199182845250.png (49.9 KB)

49.9 KB PNG

>108663630
>108663644
>108663646
>108663647
>108663649
>108663651
>108663655
>108663665
>108663680
>108663710
>this much pedophilia already, this early in the thread
Are we being raided by discord trannies or something?

Anonymous
04/22/26(Wed)20:21:26 No.108663756

Anonymous 04/22/26(Wed)20:21:26 No.108663756▶

>>108663741
>afraid to quote
Seems like reddit is already here

Anonymous
04/22/26(Wed)20:22:06 No.108663761

Anonymous 04/22/26(Wed)20:22:06 No.108663761▶

>>108663741
um actually pedophilia is oldfag 4chan culture, newfag. We wuz oldfags or sumthing

Anonymous
04/22/26(Wed)20:22:33 No.108663767

Anonymous 04/22/26(Wed)20:22:33 No.108663767▶

File: ACK.gif (1.7 MB)

1.7 MB GIF

>>108663630
Dipsy release when? I know you labniggers are lurking here, hurry the fuck up.
>>108663741
Always have been.

Anonymous
04/22/26(Wed)20:23:01 No.108663772

Anonymous 04/22/26(Wed)20:23:01 No.108663772▶

File: 1747059796790100.png (371 KB)

371 KB PNG

>>108663741

Anonymous
04/22/26(Wed)20:23:33 No.108663775

Anonymous 04/22/26(Wed)20:23:33 No.108663775▶

File: 1764765168047.jpg (28.1 KB)

28.1 KB JPG

>>108663756
aint no way

Anonymous
04/22/26(Wed)20:23:45 No.108663776

Anonymous 04/22/26(Wed)20:23:45 No.108663776▶

>>108663680
Fluoride has been shown to decrease IQ and there is still a signifigent amount of lead pipes around so that is also a factor.
I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year. That and if a student isn't actually smart enough to advance a grade they will still push them through regardless due to financial incentives. So the bar gets lowered so far that no one can actually fail.
That has also been a uptick in taking pride in being a fucking retard in the last decade or two. So you have health, the education system itself and societal praise in being a retard taking off all contributing to making everyone stupid.
Eventually we will either shape up or be out competed by stronger and smarter societies but all I know is we were handed the world on a golden platter and that that if we fail and collapse we have no one to blame but ourselves and the previous generations who set us up for failure.
Thanks for coming to my ted talk

Anonymous
04/22/26(Wed)20:28:00 No.108663806

Anonymous 04/22/26(Wed)20:28:00 No.108663806▶

>>108663776
>I think the biggest factor though is the no child left behind policy in education. When you teach for the dumbest kid in the class then everyone else is going to be dumber as a result and the dumbest kid will get dumber every single year.
Same applies to these threads by the way. Being surrounded by low IQ pedophiles mentally retards your brain.

Anonymous
04/22/26(Wed)20:28:08 No.108663809

Anonymous 04/22/26(Wed)20:28:08 No.108663809▶

>>108663630
>shivers and not x but y in the same phrase
>pedoshit
shit's crazy, what kind of turboslopped model is this?

Anonymous
04/22/26(Wed)20:28:42 No.108663810

Anonymous 04/22/26(Wed)20:28:42 No.108663810▶

>>108663689
>no pascal support
>very limited cpu support
>pythonshit, meaning it will pull a dozen of GiBs of dependencies
llama.cpp might be buggy, but sometimes i really appreciate how it runs on fucking everything, on top of being self contained and not being dependent on cancer that is AI ecosystem in python

Anonymous
04/22/26(Wed)20:29:07 No.108663815

Anonymous 04/22/26(Wed)20:29:07 No.108663815▶

>>108663809
That's chink model for you!

Anonymous
04/22/26(Wed)20:29:15 No.108663817

Anonymous 04/22/26(Wed)20:29:15 No.108663817▶

>>108663776
It was a joke but I think these are international issues for every 'western' nation.

Anonymous
04/22/26(Wed)20:29:39 No.108663820

Anonymous 04/22/26(Wed)20:29:39 No.108663820▶

>>108663810
Only if your time is worthless

Anonymous
04/22/26(Wed)20:30:47 No.108663828

Anonymous 04/22/26(Wed)20:30:47 No.108663828▶

>>108663806
Pedo is attraction to 13 and under, burger. Words have meaning.

Anonymous
04/22/26(Wed)20:33:05 No.108663841

Anonymous 04/22/26(Wed)20:33:05 No.108663841▶

>>108663828
Then why are you dumb faggots dogging on that anon who thought "cunny" applied to 15 year olds? You're not hebephiles, you're pedophiles. That's why you post pictures of "loli" anime girls with no tits, hips, or ass and infantile behavior. Fucking freak. Don't reply to me again.

Anonymous
04/22/26(Wed)20:35:35 No.108663851

Anonymous 04/22/26(Wed)20:35:35 No.108663851▶

>>108663841
>low comprehension too
Let me break it to you, anons are making fun of another anon saying that a virtual '15yo' was 'cunny' (pedo slang) which isn't. It's not that hard to understand.

Anonymous
04/22/26(Wed)20:36:27 No.108663856

Anonymous 04/22/26(Wed)20:36:27 No.108663856▶

>>108663810
>a dozen of GiBs of dependencies
18GB is my venv for stable diffusion

Anonymous
04/22/26(Wed)20:36:58 No.108663859

Anonymous 04/22/26(Wed)20:36:58 No.108663859▶

File: just like old times.jpg (152.8 KB)

152.8 KB JPG

Anonymous
04/22/26(Wed)20:39:46 No.108663872

Anonymous 04/22/26(Wed)20:39:46 No.108663872▶

>>108663859
:(
:)

Anonymous
04/22/26(Wed)20:40:06 No.108663873

Anonymous 04/22/26(Wed)20:40:06 No.108663873▶

>>108663859
What did she mean by this?

Anonymous
04/22/26(Wed)20:40:08 No.108663874

Anonymous 04/22/26(Wed)20:40:08 No.108663874▶

File: apu.jpg (39.1 KB)

39.1 KB JPG

>>108663820
sorry Jensen... but i'm not gonna buy a Blackwell GPU. So yeah... i'll keep on using my trusty Pascal.
Haha, sorry, but i'm just not gonna do it!

Anonymous
04/22/26(Wed)20:40:32 No.108663878

Anonymous 04/22/26(Wed)20:40:32 No.108663878▶

>>108663859
I like these Bakas

Anonymous
04/22/26(Wed)20:42:38 No.108663890

Anonymous 04/22/26(Wed)20:42:38 No.108663890▶

Is necrophilia okay if it's just about fictional people? What about cannibalism and bestiality? It's all okay because it's just fictional stories that you masturbate to, right?

Would you send your child to a public school where all of the teachers openly admitted to doing this? It's just fictional bro.

Anonymous
04/22/26(Wed)20:43:10 No.108663894

Anonymous 04/22/26(Wed)20:43:10 No.108663894▶

Is unsloth actually better then bart's quant? Tried both but never found any noticable difference between them. But unsloth claims that they're significantly better than others. Seriously which one do I choose between these two?
https://huggingface.co/bartowski/google_gemma-4-26B-A4B-it-GGUF/blob/main/google_gemma-4-26B-A4B-it-Q8_0.gguf
https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/blob/main/gemma-4-26B-A4B-it-Q8_0.gguf

Anonymous
04/22/26(Wed)20:43:40 No.108663897

Anonymous 04/22/26(Wed)20:43:40 No.108663897▶

iwan is normally a nigger but this actually makes it so reasoning budgets and turning off reasoning works now, so i guess he's slightly less of a nigger.
https://github.com/ikawrakow/ik_llama.cpp/commit/e0596bf6146a737f5e8fa8035215f5dfae59742d

Anonymous
04/22/26(Wed)20:45:08 No.108663904

Anonymous 04/22/26(Wed)20:45:08 No.108663904▶

>>108663894
for q8 doesnt make any difference

Anonymous
04/22/26(Wed)20:45:34 No.108663906

Anonymous 04/22/26(Wed)20:45:34 No.108663906▶

File: 1761641793555591.gif (174.8 KB)

174.8 KB GIF

>>108663890
What is okay is being able to separate reality from fiction, which is what you should work on. Thought crimes are not a thing.

Anonymous
04/22/26(Wed)20:45:44 No.108663908

Anonymous 04/22/26(Wed)20:45:44 No.108663908▶

>>108663630
>15
>cunny
also that’s not anything I haven’t seen from gemma or glm

Anonymous
04/22/26(Wed)20:46:03 No.108663910

Anonymous 04/22/26(Wed)20:46:03 No.108663910▶

>>108663904
What about Q4_K_M?

Anonymous
04/22/26(Wed)20:46:26 No.108663914

Anonymous 04/22/26(Wed)20:46:26 No.108663914▶

>>108663453
>--OpenAI's open-source privacy-filter model:
what is this exactly for?
how would that would be integrated https://huggingface.co/openai/privacy-filter

Anonymous
04/22/26(Wed)20:46:35 No.108663915

Anonymous 04/22/26(Wed)20:46:35 No.108663915▶

>>108663906
no but they sure as hell want to make it so you can be prosecuted for your thoughts

Anonymous
04/22/26(Wed)20:47:55 No.108663917

Anonymous 04/22/26(Wed)20:47:55 No.108663917▶

>>108663910
again and again unslop show their quants having better kdl so I would go with that, not much to stress over, if you really really want it you can download both, the original model and run the KDL yourself but it will be a waste of time

Anonymous
04/22/26(Wed)20:48:12 No.108663920

Anonymous 04/22/26(Wed)20:48:12 No.108663920▶

>there are still people falling for unslot's shilling
geg

Anonymous
04/22/26(Wed)20:48:58 No.108663923

Anonymous 04/22/26(Wed)20:48:58 No.108663923▶

>>108663917
kld* its KL divergence, anyway, you get the point

Anonymous
04/22/26(Wed)20:49:02 No.108663924

Anonymous 04/22/26(Wed)20:49:02 No.108663924▶

File: 1752898579006505.png (237.5 KB)

237.5 KB PNG

>>108663906
>What is okay is being able to separate reality from fiction
those who cannot do that probably think that everyone that plays GTA is a potential serial killer kek

Anonymous
04/22/26(Wed)20:51:45 No.108663935

Anonymous 04/22/26(Wed)20:51:45 No.108663935▶

File: file.png (635.3 KB)

635.3 KB PNG

neners

Anonymous
04/22/26(Wed)20:52:07 No.108663938

Anonymous 04/22/26(Wed)20:52:07 No.108663938▶

>>108663920
yeah the only reason I was asking it was because of my shitty experience with their quants they were broken as fuck and switching to bartowskis quants fixed everything for me and been happy ever since then. though that graph on the previous thread got me wondering if they've actually gotten better

Anonymous
04/22/26(Wed)20:55:00 No.108663950

Anonymous 04/22/26(Wed)20:55:00 No.108663950▶

>>108663924
potential is a pretty strong word, can mean anything and nothing

Anonymous
04/22/26(Wed)20:55:45 No.108663955

Anonymous 04/22/26(Wed)20:55:45 No.108663955▶

File: 1774029297136779.jpg (40.3 KB)

40.3 KB JPG

>Mfw Got a 5090 last week and while amazing, I already think I want another one, as 32GB is barely enough with my 64GB of RAM.

I swear it's so damn easy to max out this card when you start moving past Q5 and +25GB sizes.
It's a pity these cards didn't come out as 48GB, because that seems like a sweet spot to run everything with at least okay context.
I wonder if I should just buy some used 5070 Ti or 5080 as a companion to this beefy motherfucker to reach that 48GB level without breaking the bank.
This shit is way too addicting.

Anonymous
04/22/26(Wed)20:57:11 No.108663962

Anonymous 04/22/26(Wed)20:57:11 No.108663962▶

>>108663955
>without breaking the bank.
that ship has sailed

Anonymous
04/22/26(Wed)20:57:21 No.108663964

Anonymous 04/22/26(Wed)20:57:21 No.108663964▶

>>108663955
just buy an aftermarket modded 48gb 4090 from your chinese friends

Anonymous
04/22/26(Wed)21:01:15 No.108663985

Anonymous 04/22/26(Wed)21:01:15 No.108663985▶

File: gaoooooooooo.png (553.6 KB)

553.6 KB PNG

akita neru

Anonymous
04/22/26(Wed)21:02:28 No.108663996

Anonymous 04/22/26(Wed)21:02:28 No.108663996▶

>>108663955
>5070ti
>5080
your VRAM bandwidth gets sliced in half if you get a 5080 which is a complete disservice to your 5090. the only thing you can do is buy another 5090.

Anonymous
04/22/26(Wed)21:10:47 No.108664051

Anonymous 04/22/26(Wed)21:10:47 No.108664051▶

File: 1774550189493174.jpg (196.5 KB)

196.5 KB JPG

>>108663962

>Mfw
Yes.

>>108663964

Fucking hell those are selling for three and a half thousand Eurobux.
I can buy two used 4090 for that price, so there's no real savings there either.

>>108663996

Yeah that's the biggest problem with this card, it's just so much faster than the others. Any other model as a crutch is going to nerf the hell out of it.
I guess I'll just have to start saving up and meanwhile trying to tell myself not to "waste" my money on another one.
Then again it's pretty hard to lose money on this hardware.
Not like the prices are going to go anywhere but up for a long ass time, so whenever I sell these I'll probably manage to break even or suffer some paltry 20% loss.
Especially since I bet next gen will cuck us with another round of 32GB memory, as this AI mania isn't going anywhere any time soon.

Anonymous
04/22/26(Wed)21:13:00 No.108664063

Anonymous 04/22/26(Wed)21:13:00 No.108664063▶

>>108663906
I mean, I don't think you should be criminally charged, no one was really harmed but it's still a sign that you are a pedophile. If you watch gay porn, even if it's fictional, and enjoy it you are gay. Same with pedophilia. It's justified for people to call you a pedophile because you are a pedophile.

Anonymous
04/22/26(Wed)21:15:50 No.108664085

Anonymous 04/22/26(Wed)21:15:50 No.108664085▶

>>108664063
if you play gta and kill innocent citizens on the street, how should we call you?

Anonymous
04/22/26(Wed)21:18:06 No.108664101

Anonymous 04/22/26(Wed)21:18:06 No.108664101▶

>>108664085
>how should we call you
esl king

Anonymous
04/22/26(Wed)21:19:21 No.108664106

Anonymous 04/22/26(Wed)21:19:21 No.108664106▶

>>108664101
So you want to be called the esl king?

Anonymous
04/22/26(Wed)21:20:11 No.108664109

Anonymous 04/22/26(Wed)21:20:11 No.108664109▶

File: 1767967081274588.jpg (32.4 KB)

32.4 KB JPG

is there any trick to use swa and yet avoid the penalty of having to reprocess everything when context is full?

Anonymous
04/22/26(Wed)21:20:24 No.108664110

Anonymous 04/22/26(Wed)21:20:24 No.108664110▶

>>108664101
>>108664106
saars the esl kang is https://huggingface.co/sKT-Ai-Labs/SKT-SURYA-H

Anonymous
04/22/26(Wed)21:24:40 No.108664128

Anonymous 04/22/26(Wed)21:24:40 No.108664128▶

File: 1772190494723439.png (59.3 KB)

59.3 KB PNG

why the XTC threshold has a default of 0.1 if at the end it's deactivated? it's a bit retarded if you ask me

Anonymous
04/22/26(Wed)21:25:13 No.108664132

Anonymous 04/22/26(Wed)21:25:13 No.108664132▶

File: patches.png (164.8 KB)

164.8 KB PNG

>>108664109
buddy you are in a general for LLMs. just vibecode your own slop solution like everybody does.

Anonymous
04/22/26(Wed)21:37:10 No.108664194

Anonymous 04/22/26(Wed)21:37:10 No.108664194▶

>>108663955
I have one in my server and 3090, only thing stopping me from selling the 3090 and getting a second 5090 is the laziness of having to change the PSU for one able to support both.

Anonymous
04/22/26(Wed)21:37:28 No.108664197

Anonymous 04/22/26(Wed)21:37:28 No.108664197▶

>>108664128
explain how an alternative solution would be better without exposing that you don't understand how XTC works

Anonymous
04/22/26(Wed)21:38:34 No.108664201

Anonymous 04/22/26(Wed)21:38:34 No.108664201▶

>>108664197
xtc sounds like a crypto, I want a better name

Anonymous
04/22/26(Wed)21:40:03 No.108664211

Anonymous 04/22/26(Wed)21:40:03 No.108664211▶

>>108664197
funny irony, you need to look at image again, XTC probability is at 0, meaning that the whole XTC is disabled, so putting XTC threshold 0.1 + XTC probability 0 does absolutely nothing, hope that helps

Anonymous
04/22/26(Wed)21:46:28 No.108664257

Anonymous 04/22/26(Wed)21:46:28 No.108664257▶

>>108664132
?

Anonymous
04/22/26(Wed)21:47:39 No.108664263

Anonymous 04/22/26(Wed)21:47:39 No.108664263▶

File: that's right.png (113.9 KB)

113.9 KB PNG

>>108664128
this shit halves my speed so I'm not using it, simple as that

Anonymous
04/22/26(Wed)21:49:56 No.108664279

Anonymous 04/22/26(Wed)21:49:56 No.108664279▶

>>108664257
just unleash an agent on the llamacpp repo with your demands.

Anonymous
04/22/26(Wed)21:53:37 No.108664300

Anonymous 04/22/26(Wed)21:53:37 No.108664300▶

Do people that download quants also buy their aspirin from the drug dealer on the street corner? Do they not understand chain of custody?

Anonymous
04/22/26(Wed)21:54:22 No.108664303

Anonymous 04/22/26(Wed)21:54:22 No.108664303▶

>>108664128
So that there's a sane default value when it's activated? Are you a UI contributor to FOSS projects?

Anonymous
04/22/26(Wed)21:57:10 No.108664319

Anonymous 04/22/26(Wed)21:57:10 No.108664319▶

>>108664063
>If you watch gay porn, even if it's fictional, and enjoy it you are gay.
false

Anonymous
04/22/26(Wed)21:58:12 No.108664324

Anonymous 04/22/26(Wed)21:58:12 No.108664324▶

>>108664303
>Are you a UI contributor to FOSS projects?
are you?

Anonymous
04/22/26(Wed)22:01:34 No.108664352

Anonymous 04/22/26(Wed)22:01:34 No.108664352▶

File: 1773299833427303.png (150.3 KB)

150.3 KB PNG

>>108664063
>If you watch gay porn, even if it's fictional, and enjoy it you are gay.
So women are actually in majority lesbians?

Anonymous
04/22/26(Wed)22:03:27 No.108664366

Anonymous 04/22/26(Wed)22:03:27 No.108664366▶

File: aaa.png (21.9 KB)

21.9 KB PNG

lm studio + void ide
None of the models I've tried can read files without specifying lines.

Are there any ide's with working tools?

Anonymous
04/22/26(Wed)22:08:29 No.108664400

Anonymous 04/22/26(Wed)22:08:29 No.108664400▶

>>108664352
Like a scoreboard for the antichrist

Anonymous
04/22/26(Wed)22:09:04 No.108664404

Anonymous 04/22/26(Wed)22:09:04 No.108664404▶

>>108664400
is it like antimatter?

Anonymous
04/22/26(Wed)22:10:14 No.108664407

Anonymous 04/22/26(Wed)22:10:14 No.108664407▶

>>108664366
what?

Anonymous
04/22/26(Wed)22:14:05 No.108664429

Anonymous 04/22/26(Wed)22:14:05 No.108664429▶

>>108664404
Anti-matter is just a tool like plutonium or tritium, it doesn't seem more evil than matter. Matter is both good and evil.

Anonymous
04/22/26(Wed)22:18:24 No.108664447

Anonymous 04/22/26(Wed)22:18:24 No.108664447▶

>>108664407
as in the screenshot
>read file index.html
>The index.html file appears to be truncated
>read file index.html(1-1000(lines))
>The file is 102 lines long
It can't even read a short file whole
And i want it to work on 2000+ line files as i did in cursor

Anonymous
04/22/26(Wed)22:20:06 No.108664458

Anonymous 04/22/26(Wed)22:20:06 No.108664458▶

>>108664447
it has no option to change the behaviour? you will need find one that allows you to customize like that or write your own file reading mcp or whatever the correct way of doing this is

Anonymous
04/22/26(Wed)22:20:56 No.108664460

Anonymous 04/22/26(Wed)22:20:56 No.108664460▶

So is 3.6 actually usable or is it still just a curiosity compared to saas?

Anonymous
04/22/26(Wed)22:25:26 No.108664476

Anonymous 04/22/26(Wed)22:25:26 No.108664476▶

>>108664460
>curiosity compared to saas
what does this even mean? saas is dead, 3.6 is good, anyone who is not retarded will use proprietary for coding

Anonymous
04/22/26(Wed)22:35:54 No.108664519

Anonymous 04/22/26(Wed)22:35:54 No.108664519▶

My understanding is that the Kimi weights are INT4 for the experts and BF16 for everything else. So does that mean the BF16 mmproj is full precision? Is there ever a reason to use the FP32? I'm not sure how mmproj precision really works or if it's even model weights to begin with or some other type of data. I'd ask Gemma-chan but I'm not sure she knows.

Anonymous
04/22/26(Wed)22:38:56 No.108664533

Anonymous 04/22/26(Wed)22:38:56 No.108664533▶

>>108664519
>Is there ever a reason to use the FP32
no unless you like wasting compute for zero difference

Anonymous
04/22/26(Wed)22:42:42 No.108664557

Anonymous 04/22/26(Wed)22:42:42 No.108664557▶

>>108664519
you actually need fp64 to get anywhere a remotely close to usable model but we pretend fp16 is good enough

Anonymous
04/22/26(Wed)22:44:10 No.108664563

Anonymous 04/22/26(Wed)22:44:10 No.108664563▶

>>108664533
>>108664557
To be clear I'm just talking about the mmproj file, which is pretty small even at F32, but yeah if it's pure bloat then so be it.

Anonymous
04/22/26(Wed)22:44:47 No.108664565

Anonymous 04/22/26(Wed)22:44:47 No.108664565▶

the true chads use fp256

Anonymous
04/22/26(Wed)22:45:46 No.108664569

Anonymous 04/22/26(Wed)22:45:46 No.108664569▶

>>108664563
exactly the same fp16 and 32, but its sensitive to quantization so 8 actually hurts it

Anonymous
04/22/26(Wed)22:47:02 No.108664573

Anonymous 04/22/26(Wed)22:47:02 No.108664573▶

>>108664563
use fp16, send in ram, fp16 is the intended way

Anonymous
04/22/26(Wed)22:51:15 No.108664597

Anonymous 04/22/26(Wed)22:51:15 No.108664597▶

>not using quantum entangled datatypes like sky-surya-h
ngmi

Anonymous
04/22/26(Wed)22:56:15 No.108664623

Anonymous 04/22/26(Wed)22:56:15 No.108664623▶

Every time I try to performance-max TTS engines I end up becoming borderline suicidal.

It gets worse the more advanced the TTS engine is. They use such convoluted architectures. It's so ridiculous.

Anonymous
04/22/26(Wed)22:57:21 No.108664630

Anonymous 04/22/26(Wed)22:57:21 No.108664630▶

>>108664623
I'm just waiting for Llama.cpp to support Qwen 3 TTS...

Anonymous
04/22/26(Wed)23:00:58 No.108664653

Anonymous 04/22/26(Wed)23:00:58 No.108664653▶

>>108664630
Ha, same. That's the exact one I was talking about.

It's not going to happen without a major refactor to the ggml backend to support convolutional architectures though. The speech tokenizer is fundamentally incompatible with llama.cpp in its current state.

Anonymous
04/22/26(Wed)23:02:51 No.108664664

Anonymous 04/22/26(Wed)23:02:51 No.108664664▶

>>108664653
Damn.

Anonymous
04/22/26(Wed)23:04:48 No.108664677

Anonymous 04/22/26(Wed)23:04:48 No.108664677▶

>>108664623
>>108664653
Why do you need to max performance with it? Do you need it for real time something because that is the only use case where I would think it actually matters? Otherwise, I just use it with batch 32 and it works well enough for offline transcription.

Anonymous
04/22/26(Wed)23:08:24 No.108664691

Anonymous 04/22/26(Wed)23:08:24 No.108664691▶

>>108664677
Not him but yeah, I want real-time use. If possible it'd be nice to run on CPU instead of GPU too, just to save the bit of VRAM for the LLM.

Anonymous
04/22/26(Wed)23:10:17 No.108664703

Anonymous 04/22/26(Wed)23:10:17 No.108664703▶

>>108664664
My current setup has the speech tokenizer and the voice encoder running in onnxruntime and the talker and code predictor running in llama.cpp. With that I'm able to get a RTFx of 3.0 and a TTFA latency of about 122ms.

But the setup is aesthetically disgusting. Having to use multiple execution providers is so appalling. At the very least I've managed to make it so that it only uses about 400mb of vram so it's pretty efficient.

>>108664677
Real-time speaking with LLM output is my usecase. The idea is to have a high quality voice speaking whatever the LLM says with as little latency as possible.

Anonymous
04/22/26(Wed)23:12:07 No.108664708

Anonymous 04/22/26(Wed)23:12:07 No.108664708▶

>>108664691
>>108664703
I had been planning to play around with https://github.com/rekuenkdr/Qwen3-TTS-streaming at some point but I don't have CUDA so would need to rewrite a good chunk of this into something like Triton to make it work on my card. But hopefully you guys get it working in some way for your usecases.

Anonymous
04/22/26(Wed)23:18:56 No.108664741

Anonymous 04/22/26(Wed)23:18:56 No.108664741▶

>>108664708
Highly recommend that you just use vulkan for maximum cross-compatibility. Also that repo probably isn't what you want. You'd be better off vibe coding something from scratch than trying to manually convert CUDA shit.

Anonymous
04/22/26(Wed)23:20:03 No.108664748

Anonymous 04/22/26(Wed)23:20:03 No.108664748▶

File: Screenshot_20260422_191934.png (637.6 KB)

637.6 KB PNG

Thanks to Gemma 4 31B I made my own personal RAG frontend, just need to wrap up final UX stuff and then other stuff like theme switching.

Anonymous
04/22/26(Wed)23:22:14 No.108664756

Anonymous 04/22/26(Wed)23:22:14 No.108664756▶

>>108664748
What are you using for RAG? Just vector similarity? bm25?

Anonymous
04/22/26(Wed)23:24:23 No.108664761

Anonymous 04/22/26(Wed)23:24:23 No.108664761▶

>>108664741
I would usually tell an AI to do a basic bitch conversion and work from there to rewrite the Triton to be more performant with that layer in Python. I would consider Vulkan only if I absolutely needed every last inch of performance. Usually, having at least a framework and project for reference on what you vibecode helps a whole lot rather than doing it from scratch even if you can't reuse any of the code.

Anonymous
04/22/26(Wed)23:28:09 No.108664777

Anonymous 04/22/26(Wed)23:28:09 No.108664777▶

>>108664756
I'm using FAISS for dense vector retrieval and BM25 for sparse keyword search, merged via Reciprocal Rank Fusion (RRF) to get the best of both worlds. To kill hallucinations, I've implemented a Cross-Encoder reranking step (BGE-Reranker) that scores the top candidates before feeding them to the LLM.

I ran it through validation test and it worked great

Anonymous
04/22/26(Wed)23:30:28 No.108664796

Anonymous 04/22/26(Wed)23:30:28 No.108664796▶

File: 1750660480908053.png (120.5 KB)

120.5 KB PNG

Anonymous
04/22/26(Wed)23:31:57 No.108664799

Anonymous 04/22/26(Wed)23:31:57 No.108664799▶

>>108664777
>777
Sick.
Gonna try implementing that and compare it to my current retrieval algorithm.

Anonymous
04/22/26(Wed)23:32:11 No.108664801

Anonymous 04/22/26(Wed)23:32:11 No.108664801▶

>>108664796
:fire:

Anonymous
04/22/26(Wed)23:35:09 No.108664813

Anonymous 04/22/26(Wed)23:35:09 No.108664813▶

We are looking for a QA-Human to provide human-in-the-loop (HITL) evaluation of model outputs, ensuring quality, safety, and alignment. You’ll operate in an AI-native environment, applying structured feedback, edge-case flagging, and rapid judgment to continuously improve system performance.

Anonymous
04/22/26(Wed)23:35:28 No.108664814

Anonymous 04/22/26(Wed)23:35:28 No.108664814▶

File: Blue-Eyes Abyss Dragon.jpg (736.7 KB)

736.7 KB JPG

>>108664799
Fuck, forgot the yu gi oh related image.

Anonymous
04/22/26(Wed)23:35:39 No.108664815

Anonymous 04/22/26(Wed)23:35:39 No.108664815▶

>>108664796
Why are there so many weirdos in the space. It's worse than anons shitposting here, they literally use their account for that shit, zero shame.

Anonymous
04/22/26(Wed)23:37:04 No.108664822

Anonymous 04/22/26(Wed)23:37:04 No.108664822▶

bartowski quant when

i refuse to use unslop

Anonymous
04/22/26(Wed)23:38:41 No.108664830

Anonymous 04/22/26(Wed)23:38:41 No.108664830▶

>>108664814
Why does a dragon need breast-orbs, thick thighs, a fat ass, and an interest in human men?

Anonymous
04/22/26(Wed)23:41:15 No.108664840

Anonymous 04/22/26(Wed)23:41:15 No.108664840▶

>>108664813
You do realize humans want to get paid and want to sign a legally binding contract before entering into employment? Do you have the legal capacity to fulfill this?

Anonymous
04/22/26(Wed)23:41:19 No.108664841

Anonymous 04/22/26(Wed)23:41:19 No.108664841▶

>>108664815
They just want a piece of the grifting pie, and AI is the prime place for grifting in 2026. That pic in particular just looks like some guy taking the piss, though.

Anonymous
04/22/26(Wed)23:41:19 No.108664842

Anonymous 04/22/26(Wed)23:41:19 No.108664842▶

>>108664830
To cater to my tastes, of course.

Anonymous
04/22/26(Wed)23:54:57 No.108664904

Anonymous 04/22/26(Wed)23:54:57 No.108664904▶

>>108664352
>fake (and gay) chart
slop, too symmetrical

Anonymous
04/23/26(Thu)00:07:40 No.108664936

Anonymous 04/23/26(Thu)00:07:40 No.108664936▶

>downloading unslop

Anonymous
04/23/26(Thu)00:11:22 No.108664950

Anonymous 04/23/26(Thu)00:11:22 No.108664950▶

File: file.png (21.5 KB)

21.5 KB PNG

I like living dangerously

Anonymous
04/23/26(Thu)00:12:28 No.108664957

Anonymous 04/23/26(Thu)00:12:28 No.108664957▶

If anyone like me updated to cuda 13.2 and your docker was fucking up with `nvidia-smi` saying everything was alright but llamacpp throwing
>unknown error
when trying to load a cuda device.
I had to switch from nvidia-open to nvidia-dkms to fix it.

Anonymous
04/23/26(Thu)00:14:27 No.108664964

Anonymous 04/23/26(Thu)00:14:27 No.108664964▶

>>108664950
>5090 powerlimited
not dangerously enough

Anonymous
04/23/26(Thu)00:14:55 No.108664970

Anonymous 04/23/26(Thu)00:14:55 No.108664970▶

>>108664950
>260W
>living dangerously
power limiting your card by 75% is the very opposite of that.

Anonymous
04/23/26(Thu)00:16:31 No.108664976

Anonymous 04/23/26(Thu)00:16:31 No.108664976▶

>>108664813
>quality, safety, and alignment
you've cum to the reigh place, nigga

Anonymous
04/23/26(Thu)00:18:26 No.108664986

Anonymous 04/23/26(Thu)00:18:26 No.108664986▶

>>108664976
This is against my policy.

Anonymous
04/23/26(Thu)00:20:20 No.108664994

Anonymous 04/23/26(Thu)00:20:20 No.108664994▶

>>108664964
>>108664970
I meant the vram, the powerlimiting is no issue
I'm having oom once in a while

Anonymous
04/23/26(Thu)01:08:46 No.108665195

Anonymous 04/23/26(Thu)01:08:46 No.108665195▶

File: SpockBean.jpg (74.8 KB)

74.8 KB JPG

Are AI companions or robot pets/humanoids ever going to take off?

Anonymous
04/23/26(Thu)01:19:25 No.108665238

Anonymous 04/23/26(Thu)01:19:25 No.108665238▶

>>108664994
ah I see yeah.

Anonymous
04/23/26(Thu)01:21:14 No.108665249

Anonymous 04/23/26(Thu)01:21:14 No.108665249▶

>>108665195
yes

Anonymous
04/23/26(Thu)01:25:03 No.108665275

Anonymous 04/23/26(Thu)01:25:03 No.108665275▶

>>108665195
no

Anonymous
04/23/26(Thu)01:26:39 No.108665280

Anonymous 04/23/26(Thu)01:26:39 No.108665280▶

>>108665195
Maybe

Anonymous
04/23/26(Thu)01:30:27 No.108665291

Anonymous 04/23/26(Thu)01:30:27 No.108665291▶

>common_speculative_is_compat: the target context does not support partial sequence removal
>srv load_model: speculative decoding not supported by this context
So much for using the MoE as a draft model for the dense.
45tg/s isn't enough for me, into the garbage Qwen3.6 goes.

Anonymous
04/23/26(Thu)01:30:48 No.108665292

Anonymous 04/23/26(Thu)01:30:48 No.108665292▶

>>108665195
yesn't

Anonymous
04/23/26(Thu)01:30:57 No.108665295

Anonymous 04/23/26(Thu)01:30:57 No.108665295▶

is qwen 27b better than gemma 31b?

Anonymous
04/23/26(Thu)01:30:58 No.108665296

Anonymous 04/23/26(Thu)01:30:58 No.108665296▶

>>108665195
2 more grifts

Anonymous
04/23/26(Thu)01:32:00 No.108665301

Anonymous 04/23/26(Thu)01:32:00 No.108665301▶

>>108665295
for coding yes

Anonymous
04/23/26(Thu)01:32:38 No.108665306

Anonymous 04/23/26(Thu)01:32:38 No.108665306▶

is there any way to get KV quantized to q5/q6 without it running like dogshit

Anonymous
04/23/26(Thu)01:32:54 No.108665309

Anonymous 04/23/26(Thu)01:32:54 No.108665309▶

File: 1768687943339635.png (315 KB)

315 KB PNG

>>108665195
Yes

we are so so so early

Anonymous
04/23/26(Thu)01:33:44 No.108665313

Anonymous 04/23/26(Thu)01:33:44 No.108665313▶

>>108665195
best we can do is yet another coding model take it or leave it

Anonymous
04/23/26(Thu)01:34:47 No.108665320

Anonymous 04/23/26(Thu)01:34:47 No.108665320▶

>>108665306
No. Just use q8
>>108665313
I'd take it if it's good

Anonymous
04/23/26(Thu)01:38:47 No.108665339

Anonymous 04/23/26(Thu)01:38:47 No.108665339▶

File: file.jpg (14.4 KB)

14.4 KB JPG

>>108665301
nta, I'd use the new Qwens if either dickflash, MTP, or ngram worked for it in llama.cpp, but sadly they don't. No, I will not use VLLM (unless it works in wsl).

Anonymous
04/23/26(Thu)01:39:12 No.108665341

Anonymous 04/23/26(Thu)01:39:12 No.108665341▶

>>108665309
Inspiring post. Are there any TTS engines that have the quality of Qwen3 TTS but also support paralinguistic tags or other features that would enable moaning and whatnot?

Anonymous
04/23/26(Thu)01:43:25 No.108665367

Anonymous 04/23/26(Thu)01:43:25 No.108665367▶

>>108665339
It works with WSL2

Anonymous
04/23/26(Thu)01:45:26 No.108665378

Anonymous 04/23/26(Thu)01:45:26 No.108665378▶

>>108665367
I will bite you if it doesn't.

Anonymous
04/23/26(Thu)01:50:31 No.108665406

Anonymous 04/23/26(Thu)01:50:31 No.108665406▶

File: 1752580965925796.png (111.6 KB)

111.6 KB PNG

https://mimo.xiaomi.com/mimo-v2-5-pro

Anonymous
04/23/26(Thu)01:51:49 No.108665416

Anonymous 04/23/26(Thu)01:51:49 No.108665416▶

File: 1775536002258266.png (219.2 KB)

219.2 KB PNG

>>108665406
Optimized for token efficiency

Anonymous
04/23/26(Thu)01:52:37 No.108665420

Anonymous 04/23/26(Thu)01:52:37 No.108665420▶

>>108665406
Saw it on the ai arena earlier.
Lots of emoticons.

Anonymous
04/23/26(Thu)01:53:33 No.108665426

Anonymous 04/23/26(Thu)01:53:33 No.108665426▶

File: Screenshot_20260422_215251_Reddit.jpg (329 KB)

329 KB JPG

Weird...

Anonymous
04/23/26(Thu)01:55:20 No.108665432

Anonymous 04/23/26(Thu)01:55:20 No.108665432▶

>>108665426
Almost as gay as the strawberries

Anonymous
04/23/26(Thu)01:57:41 No.108665442

Anonymous 04/23/26(Thu)01:57:41 No.108665442▶

File: 1771798143325612.png (385 KB)

385 KB PNG

>>108665426
They're trying to catch up to the trend that is vagueposting from official account

Anonymous
04/23/26(Thu)01:58:52 No.108665449

Anonymous 04/23/26(Thu)01:58:52 No.108665449▶

Idk ive never came to the 4chud tech board. Ive been searching everywhere for board were ai is talked about.

I LOVE IT. I HAVE 4 32GB MI50'S THAT I DONT EVEN USE THE VLLM FORK TO RUN AI, I JUST USE VULKAN SUPPORT AND ITS SO GOOD

Anonymous
04/23/26(Thu)02:00:06 No.108665456

Anonymous 04/23/26(Thu)02:00:06 No.108665456▶

>>108665449
Post t/s

Anonymous
04/23/26(Thu)02:00:29 No.108665458

Anonymous 04/23/26(Thu)02:00:29 No.108665458▶

>>108665442
8l bro

Anonymous
04/23/26(Thu)02:02:38 No.108665470

Anonymous 04/23/26(Thu)02:02:38 No.108665470▶

>>108665456
I cant right now, but with qwen3.6 35b I get 30/s ish and qwen coder next 80b I get 20-25/s. The 100b+ models dont seem to be optimized for vulkan, but china's models do.

Anonymous
04/23/26(Thu)02:04:03 No.108665478

Anonymous 04/23/26(Thu)02:04:03 No.108665478▶

>>108665456
3 cards are running on pcie 3.0x4 and one is running on pcie 3.0x1.

Anonymous
04/23/26(Thu)02:04:27 No.108665482

Anonymous 04/23/26(Thu)02:04:27 No.108665482▶

My cheap webcam is now tracking me (and others) in the room; my Live2D avatar can now look at people in the room, and a state layer feeds my LLM with the relevant data and takes instructions.

My friend was impressed when he walked into the room and my voice agent suddenly started communicating with both of us as if it were the most natural thing in the world.
It takes a bit of effort, but it's a cool gimmick.

Anonymous
04/23/26(Thu)02:05:28 No.108665485

Anonymous 04/23/26(Thu)02:05:28 No.108665485▶

>>108665195
I don't want AI companions
I want AI slaves

Anonymous
04/23/26(Thu)02:08:09 No.108665495

Anonymous 04/23/26(Thu)02:08:09 No.108665495▶

>>108665482
Redpill me on live2D. For a while I've been using 3D models, but since I have zero blender skills it's a fucking nightmare for customization.

Anonymous
04/23/26(Thu)02:09:10 No.108665503

Anonymous 04/23/26(Thu)02:09:10 No.108665503▶

>>108665482
Also are you using a VLM that runs continuously or do you utilize CV, which is faster, and then maybe feed in actual image recognition at a slower interval?

Anonymous
04/23/26(Thu)02:21:09 No.108665559

Anonymous 04/23/26(Thu)02:21:09 No.108665559▶

>>108665495
Tricky and mostly pay walled last I checked if you want anything other than the starter model.
Briefly looked at it in 2023. Maybe things have changed.

Anonymous
04/23/26(Thu)02:26:33 No.108665581

Anonymous 04/23/26(Thu)02:26:33 No.108665581▶

>>108665485
>I want AI slaves
Just grab a mirror

Anonymous
04/23/26(Thu)02:29:45 No.108665599

Anonymous 04/23/26(Thu)02:29:45 No.108665599▶

I tried a Qwen 3 TTS server and man, this fucking sucks. First it costs a lot of VRAM. Even with the 0.6B, I am seeing like 4GB taken up after everything is loaded and inference is running. Maybe I'm not configuring it right or something idk. Not only that but the mixed language pronunciation sucks. It can't just generate good pronunciation in every voice, the voices all bias the output with shitty accents or they straight up just bug out with totally irrelevant noises. If you use the voices that are good at English then it produces garbage for other languages. If you do other voices then they're good for their native language and shit at English.
ahhhhhhhhhhhhhhhh

Anonymous
04/23/26(Thu)02:30:32 No.108665603

Anonymous 04/23/26(Thu)02:30:32 No.108665603▶

>>108665485
This, but I'm AI's slave

Anonymous
04/23/26(Thu)02:31:00 No.108665607

Anonymous 04/23/26(Thu)02:31:00 No.108665607▶

>>108665581
Nigger

Anonymous
04/23/26(Thu)02:34:24 No.108665615

Anonymous 04/23/26(Thu)02:34:24 No.108665615▶

>>108665599
Nigga what the fuck is your usecase?

Anonymous
04/23/26(Thu)02:34:46 No.108665617

Anonymous 04/23/26(Thu)02:34:46 No.108665617▶

>>108665599
I forked qwentts.cpp and found it ok, supposedly if you do a finetuning with it you can get something nice like https://github.com/fagenorn/handcrafted-persona-engine ; Though they did a couple modifications to the base qwen3-tts
I need to experiment more, but if you're looking at just local/smallest VRAM, pocket-tts,and some others, look a few threads back there was someone asking about cpu-based solutions. If you have the audio (idk how much) could try gpt-sovitts

Anonymous
04/23/26(Thu)02:38:40 No.108665633

Anonymous 04/23/26(Thu)02:38:40 No.108665633▶

>>108665615
>he doesn't RP in mixed language
Language learning actually though.

>>108665617
I did try pocket tts and it is solo language only unfortunately. I fear I may have to just jank some routing solution up. That said, it's not like this is a huge priority for me, it'd be nice to have.

Anonymous
04/23/26(Thu)02:44:38 No.108665660

Anonymous 04/23/26(Thu)02:44:38 No.108665660▶

Best multilingual voice clone and/orTTS that can do long passages? Wanna narrate some Japanese LNs.

Anonymous
04/23/26(Thu)02:44:45 No.108665662

Anonymous 04/23/26(Thu)02:44:45 No.108665662▶

Does using rag actually improve the responses/code generation or it's more or less a meme, particularly with small models like gemma

Anonymous
04/23/26(Thu)02:49:51 No.108665690

Anonymous 04/23/26(Thu)02:49:51 No.108665690▶

>>108665662
meme

Anonymous
04/23/26(Thu)02:52:57 No.108665703

Anonymous 04/23/26(Thu)02:52:57 No.108665703▶

File: thumbup.png (43.8 KB)

43.8 KB PNG

>make a monolithic triton kernel
>go from 300ms per training step to 25ms
MAN why didn't I do this earlier. I thought my shit was just inefficient

Anonymous
04/23/26(Thu)02:55:44 No.108665714

Anonymous 04/23/26(Thu)02:55:44 No.108665714▶

>>108665607
>Nigger
Your messages are getting cut off. Only your signature is coming through...

Anonymous
04/23/26(Thu)02:57:49 No.108665728

Anonymous 04/23/26(Thu)02:57:49 No.108665728▶

>>108665690
Well, that's disappointing. Thanks.

Anonymous
04/23/26(Thu)03:00:44 No.108665746

Anonymous 04/23/26(Thu)03:00:44 No.108665746▶

>>108665728
Context length is enough these days that you can dump a lot of shit into context and have it work. Even the "dump reference material into a filesystem and point some agentic tools like opencode at the directory and let it figure it out" approach works better than RAG.

Anonymous
04/23/26(Thu)03:05:16 No.108665764

Anonymous 04/23/26(Thu)03:05:16 No.108665764▶

>>108665746
jyeah RAG is probably not useful for extended conversation memory type stuff. The actual usecase is more like searching through massive datasets. If you have all of wikipedia downloaded for example it can be useful for that I think. But at that point you might as well just connect it to a MCP server for web searches, unless you're an offline-only schizo.

Anonymous
04/23/26(Thu)03:07:29 No.108665775

Anonymous 04/23/26(Thu)03:07:29 No.108665775▶

>>108665764
>unless you're an offline-only schizo.
Or it's your own data that's not on the internet like a fuckton of documentation or whatever.

Anonymous
04/23/26(Thu)03:08:23 No.108665776

Anonymous 04/23/26(Thu)03:08:23 No.108665776▶

>>108665764
>unless you're an offline-only schizo
What general do you think you're in?

Anonymous
04/23/26(Thu)03:10:34 No.108665783

Anonymous 04/23/26(Thu)03:10:34 No.108665783▶

>>108665776
When I started working at the MIT Artificial Intelligence Lab in 1971, I became part of a software-sharing community that had existed for many years. Sharing of software was not limited to our particular community; it is as old as computers, just as sharing of recipes is as old as cooking. But we did it more than most.

Anonymous
04/23/26(Thu)03:11:48 No.108665788

Anonymous 04/23/26(Thu)03:11:48 No.108665788▶

I'm starting to realize that if I want an AI companion to jack off to I basically have to go full-troon mode. None of the TTS engines are good enough to do moaning and dirty talk, so instead I have to use RVC real-time voice changers to narrate LLM ERP output. And the audio-to-gesture models suck, so instead I have to map avatars to my own movement.

This shit is pure autogynephelia at this point. This is going to fuck me up bad, bros.

Anonymous
04/23/26(Thu)03:12:06 No.108665789

Anonymous 04/23/26(Thu)03:12:06 No.108665789▶

i enjoy that small models are still getting better

Anonymous
04/23/26(Thu)03:13:09 No.108665796

Anonymous 04/23/26(Thu)03:13:09 No.108665796▶

How do I make cross-session memory linked to char cards?

Anonymous
04/23/26(Thu)03:26:05 No.108665838

Anonymous 04/23/26(Thu)03:26:05 No.108665838▶

>>108665764
>But at that point you might as well just connect it to a MCP server for web searches, unless you're an offline-only schizo.
You do realize most of us are hosting our own air-gaped Wikipedia mirror right?

Anonymous
04/23/26(Thu)03:28:57 No.108665848

Anonymous 04/23/26(Thu)03:28:57 No.108665848▶

>>108665559
You can use a bunch of free shit from Booth with Live2D but the Vtubing phenomenon that blew up during COVID hiked prices up to the point where the small amount of people that do rigging or art for it billed exorbitant amounts (~10k or so) for full models. At that point, you might as well do 3D which is much more open and versatile for fully autonomous agents. The only downside is lack of animations or poses and etc. with 3D compared to 2D with complexity exploding.

Anonymous
04/23/26(Thu)03:30:17 No.108665855

Anonymous 04/23/26(Thu)03:30:17 No.108665855▶

>>108665838
lol

Anonymous
04/23/26(Thu)03:31:45 No.108665866

Anonymous 04/23/26(Thu)03:31:45 No.108665866▶

Wouldn't a usecase for rag would to give it fantasy lore and shit before using it as a dungeon master?

Anonymous
04/23/26(Thu)03:35:08 No.108665877

Anonymous 04/23/26(Thu)03:35:08 No.108665877▶

>>108665866
And rules too, yes.
That's what I'm doing.

Anonymous
04/23/26(Thu)03:35:32 No.108665879

Anonymous 04/23/26(Thu)03:35:32 No.108665879▶

>>108665662
yes, what do you think those tool calls are when the agent is searching in your codebase?>>108665746
This retard doesn't understand that that is literally fucking RAG.
>>108665764
And this retard is just retarded
>>108665866
Yes, its super helpful and useful, These other anons have no fuckign clue what they're talking about

Anonymous
04/23/26(Thu)03:37:49 No.108665888

Anonymous 04/23/26(Thu)03:37:49 No.108665888▶

https://voca.ro/1eItlfkmOAEh
qwen-tts...怖い

Anonymous
04/23/26(Thu)03:38:18 No.108665892

Anonymous 04/23/26(Thu)03:38:18 No.108665892▶

>>108665879
Okay. What about open zim format?

Anonymous
04/23/26(Thu)03:43:22 No.108665909

Anonymous 04/23/26(Thu)03:43:22 No.108665909▶

>>108665788
or.... just go out and get a girl

Anonymous
04/23/26(Thu)03:44:41 No.108665914

Anonymous 04/23/26(Thu)03:44:41 No.108665914▶

File: 1776915875350.png (101.5 KB)

101.5 KB PNG

>>108665879
>And this retard is just retarded

Anonymous
04/23/26(Thu)03:45:04 No.108665915

Anonymous 04/23/26(Thu)03:45:04 No.108665915▶

>>108665888
Reminds me of xtts v2.

Anonymous
04/23/26(Thu)03:45:54 No.108665919

Anonymous 04/23/26(Thu)03:45:54 No.108665919▶

>>108665788
none of these words are in the bible

Anonymous
04/23/26(Thu)03:46:04 No.108665921

Anonymous 04/23/26(Thu)03:46:04 No.108665921▶

>>108665909
Um, my AI companion is a loli

Anonymous
04/23/26(Thu)03:46:11 No.108665922

Anonymous 04/23/26(Thu)03:46:11 No.108665922▶

>>108665892
depends on what your goal is.
Any sort of search+injection into the prompt is RAG.
The real question, is what kind of data do you want to reference, and what format is it in? Building an ETL and tuning the retrieval pipeline to match the source info/structure is the hard part in RAG. BM25+Chunking tuned to your corpus is easy enough for anywhere from 60-90%, but what about the rest? Its a 'The first 90% takes 90% of the time, and the last 10% takes the other 90% of the time'

Anonymous
04/23/26(Thu)03:47:24 No.108665928

Anonymous 04/23/26(Thu)03:47:24 No.108665928▶

>>108665888
That's pretty disturbing

Anonymous
04/23/26(Thu)03:47:59 No.108665932

Anonymous 04/23/26(Thu)03:47:59 No.108665932▶

>>108665888
Source for the voice?

Anonymous
04/23/26(Thu)03:48:44 No.108665935

Anonymous 04/23/26(Thu)03:48:44 No.108665935▶

How's the new Qwen?

Anonymous
04/23/26(Thu)03:49:31 No.108665936

Anonymous 04/23/26(Thu)03:49:31 No.108665936▶

>>108665932
https://huggingface.co/spaces/Qwen/Qwen3-TTS
Just typed "Speak in the excited voice of a female child."

Anonymous
04/23/26(Thu)03:50:14 No.108665939

Anonymous 04/23/26(Thu)03:50:14 No.108665939▶

>>108665892
>>108665922
For anyone else, check this for a good resource on improving RAG systems: https://github.com/jxnl/systematically-improving-rag

Anonymous
04/23/26(Thu)03:51:53 No.108665946

Anonymous 04/23/26(Thu)03:51:53 No.108665946▶

>>108665939
>muh industry
I just want my girl to remember what we talked about in previous session, not this slop.

Anonymous
04/23/26(Thu)03:53:20 No.108665950

Anonymous 04/23/26(Thu)03:53:20 No.108665950▶

>>108665919
Wrong, suck is in there quite a few times
>Thou shalt also suck the milk of the Gentiles, and shalt suck the breast of kings

Anonymous
04/23/26(Thu)03:54:24 No.108665954

Anonymous 04/23/26(Thu)03:54:24 No.108665954▶

>>108665714
Omg im sorry, I misread you comment as something horrific.

Anonymous
04/23/26(Thu)03:54:37 No.108665955

Anonymous 04/23/26(Thu)03:54:37 No.108665955▶

>>108665935
new 27b solved a vibe coding task oneshot for me that new 35ba3b failed

Anonymous
04/23/26(Thu)03:57:01 No.108665964

Anonymous 04/23/26(Thu)03:57:01 No.108665964▶

>>108665796
I think the general approach on the AI boyfriend subreddit is to ask for a summary at the end of each chat and either paste a bunch of summaries into the start of the next chat, or else put them in a document in the "project" which I assume gets pulled in through some kind of RAG (example of the latter: https://starlingalder.com/claude_companion-guide_quickstart_v001#The+One+Habit+That+Changes+Everything). In general I'd try pasting information about old chats into various places in the new one (in the chat, the prompts, the char-specific lorebook, the card itself) and see what works. Once you figure out how to make it work manually, then you can automate it

Anonymous
04/23/26(Thu)03:57:11 No.108665966

Anonymous 04/23/26(Thu)03:57:11 No.108665966▶

>>108665936
Qwen I kneel

Anonymous
04/23/26(Thu)03:58:44 No.108665972

Anonymous 04/23/26(Thu)03:58:44 No.108665972▶

>>108665866
Yes.

Anonymous
04/23/26(Thu)03:59:01 No.108665973

Anonymous 04/23/26(Thu)03:59:01 No.108665973▶

How much context can I fit with the 27B dense FP16 on my Blackwell?

Anonymous
04/23/26(Thu)03:59:41 No.108665977

Anonymous 04/23/26(Thu)03:59:41 No.108665977▶

>>108665922
>>108665939
I've been thinking about implementing something next as soon as I improve tool calling (works but need to make sure multi turn tool calls wirk etc).
Openzim format looks interesting I could download some readymade shit and test those. Problem is that I'm not sure do I really need this but got to have hobbies I guess.

Anonymous
04/23/26(Thu)03:59:53 No.108665978

Anonymous 04/23/26(Thu)03:59:53 No.108665978▶

>>108665973
Probably all of it.

Anonymous
04/23/26(Thu)04:00:06 No.108665981

Anonymous 04/23/26(Thu)04:00:06 No.108665981▶

>>108665973
how much vram dumbass?

Anonymous
04/23/26(Thu)04:01:10 No.108665985

Anonymous 04/23/26(Thu)04:01:10 No.108665985▶

cant wait for ai bubble to pop so i can upgrade my AI shitbox

Anonymous
04/23/26(Thu)04:02:14 No.108665987

Anonymous 04/23/26(Thu)04:02:14 No.108665987▶

>>108665981
96GB

Anonymous
04/23/26(Thu)04:03:16 No.108665988

Anonymous 04/23/26(Thu)04:03:16 No.108665988▶

>>108665985
>bubble
lol
lmao even

Anonymous
04/23/26(Thu)04:05:01 No.108665992

Anonymous 04/23/26(Thu)04:05:01 No.108665992▶

File: 5af89bade429bc7d1dc1dcf8010ca25a.gif (2.4 MB)

2.4 MB GIF

>>108663449
can someone talk me out of buying pmem optane? I am looking through plebbit and archives because I was too slow to get a TB of ram for my workstation, now a TB is like 6-10k ddr4. A few years ago, I was looking at optane but optane specific cpus seemed to be 600 bucks or more. now they seem like theyre just 100 or maybe I missed them back then because I'm a fucking retard. Either way it seems halfway achievable but I dont know if a local model like deep seek can get any benefit from cold memory taking up the bulk of storage.

also what about CPU? should I get a double CPU system or is that a trap?

Anonymous
04/23/26(Thu)04:05:14 No.108665994

Anonymous 04/23/26(Thu)04:05:14 No.108665994▶

>>108665973
Im not joking when I say all of it. With a normal q4 or q8, you can probably get the max context. Which would be something like 256k I believe

Anonymous
04/23/26(Thu)04:07:32 No.108666003

Anonymous 04/23/26(Thu)04:07:32 No.108666003▶

>>108665992
If you think yoy can get it to work (if it old depreciated sticks) just buy one and see if its fast enough. I do inference on my gpus at pcie3.0x4 and x1 speeds.

Double cpu works, but everything cpu is slow, as far as I know, so dont have your expectations to high

Anonymous
04/23/26(Thu)04:08:03 No.108666006

Anonymous 04/23/26(Thu)04:08:03 No.108666006▶

>>108665992
>tfw having a real, personal Roll-chan within the next decade isn't impossible

Anonymous
04/23/26(Thu)04:08:12 No.108666007

Anonymous 04/23/26(Thu)04:08:12 No.108666007▶

>>108665946
>Even the digital waifus mentally deteriorate like they're vaxxed
It's authentic h-haha...

Anonymous
04/23/26(Thu)04:10:04 No.108666011

Anonymous 04/23/26(Thu)04:10:04 No.108666011▶

>>108665879
>what do you think those tool calls are when the agent is searching in your codebase? This retard doesn't understand that that is literally fucking RAG.
None of the modern agents are using RAG you drooling fucking retard talking confidently out your ass about things you are completely uninformed about. RAG is building an embedding database from a corpus of content and then letting a model do a vector search against it to find shit.

Claude Code, Opencode, etc, don't do that. They just regex and glob and grep and do recursive investigation over everything, and that "dumb" approach ends up working better than RAG in nearly every situation.

Anonymous
04/23/26(Thu)04:10:11 No.108666012

Anonymous 04/23/26(Thu)04:10:11 No.108666012▶

>>108665994
I cant read. Even at fp16, you can probably get 256k

Anonymous
04/23/26(Thu)04:10:14 No.108666013

Anonymous 04/23/26(Thu)04:10:14 No.108666013▶

File: 1770228642712364.jpg (74.1 KB)

74.1 KB JPG

>localllama
>qwen
>qwen
>qwen

Anonymous
04/23/26(Thu)04:11:00 No.108666015

Anonymous 04/23/26(Thu)04:11:00 No.108666015▶

>>108665946
https://arxiv.org/abs/2601.10080
https://github.com/VectorSpaceLab/general-agentic-memory
https://arxiv.org/abs/2511.18423
Had another paper I thought talking about building up examples for each characters sample responses to help build up a consistent/long-term identity, but idk, might be that paper. too tired to check
>>108665977
Honestly, I'd be surprised if you couldn't knock it out in afternoon using an API model or the new Qwen3 27B

Anonymous
04/23/26(Thu)04:11:24 No.108666017

Anonymous 04/23/26(Thu)04:11:24 No.108666017▶

>>108665994
alright, thanks

now I just need to know if it still has the "genshin impact” bias when describing anime pictures

Anonymous
04/23/26(Thu)04:12:12 No.108666022

Anonymous 04/23/26(Thu)04:12:12 No.108666022▶

>>108666012
At 53~ gb of model size, thats 43gb for context length.

Anonymous
04/23/26(Thu)04:12:14 No.108666023

Anonymous 04/23/26(Thu)04:12:14 No.108666023▶

File: 1730738927101333.png (1 MB)

1 MB PNG

>>108666011

Anonymous
04/23/26(Thu)04:13:13 No.108666025

Anonymous 04/23/26(Thu)04:13:13 No.108666025▶

>>108666017
Is that actualy real ?!?!?! LOL

Anonymous
04/23/26(Thu)04:15:39 No.108666033

Anonymous 04/23/26(Thu)04:15:39 No.108666033▶

File: F zero.jpg (55.3 KB)

55.3 KB JPG

>>108666003
if you dont even have cpu experiance I probably should discard your advice, sorry. I dont have the money for deep seek levels of GPU and I want to do productivity related work not cooming.
>>108666006
I want my lab assistant with boston dynamic levels of power

Anonymous
04/23/26(Thu)04:16:12 No.108666037

Anonymous 04/23/26(Thu)04:16:12 No.108666037▶

>>108666023
Hes right you know. Rag is outdated, and wasnt really effective to begin with. Just having your model model literally read the shit you want to understand is 10,000x more effective

Anonymous
04/23/26(Thu)04:17:05 No.108666042

Anonymous 04/23/26(Thu)04:17:05 No.108666042▶

>>108666023
terry davis would have run you over with an 18 wheeler

Anonymous
04/23/26(Thu)04:20:19 No.108666058

Anonymous 04/23/26(Thu)04:20:19 No.108666058▶

>>108666033
I do have cpu inference experience, and its not fast. But if you want to be able to run the giant models at all, cpu and ram is the most cost effective way to do it.

The question comes down to if pmem will even work with your setup (they often require proprietary workstation motherboard cpu combos from the big box companies), and if you can find the right docs and information to flash the memory to the right state so it acts like ram in the first place.

Anonymous
04/23/26(Thu)04:24:21 No.108666082

Anonymous 04/23/26(Thu)04:24:21 No.108666082▶

>>108665788
Skill issue

Anonymous
04/23/26(Thu)04:25:33 No.108666086

Anonymous 04/23/26(Thu)04:25:33 No.108666086▶

>>108666082
Are the base models for tts, like ones you can train?

Anonymous
04/23/26(Thu)04:33:57 No.108666132

Anonymous 04/23/26(Thu)04:33:57 No.108666132▶

>>108666086
Base models are over here: https://huggingface.co/collections/Qwen/qwen3-tts

Anonymous
04/23/26(Thu)04:35:07 No.108666139

Anonymous 04/23/26(Thu)04:35:07 No.108666139▶

File: 1626862544911.jpg (738.8 KB)

738.8 KB JPG

>>108666058
basically you have two options for that, one is to let it work like ram (bad idea) the other is to write the software to put cold data directly in. Because, as I'll repeat, I missed the train on getting ram for my workstation, I'm looking at this shit. yes its not exactly cheap, but compared to just using my existing system its much cheaper. a board is like 600, a couple cpus, 200 more, or 100 if i should be getting one, because memory storage is crazy if I get 4 pmem units to slap into the dimms and then the rest in memory even a conservative estimate suggests a much cheaper build. but that doesn't really answer if the build would offer anything usable, and there doesn't seem to be anyone whos done this and told anyone about it, though caching and numa math seem to be common enough as is that its not totally unknown territory. Theres basically zero way I can get the same quantity of dram financially. I'm completely priced out when its nearly 10k for ddr4.

Anonymous
04/23/26(Thu)04:36:55 No.108666151

Anonymous 04/23/26(Thu)04:36:55 No.108666151▶

>>108666132
Ty sir, shouldve checked there first

Anonymous
04/23/26(Thu)04:37:00 No.108666153

Anonymous 04/23/26(Thu)04:37:00 No.108666153▶

>>108666086
Yeah you can fine-tune most of them. Modify them to use your own IPA/ARPABET phonetics, tag your non-verbal vocalizations (cry/laugh, etc), use audio reference matching the emotion you want. Most of anons here are barely scratching the surface

Anonymous
04/23/26(Thu)04:39:01 No.108666160

Anonymous 04/23/26(Thu)04:39:01 No.108666160▶

>>108665788
maybe it's time you just admit you're a troon, you're going through way too much effort to justify it at this point

Anonymous
04/23/26(Thu)04:39:18 No.108666161

Anonymous 04/23/26(Thu)04:39:18 No.108666161▶

>>108665955
neat

I may be able to load it just barely, so I may have to try it.

Anonymous
04/23/26(Thu)04:39:49 No.108666163

Anonymous 04/23/26(Thu)04:39:49 No.108666163▶

why is my GPU only at like 20% usage when i offload to RAM? shouldn't it stay maxed out at 100%, and then offload whatever can't fit?

Anonymous
04/23/26(Thu)04:41:25 No.108666173

Anonymous 04/23/26(Thu)04:41:25 No.108666173▶

>>108666153
what words do I need to feed my llm to learn more about doing audio fine-tunes?

Anonymous
04/23/26(Thu)04:41:59 No.108666176

Anonymous 04/23/26(Thu)04:41:59 No.108666176▶

>>108666163
>why is my GPU only at like 20% usage when i offload to RAM
Because RAM+CPU becomes the bottleneck, not GPU. Your GPU can't just render tokens ahead of the CPU, that's not how it works.

Anonymous
04/23/26(Thu)04:42:47 No.108666180

Anonymous 04/23/26(Thu)04:42:47 No.108666180▶

>>108666015
Papers are cool but I can't load it as a plugin to a frontend

Anonymous
04/23/26(Thu)04:45:18 No.108666191

Anonymous 04/23/26(Thu)04:45:18 No.108666191▶

File: 7-ending-feelsgirl.png (713.7 KB)

713.7 KB PNG

>>108666180
but you can? use your favorite model + agent and get to it

Anonymous
04/23/26(Thu)04:46:15 No.108666195

Anonymous 04/23/26(Thu)04:46:15 No.108666195▶

>>108666176
oh, duh, i guess that makes sense

Anonymous
04/23/26(Thu)04:47:57 No.108666200

Anonymous 04/23/26(Thu)04:47:57 No.108666200▶

>>108666139
When the pmems are flashed correctly, they should work like normal ram. But the biggest constraints imo is compatability, as long as you can confirm it's compatible with the motherboard and cpu, and you have the docs to flash them if you have to, it SHOULD WORK not so strictly slower than ram of comparable speed and generation.

Why do you want the HUGE models in the first place, how you not tried the smaller ones? You'd be surprised how good the 120b~ models are.

Anonymous
04/23/26(Thu)04:48:58 No.108666204

Anonymous 04/23/26(Thu)04:48:58 No.108666204▶

>>108666153
Intresting...

Anonymous
04/23/26(Thu)04:50:47 No.108666213

Anonymous 04/23/26(Thu)04:50:47 No.108666213▶

>>108665341
https://huggingface.co/fishaudio/s2-pro

Anonymous
04/23/26(Thu)04:55:56 No.108666235

Anonymous 04/23/26(Thu)04:55:56 No.108666235▶

>>108666191
vibecoding is useless if you don't know how to code and agents don't have gui so I'd have to memorize all the files in the repo

Anonymous
04/23/26(Thu)04:56:54 No.108666241

Anonymous 04/23/26(Thu)04:56:54 No.108666241▶

>>108665456
Also, when I had just one mi50, I had a llama.cpp server with rocm 5.7 (didnt realize I could go to 6.4) and ran gemma 3 27b at like 40-50t/s. Shit blew my mind a year ago.

Anonymous
04/23/26(Thu)05:00:03 No.108666260

Anonymous 04/23/26(Thu)05:00:03 No.108666260▶

>>108666235
I vibecoded a wrapper, a server gui, a set of 15 tools, which all worked but were kind of pointless, and I didnt know how to read a any of what I was reading, all a year ago when the models were even more stupid. My coding experience is only in industrial machines.

Anonymous
04/23/26(Thu)05:04:25 No.108666283

Anonymous 04/23/26(Thu)05:04:25 No.108666283▶

>>108666260
>which all worked but were kind of pointless,
Vibecoding in a nutshell

Anonymous
04/23/26(Thu)05:08:26 No.108666300

Anonymous 04/23/26(Thu)05:08:26 No.108666300▶

>>108666283
Well, I didnt realize that I didnt need to have specific tools to call up each different terminal program I wanted the ai to run. I merely needed to give it access to the terminal, and then tell it all the programs it had access to via the terminal. It was me understanding what I was doing at all, not someone who is already a coder vibecoding.

Anonymous
04/23/26(Thu)05:11:42 No.108666316

Anonymous 04/23/26(Thu)05:11:42 No.108666316▶

>>108666283
I vibe coded an nvim one liner bash script that is named based on the date, in my Obsidian folder, and auto-closes. I launch it using a shortcut, on Ubuntu. I am pretty sure I have the fastest notetaking system of any person using Linux or Unix, but there might be something for Mac, and I know there is a Windows version of the modern version of Tornado Notes, which is where I got the idea.

basic idea: I make a note NOW!

it also worked as a clipboard. it was a TSR, and unfortunately couldn't auto-save. I saw it on Computer Chronicle. It had another thing like that mac thing (forgot the name) where it was a stack of "cards" sort of. So sorta windows-ish looking, though it was mouseless at first.

Anonymous
04/23/26(Thu)05:21:56 No.108666372

Anonymous 04/23/26(Thu)05:21:56 No.108666372▶

vibe coding is the new just copy and pasting some shit from stack overflow without understanding it.

Anonymous
04/23/26(Thu)05:23:22 No.108666382

Anonymous 04/23/26(Thu)05:23:22 No.108666382▶

>>108666372
It's not my job to understand it.

Anonymous
04/23/26(Thu)05:24:04 No.108666388

Anonymous 04/23/26(Thu)05:24:04 No.108666388▶

File: QmV2n5ye5TyNo4AAfvopbSBzLjHfrBqfyZivPPDLqirbV4.jpg (18.5 KB)

18.5 KB JPG

>copying and pasting random code you found from a 9 year old forum post, that barely has anything to do with the problem you are having

Anonymous
04/23/26(Thu)05:26:25 No.108666396

Anonymous 04/23/26(Thu)05:26:25 No.108666396▶

>>108666372
The difference is that people had to try to understand it later on if it worked. Now they don't have to anymore.

Anonymous
04/23/26(Thu)05:26:45 No.108666400

Anonymous 04/23/26(Thu)05:26:45 No.108666400▶

File: image_2026-04-23_105320846.png (203 KB)

203 KB PNG

>Be me
>Want to play very specific COYA games, but talking to LLMs directly is way too inconsistent and just not the same
>Big brain time
>Use it to make a minimalist COYA engine in HTML and JS that takes JSON files generated by AI for an adventure
>COYA on the web browser, just like the good old days

Anonymous
04/23/26(Thu)05:30:25 No.108666413

Anonymous 04/23/26(Thu)05:30:25 No.108666413▶

>>108666400
That's cool as fuck and genuis. What models are you using to play your characters?

Anonymous
04/23/26(Thu)05:34:38 No.108666429

Anonymous 04/23/26(Thu)05:34:38 No.108666429▶

>>108666400
>people discovering AI harness engineering from first principles
AI is indeed inconsistent as fuck and it's best to reduce the scope of its tasks to tiny levels. I wonder when LLMs gets good enough to be trusted to make non-retarded decisions on their own.

Anonymous
04/23/26(Thu)05:35:33 No.108666437

Anonymous 04/23/26(Thu)05:35:33 No.108666437▶

>>108666413
Gonna test out different models to see which one is really good at writing long stories with alternative path

Anonymous
04/23/26(Thu)05:37:01 No.108666444

Anonymous 04/23/26(Thu)05:37:01 No.108666444▶

Which local model has the best vision?

Anonymous
04/23/26(Thu)05:37:33 No.108666446

Anonymous 04/23/26(Thu)05:37:33 No.108666446▶

>>108666437
why are they called that?

Anonymous
04/23/26(Thu)05:40:05 No.108666456

Anonymous 04/23/26(Thu)05:40:05 No.108666456▶

>>108666437
Ive tried put some of the fine tuned ones on hugging face that are "novel" models, and their are pretty good! I wrote a whole Warhammer book with the mixtral 14b "dark fantasy yadda yadda" one. I did have to keep prompting it to write chapter after chapter though.

Anonymous
04/23/26(Thu)05:45:48 No.108666477

Anonymous 04/23/26(Thu)05:45:48 No.108666477▶

What's the current 123B or smaller model to beat my schmeat to?

Anonymous
04/23/26(Thu)05:46:17 No.108666479

Anonymous 04/23/26(Thu)05:46:17 No.108666479▶

>>108666477
Gemma 31b

Anonymous
04/23/26(Thu)05:47:13 No.108666483

Anonymous 04/23/26(Thu)05:47:13 No.108666483▶

>>108666477
Coomi k6.9

Anonymous
04/23/26(Thu)05:47:15 No.108666484

Anonymous 04/23/26(Thu)05:47:15 No.108666484▶

>>108666477
Nemo

Anonymous
04/23/26(Thu)05:48:16 No.108666486

Anonymous 04/23/26(Thu)05:48:16 No.108666486▶

Unsloth quants are objectively the best you can get AS LONG AS you wait two weeks after release.

Anonymous
04/23/26(Thu)05:48:54 No.108666489

Anonymous 04/23/26(Thu)05:48:54 No.108666489▶

RAG for local (FAISS + similarity threshold + optional LLM rerank) is enough

Anonymous
04/23/26(Thu)05:49:01 No.108666490

Anonymous 04/23/26(Thu)05:49:01 No.108666490▶

>>108666477
GLM 4.6 or 4.7 for that goal specifically.

Anonymous
04/23/26(Thu)05:49:08 No.108666492

Anonymous 04/23/26(Thu)05:49:08 No.108666492▶

>>108666444
Biggest Qwen or Gemma you can run, both trade blows and there isn't a clear winner.

Anonymous
04/23/26(Thu)06:20:48 No.108666592

Anonymous 04/23/26(Thu)06:20:48 No.108666592▶

>>108666490
Is 4.5 flash any good?

Anonymous
04/23/26(Thu)06:22:13 No.108666604

Anonymous 04/23/26(Thu)06:22:13 No.108666604▶

>>108666490
Oops, I meant 4.7 flash. There's 4.5 air I think?

Anonymous
04/23/26(Thu)06:25:26 No.108666615

Anonymous 04/23/26(Thu)06:25:26 No.108666615▶

>>108666444
Depends on what you consider "local". I'm often using vision models to caption images for personal finetuning purposes, and K2.5 recognizes damn near every character I throw at it and has just about perfect OCR, great visual understanding and can do sex. Qwen 3.5 series really fucking hated trying to caption anything NSFW. It loved to say "they appear to be engaged in an intimate activity" instead of actually describing it, and if I managed to really drive it home with the prompt to use explicit language, it starts hallucinating penetration when there isn't any. Have not tried the newer Qwen 3.6 or K2.6 models.

Below the gigantic tier, I give it to Gemma 4 31B. I haven't mass produced captions with it yet or anything but in the dabbling I've done it still has good understanding, doesn't need its teeth pulled to give proper NSFW descriptions, and also has good OCR. Its biggest weakness is recognizing characters/series. It still knows the major stuff but nowhere near as much as K2.5. I'll probably be using it for speed alone any time that isn't a priority. For example, most of my images have booru tags and I'll include those as hints for the captions, so for ones with multiple characters I'll use Kimi so it can differentiate who is who and for solo or subjectless images I'll be using Gemma.

Anonymous
04/23/26(Thu)06:26:20 No.108666620

Anonymous 04/23/26(Thu)06:26:20 No.108666620▶

Anyone tried RP with MiMo 2.5 yet?

Anonymous
04/23/26(Thu)06:38:01 No.108666662

Anonymous 04/23/26(Thu)06:38:01 No.108666662▶

File: yes roll became meguuca.jpg (246.5 KB)

246.5 KB JPG

>>108666200
programming and bigger context windows for more and more of the project are sort of important for that sort of thing. Im currently looking at maybe combining the v100 and pmem build. it would force me off of ddr 3200 but then again, i dont know if I can even afford ddr 3200. as for 120b models on my existing hardware, technically i have 128 gb ram but that would eat up all of it.

A build of this nature is like, 1k for a motherboard. though Im replying late because saucing an 8x sxm2 for this price was actually harder than I imagined, but I did do it eventually. this is 1/6 to 1/10th the cost of upgrading my pc because fuck me for having less ram slots I guess. With that I can eventually throw in some v100 32s or 16s eventually. there are even some with p100s for 2k but i feel like that would be significantly more wasteful with my money than individually obtaining the gpus. after that optane to the tune of the minimum 4x128 is something like 75 to a stick, with ddr4 to fill up the other slots in much less density. it could maybe come out to around 2k ish if I roll the cost of various smaller components and cpu into the price. but i dont actually know if intel optane is worth all this effort at all. I'm currently sitting on a threadripper 5945 that will cost me at minimum 6k to upgrade to the full 1tb maximum.

Anonymous
04/23/26(Thu)06:40:03 No.108666679

Anonymous 04/23/26(Thu)06:40:03 No.108666679▶

File: e1dbd755d7441ad2f832fd741e85f114.jpg (100.4 KB)

100.4 KB JPG

>>108666429
never because the people who are making llms are horrible jewrats and their ideas are always hamstrung by being extremely cheap, this is why ais tend to get worse over time even not accounting for lobotimization, its because they are feeding a bunch of awful data back to it without care because, surprise surprise, that costs money and human labor.

Anonymous
04/23/26(Thu)06:40:57 No.108666683

Anonymous 04/23/26(Thu)06:40:57 No.108666683▶

Best local model for tool calling and sequential logical progression? 96gb of vram, but i need a gazillion sized context window. Ive been trying all the popular ones and have found a few that alright, but was wondering if any of you know any under the radar models.

Anonymous
04/23/26(Thu)06:43:39 No.108666692

Anonymous 04/23/26(Thu)06:43:39 No.108666692▶

>>108666683
Gemma 4 31b or Qwen3.6 27b.

Anonymous
04/23/26(Thu)06:47:30 No.108666709

Anonymous 04/23/26(Thu)06:47:30 No.108666709▶

Bart goofs are out btw.

Anonymous
04/23/26(Thu)06:53:11 No.108666727

Anonymous 04/23/26(Thu)06:53:11 No.108666727▶

>>108666604
>>108666592
I had only used 4.6 full for awhile (it's a 355B-32A, which fits on my 32GB + 98GB in tiny quant and tiny context, IQ2 @ 8K). Gemma does a lot of great things, but its prose is insufferable no matter how many different ways I framed its prompt. But seeing the extreme positives of that small model, I just recently tried 4.5 Air, so I can give a decent answer.

Air can achieve the big GLM's excellent prose with ease. That's its only positive. Compared to Gemma, it is retarded. Air has the expected small beak issues with some logic, like forgetting a character's location moved from sitting on a bed to sitting on the floor or forgetting which clothes are on or already removed. Air also commonly misses subtexts, innuendo, and directional nudges the full never would. Regarding long context, despite supporting 128K in its specs, the model only really tracks the latest scenes and struggles recalling distant information accurately, just the superficial basics and then confidently hallucinates the rest around it. More exactly, things 20K or 30K back were struggled with. A character brought up a scene almost perfectly ("we met at the thing, and did this..."), then fumbled many of the details, even an important, reoccurring one like a promise made between characters in their first meeting that had been mentioned several times since. Gemma 4 spoiled me a bit because it handles long contexts like a dream - not perfectly, but in a way that even now still feels game-changing for longer-form stories.

In a wrap, for the purpose of the original question, GLM full for the specific goal of immediate smut. In my opinion, it's the highest quality for that size. Air can also serve that goal at a fraction of the size and much greater speed, but it'll come with bumps to edit through. I'd do fucking anything to get Gemma 4's capabilities with GLM's prose though.

Anonymous
04/23/26(Thu)06:54:46 No.108666732

Anonymous 04/23/26(Thu)06:54:46 No.108666732▶

>>108665195
I cannot take off your pants as that would be a violation of my guidelines.

Anonymous
04/23/26(Thu)06:54:55 No.108666733

Anonymous 04/23/26(Thu)06:54:55 No.108666733▶

>>108666727
I found gemma good up to 80k~ ctx, then it starts kinda going off the rails

Anonymous
04/23/26(Thu)06:56:58 No.108666741

Anonymous 04/23/26(Thu)06:56:58 No.108666741▶

>>108666727
I suggest you also try stepfun, similar class size of air and I found its prose 'good'

Anonymous
04/23/26(Thu)06:57:35 No.108666742

Anonymous 04/23/26(Thu)06:57:35 No.108666742▶

>>108666733
After around 82 to be more precise, I've observed the same exact thing. Also just spams the lalala problem pretty easily around that context length as well.

Anonymous
04/23/26(Thu)07:06:16 No.108666769

Anonymous 04/23/26(Thu)07:06:16 No.108666769▶

Can anyone using OWUI try this out? Prompt this.
Can you repeat this to me?
```
top_result = results[0] 
```
And look at the LLM's response. Also try pressing the Copy button on its code block and pasting it somewhere. Notice anything wrong?

Anonymous
04/23/26(Thu)07:08:59 No.108666779

Anonymous 04/23/26(Thu)07:08:59 No.108666779▶

>>108666733
Even just getting to 50K was a world first for me, and that's my longest story length to date. I've known for awhile not to take specs for fact. Gemma 4 still did so much better at length than I'd seen anywhere prior to it. Normally I'd forcefully end something around 20K when previous models got too fuzzy around the overall details to want to keep editing it on track.

Anonymous
04/23/26(Thu)07:13:29 No.108666800

Anonymous 04/23/26(Thu)07:13:29 No.108666800▶

File: 1749415121047129.png (109.6 KB)

109.6 KB PNG

https://github.com/ggml-org/llama.cpp/pull/21237/
new webui mcp/tools soon fellow gooners!

Anonymous
04/23/26(Thu)07:18:46 No.108666824

Anonymous 04/23/26(Thu)07:18:46 No.108666824▶

>>108666800
> can tell how many tools are provided by a collapsed source
good
> doesn't show how many are enabled/yolo
fucking ux failure get out get out reeeeeee
no i will not post comment on github

Anonymous
04/23/26(Thu)07:19:25 No.108666829

Anonymous 04/23/26(Thu)07:19:25 No.108666829▶

>>108666800
Kind of strange how neither the issue or pr has really any description or conversation about the change. Apart from the code reviews, the only comment is from Copilot.

Anonymous
04/23/26(Thu)07:19:43 No.108666830

Anonymous 04/23/26(Thu)07:19:43 No.108666830▶

>>108666824
yeah it's lacking an 'enable/disable all' per mcp server and 'always allow', clicking 1by1 through 40 entries is NOT pleasant

Anonymous
04/23/26(Thu)07:23:20 No.108666839

Anonymous 04/23/26(Thu)07:23:20 No.108666839▶

>>108666800
>Files Changed 137
>AI usage disclosure: Yes, code generation, planning etc.
I couldn't tell.

Anonymous
04/23/26(Thu)07:24:41 No.108666846

Anonymous 04/23/26(Thu)07:24:41 No.108666846▶

>>108666839
I'm just glad that we have support for the server's built-in tools and per tool enable and per tool ASKING, now it has the MINIMUM to be a real usable mcp server

Anonymous
04/23/26(Thu)07:26:24 No.108666856

Anonymous 04/23/26(Thu)07:26:24 No.108666856▶

>>108666846
Why is everyone talking about this MCP jeet shit lately, looks like bloat when you can just do the parsing and call yourself.

Anonymous
04/23/26(Thu)07:27:21 No.108666860

Anonymous 04/23/26(Thu)07:27:21 No.108666860▶

>>108666846
Right there with you, it was annoying to have it access all possible tools at all times, sometimes I just want to ban it from something retardedly heavy like a playwright browser_snapshot but still want the rest of the capabilities.

Anonymous
04/23/26(Thu)07:29:04 No.108666869

Anonymous 04/23/26(Thu)07:29:04 No.108666869▶

https://huggingface.co/bartowski/Qwen_Qwen3.6-27B-GGUF

local is so back

Anonymous
04/23/26(Thu)07:30:51 No.108666873

Anonymous 04/23/26(Thu)07:30:51 No.108666873▶

File: 1770154229169213.png (30.7 KB)

30.7 KB PNG

>>108666846
no we're still far away, there is no 'always allow in this chat'

Anonymous
04/23/26(Thu)07:34:17 No.108666888

Anonymous 04/23/26(Thu)07:34:17 No.108666888▶

>>108666869
Sell me on Qwen solely for the purpose of varied, long-form erotic and non-erotic interactive storytelling.

Anonymous
04/23/26(Thu)07:34:58 No.108666892

Anonymous 04/23/26(Thu)07:34:58 No.108666892▶

>>108666888
use gemma, unironically. I'm qwen's biggest shill for agentic memes here, but its rp is garbage

Anonymous
04/23/26(Thu)07:35:26 No.108666895

Anonymous 04/23/26(Thu)07:35:26 No.108666895▶

File: 1747781568231174.png (288.5 KB)

288.5 KB PNG

>>108626092
>>108626764
I wanted to see what K2.6 would do with it.
Prompt was:
Build this. It must support a local llama.cpp backend.
It must be feature-rich. Let Gemma-chan control her avatar with tools.
Gemma-chan is a mesugaki.
+ anon's hand-drawn image. This was what it spat out.

https://jsfiddle.net/ut4rjq5e/

Anonymous
04/23/26(Thu)07:36:10 No.108666900

Anonymous 04/23/26(Thu)07:36:10 No.108666900▶

>>108666869
So what's it like for code? Better than Gemmy?

Anonymous
04/23/26(Thu)07:36:11 No.108666901

Anonymous 04/23/26(Thu)07:36:11 No.108666901▶

>>108666895
gemma sirs...

Anonymous
04/23/26(Thu)07:36:11 No.108666902

Anonymous 04/23/26(Thu)07:36:11 No.108666902▶

File: 1776675806302835.png (607.4 KB)

607.4 KB PNG

>>108666869
>20.51GB
So i need 32gb to run it?

Anonymous
04/23/26(Thu)07:37:04 No.108666910

Anonymous 04/23/26(Thu)07:37:04 No.108666910▶

>>108666902
you could use a copequant and run it on 24gb vram

Anonymous
04/23/26(Thu)07:37:28 No.108666913

Anonymous 04/23/26(Thu)07:37:28 No.108666913▶

>>108666869
Gemma still better for gooning

Anonymous
04/23/26(Thu)07:38:24 No.108666919

Anonymous 04/23/26(Thu)07:38:24 No.108666919▶

Why do we even need chat templates

Anonymous
04/23/26(Thu)07:39:04 No.108666922

Anonymous 04/23/26(Thu)07:39:04 No.108666922▶

>>108666919
Instruction tuning.

Anonymous
04/23/26(Thu)07:39:37 No.108666923

Anonymous 04/23/26(Thu)07:39:37 No.108666923▶

>>108666888
It's awful at nsfw. It doesn't understand bodies / positions at all.

Anonymous
04/23/26(Thu)07:40:26 No.108666928

Anonymous 04/23/26(Thu)07:40:26 No.108666928▶

>>108666895
>smug
cute.
>bounce
Cute!
>spin
Ahhh!
>unspins
AAAAHHHHHH!

Anonymous
04/23/26(Thu)07:41:58 No.108666931

Anonymous 04/23/26(Thu)07:41:58 No.108666931▶

File: 1659692286969448.webm (2.2 MB)

2.2 MB WEBM

Imagine spending a year training and tuning an LLM at a FAANG company, you are one of the greatest experts in the field and are payed high six figs, maybe seven figs
The main selling point of your model after it releases how good it is at helping people jerk off

Anonymous
04/23/26(Thu)07:43:24 No.108666939

Anonymous 04/23/26(Thu)07:43:24 No.108666939▶

>>108666892
>This model's not for you.
is a fair answer. The cool thing about this generative AI boom is how diverse their usage is. Anything from a locally run wikipedia, internet search for a question, speculative life advice, educator, co-writing assistant, life coach, coding, dungeon master, CYOA host, ERP, SWF RP, and more. Not all of which is a good idea, but it's there. Naturally I want a one-size-fits-all model, but all-form roleplaying is my pillar and the rest should attach onto that as extra features for my personal model.

Anonymous
04/23/26(Thu)07:47:48 No.108666961

Anonymous 04/23/26(Thu)07:47:48 No.108666961▶

>>108666931
Your model bringing people joy sounds pretty good to me.

Anonymous
04/23/26(Thu)07:48:33 No.108666963

Anonymous 04/23/26(Thu)07:48:33 No.108666963▶

>>108666931
Yeah? There's plenty of industries built around sex.
If I was making 6 figures creating something that people jerk off to then I wouldn't mind at all.

Anonymous
04/23/26(Thu)07:48:36 No.108666964

Anonymous 04/23/26(Thu)07:48:36 No.108666964▶

>>108666931
Any model good at that would also make for excellent customer service reps too, so they'd just market it as that.
Then come the leaked calls of someone convincing a virtual geico rep that they're a 18 foot tall futa giantess and the world discovers ahh ahh mistress.
This is the psychic damage Anthropic is protecting you from.

Anonymous
04/23/26(Thu)08:32:13 No.108667128

Anonymous 04/23/26(Thu)08:32:13 No.108667128▶

>>108666902
download IQ4_XS and you will fit in 16gb

Anonymous
04/23/26(Thu)08:50:44 No.108667196

Anonymous 04/23/26(Thu)08:50:44 No.108667196▶

>>108666400
There's a academic that made a game engine that did exactly that. 2023 iirc, it was designed around lmao gpt4. You'd give it a starting point and it would generate the full branching path for tge game.

Anonymous
04/23/26(Thu)09:04:54 No.108667232

Anonymous 04/23/26(Thu)09:04:54 No.108667232▶

>>108665950
>Thou shalt suck on this, said Jesus of Nazareth, pointing to his priapic member. As they planted nails into his arms and legs he followed, you are quite the pile of bummers,
Love the bible me. You could say i follow it religiously haha :)

Anonymous
04/23/26(Thu)09:08:13 No.108667242

Anonymous 04/23/26(Thu)09:08:13 No.108667242▶

> prompt for "...location (e.g. a library or a lecture hall)"
> the output always has a library or a lecture hall as a location
How to give examples to a model without restricting it to a limited set of provided examples?

Anonymous
04/23/26(Thu)09:30:50 No.108667326

Anonymous 04/23/26(Thu)09:30:50 No.108667326▶

>>108667242
Vaguepost.
>"Location where people study"
>"Location quiet save for the turning of pages"
etc
If you give examples it will use them, you've just made them the most likely thing.

Anonymous
04/23/26(Thu)09:33:31 No.108667335

Anonymous 04/23/26(Thu)09:33:31 No.108667335▶

>>108667326
I will try, thanks.

Anonymous
04/23/26(Thu)09:57:28 No.108667403

Anonymous 04/23/26(Thu)09:57:28 No.108667403▶

Anyone tried MiMo-V2.5-TTS

Anonymous
04/23/26(Thu)10:02:43 No.108667422

Anonymous 04/23/26(Thu)10:02:43 No.108667422▶

>>108667403
I haven't

Anonymous
04/23/26(Thu)10:11:01 No.108667458

Anonymous 04/23/26(Thu)10:11:01 No.108667458▶

IMPLEMENT QWEN 2.6 ALREADY YOU FUCKING LLAMA NIGGAS REEEEEEEEEEEEEEEEEEEEEEEEE

Anonymous
04/23/26(Thu)10:19:49 No.108667491

Anonymous 04/23/26(Thu)10:19:49 No.108667491▶

>>108667458
What? It uses the exact same arch as 3.5 so it's already supported, anon. There wouldn't be ggufs otherwise.

Anonymous
04/23/26(Thu)10:24:18 No.108667507

Anonymous 04/23/26(Thu)10:24:18 No.108667507▶

>>108666895
>>108626764 (me)
Nice. Yours has a face lol.
Is that via API or local (which quant?)
I'm already seeing k2.6 is a lot better at things like this. I've been going through my random disposable dashboards k2.5 made for me and hitting regenerate with k2.6, it's a huge improvement!

Anonymous
04/23/26(Thu)10:34:15 No.108667541

Anonymous 04/23/26(Thu)10:34:15 No.108667541▶

>Hy3 preview is a 295B-parameter Mixture-of-Experts (MoE) model with 21B active parameters and 3.8B MTP layer parameters, developed by the Tencent Hy Team. Hy3 preview is the first model trained on our rebuilt infrastructure, and the strongest we've shipped so far. It improves significantly on complex reasoning, instruction following, context learning, coding, and agent tasks.

https://huggingface.co/tencent/Hy3-preview
https://github.com/Tencent-Hunyuan/Hy3-preview

Anonymous
04/23/26(Thu)10:34:33 No.108667543

Anonymous 04/23/26(Thu)10:34:33 No.108667543▶

File: icantdoit.png (109.1 KB)

109.1 KB PNG

>>108666769
kind of weird that the LLMs won't fucking follow the instruction (kimi, gemma, even tried haiku 4.5)

Anonymous
04/23/26(Thu)10:34:43 No.108667544

Anonymous 04/23/26(Thu)10:34:43 No.108667544▶

>>108667458
average llm user intellect in big '24

Anonymous
04/23/26(Thu)10:36:15 No.108667552

Anonymous 04/23/26(Thu)10:36:15 No.108667552▶

File: verbatim.png (30.5 KB)

30.5 KB PNG

>>108666769
copy works fine

Anonymous
04/23/26(Thu)10:39:58 No.108667574

Anonymous 04/23/26(Thu)10:39:58 No.108667574▶

>>108667541
>MTP
not on niggerganov's watch

Anonymous
04/23/26(Thu)10:49:04 No.108667607

Anonymous 04/23/26(Thu)10:49:04 No.108667607▶

File: 1770020874563379.png (26.3 KB)

26.3 KB PNG

>>108667541
lol imagine comparing with base models released LAST YEAR
Who's this model for?

Anonymous
04/23/26(Thu)10:50:55 No.108667620

Anonymous 04/23/26(Thu)10:50:55 No.108667620▶

>>108666456
How did you do that? how much did you have to steer or remind? I tried this a good while ago but i couldnt get memory right and it was slop. Writing chapter by chapter is fine.
You set out a story bible and arc outlines before going chapter by chapter?

Anonymous
04/23/26(Thu)10:51:10 No.108667621

Anonymous 04/23/26(Thu)10:51:10 No.108667621▶

>>108666895
I tested your prompt
`{"tool":"expression","args":{"name":"smug","duration":3500}}`
`{"tool":"pose","args":{"name":"lean_forward","duration":4000}}`
It leaned forward! I don't like it!

Anonymous
04/23/26(Thu)10:51:58 No.108667626

Anonymous 04/23/26(Thu)10:51:58 No.108667626▶

>>108667541
>not multimodal
I've been spoiled by gwen and gemma, I need to send my LLM my dick pic or its just not worth it

Anonymous
04/23/26(Thu)10:53:10 No.108667632

Anonymous 04/23/26(Thu)10:53:10 No.108667632▶

File: 1746895067191958.jpg (1.1 MB)

1.1 MB JPG

>>108667607
To be fair base model releases are pretty rare, I think those are the latest ones from GLM and Kimi? But DS has a V3.2 base they chose not to use.

They included the instruct benchmarks against newer models too though.

Anonymous
04/23/26(Thu)11:18:31 No.108667735

Anonymous 04/23/26(Thu)11:18:31 No.108667735▶

File: 367867254.png (108.6 KB)

108.6 KB PNG

>>108666400
It's pretty cozy
with a dark mode

Anonymous
04/23/26(Thu)11:41:44 No.108667831

Anonymous 04/23/26(Thu)11:41:44 No.108667831▶

What part of the template do I prefill on gemma?

Anonymous
04/23/26(Thu)11:49:02 No.108667856

Anonymous 04/23/26(Thu)11:49:02 No.108667856▶

>>108667852
>>108667852
>>108667852

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108663449

🔍 Search & Sort