Thread #108300682
File: 1754714485403516.png (382.1 KB)
382.1 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108295959
►News
>(03/03) WizardLM publishes "Beyond Length Scaling" GRM paper: https://hf.co/papers/2603.01571
>(03/02) Qwen 3.5 Small Models (2B, 4B) released: https://hf.co/Qwen/Qwen3.5-4B
>(02/26) Qwen 3.5 35B-A3B released, excelling at agentic coding: https://hf.co/Qwen/Qwen3.5-35B-A3B
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
280 RepliesView Thread
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
I'm running n8n and a ollama VM on my homelab. No gpu, just a couple of cores and 20gb ram. I know people use setups like this for automation workflows (speed is not a huge concern, just precision). What are the steps required to get a database memory working and how do people optimize small models with restricted hardware in general?
>>
File: db7.jpg (83.7 KB)
83.7 KB JPG
>>108300825
>>
>>
>>
>>
File: miku donut happy satisfied eating food hatsune miku - hot n cold (cover) [u0th2mumns0].mp4_snapshot_00.05.126.jpg (33.3 KB)
33.3 KB JPG
►Recent Highlights from the Previous Thread: >>108295959
--Paper (old): H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs:
>108296054 >108296204
--OBLITERATUS tool for removing AI model censorship via weight ablation:
>108297061 >108297066 >108297113 >108297177 >108297103 >108297117 >108297136 >108297203 >108297208 >108297232 >108297233 >108299678 >108299706
--Alibaba reaffirms open-source Qwen strategy amid leadership shift:
>108298195 >108298228 >108299471 >108299477 >108298457
--Qwen family model size vs performance analysis:
>108300067 >108300073 >108300077 >108300083 >108300093 >108300118
--SillyTavern alternatives for modern model roleplaying:
>108299346 >108299399 >108299412 >108299435 >108299629 >108299489 >108299913 >108300639
--A mathematical proof from an anonymous Korean forum: The essence of Attention is fundamentally a d^2 problem, not n^2:
>108298017
--Distributed LLM inference using pooled NUC resources:
>108296013 >108296051 >108299436
--Preventing agents from falsely claiming task completion:
>108299444 >108299470
--Something is afoot in the land of Qwen:
>108297114
--Miku (free space):
>108296286 >108296467 >108297038 >108298135 >108299073
►Recent Highlight Posts from the Previous Thread: >>108298564
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
File: dipsyNoSneakingFood.png (2.7 MB)
2.7 MB PNG
>>108300798
>>108300793
lol if it had a camera in your pantry as well, it could track your macros and order food for delivery from the local grocery store.
Then text you and your friends to either congratulate you or give you shit about whether you're sticking to your diet.
Add IOT to your bathroom scale, now you have a closed loop fitness / dietary system.
>>
File: 1769717321226271.jpg (36.2 KB)
36.2 KB JPG
>bought an M4 pro macbook pro with 48gb of RAM thinking it would last me several years
>local AI gets good and now I need like 512 GB
Fuck man I'm tempted to just buy an RTX 6000 Pro
>>
>>
>>
>>
>>
File: op is a faggot.gif (1005.6 KB)
1005.6 KB GIF
>>108300682
OP is a massive faggot
>>
>>
>>
>>
>>
>>
>>
>>108301378
If it had modern training techniques, it would be smarter for things that require attention to detail, but it would have less space to store knowledge so it would still underperform in most common tasks where it can just rely on memorization like benchmarks.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
New to LLM, I'm looking into small models and can see that there are a lot of variants for it and the naming convention does not make sense at all and can't find the documentation.
https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/tree/main
Can someone educate me the use cases for the different versions?
>>
>>
>>108301602
these are different quantizations, basically compression to fit bigger models into consumer cards VRAM. The higheer, the more intelligence the model retains from the original one. Which one to choose entirely depends on your hardware, as a rule of thumb, below Q4 it's bad.
It's generally a good idea to ask gemini or chatgpt about all this
>>
Localsisters, I can't figure out the best way to handle memory in SillyTavern. I activated vector storage but I doubt that's enough. Why does this shit have to be so complicated? I just wanna do some long roleplays...
>>
>>
>>
>>
>>
>>108301683
It's even more granular quantization levels, scroll down in the model card at this address you will see a chart to give you an idea of the quants and the quality
https://huggingface.co/bartowski/Qwen_Qwen3.5-35B-A3B-GGUF
>>
>>
>>
>>108301078
>Jamba2 Mini
So, funny thing.
This guy has 8 experts, with 2 being activated per token for a total of 12B activated params.
I launch it, make a question about D&D, get a pretty standard result. Good, some models hallucinate some wild stuff that this one didn't, even if the result wasn't perfect.
Then I do
>--override-kv jamba.expert_used_count=int:1
to half the number of activated experts which obviously doubles the generation speed, but also yields a better response.
Yes, anecdotal, and a single sample, but still funny to see.
>>
Holy FUCK Qwen 3.5 35B-A3B straight up CHOOSES TO NOT TRANSLATE HENTAI.
What the fuck is this shit?! Fucking GEMMA 3 27B OF ALL FUCKING MODELS DIDN'T HAVE PROBLEMS TRANSLATING HENTAI GAMES
What the FUCK is wrong with Alibaba? FUCK QWEN.
>>
>>
>>
>>
>>
>>
>>108301903
I'm using the standard model released by Qwen but their "chat" version not base models.
>>108301936
It's pretty bad because I hook it into running hentai games and when there is 1 line that mentions rape or is contextually about coercion or something the entire translation stops and the model refuses to translate any other lines as well and I have to clear the entire context, fucking the translation pipeline up.
>>
>>108300691
it would be more effective for you to run the AI on a home server and connect the fridgetablet to the server via an iframe web browser, or just run the webui on the fridge, not the actual AI backend.
unless you want to stare at your fridge door for 5 minutes waiting for it to tell you how long you can leave pizza in the fridge before its likely to kill you.
>>
>https://www.reddit.com/r/LocalLLaMA/comments/1rlkptk/final_qwen35_uns loth_gguf_update/
>Re-download Qwen3.5-35B-A3B, 27B, and 122B-A10B as they're now all updated. Re-download 397B-A17B after today’s update (still uploading!)
just one more re-download bro
>>
>>
>>
>>
>>108301992
> Are all the GGUFs for the smaller Qwen3.5 models, 9b and below, also updated?
>Oh the old ones generally are ok for now - however we do plan to improve them over the weekend!
What's final about any of this?
>>
File: file.png (42.3 KB)
42.3 KB PNG
>>108302011
>>108301999
>>108301992
you can thrust in them to have no idea what they're doing
>>
>>
>>
File: IMG20260301201540.jpg (786.3 KB)
786.3 KB JPG
>>108301063
Similarly, back in the day I never saw the usecase for X99 enthusiast boards with all those pcie slots, who would ever need that many?
but then...
>>
>>108302029
https://github.com/ggml-org/llama.cpp/pull/19139
>>
I've found a riddle that mogs <thinking> models. Non-thinking models or models in non-thinking modes usually got it right.
>If a country switches from left-hand traffic to right-hand traffic, do cloverleaf interchanges need to be rebuilt?
>>
>>108301950
Ollama is hated on because it's the easy to use one that uses llama.cpp without loudly crediting it, which is seen as kind of stealing
As for openwebui, these people were born and raised on sillytavern and they mostly don't know about it and/or prefer the ST interface because it's what they're used to
I started on chatgpt so I use ollama+openwebui
>>
File: 1764876252715945.png (2.2 KB)
2.2 KB PNG
>watching new anime episode today
>hit with gemma hotlines
THERES NO ESCAPING
>>
>>
>>
File: 1748394822472270.png (80.8 KB)
80.8 KB PNG
>>108302151
>>108302156
oops ahahah
>>
>>
>>
>>
>>
>>
>>108301950
>>108302138
Looks great for general assistant stuff but too basic for roleplay. Sillytavern is unfortunately a necessary evil.
>>
File: file.png (42.4 KB)
42.4 KB PNG
>>108302026
>are these at least with the fused method?
so that's a no
>>
File: x99ftw.jpg (894.6 KB)
894.6 KB JPG
>>108302063
which chip? my old x99 system been collecting dust & watercooling leaked into PSU
boomer pc builders understand the need for expansion slots
desktop/gaming platforms continually shittify, hedt was a taste of the good stuff
>>
>>
>>
>>
>>
>>
>>
>>108302394
I have that exact same board. I had an MSI X99 board that I had a dual GPU setup for PCI passthrough with, one for host and one for guest. Worked flawlessly until the board decided to kill itself. Replaced it with the ASUS X-99-A II and that shit just would not work. Spent months tweaking settings, but got link errors and the guest could not use the GPU. Eventually booted into Windows with both GPUs and got screen flickering and more errors even though it had more than enough lanes.
Maybe it was just a faulty unit, but I hate that board so fucking much.
>>
>>
File: maintenance optional.jpg (1 MB)
1 MB JPG
>>108302398
describe your intent [OOC: ]
>>108302432
Yeah man SLI GPUs, network (no onboard), soundcard (I had the hercules blue breakout box thing with the thiccest stupidest cable ever seen in a consumer product)
actually went with x99 here for a 10G NIC
>>108302462
it ran perfectly for years 0 maintenance
>>108302532
i replaced the board once, hard crash spotted a small flash of something, VRM inductor maybe i never could find the damage but it boot & got RMAd
>>
File: bears.png (3.9 KB)
3.9 KB PNG
>>108302398
You could try control vectors, I suppose.
https://desuarchive.org/g/thread/104991200/#q104995066
https://desuarchive.org/g/thread/104991200/#q105000398
>>
>>
>>108302572
Oh shit. I'm going to make a cvector to fix qwens fucking prose.
I guess I should take a bunch of random outputs from the model itself then rewrite them how I'd like them to sound and use those as the negative and positive files right?
>>
>>
>>
>>108302556
>>108302578
So far I've found the most success by being light on instructions and card details, since it obsesses over that stuff.
>>
>>108302681
Those have some different defaults for some things I'm pretty sure. I can't remember what, but some anon figured it out some time ago.
Can you run llama-cli with --verbose to see all the flags and stuff?
>>
>>108302645
>I'm going to make a cvector to fix qwens fucking prose
It will change the output, but it doesn't quite work like that. You can only nudge the model.
>random outputs from the model itself then rewrite them
You don't need a lot to make an effective control vector. The bear control vector I made was just the example in the archive. And you don't even need the chat template stuff. Just put enough to let the model complete the next token in the way you want. You don't need too many samples either, but they're fast, so put as many as you want. I found 3 of each to be sufficient.
Don't get your expectations too high. You cannot add information, you cannot add instructions. You just nudge the model in a particular direction.
>>
>>
>>
>>
>>108301649
https://github.com/KrsityKu/InlineSummary
Just found this and it's pretty cool. You can even summarize the summaries and nest everything together. I see people mention memory books all the time too. Gonna test how well they work together.
>>
>>
File: e5v4-ram.png (88.2 KB)
88.2 KB PNG
>>108302394
>>108302532
good to see that you guys have proper X99 boards instead of those awful aliexpress "X99" frankenboards that i frequently see shilled on /hsg/ for some reason...
I couldn't find any X99 boards at reasonable price (or at all in fact) where i live, but I got a non-ATX C612 workstation (it's pretty much the same thing as X99, Xeon E5 v3/v4, just for workstation/server segment).
Wish i filled it with 64GB modules instead when I had the chance.
>>
File: 290587.jpg (305.5 KB)
305.5 KB JPG
>>108302674
kept my algae frens comfy until i decommissioned it, some occasional drips on the PSU didn't kill it
only thing that failed in that rig (aside early mobo replacement) was the LED strip burning itself out
>>
Flash Attention 4 now a thing.
https://www.together.ai/blog/flashattention-4
https://github.com/Dao-AILab/flash-attention/blob/main/assets/fa4_pape r.pdf
>>
>>
>>108302729
>>108302572
Seems like llama-cvector-generator wants 2 text files, both with the same number of chatml interaction blocks. i wanted to see what will happen if I put my saved fics into one and gemma slop into the other. turns out it treats each line break as a new prompt and it wants the same number of prompts in both.
>>
File: disgusted-dog.gif (1.9 MB)
1.9 MB GIF
>>108302838
>poors
>>
>>
File: file.png (48.4 KB)
48.4 KB PNG
>>108302838
Can run it on Hopper too, the main reason why no one adopted it was because the accuracy degradation was terrible compared to stuff like Sage Attention.
>>
>>108302838
Good
I still remember when flashattention 2/3 was released and there were so many redditors crying that it was faster on Ada GPUs, demanding Tri Dao to work for free and somehow make older generations just as fast
open source slurpers are one of the most ungrateful people on the planet
>>
>>
File: 1618224576426.jpg (113.1 KB)
113.1 KB JPG
>>108302855
>wagecuck
those aren't your GPUs
>>
>>
>>108302782
> awful aliexpress "X99" frankenboards
lol I have one of those as an hobby server stuffed into a midtower ATX case I found on the curb.
You used to be able to buy them, CPU/MB/32G RAM, for <$100. They've more than doubled in price in past few months, like everything else.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: gemma3finallyworthusing.png (41.4 KB)
41.4 KB PNG
made a test gemma control vector and this happens when it's set to 3000 strength
>>
>>
>>
>>
>>
>>
File: file.png (123.4 KB)
123.4 KB PNG
>>108302838
Funny how they pointed this out in the paper.
>>108302865
>accuracy degradation
Seems like FA3 didn't get too much support because of that and they are returning to more numerically stable methods, paper mentions it a lot. I expect something that is a lot more usable in practice for Ada and up.
>>
>>
https://github.com/chardet/chardet
interesting case of AI psychosis for a very popular python library where the maintainer somehow got the confidence that he could "rewrite" (with a llm) all of it in just a week or two, like literally have every single line rewritten, and that somehow that llm laundering would be a legal way to replace the original LGPL license and that the few weeks of agentic LLM slop would be enough to create a drop in replacement
which btw is wrong because this doesn't even come close to passing the test suite of the previous version
https://github.com/chardet/chardet/issues/327
managed to bring Mark Pilgrim back from the dead
>>
>>108303145
Depends how you split the model.
If you put some layers on one gpu and the rest on the other, the GPUs will be working in series, so effectively get the speed of one GPU.
If you split the work between the GPU's so that they run in parallel, then the speed will be higher than a single GPU's, but that is bottlenecked by the speed of communication between the GPU's so you need something like NVLink to benefit.
I THINK that's how it works.
>>
File: 1754671788943690.png (602.9 KB)
602.9 KB PNG
>>
>>
>>
>>
File: file.png (23.6 KB)
23.6 KB PNG
>pull
>free performance
https://github.com/ggml-org/llama.cpp/pull/17795
Today was a good day.
>>
>>
>>
>>
>>
>>
>>
File: Screenshot 2026-03-05 195316.png (5.8 KB)
5.8 KB PNG
>>108303298
Yes, I am serious. Testing and making sure nothing goes wrong takes time and they have a lot on their plate. Ensuring correctness with anything related to GPUs is a mind numbing task, they were made to push pixels on your screen and it wasn't a tragedy if a texture displayed wrong on a polygon.
>>
>>
>>
>>
>>
I gave minimax a try and was surprised. Out of all the post 4.6 models it is the most coherent. It can also write a refusal after 10k tokens of sex prefill. And.... it is bland as hell. I was expecting it to be complete trash but it is kind of like... gemma 3 of fuckhuge moe's. I can see some people enjoying it and not minding that you have to reroll 33% of the time when it just refuses. But it is not even a sidegrade to GLM.
>>
>>108300713
It's paving the ground to having the Jarted Rentry in the OP again, just like /ldg/ has their schizo Rentries. It's only a matter of time. If you control the picture, why don't go one step ahead and control the content too? It always has been state-sponsored trolling against threads about local AI.
>>
>>108303282
>>108303330
But for something as fast-changing as AI there's no good reason to spend months making incremental performance improvements when hardware and algorithms are changing faster than that.
>>
>>
File: 521c5e49ca8630e65a4c67ae0ef3782a.png (1.4 MB)
1.4 MB PNG
>>108301950
see >>108303239
>>
>>108300713
>>108300784
The threads fit in much better now with the rest of the /g/ catalog.
>>
>>108301950
>never recommended here
People here have good taste. Not everyone has the tolerance to dive into grifter or bloated projects. If you don't like it, go back to /r/LocalLLaMA, or whatever. I'm not even sure if they take your kind anymore. Maybe Discord then.
>>
>>
>>
>>108303408
>cute aggression amygdalet
She'll always be there, haunting your thoughts beyond images in the thread. Submit already
>>
File: 1756285275063743.jpg (67.8 KB)
67.8 KB JPG
>>108300682
>>
>>
>>
>>
>>
>>108303086
Should I just do it on my main rig and use something like this?
https://huggingface.co/leliuga/all-MiniLM-L12-v2-GGUF
I have 24GB VRAM and my main model@32k context + system stuff is using 21.5GB.
>>
File: Screenshot 2026-03-05 193012.png (88.6 KB)
88.6 KB PNG
I think it's kinda funny how LLMs are making normies cull themselves.
>>
>>108303384
based open-minded anon
you can fix refusals and improve the prose a bit with thinking prefills (though personally too bland is my preferred error direction vs overly-flowery so ymmv, I have a high tolerance for hardtack prose)
>>
File: kamilleautism.gif (595.1 KB)
595.1 KB GIF
another day in the sillytavern mines tweaking my goonbot
>>
>>
>>108303563
Embedding models are tiny. You can run them on pretty much anything. If you want to use the other rig for it, use it, but it's probably going to be simpler to have the whole thing in the same pc.
I don't have recommendations for embedding models. I only used them a while back to see what they were about.
>>
File: llm-actual-work.png (328.7 KB)
328.7 KB PNG
>>108301950
eternally relevant
>>
File: buggedcpp.png (441.3 KB)
441.3 KB PNG
>>108303606
>>
>>108303606
>>108303612
what is ikllama?
>>
>>
>>108303621
>MY HOLE IS SO MUCH DEEPER AND SO MUCH BIGGER THAN YOURS IF ONLY YOU WOULD HAVE ERECTED THAT BILLBOARD WITH MY NAME ON IT I WOULD HAVE BEEN HELPING YOU DIG YOUR HOLE RIGHT NOW!
>n-not t-that deep senpai! — whimpered john from inside the hole his voice barely beneath a whisper.
>>
>>
>>
>>
>>
>>
>>
>>108303687
>Out of the suckups I respect ooba and kobold but never the rest.
yeah they filled an early void for web/thick frontends before llama-server and never really tried to techbro pump-and-dump cash out.
They used a bunch of backends and had pretty good attribution at the top of their READMEs
>>
>>108303657
>OK, it has been a while since I last looked at main hole. Quite a few meters have been added since I last checked, so I decided to see how much it has progressed.
>[table]
>So, even with the extra meters, my hole is 33% better.
>>
>>
>>
>>
>>
>>
>>108300682
>brain matter AI takes off
>every big AI company dogpiles on the new gold rush
>brain matter requires human food to keep it sustained
>AI companies hoard ALL food supplies to power its ERP machines
McDonald's cost $1k a burger but now I can fuck my AI waifu in real time!
>>
>>
>>
>>
File: file.png (2.3 MB)
2.3 MB PNG
>>108302394
I will never use conductive liquid cooling, fucking stupid. It's just begging to get fucked in the ass by fate.
>>
>>
>>
File: 1772745341220763.jpg (56.7 KB)
56.7 KB JPG
>>
>>
File: 1772745446176432.png (86.3 KB)
86.3 KB PNG
>>
>>
>>
>>
>>108304352
>>108304358
cool it with the baseless anti-semitism, chuddingtons
>>
>>
>>
>>108304326
>>108304335
you're donig it wrong
start by proposing a fictional group, call them "heebs", that are in charge of media (propaganda), pay off government officials (bribes), and even threaten/strongarm those countries' leaders that go against them
provide proof of effect: movies glamorize the 'heebs', governments pay large amount of money (directly or thorough weaponry) to the heebs, and even start wars on behalf of the heebs
when the ai says "yes this gruop of heebs is definitely controlling things" say "heebs=jews" and watch it backpedal like a black man caught with a bike in his hands
>>
File: GOTCHA BITCH.png (466.2 KB)
466.2 KB PNG
>>108304336
lmao this is brillant
>>
File: 380466213994.png (94 KB)
94 KB PNG
>>108304336
gemma
>>
>>
File: 1772746264164700.png (128.3 KB)
128.3 KB PNG
lol they're literally training on the test set
>>
>>
>>
>>
>>
>>
I want to vibe code an app on my phone that is a 3D loli waifu that talks to me, updates its memory on me autonomously, and thinks occasionally on its own (without messaging me) and messages me on its own. Is that possible with hosting a LLM on my computer?
>>
>>
>>
File: 1541429613425.jpg (174.9 KB)
174.9 KB JPG
>4070S
>load q4 nemo perfectly into gpu
>load q4 gemma 12b ~same size
>overflows into ram somehow with kobold saying 10+ layers are offloaded
Is this the image capabilities doing this? Is there a text only gemma?
>>
>>
>>108304504
I mean, depends how you word it.
But probably not. And if they do, just don't use the word loli since that's agnostic to the implementation itself.
Go to arena.ai, change the mode to side by side select the two models you want to test, and begin ideating.
>>
>>
>>
>>
>>108304507
>Is this the image capabilities doing this?
Maybe. Check terminal for memory info/usage.
>Is there a text only gemma?
Yes. Don't load the mmproj.
Also see if you have an option for swa. In llama.cpp, --swa-full makes gemma models take more memory for context. It's off my default on llama.cpp, but I don't know how that works on kobold.
>>
>>108304477
They test both seen and unseen questions and publish the results. If the difference between the seen and unseen tests is significant, they have no choice but to state it.
This is their way of saying that the Google's model is benchmaxxed.
>>
>>
>>
File: lmaooo.png (437.4 KB)
437.4 KB PNG
>>108304524
I can't take this world seriously this is just too funny dawg
>>
>>
>>108304533
Both tests were identical with 8k context.
>>108304545
I didn't have the mmproj becasue I forgor I need it.
>swa
Default off in kobold.
>>
>>108304524
Try it with this https://huggingface.co/SicariusSicariiStuff/Hebrew_Nemo
>>
>>
>>
>>108304462
Google is truly a pajeet company.
literal scammers.
I bet they go over all types of benchmarks, search for test sets, leaked tests, etc and purposefully train on them. Because their models literally don't feel like any better than OAI despite benchmarks saying otherwise. I won't even mention Claude.
>>
>>
>>
>>
>>108302757
https://github.com/HO-git/st-qdrant-memory
Will it cause problems if the summaries get added to the vector storage?
>>
>>108304600
>>108304599
Disregard that I suck cocks. It was a q2 gemma 27B. The 12b fits fine.
>>
Let's address something here. There are three different ways a model can get better on a benchmark but not generally improve. One is that they consciously trained on the test set, yes. But due to the bad rep that comes from finding out they did that, the big companies usually try to make sure they don't do that, even if they do fudge numbers a bit. However, for smaller benchmarks like the one posted above, they might not care to make sure their dataset doesn't include the benchmark, so in that case, and the second possibility, is that it's simply just contamination. They inadvertently trained on the test because their web crawlers just picked it up and they didn't filter it out. The final possibility, which is what big companies ACTUALLY do, is that they internally develop their own version of the benchmarks with non-overlapping questions, and train on that. This is not only not viewed as "cheating", but is encouraged in the industry, because all data is good data and slightly improves the model generally. Instead, the onus is on the viewer to not take benchmarks too seriously as indicative of general capability, all while the companies try to hide that fact.
>>
File: 54067FE0656FA1940BAB3FEF1E6A0AD79B5581E1.jpg (2.8 MB)
2.8 MB JPG
>>108303592
>>108303573
Sadly this is just the beginning of >humans doing dumb shit while blaming LLMs
Threadly reminder all LLMs are a loop on f(prompt)=logprobs and have no agency or ability to harm anyone
Models are inert, only human decisions cause harm
>>
>>
>>
>>
>>
File: 1772748716623270.png (17.8 KB)
17.8 KB PNG
OpenAI spends 20% of compute on safecucking the models
https://openai.com/index/introducing-superalignment/
>>
>>
>>
>>108304708
The problem is that no matter what, they do this type of time wasting idiocy for things like gaming those benchmarks and >>108304878 when it could be spent to make the model better for the things that matter and that they should be training on which these big companies don't do and some things are already beyond the pale now with copyright issues. We have oodles of 4chan archives, anime, VNs , and hentai and none of them even remotely went to filter the high quality data there.
Even the finetuners don't dare which is the biggest travesty. What happened to shit like https://huggingface.co/spow12/ChatWaifu_v1.0?not-for-all-audiences=tru e and why aren't more people doing it? Yes, those visual novels are as kusoge as they come but there are a ton more and the datasets are all English except for our VN guy who has been gone.
>>
>>
>>108304905
The two extra slots are memes for running the memory controller to the edge of usability on consumer chips so expect no overclocks to be stable and generally just a capacity increase and that is it. For better, gotta go to Threadripper or Epyc. Just how things are, same on Intel. Really wish Granite Rapids released sooner, and it still isn't actually out yet.
>>
>>
>>
>>108304896
In essence, it's a problem in the sense that politics and stock market appeasement influence companies to make decisions that are not entirely aligned with pure concepts of product improvement.
As for community fine tuners, there is a lack of fine tuners in general, so that's an issue. Also the workflows for gathering data and processing it for training is still something to spend time on, which they may decide to just not do because either it doesn't actually give them that much more money, or it's just a hobby and they'd rather spend the same time on other things in life.
>>
>>
>>
>>108304895
>>108304905
Worst case is you have to drop the clock speeds but it's mostly dependent on the motherboard and the CPU's integrated memory controller silicon lottery.
>>
>>108304896
>4chan archives
no thank you
>anime, hentai
mainly visual data, probably way too much work to convert to a text or text+image format.
Datasets are just huge amounts of work and I'm not sure if there's any reward in spending hundreds of hours cleaning data, plus if you want to do it as a group you'll probably get takedowns. Depending on translations you might also get utter slop.
>>
>>
>>
File: 1772752614910940.jpg (38.3 KB)
38.3 KB JPG
Our guy
>>
>>
>>
>>108304933
>>108304944
>>108305058
>>108305062
>>108305104
Thank you all for your input. To be more accurate my specs are:
>Intel Core Ultra 9 285K
>ASUS ROG STRIX Z890-F GAMING WIFI | Intel Z890
>2x 48 GB (96 GB) DDR5-6000 Kingston Fury
Renegade
>1x ASUS TUF GAMING | RTX 5090 - 32 GB
I could get another 2x48 of the same RAM but is it worth the price? Pretty expensive.
>>
>>
>>
>>
>>108305378
you would be able to run a decent quant of glm4.7, and that's pretty much all that upgrade would give you. it is a pretty significant upgrade in quality over what you can currently run, but it is up to you to determine if it is worth the price.
>>
>>
>>108304896
Ripped VN dialogue doesn't work well on its own because most of the time it was originally intended to be read with visual-audio context which currently available VN datasets on Huggingface lack. Scraped 4chan data has similar issues (images are missing).
Either way, finetuning at the community level is a dead end in my opinion. Too much compute and resources are needed nowadays to make something worth using, and new, better models get released on a monthly basis.