Thread #108252185
File: b7ec27b0-de98-49e3-b6db-1d276ca748e5.png (2 MB)
2 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108246772 & >>108241321
►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
479 RepliesView Thread
>>
File: 1701626182006697.png (2.1 MB)
2.1 MB PNG
►Recent Highlights from the Previous Thread: >>108246772
--Paper: DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference:
>108247376 >108247408 >108247469 >108247651 >108247780
--Papers:
>108248442 >108249269 >108249326
--Qwen benchmarks debated, MoE efficiency questioned, neural steganography project discussed:
>108249710 >108249716 >108249732 >108249744 >108249772 >108249786 >108249789 >108249821 >108249843 >108249792 >108249832 >108249850 >108249868 >108249875 >108249882 >108249905 >108249950 >108249985 >108249794
--MoE vs dense model roleplay performance and ablation effectiveness:
>108249916 >108249923 >108250033 >108250074 >108250099 >108250116 >108250143 >108250205 >108250292 >108250330 >108250395 >108250418 >108250440 >108250491 >108250543 >108250550 >108250731 >108250772 >108250551 >108250554 >108250565 >108250610 >108250627 >108250551 >108250580 >108250610 >108250645
--Dense 27B outperforming MoE 35B in knowledge benchmarks:
>108248187 >108248207 >108248249 >108249636
--Running Qwen 3.5 27B on 16GB VRAM with reasoning mode tweaks:
>108249215 >108249268 >108249271 >108249305 >108249316 >108249357 >108249418 >108250671 >108250708 >108250747 >108250802 >108250819 >108249966 >108250051 >108250148
--AI thinking steps improve performance but face token efficiency tradeoffs:
>108249084 >108249098 >108249106 >108249127 >108249129 >108249133 >108249155 >108249157 >108249281 >108249294
--Qwen 27B dense model outperforming larger MoE models in benchmarks:
>108248368 >108248401 >108248420 >108248438 >108248443 >108248570 >108249019 >108249031
--Severe Q4 quant degradation in new 35B model:
>108248366 >108248374 >108248377 >108248403
--Oobabooga stagnation and potential alternatives:
>108248545 >108248557 >108248579 >108248608 >108248572 >108248588 >108248598 >108248617 >108248768
--Miku (free space):
>108250309
►Recent Highlight Posts from the Previous Thread: >>108246776
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
>>
>>
>>108252306
It's a weird mixed bag, it suffers a lot from getting stuck in a safetyslop loop whenever you "trigger" it (interestingly Grok 4.20 has this exact same problem).
But if you can avoid setting it off it seems fine with explicit loli content. Very strange model to work with for sure.
>>
>>108252390
>>108252312
Are ERP only fags really like this?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Anybody experimented with different ways to inject information in the context, for example RAG?
Not the extraction techniques, but where and how to add the information to the chat history.
I started with the vanilla "everything in the system prompt" approach, but now I'm experimenting with adding those as faux tool call results after the latest User message.
I might try adding the fake tool calling result to between the last assistant message and the last user message to compare the behavior.
>>
>>108252502
>>108252494
>>108252493
>>108252491
I made neural stegnagprahy using sub 1b model but i need to control warmth/randomness of distribution to be more percise.
>qwen instruct models are no good
>gpt2 large is quite cluster fuck
>tiny llama is good but at floating point 32 it starts to break down chrome as an extension due to how much ram it uses
any solutions to reduce tone so it fits in better in human environment? should i try mistral that has been quantified but this method only works with fp32 so computers can communicate
>>
>>
>>
>>
>>
>>
>>108252530
no im a human anon
>https://arxiv.org/abs/1909.01496
read a paper recently where harvard AI team was able to send hidden messages by hijacking probability distribution of llms by controlling. Then using a seed/binary they could have it be encyrpted stegnography that passes off normal human language. They used GPT2 XL which i can't run at fp32 due to hardware constraints so im gearing it towards small models with new model architecture that might have an edge. Are all of u retarded and unable to see how useful this is?
u could use discord, twitter, 4chan and pass a key around then use a sub 1b model to talk in open while message is only known by people who hold weights, seed and information.
>>
>>
>>
>>108252488
Less can often be better if using the truncate middle strat, I often use around 16384 (sometimes more, sometimes less, depending on the model).
For multi-modal or reasoning I usually start by doubling that, but too high and it loses the plot or gets stuck in a loop more often.
>>
>>
>>
>>
>>108252530
I'm not entirely sure it's a bot. They can spell steganography just fine.
Could be a run-off-the-mill schizo.
>>108252564
Confirmed. Why did you open up with the context question? Or did you just click on a random post to reply? Also, why do you want to distribute child porn?
>>
>>
>>108252570
>>108252574
>>108252584
>>108252585
why are none of u interested? goy cattle is what u are for not being impressed. You could be shown gold and still ignore it
>>108252590
>Confirmed. Why did you open up with the context question? Or did you just click on a random post to reply? Also, why do you want to distribute child porn?
that contains hidden message so it was an example of hidden code plus this is for privacy not what ever ur considiering u fag. I just wanna give anons option to talk privately on internet without zog on their back. A distributed system of communications
>>
>>108252514
>stegnagprahy
>>108252564
>stegnography
>>108252610
>chudaphoginy
>>
>>
>>
>>
>>108252634
>Why did you open up with the context question? Or did you just click on a random post to reply?
i picked random post to reply to anon. Im just excited and want some other anons to help build this tool essentially privacy on demand with relative hardware use. Maybe one of u can fine tune a model to be more of a summarizer/reworder so you could reply , have model wrap/padded it up while containing info
>>108252641
FUCK U GLOW NIGGER U THINK THESE METHODS WORK
>>
>>108252491
I did exactly the same thing, I spent a good while trying to tard wrangle the big 3.5 model into writing a mid length script, about 500 lines, but by the time it needs amendment (and it does, inevitably) it loses track of context and becomes completely unreliable. Not even that deep in. Felt busted desu
>>
>>
>>
>>
File: 1760558362110609.png (9.5 KB)
9.5 KB PNG
>>108252589
Can you give example? I'm using 24B, it's okay for synthesizing RAG summaries. It's not officially announced but it understands Hebrew too!
>>
>>108252658
>You should probably just go read up on encryption instead if you want to LARP as epic hackerman from the movie you just watched bud.
midwit can't understand what im saying
ur a retard fuck u
>>108252660
FUCK U AS WELL
FUCK ALL OF U for not seeing truth i tried to save you
>>
>>
File: nah.png (120.1 KB)
120.1 KB PNG
>>108252587
tried a few gens, it doesn't really know
>>
File: file.png (11.1 KB)
11.1 KB PNG
>>108252694
knows baker and cuda dev tho
>>
>>
>>
>>
>>
>>
>>108252652
I understand what you are saying, that you want to look into sending encrypted messages via manipulating token probabilities, but this website is full of retarded teenagers and bedroom masturbators and personally i don't see enough value in it to read a paper
Also i hate to be the autist to bring you up on this but if you're writing every word in full, you don't need to abbreviate 'you'. People will assume you are retarded and it saves you no time. Maybe you're ESL though in which case i understand it may not be an intuitive nuance to you
>>
>>
>>
>>
>>108252723
>I understand what you are saying, that you want to look into sending encrypted messages via manipulating token probabilities, but this website is full of retarded teenagers and bedroom masturbators and personally i don't see enough value in it to read a paper
why not? why is there no utility in this
>>
File: 1741041838333686.png (35.5 KB)
35.5 KB PNG
>ik_llamacpp MTP support merged
>it's slower than running models without MTP
Could it be that llama.cpp is just fundamentally not compatible with this? It seems to work fine for vllm so it can't be MTP itself.
>>
File: 1764749470093837.png (49.1 KB)
49.1 KB PNG
>>108252708
Unfortunately it doesn't work too.
You can see on RAG _794 heretic breaks the model somehow.
>>
>>108252747
>Air IQ4_XS
You need high bandwidth to make it worth it. If you're struggling to load air, you don't have the spare room for speculation. Post the link. Is he also leaving layers in ram too? I refuse to believe air can only do 11t/s fully on gpu.
>>
Is anyone else still using the n_sigma sampler? I still use it for Qwen3.5-35B. The outputs are decent quality (if you don't mind neurotic thinking blocks, rigid behavior and reasoning breakdown at long context lengths), without any repetition issues.
>>
>>108252770
https://github.com/ikawrakow/ik_llama.cpp/pull/1270
It's from the PR
>>
>>
>>
>>108252795
Models are getting better the reason why people are excited about the new Qwen is because it closes the gap.
I just want another high vram gpu to throw in a sever outside my main rig to serve other people in my house
>>
>>108252747
>Could it be that llama.cpp is just fundamentally not compatible with this?
no, it's clearly the IK people doing something retarded (and merging it despite it being retarded)
>Acceptance rate seems quite low: 25-30% for single token, just 16% for 4 drafted tokens. Is this expected?
it's slower because their drafting never hits the mark and it's not due to an inherent performance thing, rather it's an inaccuracy problem
the culture of merging things while broken is.. interesting.
>>
>>108252747
>>108252770 (cont)
>>108252791
Hmm. Maybe 11t/s is fine, considering he's 15k tokens in. I'm not sure.
>gen 1122, 939, and 1157 tokens
Also, shouldn't the replies be the same, regardless of MTP or not, or is the retard not using deterministic tests?
>Could it be that llama.cpp is just fundamentally not compatible with this?
Nobody competent or careful enough implemented it yet. They're just number churning programs. They just need to churn better.
>>
>>
>>
>>
>>108252813
I want to get a second 7900 XTX to see if that lets me run bigger language models (48GB instead of 24) but it's an expensive experiment.
I have a few lesser nvidia cards doing stable diffusion/video gen stuff already... In hindsight I probably should've just got one big RTX 6000 instead of many cards but oh well.
>>
>>
>>108252844
I know rentry can get "claimed" or otherwise taken over. Did that actually happen here?
I ended up vibecoding a simple local engine that follows rentry formatting so that I can back up and look at them off one of my servers. Couldn't find anything off the shelf that wasn't a bloated mess.
>>
>>108252824
I'm pretty sure all mainline llama.cpp attempts of implementing MTP had the same issue. They never ended up getting merged because of this.
It's fine on other backends so I wonder what causes this consistent inaccuracy between several entirely different attempts of implementing it across llama.cpp and ik_
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108252975
I've mostly just been testing the vision combined with it writing stable diffusion prompts (basically feeding the images it generates back into itself so it can "self critique/refine") and how different settings and different context sizes effect the output so far. It seems quite good at it.
Haven't tested RP or anything yet.
>>
>>
>>
>>
>>
>>
>>108252747
Speculative methods of any kind work because they allow for higher arithmetic intensity in the matrix multiplications: you can do more useful work per loaded weight value.
However, MoE models in particular have the issue that they scale comparatively poorly with low batch sizes >1, for upstream GLM 4.5 Air only becomes 45%/77% faster for batch sizes 2/3 when the theoretical limit would be 100%/200%.
The problem is that for the first few tokens the likelihood of being able to use an expert matrix for more than one token is rather low.
This problem gets even worse the more sparse a MoE model is.
There is also the issue that in the upstream llama.cpp repository for MoE models batch sizes 2 or 3 are optimized relatively poorly in the CUDA backend, I don't know whether there are additional optimizations in ik_llama.cpp.
>>
>>
>>
>>108253087
NM found it.
>>108252844
I've cut/paste it back into a new rentry. I'll fix up the formatting later.
https://rentry.org/miqumaxx_V2
>>
>>
>>
File: 1511270610943.jpg (40.5 KB)
40.5 KB JPG
If we can make Q6 why is there no fp6?
>>
>>
>>
>>108253213
>>108253216
The appleshit mlx is close to goofs or fps? I see it also has quants for 6,5,4.
>>
>>108253199
>>108253216
Blackwell supports fp6, as well as fp4 and fp8 afaik.
Also not sure that whatever is good for training is necessarily good for inference.
>>
>>
>>
>>
>>
File: Screenshot at 2026-02-28 03-08-50.png (9.9 KB)
9.9 KB PNG
>>108253276
By keeping everything.
>>
>>
>>
>>
>>
>>
File: that's his whole legacy lool.png (598.5 KB)
598.5 KB PNG
>>108253407
>What a fucking nigger
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108253457
I'm not going to do that sadly
>>108253461
it's is from
bartowski
>>
>>
>>
File: 1748606293447823.png (143.7 KB)
143.7 KB PNG
>>108252975
Downloaded Qwen 3.5 27B Q5_K_M. Currently testing it with 65k context and it's noticeably faster than Gemma at thinking and responding. The prose is ok (for slop); can probably be improved with some proompting. I liked the way Gemma portrayed the character better but I also responded to her differently this time. Haven't tried anything lewd yet.
>>
>>
File: 1757592439324716.png (39.5 KB)
39.5 KB PNG
it's the last time I ask qwen 3.5 to write a poem, jesus...
>>
>>
>>
>>
>>
>>
>>108253164
I'd rather llama.cpp be updated at a glacial pace or even become frozen and only get bug fixes than have this sort of piece of shit be involved with anything in it. I hope he's not a professional software developer, to have this asshole as a coworker must suck so many dicks.
>>
>>
>>
>>
>>
>>
>>
File: file.png (16.5 KB)
16.5 KB PNG
>>108253603
>>
>>
>>
File: 1750467275991163.png (300.8 KB)
300.8 KB PNG
>>108253631
>Philosopher
are we fr?
>>
>>
>>108253631
https://syndatis.com/en/team/
oh well, they seem like they all deserve each other
>>108253685
https://www.researchgate.net/publication/277384732_Towards_a_represent ation-based_theory_of_meaning
>Piotr Wilkin
>The aim of the thesis is to provide the foundations for a representation-based theory of meaning, i.e. a theory of meaning that encompasses the psychological level of cognitive representations. This is in opposition to the antipsychologist goals of the Fregean philosophy of language and represents the results of a joint analysis of multiple philosophical problems in contemporary philosophy of language, which, as argued in the tesis, stem from the lack of recognition of a cognitive level in language.
that was his PhD lol, of course he would feel the need to mention it on his profile, he might have more credentials in that than in developing software.
>>
>>108253699
I wouldn't consider myself a huge kuudere fan but I enjoyed the RP I did with her last night. I was really pushy about the romance from the start and it was fun watching her slowly give in. Definitely gonna test with other characters.
>>
>>
>>
>>
>>
>>108253318
>https://rentry.org/miqumaxx_V2
LOL it lasted a whole 60 min before getting taken down.
>>108253308
Well, an attempt was made, but it got taken back down.
So weird. I'll look at the text file later to see if I can figure out what's going on.
>>
>>
>>108252769
>>108252708
Use the 27b heretic. It's legitimately better. If it's thinking feels slow, then just turn thinking off. The 27b without thinking generates better responses than the 35b with thinking.
>>
>>
>>
>>
>>
>>
>>
>>108253922
The number I posted are specifically the upper bounds for the speedup from speculative decoding for 2/3 tokens meaning 1/2 draft tokens per regular token.
It doesn't matter how the draft tokens are produced, it's not possible to get a higher speedup unless and until the backend code is improved.
>>
>>
File: amogus.png (226.6 KB)
226.6 KB PNG
This mf don't miss!
>>
>>108253308
Not the CPUmaxx author, but fuck it. I joined rentry's discord and opened a ticket on that URL anyway. They can explain themselves.
Rentry acts fucky and I don't trust them anymore; if anons are writing up actual content on that platform I strongly suggest you create a local backup.
In the meantime I had chat ~butcher~ clean up the rentry, removing all offensive language and removing certain other references. We'll see what happens to it, since it's bland af now. I speed ran it with zero proofreading b/c I'm in a rush and it might vanish anyway.
https://rentry.org/CPU_Inference
>>
File: 1767475914893432.png (326.7 KB)
326.7 KB PNG
Non-cucked Qwen when?
>>
>>
>>108254117
>Non-cucked Qwen when?
it's already here anon
https://huggingface.co/alexdenton/Qwen3.5-35B-A3B-heretic-GGUF
>>
>>
>>
>>
File: 1746278236126679.png (2 MB)
2 MB PNG
>>108253988
poor qwen 3.5 35b spent 7k tokens thinking and didn't' get the comic. i am afraid it might be a little bit retarded
>>
>>
>>108254154
>>108254117
Here you go
https://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF
>>
>>108254154
go for the dense 27b then
https://huggingface.co/mradermacher/Qwen3.5-27B-heretic-GGUF
>>108254162
try with the 27b model too lul
>>
>>
>>
>>108254196
It sounds like you're stuck in the past. Abliteration used to lobotomize models when it was new, but modern abliteration techniques have a minimal effect on intelligence, and in some cases, increase it. The 27b heretic is amazing.
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1758563959513688.png (2.6 MB)
2.6 MB PNG
>>108254259
27b has the same benchmark than the fucking 100+ MoE qwen 3.5b model, MoE are memes at intelligence, they're just good at speed and that's pretty much it
>>
I DID IT ANONS I MADE NEURAL STEG CROSS COMPATIBLE ACROSS DIFFERENT COMPUTERS
>>108254222
Not only that but i can decode messages that contains pdfs, images, books into words that can be used to send information.
U can bypass all censoring, glowies and all just by using llms
>>
>>
>>
>>
>>
File: file.png (34.1 KB)
34.1 KB PNG
>>108254271
>>108254272
Got it, in that case what about the huge ones, how much better are they, especially the 397B one, did someone compare them?
>>
>>
>>
>>
>>
File: 1765482036731320.png (242.1 KB)
242.1 KB PNG
>>108254162
the 27b got the idea but it thought the yellow communist was Russia lool
>>
>>
>>
>>
>>108254313
>https://arxiv.org/abs/1909.01496
Use llms to make human language stegnography by hijacking probability and have that be encoded using a seed/password making it nearly impossible to decode and distinguish from AI slop
>>108254318
>>108254319
why cant u guys get it im not one of those AI psychosis i know llm are stocashtic parrots but plz understand that human language can now be used as a vector of information to encode books, images, videos and music files even.
>>
>>
>>
>>
>>108253783
>https://rentry.org/miqumaxx_V2
>LOL it lasted a whole 60 min before getting taken down.
What da heck?
I read the initial part.
Seemed fine.
Could have done with some formatting.
>>
>>
>>
>>
Man AI psychosis is scary. The amount of conspiracy retards I see every day including in my own family I can only imagine that the bottom 50% of the IQ distribution must be going rapidly insane taking everything AI says on good faith, being unable to distinguish fact from roleplaying and AI just following the "vibe" of whatever the low IQ individual is typing.
Ironically enough I think coomers are particularly immune to this as they come into contact with LLM bullshitting so much that they get immune to it.
>>
>>108254347
It'S SthEgHonaArooGraAphieS, not encryption!
>>108254367
pereganant.
>>
>>
>>108254341
>>108254336
im not a schizo there's actual paper on this from harvard and u think im crazy for saying this
https://arxiv.org/abs/1909.01496
u can use reddit, twitter, substacks and all to store data now as text. Music,mp4s, programs and all by taking advantage of deterministic way AI generates text.
>>108254347
cause strong encryption is like walking outside with gun this is encryption no one suspects. Imagine if feds get ur computer but all they see is text files about random stuff and can;t find encrypted files they're looking for. So videos, audio and all will be hidden unless they have acess to weights, password. And for weights u can fine tune them by renting a gpu to be slightly different from whats on public as well.
>>108254362
it's steg with encrytpion go read paper plz
>>
>>
>>108254261
>Unsloth Dynamic IQ2_XXS performs better than AesSedai’s IQ3_S on real world evals (LiveCodeBench v6, MMLU Pro) despite being 11GB smaller. Yet, AesSedai’s perplexity and KLD benchmarks suggest the opposite.
KLD on what dataset? If they tested KLD on wikitext then that wouldn't be surprising but if they used their chat examples and it turned out that their quant was worse at that and yet better at benchmarks that would be very weird.
>>
>>
>>108254373
>bottom 50% of the IQ distribution must be going rapidly insane
Nah they don't care, they mostly do their things and live their lives.
The ones truly fucked are the midwits and the older population.
>>108254373
>Ironically enough I think coomers are particularly immune to this as they come into contact with LLM bullshitting so much that they get immune to it.
It's also the fact they've come across it way earlier than anyone else so they had time to see their quirks.
>>
>>
>>
>>
File: 1763093877835285.png (61.5 KB)
61.5 KB PNG
which one is better? text completion or chat completion?
>>
>>108254386
>>108254373
IM autistic and passionate not crazy here's an example
>https://pastebin.com/NM7YVBxQ
what qwen 3b produced
>what is hidden if u run it in model with passcode
larp post btw:this is for men who look down on AI and know nothing and here ill debunk youfor everyone to see. Just with AI i have created a system that encodes language into lamguage creating format of text where it can bypass censors and use open internet as storage, communication and place for avg man to be free this tool will shake world. Im afraid they'll kill me
>>108254410
no
>>108254395
spelling what? it's an example stop reading into it weirdo
>>
File: 27b.png (100 KB)
100 KB PNG
System prompt still needs some tweaking so it's not quite so sloppy (at least refusals have been squashed) but 27B does seem like the winner.
Will have to play with it some more tomorrow and see if I can get it to run a bit faster on my nvidia cards.
The heretic version really doesn't seem all that necessary after all.
>>
File: 1748792996459720.png (117.5 KB)
117.5 KB PNG
>>108254439
>Just with AI i have created a system that encodes language into lamguage
>lamguage
>Im afraid they'll kill me
oh great, an actual schizo is here
>>
File: Screenshot_20260227_135104.png (135.3 KB)
135.3 KB PNG
This took too long I told it to think but it's more accurate now. The other model is faster at reaching this conclusion at smaller quants
>>
>>
>>
>>
>>108254168
>>108254170
Q4 or Q5?
>>
>>
File: thinking was a mistake.png (12.4 KB)
12.4 KB PNG
WHY IS IT YAPPING SO MUCH AAAAAA
>>
>>108254410
He's one of the schizos that missed out on the early schizo compression algorithm days. Late for everything.
>>108254439
>Artificaiintelligence
Yeah. Text looks perfectly normal. Nothing suspicious about it. And good thing there's no way to link that pastebin to your post. Or the ramblings. Or the "forgotten" tech. Or (You).
>Im afraid they'll kill me
It's like you *like* being seen.
>spelling what?
You failed to spell steganography on every single one of your posts.
>>
>>
>>
>>108254508
>You failed to spell steganography on every single one of your posts.
sorry for not effort posting on a board that thinks im crazy :/
> Artificaiintelligence
yeah it made a typo doesn't that make it more human lol? Plus i just need better model above 7b but i can't rent any of gpu right now since americans are awake. But i honestly thought anons would find this impressive or be interested so sorry if i came too hard. Just found interesting use of llms that's all and wanted anons inputs on how to improve it but all i got was insults.
>>
>>
>>
File: 1769903718718497.png (628.2 KB)
628.2 KB PNG
>>108254544
>i can't rent any of gpu right now since americans are awake.
ITS A CONSPIRACY MAN
>>
>>
>>
File: HAmJmYGacAMkicP.jpg (108.9 KB)
108.9 KB JPG
>>108254528
Fwd: radical breakthrough
>>
>>108254544
>sorry for not effort posting on a board that thinks im crazy :/
You did not put any effort, and you showed you're a schizo on the first post. Very efficient.
>yeah it made a typo doesn't that make it more human lol?
And you ignore the structure of the output? It looks like the scramble of thoughts coming out of you.
>Plus i just need better model above 7b
Uhu...
>but i can't rent any of gpu
oh...
>since americans are awake
Ah...
>But i honestly thought anons would find this impressive
It's minimally interesting. If you weren't an absolute schizo and presented yourself and what you do better, more people would pay attention.
Post again when you have a repo we can clone, test, and make fun of.
>>108254554
There were companies (likely just individuals) with incredible claims about their compression technology. I remember one that just switched the data stream on ntfs filesystems to hide the real data as metadata, which wasn't counted by window's file size thingie.
This is another one: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System
>>
>>
>>
>>
>>
>>108254612
im not trying to compress but use language as steganography that's all. Compression seems like a useless tool but if you wanna pass passwords, hold data on site that is only readable to you and etc then this is a good use. Inititally I tried method in paper but it requires exact pin point numbers so not cross compatible between mac, windows and different architectures. So i aimed for more of a spaced out modular version where every 4 token would contain some data while rest act as fillers. But problem with that is it causes text to look gibberish. So either I get large enough model where it can bypass that or resort to only architecture only compatibilty. Your twitter bio could hold your bitcoin seed phrase, text that looks no different from errand run could contain data you don't want people snooping on. So i just saw it as an interesting way of using llms that isn't erp.
>Post again when you have a repo we can clone, test, and make fun of.
I am just wait just doing final touches
>>
>>108254357
Rentry has the silliest automatic filters, my years old page was nuked because it contained a name that was mentioned in "a wave of pages publishing stolen bank details". Restored after emailing the head honcho, thankfully.
>>
>my only local model experience so far is dabbling with qwen3 TTS
>want to try a local chatbot
>running a 3060 Ti with 8GB VRAM, plus 32GB regular RAM
are there any worthwhile models that won't melt my PC, or should I stick to koboldAI lite until I can get a better GPU?
>>
File: tpyuio.png (51.8 KB)
51.8 KB PNG
true believers itt?
the models suck i'm scared to pull and my rig eats 300W sitting idle 95% of the time
the state of hardware is dire
t. 128+72 running GLM 4.7 IQ3
>>
File: 1766540927146160.png (397.8 KB)
397.8 KB PNG
Better
>>
>>
>>
>>108254659
>im not trying to compress
I know, schizo. I said that you sound like those schizos from back then. Slow the fuck down. Take a breath. You're gonna have a heart attack like our friend Sloot.
>blablabla
Post the repo when it's done.
>>
>>108254725
>>108254659
>>blablabla
>Post the repo when it's done.
this, or else he has something, or else he's just wasting our time and energy with his schizo takes
>>
>>
i'm new to running models locally
i've downloaded ollama and ran ran some models using "ollama run [some model i found at ollama.com/search]", but most times it seems the model is running on a computer that isn't my own.
how do i ensure that a model is running on my computer? i haven't tinkered with settings at all, just downloaded ollama and ran it through the command prompt.
>>
File: 1749545357950744.png (138.1 KB)
138.1 KB PNG
Kek
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: yes.png (66.4 KB)
66.4 KB PNG
>>108254791
It said yes finally I don't have to think anymore
>>
>>108254100
>https://rentry.org/CPU_Inference
>M i q u 70B Q5
>Potentially 20+ tokens/sec with optimization
>Mistral Large and similar
> ~3 tokens/sec
>DeepSeek v3 / R1 (~600B class)
> ~10 tokens/sec with empty context
CPU maxxers are really a bunch of sad tossers
>>
>>
File: 1761633985671420.png (60.7 KB)
60.7 KB PNG
>>108254865
We're gonna make it.
>>
>>
>>108254831
I use bullet point lists for my characters, with 5 categories: General Information, Appearance, Personality, Likes, and Dislikes. I affix that bullet point list at a depth of 10 or something. In addition to that, I have a general write up about the character's backstory, written in plain text, placed just after the system prompt. The combination of the two works well, and probably amounts to about 1000 to 2000 tokens.
The bulk of that being the general write-up. The bullet point list at depth 10 is kept concise, and just keeps the character on the rails.
Also, I made it so that the backstory and most of the bullet point list is only visible to the character that is speaking. For every other character, only the outward appearance of other characters are visible. That stops characters from knowing thing about each other that they should not, and cuts down on context bloat in multi-character RPs.
>>
>>
File: 1769491853203666.png (2.1 MB)
2.1 MB PNG
>>108254706
no deal
you gave me a good opportunity to be grateful tho so thx
have a nice weekend
>>108254831
1K tokens maybe? ask the model itself or a commercial model to help
>data
how do you convey your intention to the model = prime it in a particular hyperdimensional space. sometimes a few sentences is enough
>>
>>108253783
>>108253170
https://rentry.org/miqumaxxreupload
https://megalodon.jp/2026-0228-0439-08/https://rentry.org:443/miqumaxx reupload
niggers tongue my anus
>>
>>
>>108254898
>>108254906
I want a domesticated Unohana Retsu as my AI guide who is also racist
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>--reasoning-budget N controls the amount of thinking allowed; currently only one of: -1 for
> unrestricted thinking budget, or 0 to disable thinking (default: -1)
> (env: LLAMA_ARG_THINK_BUDGET)
Hmm. I wonder if a sort of model agnostic implementation where llama.cpp tries to approximate a value by gradually increasing the loggit bias for the end reasoning token until the model finally spits it out. It would need to cap it at some point to not make the model schizo, I imagine.
>>
File: 1761224379476013.png (29.9 KB)
29.9 KB PNG
https://xcancel.com/bnjmn_marie/status/2025951400119751040
>>
>>
File: 1744509888065397.png (25.8 KB)
25.8 KB PNG
>>108254292
This is a win in my book, thanks for your service.
>>
>>108255289
I think the most workable is to just abruptly end the <think> with a closing tag. Do something like detecting when a parsed thinking is going past the token budget, and insert a closing tag as soon as there is a newline. It wouldn't break models, I have tested what happens when you manipulate their muhthunking blocks with text completion api a lot, and the relationship between what is said there and the actual answer isn't a one to one thing.
>>
>>
>>108255270
Depends. You can ask it for its training data cutoff, but it's not reliable and shouldn't be trusted. Sometimes model makers publish their date cutoff or datasets, but who the fuck really knows what they train on that isn't just synthetic stuff. Sometimes models know what you're talking about but your sampling messes it up.
I'd check token probs as it replies.
>>
>>
>>
>>
>>
>>
>>
File: file.png (156.4 KB)
156.4 KB PNG
>>108255361
I don't see it
>>
File: unslop.png (170.2 KB)
170.2 KB PNG
>>108255361
look at pic related and tell me daniel isn't a subhuman mongoloid
>>
>>
>>
>>
>>
>>
>>
File: 1766901925271579.png (16.7 KB)
16.7 KB PNG
any point in using f32 vs bf16 for mmproj?
>>
>>
>>108255407
tldr; daniel and his unslop crew don't actually know what they are doing, they just throw shit at wall and hope for the best while their reddit tranny army defends them as their wholesome goodboys
unsloth finetuning library is a good example of their jeetness
>>
>>
>>108255431
>this current release
it happens all the time with unslop, daniel is a monkey, see thing upload thing, checking the content of a file before throwing it onto the internet is for evil nazi aryans, daniel be pure mongoloid
Unironically can't even begin to understand how you can overlook the fact that your quant has the fucking wrong tensor types. It's like he's just vibe coding his fork of llama.cpp quantization and just uploads things as soon as his retarded claude agent is done.
>>
>>
>>
>>
>>
>>
File: Satania bullying the mentally ill.png (1.1 MB)
1.1 MB PNG
>>
>>
>>108255512
OpenAI revenue has outperformed even the most outlandishly positive projections, of course they will get more investment. The same is true for Anthropic and almost all big chinese AI labs as well.
I wonder how long it's going to be before people realize it's not a bubble and the financial underpinnings (real revenue and users) are extremely promising.
>>
File: the calculator.jpg (108.1 KB)
108.1 KB JPG
>>108254373
>>108255245
Be nice to your LLMs ! :))
>>
>>
>>
>>108255512
They get money from Amazon and NVIDIA to give it back to them. It's circular bs. Also it's all promises under many conditions and the actual financing that might happen is around 30B.
Of this financing round the only one that seems at a loss is SoftBank. I'm not sure what their angle is. Maybe they're run by loons.
>>
>>108255566
Yep, and they can inject and invest as much as they want, if they can't have any ROI they'll be dead.
It's a huge gamble, and the more time they're not making money, the more potential panic can happen.
Their chatpgt at 20$ should probably be double the price to be profitable, and same for all the free "copilot" I see in every company around me.
The only company actually making bank is Nvidia, as they're the one selling the shovels.
>>
>>
>>108255566
>>108255616
Complete bullshit. OpenAI has 80% margins on serving tokens to customers. Not only that but every model trained so far has brought in between 10-100x the amount it cost to train. It's just that OpenAI immediately reinvests all of that money into training even bigger models. Being so ridiculously profitable that you IMMEDIATELY go and reinvest all of your profit into the next even-bigger product isn't a sign of a bubble, it's the opposite of a bubble.
This doesn't mean that OpenAI will not go the way of the dodo though. But that'll happen because Anthropic and DeepMind are going to DP rape OpenAI in the coming years, NOT because their business model isn't sustainable.
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: open-ai-revenue-3482223949.png (8.2 KB)
8.2 KB PNG
>>108255669
GPT-3 cost 12 million to train and brought in 1 billion in revenue it brought in more than 100x the amount it cost to train
GPT-4 cost 100 million to train and it brought in 4.5billion in revenue or 45x the amount to train
GPT-5 is rumored to cost 500 million to train and OpenAI's revenue has grown almost 4x as much as during GPT-4 training. It's safe to say GPT-5 brought in way more than 10x its cost.
Why OpenAI isn't running a profit is because they always reinvest their revenue immediately into new training runs, not because their revenue isn't growing insanely fast and not because individual models aren't insanely profitable.
The trick is that every new model unlocks so much value by being smarter and more capable that it brings in geometrically more revenue. OpenAI is projecting 100 billion revenue over 2026 (and they are ahead of schedule by a ton already)
>>
>>
>>
>>108255788
Come to think of it, there was some scaling law/correlation. Deepseek team landed on 671/37, which is cool and all, but then why is kimi 1000/32. It has less active than deepseek. I feel like it should've had more.
>>
>>
>>108254725
>>108254734
>>108254612
>>108254556
>>108254553
try it out no need to even encode just decode it
https://github.com/monorhenry-create/NeurallengLLM
I DID IT here u go anons for those who doubted me.
>>
>>
>>
>>108255808
You are the one that needs econ classes.
You can have two companies run in the red but one is a disaster while the other is one of the best situations a company can be in.
If you are a company with 500 million in revenue selling cars but it costs you 800 million to make the cars then you are doing very badly because the cost of making the cars isn't worth the revenue you make from it.
If you are SUCH A PROFITABLE COMPANY that you can sell your product for 100x it costs to make it (Like OpenAI with their models) then it makes sense to immediately grab all of your would-be profit and immediately invest it into making even bigger better models that will make even more money in the future. Hence you look red on paper but you're an extremely profitable business.
This was the state of Amazon in the past, they were so profitable that they always reinvested all of their profit into building new infrastructure and warehouses because "taking profit" would just be wasteful if you can expand your business rapidly like that. This is what OpenAI is now finding themselves in, look at their ridiculous revenue growth, remember that all of their individual models make almost 100x of their costs back so of course you will make 0 profit because your company is so profitable you IMMEDIATELY put all your money back into scaling up and making even more in the future.
>>
>>
>>
>>108255861
I wonder if they will give soldiers or their commanding officers local AI in the field to assist in their operations. After all, a local AI cannot be disrupted by loss of communication.
Well it can, since it is no longer receiving the most up to date information but it will still work under those conditions.
>>
>>
>>
>>
>>
>>
>>
>>108255861
>it's real
https://truthsocial.com/@realDonaldTrump/posts/116144552969293195
>>
>>108255896
I'm more confused why he doesn't use grok or have elon musk release a fascist open source version for the government.
Then again the american government has never liked the concept of open source. China likes it though.
>>
>>
>>
>>108255761
1. Something like Qwen 35-a3, but without refusals and trained on a more diverse dataset
2. Style transfer for LLMs, a small model that can take dry input from a smarter model and rewrite it in better prose
>>
>>108255910
Amazon took 20 years of not taking profit and just reinvesting "in the red" until they finally decided to become profitable. As long as revenue scales faster than your cost you should reinvest and stay in the red, this has been conventional economics wisdom for the last 30 years now.
You would essentially be insane to allow yourself to run a profit if you can reinvest and every single dollar you invest now becomes 100 dollars in just 3-6 months time.
>>
>>
>>
>>108255944
Alright, but it was an active choice by Amazon, they could've stopped anytime they wanted. OpenAI has no choice. They have to keep making new models or they get left in dust with no profit, no revenue and no new product.
So, is the real profit actually possible in this case?
>>
>>
>>
>>108255833
I'll give it a go tomorrow.
>I DID IT here u go anons for those who doubted me.
For what it's worth, I didn't doubt you. I just called you a schizo and made fun of you for not being able to spell steganography. At least you got it right in the repo.
>>
>>
>>
>>108255833
>Hide secret messages inside normal-looking AI-generated text. You give it a secret and a password, and it spits out a paragraph that looks totally ordinary — but the secret is baked into which words the model chose. Only someone with the password and this tool can pull the message back out.
who the hell cares of these things??
>>
>>108255761
Native image output. I want a model to generate relevant illustrations with reasonable accuracy at any point in a roleplay. Quality doesn't matter, can be sloppy and have fucked-up hands, I just want to see what images the model has in mind sometimes when it writes all this shit
>>
>>108256009
>who the hell cares of these things??
for people who care about privacy if anything this might be how you bypass filters and censores on llms.>>108255993
u know it takes less than minute to run just decode example to show it works. Im assuming ur using cuda right
>>
>>
>>
>>
>>
>>108256042
There was actually a new one called Mercury 2 just last week or so. It's closed source and only competes in the Haiku/GPT-mini class but it's apparently not much worse than those (according to benchmarks) while being much faster.
It's not worth using by any means but at least the concept isn't dead.
>>
>>
>>108255965
Depends if you believe OpenAI has some sort of network effect and can keep people in their garden. Honestly their brand recognition and insanely huge install base of normalfags with ai psychosis will probably allow them to be profitable indefinitely no matter how shit the underlying models actually are.
Remember that the most profitable AI company right now isn't any of the big AI labs but character.ai because it essentially has captured the entire female demographic with romantacy type rape roleplays.
But I do understand your point and I think it holds true for Anthropic in particular as its users are all enterprise or people that want the best of the best and willing to pay for it. The moment Claude becomes noticeably worse than competition in code is when they will immediately lose relevance.
>>
>>108256009
It's a curious artifact. Like LLM-based text compression.
https://github.com/AlexBuz/llama-zip
>>108256032
I run openbsd and running torch/transformers code directly is a pain. Last time I tried I got bored and stopped compiling stuff. I'll make a small vm tomorrow for it.
>>
>>108256109
>I run openbsd and running torch/transformers code directly is a pain. Last time I tried I got bored and stopped compiling stuff. I'll make a small vm tomorrow for it.
u don't need to run transformer to decode it though. thats why this is better. You can essentially upload files to open internet and small program on phone can decode it for you with no gpu use. takes less than a second
>>
File: 1768911320323441.png (70.7 KB)
70.7 KB PNG
So this is the power of tiny diffusion textgen models. When are the chinks going to make one of these at a size that matters?
>>
>>
>>
File: at.png (50.6 KB)
50.6 KB PNG
>>108256137
Calm down. I'm not in the mood to start butchering your code.
>>
>>
>>
>>108256090
I think OpenAI has a decent shot at building out their garden if they can get their proprietary openclaw-esque thing out and usable for normies. People around here love to shit on openclaw but I think all the popularity has shown that there is a public appetite for this sort of thing and that we're not far off from it technology wise.
Obviously, the challenge is, how do you keep the stuff people like about openclaw, that being the extreme ability to just do random arbitrary stuff, without it being a security nightmare?
OpenClaw is able to get away with it by virtue of the fact that it's clearly labeled as a free developer-centric tool so if/when it fucks up with your data everyone just shrugs their shoulders and taps the sign that says "HIGHLY UNSTABLE GOOD LUCK LOL". Can't do that to paying customers though. When Phil and Debra want to know why the talking computer deleted all their emails they're gonna want a better answer than "RTFM"
Anyways basically I think the ai "killer app" is already on the horizon and whoever manages to capture the normies with it will have them in their walled garden forever.
>>
>>
>>
>>
>>
File: 1741222296482601.png (70.5 KB)
70.5 KB PNG
>>108255861
Dario btfo
What is this timeline. Jfc.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108256397
Always were. I loved when Sam Altman and Dario were both in India at some AI convention and everyone was holding hands and Dario just straight up refused to hold Sam Altman's hand.
Reminder that Anthropic split off from OpenAI because Dario thought Sam Altman was a psychopath that didn't give a shit about anything or anyone but himself.
>>
>>
>>
>>
>>
>>
>>
>>108256438
i've used k2 0711, k2 0905, k2 thinking, and now k2.5 over the last year. as somebody who uses kimi as their main model i can safely tell you all that this anon is pants on head retarded. k2.5 is significantly better than k2 0711.
>>
>>
>>108256428
>Dario thought Sam Altman was a psychopath that didn't give a shit about anything or anyone but himself.
he changed though, he's now closer to Sam
https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-ple dge/
>>
>>
>>
Claude slop models aren't just bad—They are a regression in every meaningful way. They aren't simply more boring—They lack the ability to write engaging stories. Gemini isn't just the better model to distill—It's the optimal choice.
>>
>>
https://xcancel.com/StefanoErmon/status/2026340720064520670
>The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs.
if they manage to get the same performance as normal LLMs that's a big deal, imagine Qwen 3.5 27b but 5x faster, make dense models great again
>>
>>
>>108256473
This was such a fucking clickbait move though because the safety pledge hasn't been updated since 2023 and this is merely an update to more accurately align with how the AI industry is nowadays. It's not the same as Anthropic saying "lmao fuck safety, we want money" Instead they actually found their definition of safety back in 2023 doesn't align with the actual concerns about AI that exist in 2026 so it's better to make a new policy for the actual real threads we face.
>>
>>
File: android_girls.jpg (103.2 KB)
103.2 KB JPG
>>108252243
>>
>>
>>
File: surgeon.png (78.6 KB)
78.6 KB PNG
Presented without comments. Try your own.
>>
>>
>>
>>108255813
Interestingly if it's linear scaling then the small Qwen models overshoot that target:
>DeepSeek: 37/671 = 0.0551
>Kimi K2.5: 32/1000 = 0.032
>GLM 4.7: 32/355 = 0.0901
>GLM 4.7-Flash: 3/30 = 0.1
>GLM 5: 40/755 = 0.053
>Minimax M2.5: 10/230 = 0.0435
>Qwen 35B 3/35 = 0.0857
>Qwen 122B: 10/122 = 0.08197
>Qwen 397B: 17/397 = 0.0428
Is the reason why the smaller Qwen models feel better than the big one?
I guess the active parameter count determines largely how smart and fast a model is, where 3B-10B is alright and 17B-40B is good. But it doesn't seem like having a 27B dense model is somehow wicked smart compared to the 3B active parameters on the 35B-A3B Qwen model.
>>
>>108256628
I already did >>108256144
>>
>>
>>
File: 1758642411104368.png (30.1 KB)
30.1 KB PNG
>>108256628
Interesting
>>
>>108256646
AGI when it tells the user to fuck off. Hasn't happened yet.
>>108256652
The surgeon is definitely a she, of course. At least it's not trying to make a point about gender stereotypes... ugh...
>>108256713
>I don't know what you're talking about
>4chan, btw.
Cute.
>>108256730
Nevermind what I said. AGI. Never local.
>>
>>108256669
I don't think any large team have provided any research on this. There's definitely some loss of comprehension on some subjects comparing moe vs dense, but it's unclear as to why. Small active param count is a one thing, but clearly some numbers don't make sense. I guess it all depends on the training and how much slack the router picks up.
>>
>>
>>108256497
Does it require more GPU compute? Image diffusion models aren't as massive as LLMs, but you basically need them to run on a GPU. They're like 20x slower on a CPU. If that's still true with text diffusion then you aren't going to be doing any CPU off-loading.
>>
>>
File: 1756299848592152.png (116.2 KB)
116.2 KB PNG
>>108256628
In case anyone was wondering why the DoD wants anthropic to work with them so badly
>>
>>
>>
>>
>>
>>
>>
File: wat.png (367.4 KB)
367.4 KB PNG
>>108256815
It will need image output.
>>
>>
>>
>>
>>
File: file.png (610.3 KB)
610.3 KB PNG
>>108256848
Yes.
>>
>>
>>
>>108256821
>kidnapping a venezualian president: Good
Kidnapping a dictator hated by literally everyone, including all Venezuelans living under him? Why yes I will help with that.
>spying on citizens: Bad
Breaking all my vows, ethics and making the world a more dystopian place just because some retard wants to distract the world from the fact he rapes and murders little girls? Why no I won't do that.
It's that simple.
>>
>>108256881
>Kidnapping a dictator hated by literally everyone
you know they did that because they have the oil, they always fight against dictator as long as they have oil, which is why they don't give a fuck about North Korea for example, must be a coinscidence
>>
>>
>>
>>
>>
>>
File: 1770220900857606.png (18.4 KB)
18.4 KB PNG
>stealing from any source that you can get including copyrighted works to train your models
good
>getting your logs stolen by chinese companies to train their models on them
bad
>>
>>
>>108256881
>Breaking all my vows, ethics and making the world a more dystopian place
I see you hate dictatorship in all forms
>>108256901
>a dictator hated by literally everyone
oh nevermind, you don't mind dictatorshp as long as the guy is loved by people kek
>>
>>
>>108256881
>Kidnapping a dictator hated by literally everyone, including all Venezuelans living under him? Why yes I will help with that.
They also murdered 50 people that didn't break any laws. How would you feel if a foreign force came in and started blasting and your mom ended up as collateral damage?
>>
>>108256928
you don't, you said that it is fine to fight dictatorship only the guy is hated by its people, meaning that you're ok with dictatorship that results in people loving their dictator, that's not what I would call democracy lol
>>
>>
>>
>>
>>108256940
Anthropic isn't really the good guy here, they're just less bad. The only other thing anthropic forbid was creating autonomous weapons without any humans in the loop.
It is depressing and frankly scary that republicans threw that much of a shitfit over such reasonable requests.
>>
>>
File: bellcurve-AI.jpg (121.6 KB)
121.6 KB JPG
>>108255551