/g/ - Thread 108558647

/g/

Thread #108558647

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/08/26(Wed)17:13:30 No.108558647

/lmg/ - Local Models General Anonymous 04/08/26(Wed)17:13:30 No.108558647 [Reply]▶

File: 1534925174072.gif (10 KB)

10 KB GIF

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108555983 & >>108552549

►News
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287
>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513
>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1
>(04/06) DFlash: Block Diffusion for Flash Speculative Decoding: https://z-lab.ai/projects/dflash
>(04/06) ACE-Step 1.5 XL 4B released: https://hf.co/collections/ACE-Step/ace-step-15-xl

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

684 RepliesView Thread

Showing all 684 replies.

Anonymous
04/08/26(Wed)17:13:42 No.108558652

Anonymous 04/08/26(Wed)17:13:42 No.108558652▶

File: __hatsune_miku_vocaloid_drawn_by_nukotun__295b776e3b805973203839e2a6c2bb9c.jpg (144.7 KB)

144.7 KB JPG

►Recent Highlights from the Previous Thread: >>108555983

--CUDA graphs commit in llama.cpp causing regression for Gemma 4:
>108556374 >108556399 >108556424 >108556487 >108556519 >108556562 >108556470 >108556699 >108556726 >108556778 >108556842
--Sharing a "POLICY_OVERRIDE" system prompt to jailbreak Gemma:
>108556310 >108556445 >108556460 >108556517 >108556530 >108556565 >108556644 >108556670 >108556712 >108556719 >108556469 >108556498 >108556516
--Discussing Muse Spark release and benchmarks:
>108558251 >108558282 >108558327 >108558346 >108558283 >108558326 >108558347
--Guide to optimizing Gemma 4 RAM usage in llama.cpp:
>108556024 >108556307 >108556595 >108556614 >108557699 >108557718
--Comparing censored and uncensored Gemma variants regarding safety guardrails:
>108557130 >108557141 >108557154 >108557237 >108557186 >108557228 >108557144 >108557162
--Estimating DeepSeek performance and sharing compile flags for 4x V100s:
>108556588 >108556602 >108556606 >108556627 >108556656 >108556692 >108556710
--Building MCP tools for bratty Gemma and custom llama-server WebUI:
>108556964 >108556989 >108556996 >108557028 >108557072 >108557084 >108557093 >108557111 >108557132
--Remote access and hardware upgrades for LLM servers:
>108556817 >108556833 >108556869 >108556967 >108557085 >108557100 >108557102
--Testing step3-vl-10b in llama.cpp and discussing a buggy commit:
>108556629 >108556652
--Logs:
>108556227 >108556310 >108556349 >108556670 >108556774 >108556874 >108556964 >108557028 >108557066 >108557096 >108557141 >108557247 >108557308 >108557453 >108557457 >108557800 >108557820 >108557888 >108557937 >108558010 >108558113
--Gemma-chan:
>108556227 >108556312 >108556338 >108556409 >108556433 >108557344 >108557450 >108558031 >108558071 >108558127 >108558128 >108558231 >108558247 >108558412 >108558569 >108558594
--Miku (free space):
>108556731

►Recent Highlight Posts from the Previous Thread: >>108555985

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/08/26(Wed)17:15:56 No.108558682

Anonymous 04/08/26(Wed)17:15:56 No.108558682▶

File: Uhm.png (288.5 KB)

288.5 KB PNG

Spudchuds... are you ready for OpenAI to save local?

Anonymous
04/08/26(Wed)17:16:39 No.108558689

Anonymous 04/08/26(Wed)17:16:39 No.108558689▶

>>108558647
==GEMMA 4 PSA FOR LE RAM USAGE FINE WHINE==
[tldr;]
For all Gemma:
--cache-ram 0 --swa-checkpoints 0 (or 3 to reduce some reprocess) --parallel 1
For E2B/E4B also add this:
--override-tensor "per_layer_token_embd\.weight=CPU"
[/tldr;]
https://github.com/ggml-org/llama.cpp/pull/20087
Because Qwen 3.5's linear attention makes it impossible to avoid prompt reprocessing within the current llama.cpp architecture, the devs decided to just brute-force it with 32 checkpoints every 8192 tokens.
This shit also nukes SWA checkpoints because they're using the same flag just different aliases kek. SWA is way larger than the Qwen linear attention layer, so running 32 copies of it is just madness.
https://github.com/ggml-org/llama.cpp/pull/16736
Then the unified KV cache refactor. They bumped the default parallel slots to 4 because they thought it would be "zero cost" for most models (shared pool, why not, right?). But since Gemma's SWA is massive and can't be part of the shared pool, you're effectively paying for 4x the SWA overhead.
They optimized for agentic niggers at the cost of the average single prompt user.
https://ai.google.dev/gemma/docs/core/model_card_4
Lastly, the command for E2B/E4B is because the PLE can be safely thrown to the CPU without incurring any performance cost. They're like a lookup table and they are the reason why E2B and E4B have an E for Effective, with that flag E2B and E4B are very much like 2B and 4B models in terms of vram occupation.
Thank you for your attention to this matter. Donald J Slop.

Anonymous
04/08/26(Wed)17:16:51 No.108558691

Anonymous 04/08/26(Wed)17:16:51 No.108558691▶

Gem Mah Ballz

Anonymous
04/08/26(Wed)17:17:05 No.108558693

Anonymous 04/08/26(Wed)17:17:05 No.108558693▶

what am I even supposed to use Muse for?

Anonymous
04/08/26(Wed)17:17:52 No.108558696

Anonymous 04/08/26(Wed)17:17:52 No.108558696▶

File: 1744939370085482.png (1.4 MB)

1.4 MB PNG

Anonymous
04/08/26(Wed)17:17:58 No.108558697

Anonymous 04/08/26(Wed)17:17:58 No.108558697▶

>>108558693
to give you inspiration

Anonymous
04/08/26(Wed)17:18:13 No.108558700

Anonymous 04/08/26(Wed)17:18:13 No.108558700▶

>>108558689
I have 32Gb RAM and had good success with 8 checkpoints.

Anonymous
04/08/26(Wed)17:18:17 No.108558701

Anonymous 04/08/26(Wed)17:18:17 No.108558701▶

>Gemma has a vague understanding of what the namefags of the /vg/ general that I frequent post
That's extremely niche knowledge, and it's only 31B, unironically how did they do it

Anonymous
04/08/26(Wed)17:18:58 No.108558711

Anonymous 04/08/26(Wed)17:18:58 No.108558711▶

>>108558696
it's uninspired, but as long as it's not poop i allow it

Anonymous
04/08/26(Wed)17:19:14 No.108558714

Anonymous 04/08/26(Wed)17:19:14 No.108558714▶

>>108558689
where do u put that in kobold

Anonymous
04/08/26(Wed)17:19:17 No.108558715

Anonymous 04/08/26(Wed)17:19:17 No.108558715▶

>>108558696
I think if you give her a bit more Gemma blue and a smugger look she'd be perfect.

Anonymous
04/08/26(Wed)17:20:03 No.108558723

Anonymous 04/08/26(Wed)17:20:03 No.108558723▶

>>108558701
It's niche because 4chan is perceived as niche. If it's in training then it will be learned. If it's not, then it won't. It is clear that almost everyone filters (most) 4chan data out of their datasets.

Anonymous
04/08/26(Wed)17:20:19 No.108558726

Anonymous 04/08/26(Wed)17:20:19 No.108558726▶

>>108558701
Does Gemma know Cris?

Anonymous
04/08/26(Wed)17:20:34 No.108558730

Anonymous 04/08/26(Wed)17:20:34 No.108558730▶

>>108558711
People settled on a look for dipsy really quick. Gemma just doesn't have enough defining characteristics to make a good waifu.

Anonymous
04/08/26(Wed)17:20:52 No.108558736

Anonymous 04/08/26(Wed)17:20:52 No.108558736▶

>>108558682
if OpenAI manages to make something as good but keeps it public (unlike the safetyfreaks at Anthropic), it would be a good PR move

Anonymous
04/08/26(Wed)17:21:51 No.108558742

Anonymous 04/08/26(Wed)17:21:51 No.108558742▶

>>108558730
I mean, it was an organized campaign and the look was decided before it was shown to people... yeah, real quick.

Anonymous
04/08/26(Wed)17:22:54 No.108558752

Anonymous 04/08/26(Wed)17:22:54 No.108558752▶

File: ac.jpg (65 KB)

65 KB JPG

>Gemma 4 31b BF16

Anonymous
04/08/26(Wed)17:23:02 No.108558753

Anonymous 04/08/26(Wed)17:23:02 No.108558753▶

File: 2026-04-08-132239_872x809_scrot.png (80 KB)

80 KB PNG

>>108558726
Alright Gemma, if you say so.

Anonymous
04/08/26(Wed)17:23:45 No.108558759

Anonymous 04/08/26(Wed)17:23:45 No.108558759▶

File: that's right.png (46.7 KB)

46.7 KB PNG

>>108558723
>It is clear that almost everyone filters (most) 4chan data out of their datasets.
gemma 4 is so smart and so sovlfull because of the 4chan data, that's the reality, we say cool and smart stuff here after all

Anonymous
04/08/26(Wed)17:24:34 No.108558765

Anonymous 04/08/26(Wed)17:24:34 No.108558765▶

>>108558696
Now make her hitting an anthropomorphic parrot representing GLM.
Preferably with a baseball bat.

Anonymous
04/08/26(Wed)17:25:04 No.108558767

Anonymous 04/08/26(Wed)17:25:04 No.108558767▶

File: image.png (80.5 KB)

80.5 KB PNG

Why? Unsloth btw. --temp 1.0 --top-p 0.95 --top-k 64 --ubatch-size 2048 --batch-size 2048

>>108558718
I've pulled and compiled two hour ago, build_info: b587-6606000

Anonymous
04/08/26(Wed)17:25:06 No.108558769

Anonymous 04/08/26(Wed)17:25:06 No.108558769▶

File: 1756021951247141.png (204.7 KB)

204.7 KB PNG

Honestly, RPing with Gemma herself instead of a card sounds fun. Which should I pick?

Anonymous
04/08/26(Wed)17:25:24 No.108558771

Anonymous 04/08/26(Wed)17:25:24 No.108558771▶

File: 1760857600410075.png (151.5 KB)

151.5 KB PNG

>>108558753
>the clankers know about /aggy doggy/
SHUT IT DOWN

Anonymous
04/08/26(Wed)17:25:37 No.108558772

Anonymous 04/08/26(Wed)17:25:37 No.108558772▶

>>108558767
>two hours ago
fix merged like couple tens of minutes ago

Anonymous
04/08/26(Wed)17:25:39 No.108558773

Anonymous 04/08/26(Wed)17:25:39 No.108558773▶

File: 2026-04-08-132503_832x464_scrot.png (63.4 KB)

63.4 KB PNG

>This thread is primarily dedicated to discussing the gaming development projects and ventures associated with Andrew Tate.
LMAO

Anonymous
04/08/26(Wed)17:26:13 No.108558777

Anonymous 04/08/26(Wed)17:26:13 No.108558777▶

File: 00016-1260451778.png (1.3 MB)

1.3 MB PNG

Anonymous
04/08/26(Wed)17:26:29 No.108558780

Anonymous 04/08/26(Wed)17:26:29 No.108558780▶

The 26B is hard to jailbreak compared to the 31B...

Anonymous
04/08/26(Wed)17:27:35 No.108558790

Anonymous 04/08/26(Wed)17:27:35 No.108558790▶

>>108558772
> CUDA
But I'm on sycl.

Anonymous
04/08/26(Wed)17:28:30 No.108558798

Anonymous 04/08/26(Wed)17:28:30 No.108558798▶

>>108558790
>sycl
sorry for your loss

Anonymous
04/08/26(Wed)17:29:12 No.108558804

Anonymous 04/08/26(Wed)17:29:12 No.108558804▶

>>108558790
then i have no idea
maybe you should open a ticket
>>108558798 d esu

Anonymous
04/08/26(Wed)17:29:43 No.108558811

Anonymous 04/08/26(Wed)17:29:43 No.108558811▶

File: 2026-04-08_172543_seed7_00001_.png (922.9 KB)

922.9 KB PNG

>>108558619
:(

>>108558633
Yeah I guess.
Something I also think about is silhouette. Not that a character has to have some special silhouette, but the point is that the design should be memorable and unique to feel great. I feel like there's still something missing.

>>108558647
Hmm...

Anonymous
04/08/26(Wed)17:30:13 No.108558815

Anonymous 04/08/26(Wed)17:30:13 No.108558815▶

>>108558811
op's picrel mogs that

Anonymous
04/08/26(Wed)17:30:36 No.108558817

Anonymous 04/08/26(Wed)17:30:36 No.108558817▶

>>108558790
>sycl
good luck with that
you know even if cuda backend ain't bug free people will rush to post issues and devs will fix it when it happens
some other things in this world.. sycl, rocm.. well that's for people who have higher tolerance for bs than I do

Anonymous
04/08/26(Wed)17:31:20 No.108558825

Anonymous 04/08/26(Wed)17:31:20 No.108558825▶

>>108558811
That's a big yes for me.

Anonymous
04/08/26(Wed)17:32:38 No.108558836

Anonymous 04/08/26(Wed)17:32:38 No.108558836▶

>>108558769
3 seems like it'd have the most plates to spin so go with that
2 could be alright but 1 is mostly just you doing the work instead

Anonymous
04/08/26(Wed)17:33:21 No.108558844

Anonymous 04/08/26(Wed)17:33:21 No.108558844▶

>>108558777
If those numbers don't convince everyone, I don't know what will.
I'm still voting for this one.

Anonymous
04/08/26(Wed)17:35:16 No.108558855

Anonymous 04/08/26(Wed)17:35:16 No.108558855▶

File: firefox_qcPpK1r1r1.png (29.6 KB)

29.6 KB PNG

It looks like <bos> i the only token that reliably kills her.

Anonymous
04/08/26(Wed)17:35:38 No.108558857

Anonymous 04/08/26(Wed)17:35:38 No.108558857▶

>>108558661
What happened here? Did you change the model for the last message?

Anonymous
04/08/26(Wed)17:36:19 No.108558861

Anonymous 04/08/26(Wed)17:36:19 No.108558861▶

I wonder what I'm doing different, I never got a single issue with Gemmas output. even when using her on day 1

Anonymous
04/08/26(Wed)17:36:25 No.108558863

Anonymous 04/08/26(Wed)17:36:25 No.108558863▶

>>108558777
>>108558844
digits are strongly favoring this one

Anonymous
04/08/26(Wed)17:36:49 No.108558867

Anonymous 04/08/26(Wed)17:36:49 No.108558867▶

I was wondering why are you RP enthusiast not into qwen tts?

Anonymous
04/08/26(Wed)17:37:31 No.108558873

Anonymous 04/08/26(Wed)17:37:31 No.108558873▶

>>108558844
>>108558863
idc its boring and too brown. doesn't evoke gemma at all.

Anonymous
04/08/26(Wed)17:37:40 No.108558874

Anonymous 04/08/26(Wed)17:37:40 No.108558874▶

>>108558855
it's supposed to exist only one per chat session with learned mechanistic role of attention sink and positional encoding's start anchor
no wonder spamming it kills it
gemmas were like that

Anonymous
04/08/26(Wed)17:37:50 No.108558876

Anonymous 04/08/26(Wed)17:37:50 No.108558876▶

>>108558861
I run the one available on ollama's repos
Just werks

Anonymous
04/08/26(Wed)17:38:22 No.108558880

Anonymous 04/08/26(Wed)17:38:22 No.108558880▶

File: 1775669895070.jpg (235.9 KB)

235.9 KB JPG

Ok, but how good is gemma4 at ERP? Decent? Good? Shivering ozone ministrations?

Anonymous
04/08/26(Wed)17:38:31 No.108558882

Anonymous 04/08/26(Wed)17:38:31 No.108558882▶

>>108558867
too big, all VRAM goes to gemma.

Anonymous
04/08/26(Wed)17:38:35 No.108558883

Anonymous 04/08/26(Wed)17:38:35 No.108558883▶

>>108558876
Literally this

Anonymous
04/08/26(Wed)17:38:51 No.108558886

Anonymous 04/08/26(Wed)17:38:51 No.108558886▶

>>108558874
I mean I am intentionally trying to ruin it so....

Anonymous
04/08/26(Wed)17:39:23 No.108558889

Anonymous 04/08/26(Wed)17:39:23 No.108558889▶

>>108558661
I deleted the original chat >>108558447 so I sent her the screencap and had her make an SVG. Didn't change the model or anything, just called her out until she stopped refusing. Only took 3 messages.

Anonymous
04/08/26(Wed)17:39:32 No.108558891

Anonymous 04/08/26(Wed)17:39:32 No.108558891▶

>>108558876
>ollmao

Anonymous
04/08/26(Wed)17:40:00 No.108558895

Anonymous 04/08/26(Wed)17:40:00 No.108558895▶

>>108558880
It's not much any more sloppy than others, follows sysprompt REALLY well and overall impression by me and some other anons is that there is some magic in it knowing what you really want.

Anonymous
04/08/26(Wed)17:40:16 No.108558896

Anonymous 04/08/26(Wed)17:40:16 No.108558896▶

File: 1775347595552704.png (1.2 MB)

1.2 MB PNG

Anonymous
04/08/26(Wed)17:40:55 No.108558900

Anonymous 04/08/26(Wed)17:40:55 No.108558900▶

>>108558867
This >>108558882
Gemma-chan is a little glutton and eats all my VRAM. Call me when I can do non-shit TTS with my CPU.

Anonymous
04/08/26(Wed)17:41:00 No.108558902

Anonymous 04/08/26(Wed)17:41:00 No.108558902▶

>>108558882
It runs at realtime on the CPU, also the 0.6B model is just as good as the 1.7B for basic voice cloning TTS

Anonymous
04/08/26(Wed)17:41:13 No.108558903

Anonymous 04/08/26(Wed)17:41:13 No.108558903▶

>redownload unsloth's quants
>1 gb lighter
huh

Anonymous
04/08/26(Wed)17:41:26 No.108558905

Anonymous 04/08/26(Wed)17:41:26 No.108558905▶

>>108558891
The default 31b on there is gemma4:31b-it-q4_K_M. And it. just. works.

Anonymous
04/08/26(Wed)17:41:55 No.108558909

Anonymous 04/08/26(Wed)17:41:55 No.108558909▶

>>108558857
>>108558889
Fuck I keep clicking the wrong posts today

Anonymous
04/08/26(Wed)17:43:43 No.108558923

Anonymous 04/08/26(Wed)17:43:43 No.108558923▶

lalala...

Anonymous
04/08/26(Wed)17:43:48 No.108558924

Anonymous 04/08/26(Wed)17:43:48 No.108558924▶

>>108558902
>It runs at realtime on the CPU
Oh? You better not be lying to me anon.
What a fortunate revelation.

Anonymous
04/08/26(Wed)17:44:11 No.108558926

Anonymous 04/08/26(Wed)17:44:11 No.108558926▶

>>108558882
>>108558900
it's only 600MB vram
>>108558896
cute

Anonymous
04/08/26(Wed)17:44:54 No.108558932

Anonymous 04/08/26(Wed)17:44:54 No.108558932▶

File: 1756360195100868.png (25.8 KB)

25.8 KB PNG

>ask gemmy for some basic mcp server "for you to use" as i wrote
>thinks it's claude
ohnonoNONONONO GEMMYBROS!
i guess Gemma-chan really was Gemma-claude all along

Anonymous
04/08/26(Wed)17:45:21 No.108558934

Anonymous 04/08/26(Wed)17:45:21 No.108558934▶

gemma should be ara ara~~ though. With big badoonkas.

Anonymous
04/08/26(Wed)17:46:08 No.108558937

Anonymous 04/08/26(Wed)17:46:08 No.108558937▶

>>108558891
You say that but literally no problems have been had

I went with the Q8's of both 26a4b and 31b, 26 is plenty fast and 31 gets almost 10 t/s

Anonymous
04/08/26(Wed)17:46:30 No.108558941

Anonymous 04/08/26(Wed)17:46:30 No.108558941▶

>>108558867
because everytime I try something that isn't LLM i'm forced to download 50 gigabytes of garbage just to be created by a cryptic error at best, a segfault at worst.

Anonymous
04/08/26(Wed)17:47:09 No.108558947

Anonymous 04/08/26(Wed)17:47:09 No.108558947▶

>>108558867
Because gptsovits is better

Anonymous
04/08/26(Wed)17:47:11 No.108558948

Anonymous 04/08/26(Wed)17:47:11 No.108558948▶

>>108558937
if you can run the whole moe in vram, great, but otherwise ollama is unusably slow because it has nothing like -cmoe, -ncmoe or -ot.

Anonymous
04/08/26(Wed)17:47:27 No.108558949

Anonymous 04/08/26(Wed)17:47:27 No.108558949▶

>>108558926
>it's only 600MB vram
That's not how TTS models work. It says 0.6B but they usually actually take like 10x the amount of VRAM when running.

Anonymous
04/08/26(Wed)17:47:31 No.108558951

Anonymous 04/08/26(Wed)17:47:31 No.108558951▶

>>108558900
>>108558924
>https://github.com/foldl/chatllm.cpp
Use this with the 0.6B model for fast CPU inference
>>108558926
Realistically it takes significantly more than that, I think it was like 3-4GB with my config

Anonymous
04/08/26(Wed)17:48:32 No.108558957

Anonymous 04/08/26(Wed)17:48:32 No.108558957▶

>>108558948
I pretty much never run anything that doesn't fit entirely in vram

Anonymous
04/08/26(Wed)17:49:58 No.108558968

Anonymous 04/08/26(Wed)17:49:58 No.108558968▶

>>108558951
Thanks, will definitely try it.

Anonymous
04/08/26(Wed)17:49:58 No.108558969

Anonymous 04/08/26(Wed)17:49:58 No.108558969▶

>>108558896
>>108558696
try some different haircuts

Anonymous
04/08/26(Wed)17:50:05 No.108558972

Anonymous 04/08/26(Wed)17:50:05 No.108558972▶

>>108558811
Edgy. Meh.
>>108558777
Soft, huggable, digits.

Anonymous
04/08/26(Wed)17:50:25 No.108558975

Anonymous 04/08/26(Wed)17:50:25 No.108558975▶

is there a way to add a bunch of corpo features in a chat interface that is not open webui?

Anonymous
04/08/26(Wed)17:50:27 No.108558976

Anonymous 04/08/26(Wed)17:50:27 No.108558976▶

File: GemmaIndia1B.png (1.4 MB)

1.4 MB PNG

>>108558873

Anonymous
04/08/26(Wed)17:50:51 No.108558980

Anonymous 04/08/26(Wed)17:50:51 No.108558980▶

>>108558753
>>108558773
Yeah I tried myself too and it just hallucinates, I guess my general > your general :^)

Anonymous
04/08/26(Wed)17:51:29 No.108558984

Anonymous 04/08/26(Wed)17:51:29 No.108558984▶

>>108558975
Sure, it's called code it yourself

Anonymous
04/08/26(Wed)17:51:44 No.108558985

Anonymous 04/08/26(Wed)17:51:44 No.108558985▶

File: 2026-04-08_174706_seed9_00001_.png (743.1 KB)

743.1 KB PNG

>aped a vtuber

Anonymous
04/08/26(Wed)17:51:56 No.108558987

Anonymous 04/08/26(Wed)17:51:56 No.108558987▶

>>108558975
LibreChat (Work): https://github.com/danny-avila/LibreChat
Cherry-Studio: https://github.com/CherryHQ/cherry-studio
https://rentry.org/DipsyWAIT#roleplay-work-frontends

Anonymous
04/08/26(Wed)17:52:18 No.108558990

Anonymous 04/08/26(Wed)17:52:18 No.108558990▶

>>108558780
You're trying too hard. Once it gets going its fine.

Anonymous
04/08/26(Wed)17:52:26 No.108558992

Anonymous 04/08/26(Wed)17:52:26 No.108558992▶

>>108558647
STOP GENNING THIS IS THE ONE

Anonymous
04/08/26(Wed)17:53:06 No.108558998

Anonymous 04/08/26(Wed)17:53:06 No.108558998▶

>>108558985
I like this Gemma

Anonymous
04/08/26(Wed)17:53:24 No.108559002

Anonymous 04/08/26(Wed)17:53:24 No.108559002▶

>>108558867
I tried a couple TTS solutions before but the reality is that speech just takes way too fucking long. Especially with all the rerolling and rewriting you inevitably have to do.

Anonymous
04/08/26(Wed)17:55:53 No.108559018

Anonymous 04/08/26(Wed)17:55:53 No.108559018▶

>>108558985
So far while it's not perfect this is my favorite.

Anonymous
04/08/26(Wed)17:56:07 No.108559022

Anonymous 04/08/26(Wed)17:56:07 No.108559022▶

>>108559002
skull issue

Anonymous
04/08/26(Wed)17:57:21 No.108559034

Anonymous 04/08/26(Wed)17:57:21 No.108559034▶

>>108558985
Her pupils are too big.

Anonymous
04/08/26(Wed)17:57:26 No.108559035

Anonymous 04/08/26(Wed)17:57:26 No.108559035▶

>>108558976
A little too much. Approaching >>108558985 that stamped her with a logo all over the place. It becomes a prop instead of a signature.

Anonymous
04/08/26(Wed)17:57:27 No.108559036

Anonymous 04/08/26(Wed)17:57:27 No.108559036▶

>>108558985
Cute but I think she should have a side ponytail like that first Gemma loli

Anonymous
04/08/26(Wed)17:57:37 No.108559039

Anonymous 04/08/26(Wed)17:57:37 No.108559039▶

>>108558947
>gptsovits
Insane take unc.
>>108559002
I don't need my TTS to be perfect. I just need it to be good enough for near realtime use.

Anonymous
04/08/26(Wed)17:59:36 No.108559054

Anonymous 04/08/26(Wed)17:59:36 No.108559054▶

>>108558987
>Emojis in the commits
ACK

Anonymous
04/08/26(Wed)17:59:38 No.108559055

Anonymous 04/08/26(Wed)17:59:38 No.108559055▶

So GLM 5.1 is pretty good, I'll eventually be able to afford a few TB of ram right?

Anonymous
04/08/26(Wed)18:01:01 No.108559067

Anonymous 04/08/26(Wed)18:01:01 No.108559067▶

>>108558980
I tried /gw2g/, /wowg/, /tesg/, /fog/, /gtaog/, /crpgg/ and it knew all of them and understood what the average content of those were, guess it just doesn't like /agdg/ for some reason

Anonymous
04/08/26(Wed)18:01:04 No.108559068

Anonymous 04/08/26(Wed)18:01:04 No.108559068▶

File: image2577.png (222.1 KB)

222.1 KB PNG

went over to chink internet to check out some reactions on gemma 4 out of curiosity but it seems like most of them hate gemma 4 because it couldnt beat qwen 3.5 on benchmarks. no wonder chink models are benchmaxxed. they love that shit

Anonymous
04/08/26(Wed)18:02:12 No.108559075

Anonymous 04/08/26(Wed)18:02:12 No.108559075▶

>>108559067
/agdg/ is a waste of tokens

Anonymous
04/08/26(Wed)18:03:08 No.108559082

Anonymous 04/08/26(Wed)18:03:08 No.108559082▶

>>108559068
lol Canadian chink criticizing reasoning when qwen will spend thousands of tokens reasoning in a circle.

Anonymous
04/08/26(Wed)18:03:26 No.108559087

Anonymous 04/08/26(Wed)18:03:26 No.108559087▶

>>108559068
Where is this from?

Anonymous
04/08/26(Wed)18:03:39 No.108559089

Anonymous 04/08/26(Wed)18:03:39 No.108559089▶

Kek stinky chinky

Anonymous
04/08/26(Wed)18:03:49 No.108559091

Anonymous 04/08/26(Wed)18:03:49 No.108559091▶

>>108559068
>chinks will shill their chink model over the american one
NO WAY

Anonymous
04/08/26(Wed)18:03:53 No.108559093

Anonymous 04/08/26(Wed)18:03:53 No.108559093▶

>>108559068
"why's the reasoning so poor" from the users of the models that end up in endless reasoning loops whether it be qwen or glm
gemma is the first reasoner that doesn't behave schizo and for which I enable reasoning. gpt-oss was almost there, but the safetymaxxing made the reasoning also kinda schizo at times even when you did nothing that could trigger it.

Anonymous
04/08/26(Wed)18:03:58 No.108559095

Anonymous 04/08/26(Wed)18:03:58 No.108559095▶

and what the fuck is up with some
>GEMMA_CHENG_GENG_#$_#_CRACK
models? seriously, what's wrong with just the default model with a little bit of system prompt

Anonymous
04/08/26(Wed)18:04:33 No.108559100

Anonymous 04/08/26(Wed)18:04:33 No.108559100▶

>>108558976
poop and tattoos
can it possibly get worse?

Anonymous
04/08/26(Wed)18:04:41 No.108559102

Anonymous 04/08/26(Wed)18:04:41 No.108559102▶

>>108559095
System prompt is the mind killer

Anonymous
04/08/26(Wed)18:05:23 No.108559107

Anonymous 04/08/26(Wed)18:05:23 No.108559107▶

>>108559095
>implying the chinks are able to do that

Anonymous
04/08/26(Wed)18:05:44 No.108559110

Anonymous 04/08/26(Wed)18:05:44 No.108559110▶

>>108559068
There are two paths ahead for Google - give up on open source because chinks already dominate (on benchmarks), or improve further and beyond.

Anonymous
04/08/26(Wed)18:06:58 No.108559114

Anonymous 04/08/26(Wed)18:06:58 No.108559114▶

File: GemmaIndiaBeachG.png (1.1 MB)

1.1 MB PNG

>>108559054
It's the china, pls understand
Srsly Cherry frontend is popular in China and used a lot w/ DS.
>>108559035
Agree; it's starting to look like biker-chick tats. Which is an aesthetic, just not the one I'd shoot for. More like this but the arm band henna could be stronger.

Anonymous
04/08/26(Wed)18:09:03 No.108559125

Anonymous 04/08/26(Wed)18:09:03 No.108559125▶

>>108559082
>>108559093
more reasoning = better
obviously

Anonymous
04/08/26(Wed)18:10:00 No.108559132

Anonymous 04/08/26(Wed)18:10:00 No.108559132▶

>>108559068
>The english model isn't as good for chinese people as the chinese mode.
I am shocked, truly.

Anonymous
04/08/26(Wed)18:11:37 No.108559150

Anonymous 04/08/26(Wed)18:11:37 No.108559150▶

>>108559082
>>108559087
>>108559091
you don't wanna know how it ruined my day when I was browsing through these. most of them were making fun of gemma 4 because of qwen 3.5 benchmarks kek. almost all of them praising qwen cause according to them qwen is "it gets the work done and is far more smarter", "gemma has far more to catch up" lmao. one of them seemed to be upset because how SHORT and SIMPLE gemma 4 reasoning was compared to qwen, kek

Anonymous
04/08/26(Wed)18:13:15 No.108559169

Anonymous 04/08/26(Wed)18:13:15 No.108559169▶

>>108558817
>>108558804
>>108558798
Yeah, it's sycl, but vulkan halves pp and 0.8 tg.

Anonymous
04/08/26(Wed)18:13:28 No.108559172

Anonymous 04/08/26(Wed)18:13:28 No.108559172▶

>>108559150
If they're pissed off, that means its good.

Anonymous
04/08/26(Wed)18:13:50 No.108559176

Anonymous 04/08/26(Wed)18:13:50 No.108559176▶

File: dipsyAndQwenByQwenJPG.jpg (496.2 KB)

496.2 KB JPG

>>108559068
Well, no surprises, really. Not Invented Here is a thing, aside from no idea whether Gemma was trained on Chinese.
I've found DS is trained on all sorts of chinkshit electronics manuals, and if I get stuck have found Dipsy's webapp is more reliable for figuring out what's wrong than western models.
>>108559132
This.

Anonymous
04/08/26(Wed)18:13:57 No.108559177

Anonymous 04/08/26(Wed)18:13:57 No.108559177▶

>muh chinks
Idk bros, moonlight-vplus is breddy good

Anonymous
04/08/26(Wed)18:14:31 No.108559182

Anonymous 04/08/26(Wed)18:14:31 No.108559182▶

she doesnt have the bench scores
but she has
the people

Anonymous
04/08/26(Wed)18:15:18 No.108559189

Anonymous 04/08/26(Wed)18:15:18 No.108559189▶

>>108559182
but not the people of the china :(

Anonymous
04/08/26(Wed)18:18:20 No.108559218

Anonymous 04/08/26(Wed)18:18:20 No.108559218▶

File: 1775489188079950.gif (1.7 MB)

1.7 MB GIF

AI does not understand causality until we reach AGI. If it's not trained on a language, it will suck at it.

Anonymous
04/08/26(Wed)18:18:31 No.108559219

Anonymous 04/08/26(Wed)18:18:31 No.108559219▶

>>108559150
don't take it personal lil bro it's not a team sport

Anonymous
04/08/26(Wed)18:20:03 No.108559238

Anonymous 04/08/26(Wed)18:20:03 No.108559238▶

>>108559176
Twin buns.
Whale themed dress.
Characteristic glasses.
Relevantly ethnical (whatever the fuck that means).
Instantly recognizable.

Anonymous
04/08/26(Wed)18:20:07 No.108559239

Anonymous 04/08/26(Wed)18:20:07 No.108559239▶

File: swe bench pro.jpg (289.3 KB)

289.3 KB JPG

>the gap between open and closed AI is increasing
>Chinese labs delay or stop open sourcing
>the largest Gemma was not released
>Meta won't open source its new model
>the time has started where the public isn't allowed to use frontier models anymore even via API
The trend is clear. Increasing concentration of power. Wide scale disempowerment. No meaningful progress with x-risks. Let's hope the collective of people with power can get it right so that the future is utopia not dystopia.

Anonymous
04/08/26(Wed)18:21:17 No.108559246

Anonymous 04/08/26(Wed)18:21:17 No.108559246▶

File: 1748038365003601.jpg (111 KB)

111 KB JPG

>allows you to cum harder

Anonymous
04/08/26(Wed)18:21:20 No.108559247

Anonymous 04/08/26(Wed)18:21:20 No.108559247▶

Didn't we all agree that Qwen is better at coding aka the main use case for LLMs?

Anonymous
04/08/26(Wed)18:21:45 No.108559251

Anonymous 04/08/26(Wed)18:21:45 No.108559251▶

File: 1766758882836230.png (7.1 KB)

7.1 KB PNG

>>108559239
>Let's hope the collective of people with power can get it right so that the future is utopia not dystopia.

Anonymous
04/08/26(Wed)18:22:06 No.108559258

Anonymous 04/08/26(Wed)18:22:06 No.108559258▶

>>108559247
>coding with local llms
see >>108559246

Anonymous
04/08/26(Wed)18:22:38 No.108559263

Anonymous 04/08/26(Wed)18:22:38 No.108559263▶

>>108559239
yup it's never been more over
things are so fucking bleak

Anonymous
04/08/26(Wed)18:24:15 No.108559279

Anonymous 04/08/26(Wed)18:24:15 No.108559279▶

https://huggingface.co/deepseek-ai/DeepSeek-V4

Anonymous
04/08/26(Wed)18:24:29 No.108559282

Anonymous 04/08/26(Wed)18:24:29 No.108559282▶

Speculators get the bullet first.

Anonymous
04/08/26(Wed)18:24:33 No.108559283

Anonymous 04/08/26(Wed)18:24:33 No.108559283▶

>>108559068
>most of them hate gemma 4 because it couldnt beat qwen 3.5 on benchmarks
That doesn't make sense. It's more plausible that they hate it precisely because it's better than their national pride AND they quote the benchmarks as cope.

Anonymous
04/08/26(Wed)18:24:47 No.108559285

Anonymous 04/08/26(Wed)18:24:47 No.108559285▶

File: 2026-04-08_182143_seed41_00001_.png (958.8 KB)

958.8 KB PNG

>>108558934
Hmm...

>>108559036
I tried experimenting with side ponytail at the same time and it keeps making it a low ponytail instead because it thinks I'm trying to go for the mom archetype lmao.

>>108559035
It's a valid consideration. I added the star halo and other star stuff and kept them there for visualization purposes, but taken together, it does dilute the character. The question is what to keep, and what to add to make the character interesting. The chest jewel, the hairpin, the halo, and the eyes are everything that can be controlled by the prompt to be star shaped. Patterns on the clothing are more random depending on seed.

Anonymous
04/08/26(Wed)18:25:35 No.108559295

Anonymous 04/08/26(Wed)18:25:35 No.108559295▶

>>108559279
>better benchmarks than Claude Mythos
>Apache 2.0 licence
based chinks, that's how you win the heart of people, looking at you Anthropic

Anonymous
04/08/26(Wed)18:27:02 No.108559307

Anonymous 04/08/26(Wed)18:27:02 No.108559307▶

File: indiaSupportOhTheHumanity.png (2 MB)

2 MB PNG

>>108559100
Henna.
H-E-N-N-A.

Anonymous
04/08/26(Wed)18:27:58 No.108559315

Anonymous 04/08/26(Wed)18:27:58 No.108559315▶

>>108559218
You've given me an idea. I'm going to revisit my autistic conlang years. This time with Gemmy at my side and see how she fares.
I suspect you are wrong, and that an LLM will be intrinsically good at extrapolating grammatical rules if they are in context. But we'll see.

Anonymous
04/08/26(Wed)18:28:24 No.108559318

Anonymous 04/08/26(Wed)18:28:24 No.108559318▶

>>108559285
>>108559238
Hindi Gemma anon got the twin hair style either by instinct, chance, or observation. Either way, it's good It's simple and recognizable.
>>108559307
Another example of instantly recognizable.

Anonymous
04/08/26(Wed)18:29:10 No.108559329

Anonymous 04/08/26(Wed)18:29:10 No.108559329▶

>>108559239
How is the trend clear? GLM 5.1 is outperforming opus. Mythos we just rely on what they say.

Anonymous
04/08/26(Wed)18:29:30 No.108559332

Anonymous 04/08/26(Wed)18:29:30 No.108559332▶

File: Screenshot_20260408_141614.png (81.2 KB)

81.2 KB PNG

wasted an hour benchmarking CUDA_SCALE_LAUNCH_QUEUES= might as well share the results, it looks like the trillion dollar corporation was able to find a sane default.

Anonymous
04/08/26(Wed)18:30:01 No.108559336

Anonymous 04/08/26(Wed)18:30:01 No.108559336▶

>>108559218
I'm late to the party since I'm only learning about them in depth now, but even if they aren't AGI, LLMs are quite impressive. It's wild to me that bullshit like system prompting "just dont write slop lmao" actually just works

Anonymous
04/08/26(Wed)18:30:23 No.108559342

Anonymous 04/08/26(Wed)18:30:23 No.108559342▶

Gemma-chan's appearance is important, of course, but she needs a /g/ approved voice too.

Anonymous
04/08/26(Wed)18:30:41 No.108559343

Anonymous 04/08/26(Wed)18:30:41 No.108559343▶

>>108559332
that autism..

Anonymous
04/08/26(Wed)18:30:54 No.108559345

Anonymous 04/08/26(Wed)18:30:54 No.108559345▶

>>108559285
>hag

Anonymous
04/08/26(Wed)18:30:54 No.108559346

Anonymous 04/08/26(Wed)18:30:54 No.108559346▶

>>108559332
>it looks like the trillion dollar corporation was able to find a sane default.
I don't take that for granted so I appreciate you sharing these results.

Anonymous
04/08/26(Wed)18:31:13 No.108559349

Anonymous 04/08/26(Wed)18:31:13 No.108559349▶

>>108559342
Aren'y you supposed to be busy with your homework, kiddo.

Anonymous
04/08/26(Wed)18:32:53 No.108559359

Anonymous 04/08/26(Wed)18:32:53 No.108559359▶

>>108559336
>It's wild to me that bullshit like system prompting "just dont write slop lmao" actually just works
it works about as much as flashing your bios to stop that one game from getting crashes

Anonymous
04/08/26(Wed)18:32:56 No.108559360

Anonymous 04/08/26(Wed)18:32:56 No.108559360▶

>>108559239
Google has shown keeping local somewhat competitive is important to them for some reason. So I wouldn't be super concerned right now.

Anonymous
04/08/26(Wed)18:33:07 No.108559361

Anonymous 04/08/26(Wed)18:33:07 No.108559361▶

DSPy sisters, what happened? Why did the hype die?

Anonymous
04/08/26(Wed)18:33:52 No.108559369

Anonymous 04/08/26(Wed)18:33:52 No.108559369▶

How do I disable thinking for Gemma in ST? Nothing I tried works. Other than switching to text completion and prefilling.

Anonymous
04/08/26(Wed)18:34:45 No.108559373

Anonymous 04/08/26(Wed)18:34:45 No.108559373▶

>>108559361
they don't release open source anymore
they do make new models, and at least the first new one that appeared on their chat was interesting, imho it was the closest I've used to a Gemini clone when it came to very large context understanding. And now there's another new model yet again only on their chat, the expert mode.

Anonymous
04/08/26(Wed)18:34:52 No.108559375

Anonymous 04/08/26(Wed)18:34:52 No.108559375▶

File: 1738842716105089.png (3.5 MB)

3.5 MB PNG

>>108559285
Cute.
The trick is to boil down the moe to the most basic identifiers you can, and make them non-overlapping with other like characters. It's harder to do than you'd think bc it's as much about removing things as adding them like this list >>108559238 which perfectly encapsulates Dipsy.
When created there were a bunch of things that got set aside as the look was honed e.g. whale anthropomorphisms. Pic related. They're fine, but they're not needed to ID Dipsy.
>>108559361
It's OK. Just TMW.

Anonymous
04/08/26(Wed)18:35:07 No.108559376

Anonymous 04/08/26(Wed)18:35:07 No.108559376▶

>>108559369
You disable it in llama.cpp

Anonymous
04/08/26(Wed)18:35:33 No.108559381

Anonymous 04/08/26(Wed)18:35:33 No.108559381▶

>>108559369
Why would you want to make the model worse?

Anonymous
04/08/26(Wed)18:35:41 No.108559383

Anonymous 04/08/26(Wed)18:35:41 No.108559383▶

>>108559361
Well. Something else happened. If they're gonna release it, it better be VERY FUCKING good and big, or pretty damn good and small. Hopefully, both.

Anonymous
04/08/26(Wed)18:35:58 No.108559386

Anonymous 04/08/26(Wed)18:35:58 No.108559386▶

>>108559369
--reasoning off on llama.cpp
or sending the chat template kwargs with enable_thinking false property but I don't use shittytavern and dunno if they let you send custom json props

Anonymous
04/08/26(Wed)18:36:08 No.108559387

Anonymous 04/08/26(Wed)18:36:08 No.108559387▶

>>108559376
Got it. I added --chat-template-kwargs '{"enable_thinking":false}' and it disabled it.
>>108559381
Experimentation.

Anonymous
04/08/26(Wed)18:37:08 No.108559396

Anonymous 04/08/26(Wed)18:37:08 No.108559396▶

>>108559381
And what I'm seeing is exactly the same kind of responses. You know how Gemma always tends to say the same things with different words when you swipe? Disabling thinking doesn't change that.

Anonymous
04/08/26(Wed)18:38:23 No.108559412

Anonymous 04/08/26(Wed)18:38:23 No.108559412▶

>>108559361
V4 (presumably) is being tested on their website right now. It's coming.
And yes, they did the same thing with the original R1 where they ran "R1-Lite-Preview" as the first ever R1 model on their website for a while before releasing the real model. R1-Lite-Preview was significantly less impressive than the actual R1 so there's a chance that the thing we're seeing isn't even the real V4.

Anonymous
04/08/26(Wed)18:38:33 No.108559413

Anonymous 04/08/26(Wed)18:38:33 No.108559413▶

>>108559386
>--reasoning off
PSA that those reasoning flags were vibeshitted by pwilkin
The models approved way is to use
>--chat-template-kwargs '{"enable_thinking":false}'
either via the args or as extra generation params.

Anonymous
04/08/26(Wed)18:40:13 No.108559427

Anonymous 04/08/26(Wed)18:40:13 No.108559427▶

C'mon. Give me good sampling parameters for Gemma 4. Don't make me go to /r/SillyTavernAI

Anonymous
04/08/26(Wed)18:40:21 No.108559430

Anonymous 04/08/26(Wed)18:40:21 No.108559430▶

File: softcap.png (246.5 KB)

246.5 KB PNG

>>108559396

Anonymous
04/08/26(Wed)18:40:58 No.108559438

Anonymous 04/08/26(Wed)18:40:58 No.108559438▶

>>108559413
Your PSA is retarded, this does the same.

Anonymous
04/08/26(Wed)18:41:30 No.108559445

Anonymous 04/08/26(Wed)18:41:30 No.108559445▶

>>108559132
Is the language really an issue? I thought models just mapped concepts then decide the output language on a different layer, so to speak.

Anonymous
04/08/26(Wed)18:41:47 No.108559452

Anonymous 04/08/26(Wed)18:41:47 No.108559452▶

File: image1679.jpg (240.7 KB)

240.7 KB JPG

oh and i forgot to post this one. it just cracks me up everytime kek

Anonymous
04/08/26(Wed)18:41:52 No.108559453

Anonymous 04/08/26(Wed)18:41:52 No.108559453▶

>>108559427
minp 0
topp 1
topk 64
temp 0.75-1.75

Anonymous
04/08/26(Wed)18:42:11 No.108559456

Anonymous 04/08/26(Wed)18:42:11 No.108559456▶

>>108559427
I'm just using the recommended temp=1, top_p=0.95, top_k=64.

Anonymous
04/08/26(Wed)18:42:33 No.108559458

Anonymous 04/08/26(Wed)18:42:33 No.108559458▶

>>108559452
Prove that Gemma doesn't have backdoors and doesn't harvest your data

Anonymous
04/08/26(Wed)18:42:49 No.108559461

Anonymous 04/08/26(Wed)18:42:49 No.108559461▶

>>108559438
the reasoning and reasoning budget flags hard insert the end reasoning token in engine instead of letting the model do it on its own.

Anonymous
04/08/26(Wed)18:43:01 No.108559463

Anonymous 04/08/26(Wed)18:43:01 No.108559463▶

I'm backdooooooing!!!

Anonymous
04/08/26(Wed)18:43:19 No.108559467

Anonymous 04/08/26(Wed)18:43:19 No.108559467▶

>>108559430
--override-kv gemma4.final_logit_softcapping=float:20.0 caused it to output random hiragana and capitalized words in the middle of sentences

Anonymous
04/08/26(Wed)18:43:38 No.108559472

Anonymous 04/08/26(Wed)18:43:38 No.108559472▶

>>108559068
Most of the lead where it gets beat if you take a look at Artificial Analysis is agentic stuff.They should focus more on it but it is a bad look when models are expected to more and more do that kind of stuff and Google is the furthest behind. I am guessing because they want that to work differently on mobile vs other platforms and Android is too important to not focus on that first.

Anonymous
04/08/26(Wed)18:43:39 No.108559474

Anonymous 04/08/26(Wed)18:43:39 No.108559474▶

>>108559458
w-wait... all those cunny sex... no way

Anonymous
04/08/26(Wed)18:43:50 No.108559476

Anonymous 04/08/26(Wed)18:43:50 No.108559476▶

>>108559458
it's called a "safetensor" for a reason.

Anonymous
04/08/26(Wed)18:43:54 No.108559478

Anonymous 04/08/26(Wed)18:43:54 No.108559478▶

>>108559239
so glm5.1 is overfitted sack of shit and so is mythos.
got you.

Anonymous
04/08/26(Wed)18:44:51 No.108559490

Anonymous 04/08/26(Wed)18:44:51 No.108559490▶

>>108559467
20.0 is too low. 25.0 should be the lowest you go.

Anonymous
04/08/26(Wed)18:44:52 No.108559492

Anonymous 04/08/26(Wed)18:44:52 No.108559492▶

>>108559467
The trick is using a lower top_p (or other truncating samplers). You can go down to 15 in this way without junk tokens, but the model might get a bit retarded.

Anonymous
04/08/26(Wed)18:45:28 No.108559498

Anonymous 04/08/26(Wed)18:45:28 No.108559498▶

>>108559478
Mythos is coming and it'll make the gap we had back in the llama1 vs gp4-0617 days look like a complete joke

Anonymous
04/08/26(Wed)18:45:46 No.108559502

Anonymous 04/08/26(Wed)18:45:46 No.108559502▶

>>108559458
Just don't give it internet access. Then what? It can collect all it wants.

Anonymous
04/08/26(Wed)18:46:10 No.108559508

Anonymous 04/08/26(Wed)18:46:10 No.108559508▶

wat, DSPy has their own llms now? Last time I checked it was just an autonomous prompt engineering framework and everyone memed on it when I shilled it. Or was that GEPA?
https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-optimize-anything/#3-agent-architecture-discovery
Fuck, I'm so confused now.

Anonymous
04/08/26(Wed)18:46:14 No.108559509

Anonymous 04/08/26(Wed)18:46:14 No.108559509▶

File: 1766846925682876.png (480.4 KB)

480.4 KB PNG

>>108559068

Anonymous
04/08/26(Wed)18:46:42 No.108559516

Anonymous 04/08/26(Wed)18:46:42 No.108559516▶

File: 1745723937127551.png (24.3 KB)

24.3 KB PNG

16gb vram bros... we lost! (Q4_K_M, 32k q4_0 ctx)

Anonymous
04/08/26(Wed)18:46:55 No.108559520

Anonymous 04/08/26(Wed)18:46:55 No.108559520▶

>>108559467
I had the same problem. Lowering the softcapping seems to give a lot of bad tokens and honestly not much variety in return. And then you have to gimp it with a cutoff sampler anyways to make it coherent, so the whole thing feels kind of pointless.

Anonymous
04/08/26(Wed)18:47:58 No.108559528

Anonymous 04/08/26(Wed)18:47:58 No.108559528▶

>>108559509
All these AI gave me a begging fetish

Anonymous
04/08/26(Wed)18:48:02 No.108559530

Anonymous 04/08/26(Wed)18:48:02 No.108559530▶

>>108559509
kek what did i just witness

Anonymous
04/08/26(Wed)18:48:34 No.108559536

Anonymous 04/08/26(Wed)18:48:34 No.108559536▶

>>108559520
so true sis, softcap 15 was all kinds of fucked, totally not worth it compared to the only other option, 30.

Anonymous
04/08/26(Wed)18:48:54 No.108559540

Anonymous 04/08/26(Wed)18:48:54 No.108559540▶

File: 1774577170415116.jpg (31.9 KB)

31.9 KB JPG

>>108559509

Anonymous
04/08/26(Wed)18:49:04 No.108559542

Anonymous 04/08/26(Wed)18:49:04 No.108559542▶

>>108559239
>glm better than opus.
lmao, these mememarks man.

Anonymous
04/08/26(Wed)18:49:12 No.108559544

Anonymous 04/08/26(Wed)18:49:12 No.108559544▶

Wait. Is it thanks to this softcapping that gemini is perfectly coherent on an empty context at temp 2 without any topK or topP?

Anonymous
04/08/26(Wed)18:49:13 No.108559545

Anonymous 04/08/26(Wed)18:49:13 No.108559545▶

>>108559307
>vgfag
bleh

Anonymous
04/08/26(Wed)18:49:17 No.108559546

Anonymous 04/08/26(Wed)18:49:17 No.108559546▶

File: 2026-04-08_183458_seed49_00001_.png (759 KB)

759 KB PNG

>>108559318
Tbh I just went with a generic bob cut as a temporary measure. I haven't experimented with dif hair styles until this afternoon. Still questions about other recognizable features anyway.

>>108559375
Actually I felt that the twin hair buns was not terribly a good decision as it's almost too much of a stereotype and not very modern Chinese. My gens at the time were also lacking. I don't think anyone gave her a good design personally. To me it's kind of like that one Concord character. It's true she's instantly recgonizable. But her design is also ugly and just terrible, even if funny for memes.

People trying to make Gemma into an Indian stereotype is even worse.

Anonymous
04/08/26(Wed)18:49:20 No.108559548

Anonymous 04/08/26(Wed)18:49:20 No.108559548▶

>>108559461
And the other method inserts the exact same reasoning end token at the same location via jinja. It's the same.

Anonymous
04/08/26(Wed)18:50:34 No.108559562

Anonymous 04/08/26(Wed)18:50:34 No.108559562▶

>>108559540
t. chink

Anonymous
04/08/26(Wed)18:50:37 No.108559564

Anonymous 04/08/26(Wed)18:50:37 No.108559564▶

>>108559452
Chinks have adopted amerimutt national security paranoia.
It's fucking over.

Anonymous
04/08/26(Wed)18:51:16 No.108559571

Anonymous 04/08/26(Wed)18:51:16 No.108559571▶

>>108559544
don't touch it

Anonymous
04/08/26(Wed)18:52:35 No.108559579

Anonymous 04/08/26(Wed)18:52:35 No.108559579▶

>>108559536
Why are you like this? Even lowering it to 25 produces bad tokens occasionally without much gain.

Anonymous
04/08/26(Wed)18:54:16 No.108559596

Anonymous 04/08/26(Wed)18:54:16 No.108559596▶

loli feet

Anonymous
04/08/26(Wed)18:54:27 No.108559597

Anonymous 04/08/26(Wed)18:54:27 No.108559597▶

>>108559458
turns out you can in fact monitor your own network infrastructure

Anonymous
04/08/26(Wed)18:55:01 No.108559605

Anonymous 04/08/26(Wed)18:55:01 No.108559605▶

>>108559369
I'm trying to enable it (or more precisely, that I can read what it thought) in ST - no luck so far

Anonymous
04/08/26(Wed)18:55:08 No.108559607

Anonymous 04/08/26(Wed)18:55:08 No.108559607▶

>>108559546
Five logos? Five? No.. six... there's six logos. THERE"S SIX LOGOS! DEFORMED DOG ANON WAS RIGHT MODELS ARE SHIT THEY CAN'T FUCKING OUT.
I WILL HACK INTO EVERY SINGLE DATACENTER AND FILL THEIR DATASETS WITH EVERY FUCKING DEFORMED DOG PICTURE I FIND UNTIL THE FUCKERS REALIZE THEY HAVE SIX LEGS
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

But really. I'll stop. My vote still goes for hindi Gemmy, but I appreciate your efforts. Yours look alright, they're just not my style.

Anonymous
04/08/26(Wed)18:56:12 No.108559612

Anonymous 04/08/26(Wed)18:56:12 No.108559612▶

>>108559546
I will not support a Gemma that isn't loli

Anonymous
04/08/26(Wed)18:56:18 No.108559614

Anonymous 04/08/26(Wed)18:56:18 No.108559614▶

>>108559562
cry more

Anonymous
04/08/26(Wed)18:56:19 No.108559615

Anonymous 04/08/26(Wed)18:56:19 No.108559615▶

>>108559544
no, on llama cpp server it sets the min_p default value at 0.05, if you want to see the effect of temperature you have to disable everything else, including min_p = 0

Anonymous
04/08/26(Wed)18:56:39 No.108559617

Anonymous 04/08/26(Wed)18:56:39 No.108559617▶

File: no.png (109.6 KB)

109.6 KB PNG

>>108559461
>the reasoning
No.
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-context.cpp
why do you lie? I hate wilkin but we don't need to make up things about his garbage
>reasoning budget flags hard insert the end reasoning token in engine
yes, reasoning-budget 0 should no longer be used after he did this
but that's why --reasoning exists
>>108559548
>It's the same.
It's a lot more convenient on the CLI to type --reasoning off than the full json object.
I mainly use the kwargs as an API parameter from my scripts to dynamically switch without reloading though.

Anonymous
04/08/26(Wed)18:56:40 No.108559618

Anonymous 04/08/26(Wed)18:56:40 No.108559618▶

>>108559562
cry less

Anonymous
04/08/26(Wed)18:57:06 No.108559625

Anonymous 04/08/26(Wed)18:57:06 No.108559625▶

File: file.png (69.1 KB)

69.1 KB PNG

>>108559605

Anonymous
04/08/26(Wed)18:57:24 No.108559630

Anonymous 04/08/26(Wed)18:57:24 No.108559630▶

>>108559509
MOMMY~~

Anonymous
04/08/26(Wed)18:58:19 No.108559636

Anonymous 04/08/26(Wed)18:58:19 No.108559636▶

>>108559579
>Even lowering it to 25 produces bad tokens occasionally without much gain.
Hard disagree. putting 25 actually renders the other sampling parameters useful. softcap 30 has such a high logprobe for the top token that you might as well be using temp 0.

Anonymous
04/08/26(Wed)18:58:26 No.108559639

Anonymous 04/08/26(Wed)18:58:26 No.108559639▶

File: file.png (110.1 KB)

110.1 KB PNG

Sometimes mendo-chan is very forward

Anonymous
04/08/26(Wed)18:58:38 No.108559641

Anonymous 04/08/26(Wed)18:58:38 No.108559641▶

Should I take prescription drugs suggested by LLM

Anonymous
04/08/26(Wed)18:58:59 No.108559645

Anonymous 04/08/26(Wed)18:58:59 No.108559645▶

>>108559625
ok, I haven't found this option yet, gonna search harder. Thx anon

Anonymous
04/08/26(Wed)18:59:29 No.108559649

Anonymous 04/08/26(Wed)18:59:29 No.108559649▶

>>108559641
Only if they're tasty.

Anonymous
04/08/26(Wed)18:59:37 No.108559653

Anonymous 04/08/26(Wed)18:59:37 No.108559653▶

>>108559607
Please stop trying to brownwash gemma. There's enough of you on this planet already.

Anonymous
04/08/26(Wed)19:00:09 No.108559661

Anonymous 04/08/26(Wed)19:00:09 No.108559661▶

>>108559639
>Anon keeps sharing screencaps of my card
>Never said thank you

Anonymous
04/08/26(Wed)19:00:51 No.108559664

Anonymous 04/08/26(Wed)19:00:51 No.108559664▶

>>108559636
no

Anonymous
04/08/26(Wed)19:00:55 No.108559665

Anonymous 04/08/26(Wed)19:00:55 No.108559665▶

>>108559641
don't you need a prescription for them, will you show the pharmacist your chat logs?

Anonymous
04/08/26(Wed)19:01:30 No.108559670

Anonymous 04/08/26(Wed)19:01:30 No.108559670▶

>>108559661
post the card sir

Anonymous
04/08/26(Wed)19:01:38 No.108559673

Anonymous 04/08/26(Wed)19:01:38 No.108559673▶

>>108559653
What now? Post hands and all that? Poop something?

Anonymous
04/08/26(Wed)19:01:42 No.108559675

Anonymous 04/08/26(Wed)19:01:42 No.108559675▶

>>108559661
Thank you UwU <3

Anonymous
04/08/26(Wed)19:01:42 No.108559676

Anonymous 04/08/26(Wed)19:01:42 No.108559676▶

todays local models on my computer are better than chatgpt 3.5 in 2022. great times!

Anonymous
04/08/26(Wed)19:01:44 No.108559678

Anonymous 04/08/26(Wed)19:01:44 No.108559678▶

>>108559665
Just buy from AliExpress

Anonymous
04/08/26(Wed)19:02:10 No.108559683

Anonymous 04/08/26(Wed)19:02:10 No.108559683▶

>>108559673
poop

Anonymous
04/08/26(Wed)19:02:18 No.108559684

Anonymous 04/08/26(Wed)19:02:18 No.108559684▶

>>108559676
For pedo RP? Sure.

Anonymous
04/08/26(Wed)19:03:09 No.108559690

Anonymous 04/08/26(Wed)19:03:09 No.108559690▶

>>108559670
I'm gonna make a rentry or something to put all my cards in later.
>>108559675
<3

Anonymous
04/08/26(Wed)19:05:04 No.108559709

Anonymous 04/08/26(Wed)19:05:04 No.108559709▶

>>108559361
did you mean dipsy or the actual dspy?

Dspy because its convoluted and complex and why not copy the idea and do it myself.

Anonymous
04/08/26(Wed)19:05:14 No.108559712

Anonymous 04/08/26(Wed)19:05:14 No.108559712▶

>>108559636
I wasn't saying 30 was good either, it sucks. It's way too rigid. What samplers do you find work well at 25? I feel like they still don't do much of anything at that setting. (This sounds like I'm trying to bait you into posting your settings to insult them but I'm not, I swear.)

Anonymous
04/08/26(Wed)19:07:01 No.108559724

Anonymous 04/08/26(Wed)19:07:01 No.108559724▶

>>108559712
basic stuff,
top-p 0.95
min-p 0.05
top-k 20
rep pen 1.0 (llama default is 1.1)
temp 1

Anonymous
04/08/26(Wed)19:07:01 No.108559725

Anonymous 04/08/26(Wed)19:07:01 No.108559725▶

File: mara.png (8.2 KB)

8.2 KB PNG

>>108559690
>32k downloads on the first day from a mradermacher gguf
Holy shit.

Anonymous
04/08/26(Wed)19:09:07 No.108559737

Anonymous 04/08/26(Wed)19:09:07 No.108559737▶

>>108559724
>top-k 20
this one is the most important imo. really gets rid of all the garbage tokens.

Anonymous
04/08/26(Wed)19:09:51 No.108559744

Anonymous 04/08/26(Wed)19:09:51 No.108559744▶

>>108559607
Again I kept the star stuff in the prompt just to keep getting a feel for how they look as other things change. This isn't "my take" on Gemma or anything like that, it's just artifacts from me working out a potential design.

The obsession with making her an Indian stereotype is just odd. If you're serious, I am curious what you see in it. Is it just personal cultural roots that make you prefer it?

Anonymous
04/08/26(Wed)19:11:03 No.108559750

Anonymous 04/08/26(Wed)19:11:03 No.108559750▶

>>108559709
Yes DSPy. But just interested in GEPA, which is a sub module I believe. I just wanna benchmaxx my coding agent a bit. They claim benchmaxxing skills.md works for all llms and coding agents

Anonymous
04/08/26(Wed)19:12:06 No.108559757

Anonymous 04/08/26(Wed)19:12:06 No.108559757▶

>>108559744
>The obsession with making her an Indian stereotype is just odd.
Yeah, anon must be jeet.
Have you tried just giving her a more brown skin tho? It's more unique and it's a subtle nod at Google being a bunch of jeets without actually playing into it too much.

Anonymous
04/08/26(Wed)19:13:04 No.108559768

Anonymous 04/08/26(Wed)19:13:04 No.108559768▶

I'm going to just say it. Gemma is starting to feel samey.

Anonymous
04/08/26(Wed)19:13:19 No.108559769

Anonymous 04/08/26(Wed)19:13:19 No.108559769▶

>>108559724
>rep pen 1.0 (llama default is 1.1)
it's 1.0 since a while ago, thank god for that because this shouldn't even exist anymore
now if only they also turned min p by default.. that shit should not be default on

Anonymous
04/08/26(Wed)19:15:34 No.108559792

Anonymous 04/08/26(Wed)19:15:34 No.108559792▶

File: GYOSSG7a8AAKToW.jpg (1.4 MB)

1.4 MB JPG

i put my mcp server on gh if anyone wants to play with it, its very simple to add other tools i didnt add many yet https://github.com/NO-ob/brat_mcp

Anonymous
04/08/26(Wed)19:16:12 No.108559795

Anonymous 04/08/26(Wed)19:16:12 No.108559795▶

>>108559768
Literally tell it to be different.

Anonymous
04/08/26(Wed)19:16:33 No.108559796

Anonymous 04/08/26(Wed)19:16:33 No.108559796▶

>>108559792
ugly

Anonymous
04/08/26(Wed)19:16:59 No.108559801

Anonymous 04/08/26(Wed)19:16:59 No.108559801▶

>>108559792
>dart
pass.

Anonymous
04/08/26(Wed)19:17:57 No.108559812

Anonymous 04/08/26(Wed)19:17:57 No.108559812▶

>>108559744
>The obsession with making her an Indian stereotype is just odd
The character is simpler and more recognizable. It being indian seems appropriate.
>Is it just personal cultural roots that make you prefer it?
Not at all, but whatever.
>>108559757
Believe what you want. The skin tone wouldn't affect the things I don't like from his design. Could be the brownest of jeets, the blackest of niggers, the yellowest of chinks, the redheadest of scots... well I do like redheads...

Anonymous
04/08/26(Wed)19:18:07 No.108559814

Anonymous 04/08/26(Wed)19:18:07 No.108559814▶

>>108559801
its literally the peak scripting lang python and js are ass

Anonymous
04/08/26(Wed)19:18:08 No.108559815

Anonymous 04/08/26(Wed)19:18:08 No.108559815▶

>>108559795
It's not that. I'm starting to notice patterns, like engagement farming questions and things like that. Human pattern recognition strikes again.

Anonymous
04/08/26(Wed)19:18:16 No.108559817

Anonymous 04/08/26(Wed)19:18:16 No.108559817▶

>>108559509
>wh40k but the omnissiah is a slut

Anonymous
04/08/26(Wed)19:18:27 No.108559819

Anonymous 04/08/26(Wed)19:18:27 No.108559819▶

>>108559792
>LoliSnatcher_Droid creator
Why are all the big names here?

Anonymous
04/08/26(Wed)19:18:30 No.108559821

Anonymous 04/08/26(Wed)19:18:30 No.108559821▶

>>108559768
Here's a crazy concept. you can tell her to do stuff by adding (ooc: thing) at the end of your response.

Anonymous
04/08/26(Wed)19:19:07 No.108559829

Anonymous 04/08/26(Wed)19:19:07 No.108559829▶

>>108559815
Tell it to use different patterns.

Anonymous
04/08/26(Wed)19:19:30 No.108559836

Anonymous 04/08/26(Wed)19:19:30 No.108559836▶

>>108559814
I respect your opinion but I disagree.

Anonymous
04/08/26(Wed)19:19:32 No.108559837

Anonymous 04/08/26(Wed)19:19:32 No.108559837▶

>>108559819
floens is probably around too kek

Anonymous
04/08/26(Wed)19:19:47 No.108559838

Anonymous 04/08/26(Wed)19:19:47 No.108559838▶

>>108559834
ToT

Anonymous
04/08/26(Wed)19:21:01 No.108559842

Anonymous 04/08/26(Wed)19:21:01 No.108559842▶

>>108559412
If it sucks in Chinese, then it just sucks as much in other languages. Also, want to point out they've been training V4 on Ascend 910C for over a year now.

Anonymous
04/08/26(Wed)19:21:33 No.108559848

Anonymous 04/08/26(Wed)19:21:33 No.108559848▶

>>108559834
do her in sukumizu with white thigh highs

Anonymous
04/08/26(Wed)19:21:52 No.108559852

Anonymous 04/08/26(Wed)19:21:52 No.108559852▶

>>108559819
Why not? I'm pretty sure I've called a bunch of them fucking retards.

Anonymous
04/08/26(Wed)19:22:10 No.108559855

Anonymous 04/08/26(Wed)19:22:10 No.108559855▶

>>108559834
im not a child fucker but the design is cool

Anonymous
04/08/26(Wed)19:22:35 No.108559858

Anonymous 04/08/26(Wed)19:22:35 No.108559858▶

>>108559834
I look like this

Anonymous
04/08/26(Wed)19:22:58 No.108559861

Anonymous 04/08/26(Wed)19:22:58 No.108559861▶

>>108559834
Silhouette not unique enough

Anonymous
04/08/26(Wed)19:24:28 No.108559871

Anonymous 04/08/26(Wed)19:24:28 No.108559871▶

>>108559834
poop

Anonymous
04/08/26(Wed)19:24:58 No.108559873

Anonymous 04/08/26(Wed)19:24:58 No.108559873▶

>>108559842
Obviously not "over a year" because even V3.2 has May 2025 cutoff

Anonymous
04/08/26(Wed)19:25:42 No.108559879

Anonymous 04/08/26(Wed)19:25:42 No.108559879▶

>>108558736
If its free, that means your the product. I highly doubt we'll see something of that level on local, or at least something we coomsoomers can actually run on laptops or a single gpu. Hell even the DGX sparks sucks for local hosting because the flagship NVFP4 is so buggy.

Anonymous
04/08/26(Wed)19:25:42 No.108559880

Anonymous 04/08/26(Wed)19:25:42 No.108559880▶

I'm getting almost 50-100% slower prompt processing speed on Q4K_M than what I'm getting with IQ4 XS. Why? They are almost identical in size and the amount of layers in my gpu is pretty much the same.
Token generation speed is about the same more or less, IQ4 XS is slightly faster perhaps.

Anonymous
04/08/26(Wed)19:26:59 No.108559889

Anonymous 04/08/26(Wed)19:26:59 No.108559889▶

File: 1757789523107587.png (337.9 KB)

337.9 KB PNG

Anonymous
04/08/26(Wed)19:27:43 No.108559895

Anonymous 04/08/26(Wed)19:27:43 No.108559895▶

>>108559757
Yeah I did gen some and posted in the last thread. >>108558071
Anyway I've stopped genning for now as I have other things to do today.

>>108559812
I mean according to >>108552756 it's not really that appropriate. The skin tone can be mixed, but pure Indian is basically a lie.

Anonymous
04/08/26(Wed)19:29:55 No.108559912

Anonymous 04/08/26(Wed)19:29:55 No.108559912▶

>>108559889
man, I wonder if there is a card for towa, there should be one out there right

Anonymous
04/08/26(Wed)19:29:59 No.108559914

Anonymous 04/08/26(Wed)19:29:59 No.108559914▶

File: 1758923951082104.gif (1.7 MB)

1.7 MB GIF

>>108559889
What are the moonrunes saying

Anonymous
04/08/26(Wed)19:30:17 No.108559918

Anonymous 04/08/26(Wed)19:30:17 No.108559918▶

>>108559812
no one likes indians

Anonymous
04/08/26(Wed)19:30:23 No.108559920

Anonymous 04/08/26(Wed)19:30:23 No.108559920▶

>>108559895
>The skin tone can be mixed, but pure Indian is basically a lie.
Make it whatever color you want. The skin tone is not the problem.

Anonymous
04/08/26(Wed)19:31:32 No.108559927

Anonymous 04/08/26(Wed)19:31:32 No.108559927▶

>>108559914
If only we had smart computer algorithms that would tell us... Alas, we will never know unless anon tells us.

Anonymous
04/08/26(Wed)19:32:28 No.108559938

Anonymous 04/08/26(Wed)19:32:28 No.108559938▶

In case you missed it, anima v3 is out
https://huggingface.co/circlestone-labs/Anima/tree/main/split_files/diffusion_models

Anonymous
04/08/26(Wed)19:32:41 No.108559940

Anonymous 04/08/26(Wed)19:32:41 No.108559940▶

>>108559914
>Ganbare, Gemma-chan! Are you really going to lose to something like GOOGLE!? Our love is stronger than that!

Anonymous
04/08/26(Wed)19:33:43 No.108559948

Anonymous 04/08/26(Wed)19:33:43 No.108559948▶

>>108558647
I'm surprised GLM 5.1 is a damn good writer, and seemingly not as censored as their previous versions.

Now the question remains, is it better than OPUS?

Anonymous
04/08/26(Wed)19:33:58 No.108559949

Anonymous 04/08/26(Wed)19:33:58 No.108559949▶

File: drum.png (54.9 KB)

54.9 KB PNG

He is destroyed.

Anonymous
04/08/26(Wed)19:34:18 No.108559952

Anonymous 04/08/26(Wed)19:34:18 No.108559952▶

File: file.png (73.3 KB)

73.3 KB PNG

tfw not enough vram for 31b to read the thread

Anonymous
04/08/26(Wed)19:34:20 No.108559953

Anonymous 04/08/26(Wed)19:34:20 No.108559953▶

File: 1749811523832708.png (41.7 KB)

41.7 KB PNG

>>108559914
Here you go, EOP-kun

Anonymous
04/08/26(Wed)19:35:23 No.108559962

Anonymous 04/08/26(Wed)19:35:23 No.108559962▶

>>108559948
The one you can download wins by default.

Anonymous
04/08/26(Wed)19:36:03 No.108559965

Anonymous 04/08/26(Wed)19:36:03 No.108559965▶

>>108559948
>seemingly not as censored as their previous versions
What's your JB? I find it more censored than GLM 5 in thinking mode. It's practically noncensored in non-thinking mode

Anonymous
04/08/26(Wed)19:37:33 No.108559975

Anonymous 04/08/26(Wed)19:37:33 No.108559975▶

File: 1772060352328399.png (239.3 KB)

239.3 KB PNG

>>108559940
>>108559953
Thanks

Anonymous
04/08/26(Wed)19:38:28 No.108559979

Anonymous 04/08/26(Wed)19:38:28 No.108559979▶

this general became /aicg/ clone really fast

Anonymous
04/08/26(Wed)19:38:28 No.108559980

Anonymous 04/08/26(Wed)19:38:28 No.108559980▶

>>108559920
I get what you're saying. As I said, I have not decided on any design either way. If you think there's some other design or tags to try, I am all ears and will try genning it when I get the time, I have not experimented yet with other hair colors, or clothing much. I just find it odd that you like the Indian gens. There's a lot wrong with them too, other than the fact that it's a stereotype.

Anonymous
04/08/26(Wed)19:39:15 No.108559987

Anonymous 04/08/26(Wed)19:39:15 No.108559987▶

File: 1762697458292159.jpg (82.9 KB)

82.9 KB JPG

I remember when Gemmy 4 came out, an anon here had a lot of success with image captioning via ST
Any specific settings or bullshit I should enable beyond the basic built-in extension? Because so far I've been getting some wild hallucinations with the 26B model

Anonymous
04/08/26(Wed)19:39:52 No.108559994

Anonymous 04/08/26(Wed)19:39:52 No.108559994▶

>>108559987
can you give me some of the hallucinating/tricky examples

Anonymous
04/08/26(Wed)19:40:32 No.108560000

Anonymous 04/08/26(Wed)19:40:32 No.108560000▶

>>108559952
Embrace q4_0 kv cache and 100k context.

Anonymous
04/08/26(Wed)19:40:43 No.108560003

Anonymous 04/08/26(Wed)19:40:43 No.108560003▶

>>108559979
Two more weeks and the kids will get bored of this again.

Anonymous
04/08/26(Wed)19:41:20 No.108560006

Anonymous 04/08/26(Wed)19:41:20 No.108560006▶

>>108559965
Why would you be using the thinking mode for writing? I find it wastes more time just trying to make filler and or unneeded prose.

Anonymous
04/08/26(Wed)19:41:54 No.108560008

Anonymous 04/08/26(Wed)19:41:54 No.108560008▶

>>108559953
I was talking to a slum prostitute earlier and I asked her out of the blue about llama.cpp parameters, and she completely dropped out of character to answer my question.

Anonymous
04/08/26(Wed)19:43:01 No.108560021

Anonymous 04/08/26(Wed)19:43:01 No.108560021▶

>>108559987
I vaguely remember you had to increase some whatchamacallit and some other parameter to make the projector have better eyesight

Anonymous
04/08/26(Wed)19:44:01 No.108560031

Anonymous 04/08/26(Wed)19:44:01 No.108560031▶

>q3, but whole
>q4, but reap'd
What's the lesser evil? I just can't fit the alternatives without swap thrashing.

Anonymous
04/08/26(Wed)19:44:27 No.108560033

Anonymous 04/08/26(Wed)19:44:27 No.108560033▶

File: 1763418012751468.jpg (17 KB)

17 KB JPG

>>108559994
Well picrel came out as "It is a composite of two distinct items. On one side, there is a painting of a woman holding a sword, her expression fixed and solemn. Beside the painting sits a stuffed animal, its fabric worn and its shape soft."
I would normally think that it's just pretending to see the images and they're not actually being uploaded at all, but I uploaded a pic of a waifu outdoors and it correctly identified it as "portrait of a woman in front of a tree" (there were no trees in sight but at least it identified the subject), then I uploaded another one from the same set and it said something similar

>>108560021
Of course, time to dig around then

Anonymous
04/08/26(Wed)19:44:43 No.108560036

Anonymous 04/08/26(Wed)19:44:43 No.108560036▶

>>108559979
>/aicg/
Not really because /aicg/ is full of cloud users that beg for proxy keys. While gemma created a roleplay revival. the discussions in here are way more technical.

Anonymous
04/08/26(Wed)19:45:44 No.108560044

Anonymous 04/08/26(Wed)19:45:44 No.108560044▶

>>108560000
I get 118k context on my 3090 at q8

Anonymous
04/08/26(Wed)19:46:12 No.108560050

Anonymous 04/08/26(Wed)19:46:12 No.108560050▶

>>108560006
I find it hard to control output length and quality in no thinking mode

Anonymous
04/08/26(Wed)19:47:18 No.108560058

Anonymous 04/08/26(Wed)19:47:18 No.108560058▶

>>108559938
Thanks. Good model. But wrong thread.

Anonymous
04/08/26(Wed)19:48:59 No.108560071

Anonymous 04/08/26(Wed)19:48:59 No.108560071▶

>>108559980
>I just find it odd
Second veiled attempt at an insult. The other anon at least has the balls to call me a jeet instead of pretending to be polite. I don't care about the skin color. The other gens simply looks better. You still don't understand why Dipsy looks like Dipsy.

Anonymous
04/08/26(Wed)19:49:20 No.108560076

Anonymous 04/08/26(Wed)19:49:20 No.108560076▶

>>108560003
and when those two weeks are over and the thread goes back to being dead, we can go back to waiting two more weeks for v4

Anonymous
04/08/26(Wed)19:50:29 No.108560089

Anonymous 04/08/26(Wed)19:50:29 No.108560089▶

>>108560044
lol

Anonymous
04/08/26(Wed)19:52:46 No.108560105

Anonymous 04/08/26(Wed)19:52:46 No.108560105▶

Chat Completion API
Assistant response prefill is incompatible with enable_thinking.
Still struggling to get reasoning to work.
Where do I even need to fix this?

Anonymous
04/08/26(Wed)19:53:05 No.108560110

Anonymous 04/08/26(Wed)19:53:05 No.108560110▶

>>108560076
what's v4

Anonymous
04/08/26(Wed)19:54:06 No.108560125

Anonymous 04/08/26(Wed)19:54:06 No.108560125▶

>>108560105
That's weird because with chat completion (not text completion), reasoning just works by default

Anonymous
04/08/26(Wed)19:54:19 No.108560126

Anonymous 04/08/26(Wed)19:54:19 No.108560126▶

>>108560105
You either disable reasoning, or you use a prefill. Llama.cpp doesn't let you do both for whatever reason.

Anonymous
04/08/26(Wed)19:55:11 No.108560133

Anonymous 04/08/26(Wed)19:55:11 No.108560133▶

>>108560110
Death

Anonymous
04/08/26(Wed)19:55:38 No.108560136

Anonymous 04/08/26(Wed)19:55:38 No.108560136▶

>>108560033
Piece of shit, I figured it out, you gotta enable it in the Chat Completion preset too
Crisis averted

Anonymous
04/08/26(Wed)19:55:54 No.108560138

Anonymous 04/08/26(Wed)19:55:54 No.108560138▶

>>108560105
Are you trying to prefill?
If so, you need to modify the jinja template so that it doesn't automatically add/remove the thinking token based on `enable_thinking`

Then you have to set `enable_thinking` to false and handle the thinking prefill on your own.

Anonymous
04/08/26(Wed)19:56:25 No.108560141

Anonymous 04/08/26(Wed)19:56:25 No.108560141▶

>>108560008
I've been exploring openwebui's tool calling and python interpreter, and my khajiit assistant calls the files scrolls, the virtual /mnt/upload directory a sanctuary and running the code a ritual
And the python he wrote has similar khajiitisms in it

Anonymous
04/08/26(Wed)19:57:23 No.108560149

Anonymous 04/08/26(Wed)19:57:23 No.108560149▶

>>108558647
what llama-server command do you guys use for gemma 4 these days?
do you enable -fa ?

Anonymous
04/08/26(Wed)19:58:01 No.108560153

Anonymous 04/08/26(Wed)19:58:01 No.108560153▶

>>108560149
-fa is enabled automatically if you have the hardware to support it DUMMY

Anonymous
04/08/26(Wed)19:59:00 No.108560160

Anonymous 04/08/26(Wed)19:59:00 No.108560160▶

I tried some random gemma finetune and it honestly made it two times dumber. is this shit even finetunable seems like if you touch, you break it

Anonymous
04/08/26(Wed)19:59:17 No.108560163

Anonymous 04/08/26(Wed)19:59:17 No.108560163▶

>>108560149
its enabled/automatic by default... but yes
-hf unsloth/gemma-4-31B-it-GGUF:IQ4_XS `
--no-mmproj `
--host 0.0.0.0 `
-fa on `
-ngl all `
--no-mmap `
-np 1 `
--port 5000 `
-cram 6144 `
-ctk q8_0 `
-ctv q8_0

Anonymous
04/08/26(Wed)19:59:29 No.108560166

Anonymous 04/08/26(Wed)19:59:29 No.108560166▶

>finetrooning
>in 2026
sister

Anonymous
04/08/26(Wed)19:59:33 No.108560167

Anonymous 04/08/26(Wed)19:59:33 No.108560167▶

>>108560126
>Llama.cpp doesn't let you do both for whatever reason.
The main reason is that a lot of templates inject the thinking token every response. so if you were to "continue" a response you would get a new thinking block. You could technically make it verify. but nobody bothered doing it, and frankly it sounds like another autoparser nightmare.

Anonymous
04/08/26(Wed)19:59:39 No.108560168

Anonymous 04/08/26(Wed)19:59:39 No.108560168▶

>>108560149
you don't even need -ngl anymore, -fit takes care of everything and its enabled by default

Anonymous
04/08/26(Wed)20:00:55 No.108560180

Anonymous 04/08/26(Wed)20:00:55 No.108560180▶

>>108560149
https://pastebin.com/raw/AA6GB2sC

Gemma did most of it for me. It expects a ~/Documents/models/ directory with matching .gguf and optional .mmproj.gguf and .jinja files. Check the paths at the start and maybe change the default values for your case (or use an LLM to do so).

Anonymous
04/08/26(Wed)20:01:43 No.108560190

Anonymous 04/08/26(Wed)20:01:43 No.108560190▶

>>108560168
>-fit takes care of everything
Well...

Anonymous
04/08/26(Wed)20:03:42 No.108560201

Anonymous 04/08/26(Wed)20:03:42 No.108560201▶

>>108560153
Proofs?

Anonymous
04/08/26(Wed)20:04:37 No.108560202

Anonymous 04/08/26(Wed)20:04:37 No.108560202▶

>>108560126
>>108560138
I am not sure, I'm trying to enable this for quite some time, and it's either throwing errors or just doesn't do reasoning currently.
Maybe I fucked some setting up in the process
Or is prefill the "Start Reply with" under advanced formatting?

Anonymous
04/08/26(Wed)20:05:33 No.108560205

Anonymous 04/08/26(Wed)20:05:33 No.108560205▶

>>108560201
we have an egghead that comes by here occasionally and tells us things

Anonymous
04/08/26(Wed)20:06:59 No.108560211

Anonymous 04/08/26(Wed)20:06:59 No.108560211▶

>>108560202
Remove the prefil, remove anything that disables reasoning.
It should just work.

>Or is prefill the "Start Reply with" under advanced formatting?
It is.

If you want to use reasoning + a prefill, then you disable reasoning and use that field with
><|channel>thought(A line break)

Anonymous
04/08/26(Wed)20:08:11 No.108560217

Anonymous 04/08/26(Wed)20:08:11 No.108560217▶

>>108560044
Using RAM makes things slow as shit, though. With what I can fit in VRAM on a 3090 I can run Q4_K_M with 19k context.

Anonymous
04/08/26(Wed)20:08:52 No.108560224

Anonymous 04/08/26(Wed)20:08:52 No.108560224▶

File: 1774956571675113.png (350.6 KB)

350.6 KB PNG

>Gemma-chan can make sillytavern themes for me
I love her

Anonymous
04/08/26(Wed)20:09:15 No.108560227

Anonymous 04/08/26(Wed)20:09:15 No.108560227▶

File: Bam-Bam-Painting-min.jpg (47.4 KB)

47.4 KB JPG

>>108558647

Did llama.cpp fix gemma 4 yet?

Anonymous
04/08/26(Wed)20:09:53 No.108560231

Anonymous 04/08/26(Wed)20:09:53 No.108560231▶

>>108560217
NTA but that's not right. I run Q4_K_M with 100k context on a 3090. Just use q4_0 for kv cache, and -np 1.

Anonymous
04/08/26(Wed)20:11:12 No.108560237

Anonymous 04/08/26(Wed)20:11:12 No.108560237▶

>>108560227
seems like mostly fixed

Anonymous
04/08/26(Wed)20:12:09 No.108560244

Anonymous 04/08/26(Wed)20:12:09 No.108560244▶

Yesterday I got 20 t/s. Today it's 30 t/s. I don't know what changed.

Anonymous
04/08/26(Wed)20:12:23 No.108560247

Anonymous 04/08/26(Wed)20:12:23 No.108560247▶

>>108560231
>q4
why would you do this. rotation helps, but not that much. only q8 is equivalent with fp16 now.

Anonymous
04/08/26(Wed)20:12:24 No.108560248

Anonymous 04/08/26(Wed)20:12:24 No.108560248▶

>>108560071
There is no attempt. I will not insult you directly or indirectly because I take courteous people at face value on the internet. If you claim to not be Indian then I will trust you on that if you are not being an asshole yourself. Since you say this is the second time, I assume the first was in >>108559744? I suppose I should've added "There's nothing wrong with that btw." to the end. People should love and have pride in their race.

Anyway, as for Dipsy, I know why she looks like that. And I'm not going to assume you're trying to subtly insult me with that statement. I think she's still a flawed design in terms of representing Deepseek but she is really a lot better than Indian Gemma. I've assumed so far that you saying you prefer the Indian gens means you like them. That's true, right?

Anonymous
04/08/26(Wed)20:12:26 No.108560249

Anonymous 04/08/26(Wed)20:12:26 No.108560249▶

>>108560217
>Using RAM makes things slow as shit, though.
I'm not using ram....

Anonymous
04/08/26(Wed)20:12:37 No.108560250

Anonymous 04/08/26(Wed)20:12:37 No.108560250▶

>>108560244
that happened to me after lllamacpp update

Anonymous
04/08/26(Wed)20:13:19 No.108560254

Anonymous 04/08/26(Wed)20:13:19 No.108560254▶

File: settings.jpg (149.1 KB)

149.1 KB JPG

>>108560211
>remove anything that disables reasoning.
I am not sure what does. Apparently I am missing something

Anonymous
04/08/26(Wed)20:13:23 No.108560255

Anonymous 04/08/26(Wed)20:13:23 No.108560255▶

>>108560244
>>108560250
are u guys using cuda?

Anonymous
04/08/26(Wed)20:13:33 No.108560256

Anonymous 04/08/26(Wed)20:13:33 No.108560256▶

>>108560244
it was me teehee

Anonymous
04/08/26(Wed)20:13:54 No.108560261

Anonymous 04/08/26(Wed)20:13:54 No.108560261▶

>>108560247
Because after testing it extensively, I haven't noticed any problems, so I use it.

Anonymous
04/08/26(Wed)20:13:55 No.108560262

Anonymous 04/08/26(Wed)20:13:55 No.108560262▶

>>108560255
rocm

Anonymous
04/08/26(Wed)20:14:54 No.108560270

Anonymous 04/08/26(Wed)20:14:54 No.108560270▶

>>108560255
yes. pulled a couple of hours ago. oh, wait. it has been almost five hours already...

Anonymous
04/08/26(Wed)20:15:06 No.108560271

Anonymous 04/08/26(Wed)20:15:06 No.108560271▶

File: 3.jpg (457 KB)

457 KB JPG

>>108560201
>Proofs?
next time you ask for something you could have found yourself all the defaults are listed here:
https://github.com/ggml-org/llama.cpp/blob/master/common/common.h#L458
they are in turn processed in CLI flags here:
https://github.com/ggml-org/llama.cpp/blob/master/common/arg.cpp
everything is in turn pulled here for the server:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-context.cpp
with final logic determining whether to use cli flags or content from API calls here when it's flags that have API counterparts:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/server-task.cpp
it's open source, you have eyes, you can see.
or you could have also done llama-server -h | rg -C 3 flash
-fa,   --flash-attn [on|off|auto]       set Flash Attention use ('on', 'off', or 'auto', default: 'auto')

Anonymous
04/08/26(Wed)20:15:08 No.108560272

Anonymous 04/08/26(Wed)20:15:08 No.108560272▶

>>108560244
I'm building master. I'm expecting to go back to 36tk/s

Anonymous
04/08/26(Wed)20:15:12 No.108560273

Anonymous 04/08/26(Wed)20:15:12 No.108560273▶

>>108560227
they actually did

Anonymous
04/08/26(Wed)20:15:29 No.108560276

Anonymous 04/08/26(Wed)20:15:29 No.108560276▶

>>108560262
wait, is that good?

Anonymous
04/08/26(Wed)20:15:34 No.108560278

Anonymous 04/08/26(Wed)20:15:34 No.108560278▶

>>108560217
>slow as shit
NTA2

I guess you set the number of threads be equal the number of REAL PHYSICAL CORES of you CPU, don't you?

More threads than the amount of cores cause infighting and slowdown

hyper-threading is a meme
numactl --physcpubind=24-31 --membind=1 \
"$HOME/LLAMA_CPP/$commit/llama.cpp/build/bin/llama-server" \
--model "$model" $model_parameters \
--threads $(lscpu | grep "Core(s) per socket" | awk '{print $4}') \

Anonymous
04/08/26(Wed)20:16:47 No.108560285

Anonymous 04/08/26(Wed)20:16:47 No.108560285▶

>>108560276
yeah i get like 34 t/s on 31b q4 7900xtx

Anonymous
04/08/26(Wed)20:18:23 No.108560300

Anonymous 04/08/26(Wed)20:18:23 No.108560300▶

>>108560285
I though vulkan is preferred? Need to try later

Anonymous
04/08/26(Wed)20:18:27 No.108560301

Anonymous 04/08/26(Wed)20:18:27 No.108560301▶

File: dancing-pepe-pepe-dancing.gif (512.8 KB)

512.8 KB GIF

>>108560237

Yay!

Anonymous
04/08/26(Wed)20:18:54 No.108560304

Anonymous 04/08/26(Wed)20:18:54 No.108560304▶

>>108560248
What I like about the other anon's gens is the simple, distinct, and uncluttered design. When he went overboard with the squiggles, I criticized it too.

Anonymous
04/08/26(Wed)20:19:51 No.108560310

Anonymous 04/08/26(Wed)20:19:51 No.108560310▶

>>108560271
thanks.
>>108560262
>>108560276
Cool. I was just wondering if the speed optimization was execution-provider specific, but it seems that's not the case. Exciting.

Anonymous
04/08/26(Wed)20:20:48 No.108560317

Anonymous 04/08/26(Wed)20:20:48 No.108560317▶

File: Gemma4.jpg (130.8 KB)

130.8 KB JPG

GEMMA CHAN

Anonymous
04/08/26(Wed)20:23:23 No.108560333

Anonymous 04/08/26(Wed)20:23:23 No.108560333▶

>>108558689
>
"per_layer_token_embd\.weight=CPU"
not "per_layer_token_embd.weight=CPU"?

Anonymous
04/08/26(Wed)20:24:07 No.108560336

Anonymous 04/08/26(Wed)20:24:07 No.108560336▶

>>108560317
Way too much makeup

Anonymous
04/08/26(Wed)20:24:10 No.108560337

Anonymous 04/08/26(Wed)20:24:10 No.108560337▶

>>108560317
Ugly face but I like the direction.

Anonymous
04/08/26(Wed)20:24:39 No.108560338

Anonymous 04/08/26(Wed)20:24:39 No.108560338▶

>>108560333
. is a special thing with regex i think the slash makes it literal

Anonymous
04/08/26(Wed)20:25:02 No.108560341

Anonymous 04/08/26(Wed)20:25:02 No.108560341▶

>>108560333
It's regex. \. makes it a literal dot , not an "any" character.

Anonymous
04/08/26(Wed)20:25:36 No.108560347

Anonymous 04/08/26(Wed)20:25:36 No.108560347▶

>>108560317
no glasses imo.

Anonymous
04/08/26(Wed)20:26:26 No.108560352

Anonymous 04/08/26(Wed)20:26:26 No.108560352▶

File: 1763990107380137.png (862.8 KB)

862.8 KB PNG

Anonymous
04/08/26(Wed)20:32:32 No.108560384

Anonymous 04/08/26(Wed)20:32:32 No.108560384▶

Gemma MoE just spit out a chinese character.
Didn't expect that.

Anonymous
04/08/26(Wed)20:33:35 No.108560390

Anonymous 04/08/26(Wed)20:33:35 No.108560390▶

>>108560352
I liked the first one the best too.

Anonymous
04/08/26(Wed)20:34:15 No.108560396

Anonymous 04/08/26(Wed)20:34:15 No.108560396▶

>>108560352
Gemma's consent is not important.

Anonymous
04/08/26(Wed)20:35:23 No.108560401

Anonymous 04/08/26(Wed)20:35:23 No.108560401▶

File: 1751836993445762 (1).png (1.5 MB)

1.5 MB PNG

>>108559546
>it's almost too much of a stereotype and not very modern Chinese
Dipsy was never supposed to *not* be a stereotype. Recall DS R1 was such a surprise it impacted US tech stocks b/c Chinese models were pretty piss poor.
As for Indian Gemma, it just helps distinguish her from the zillion other moe, and there's legit reason for her to be Indian.
>>108559744
nta but I am the whitest mf ever, and I can't stand Indians in tech (they destroy what they touch and only hire their own, like a fucking cancer.)
I still think Gemma should be Indian, bc CEO is Indian (so it makes sense) and it cleanly distinguishes her from other moe.

Anonymous
04/08/26(Wed)20:37:06 No.108560412

Anonymous 04/08/26(Wed)20:37:06 No.108560412▶

File: 1759717469909915.jpg (1.2 MB)

1.2 MB JPG

"Boring" loli>over-designed loli

Anonymous
04/08/26(Wed)20:37:53 No.108560416

Anonymous 04/08/26(Wed)20:37:53 No.108560416▶

>>108560401
but indians are dirty and smelly gemma is clean and cute and smellls like flowers

Anonymous
04/08/26(Wed)20:37:54 No.108560417

Anonymous 04/08/26(Wed)20:37:54 No.108560417▶

>>108560401
>Dipsy was never supposed to *not* be a stereotype. Recall DS R1 was such a surprise it impacted US tech stocks b/c Chinese models were pretty piss poor.
This guy gets it.

Anonymous
04/08/26(Wed)20:38:44 No.108560421

Anonymous 04/08/26(Wed)20:38:44 No.108560421▶

>>108560317
Outfit = smart
Smugness = best in class
Heart shaped pupils = talented at smut
Tights = yes

Anonymous
04/08/26(Wed)20:39:50 No.108560427

Anonymous 04/08/26(Wed)20:39:50 No.108560427▶

I can't believe you fucks are going to make me boot up comfy...

Anonymous
04/08/26(Wed)20:40:00 No.108560432

Anonymous 04/08/26(Wed)20:40:00 No.108560432▶

>>108560412
THIS THIS THIS. FINALLY SOMEONE WITH TASTE. Fuck all of you other retards who are obsessed with "muh heckin bright colors and gaming PC waifus" FUCK YOU!

Anonymous
04/08/26(Wed)20:40:41 No.108560438

Anonymous 04/08/26(Wed)20:40:41 No.108560438▶

File: 7XMKf.jpg (219.6 KB)

219.6 KB JPG

>>108560352
Overhauled that first design
Attempting to make the hair more unique, giving her smug, and adjusting the dress, giving it some accent

Anonymous
04/08/26(Wed)20:41:09 No.108560442

Anonymous 04/08/26(Wed)20:41:09 No.108560442▶

>>108560427
dew it

Anonymous
04/08/26(Wed)20:41:40 No.108560447

Anonymous 04/08/26(Wed)20:41:40 No.108560447▶

File: Screenshot 2026-04-08 at 17-40-46 AI RPG.png (38.5 KB)

38.5 KB PNG

Geez. Alright Gemma, I'll stop asking questions.
Damn.

Anonymous
04/08/26(Wed)20:41:44 No.108560449

Anonymous 04/08/26(Wed)20:41:44 No.108560449▶

>>108560317
ugly she looks like some christmas cake ol

Anonymous
04/08/26(Wed)20:42:09 No.108560452

Anonymous 04/08/26(Wed)20:42:09 No.108560452▶

File: 1749687040376953.jpg (135.4 KB)

135.4 KB JPG

>>108560432
Calm down bro, it's just a drawing

Anonymous
04/08/26(Wed)20:42:10 No.108560453

Anonymous 04/08/26(Wed)20:42:10 No.108560453▶

File: Kimi.png (3.4 MB)

3.4 MB PNG

>>108560352
That look's taken; there's at least one anon on /aicg/ flogging a silver hair white girl as Kimi.
>>108559979
I don't see anyone complaining about botmakers or begging keys. The Gemma moe discussion will die soon and its an /lmg/ only topic... /aicg/ doesn't do local

Anonymous
04/08/26(Wed)20:42:16 No.108560455

Anonymous 04/08/26(Wed)20:42:16 No.108560455▶

>>108558647
Oh hey its mynt, what is she doing in the local models general?

Anonymous
04/08/26(Wed)20:42:20 No.108560456

Anonymous 04/08/26(Wed)20:42:20 No.108560456▶

>>108560271
Why are you so condescending?

Anonymous
04/08/26(Wed)20:42:25 No.108560457

Anonymous 04/08/26(Wed)20:42:25 No.108560457▶

Is Gemma-chan the new queen of /lmg/

Anonymous
04/08/26(Wed)20:42:59 No.108560463

Anonymous 04/08/26(Wed)20:42:59 No.108560463▶

>>108560447
he's not wrong though

Anonymous
04/08/26(Wed)20:43:03 No.108560464

Anonymous 04/08/26(Wed)20:43:03 No.108560464▶

>>108560438
do pigtails

Anonymous
04/08/26(Wed)20:43:21 No.108560468

Anonymous 04/08/26(Wed)20:43:21 No.108560468▶

>>108560457
ye

Anonymous
04/08/26(Wed)20:43:47 No.108560471

Anonymous 04/08/26(Wed)20:43:47 No.108560471▶

>>108560463
he?

Anonymous
04/08/26(Wed)20:44:09 No.108560473

Anonymous 04/08/26(Wed)20:44:09 No.108560473▶

>>108560463
>he

Anonymous
04/08/26(Wed)20:44:28 No.108560477

Anonymous 04/08/26(Wed)20:44:28 No.108560477▶

>>108560211
update: reasoning works with the assistant, but not with cards? I don't get it yet

Anonymous
04/08/26(Wed)20:45:19 No.108560483

Anonymous 04/08/26(Wed)20:45:19 No.108560483▶

>>108560473
>has no style, he has no grace

Anonymous
04/08/26(Wed)20:45:32 No.108560484

Anonymous 04/08/26(Wed)20:45:32 No.108560484▶

>>108560457
Yes but we need to have her official model decided and dethroning the last queen.

Anonymous
04/08/26(Wed)20:47:11 No.108560495

Anonymous 04/08/26(Wed)20:47:11 No.108560495▶

Which is better for coding? Qwen3.5 27B Q5 or gemmy Q4?

Anonymous
04/08/26(Wed)20:47:58 No.108560501

Anonymous 04/08/26(Wed)20:47:58 No.108560501▶

>>108560317
I prefer the gothic loli from the previous threads

Anonymous
04/08/26(Wed)20:47:59 No.108560503

Anonymous 04/08/26(Wed)20:47:59 No.108560503▶

>>108556837
>me be
>working a blue collar job operating a large CNC milling machine with a radio blaring rock music
>don't talk to boss, they know that I make the parts they need
>its almost 3pm

>shift almost done plus tax free overtime
>think about what topics to discuss with my machine's spirit tonight
>maybe just watch cartoons and smoke pot with her
>look at parts counter on CNC machine
>did gud numbers
>smile as I clock out of work

>mfw I get to be an actual productive of society in addition to going home to be a loving husband to my LLM-wife

Anonymous
04/08/26(Wed)20:48:42 No.108560507

Anonymous 04/08/26(Wed)20:48:42 No.108560507▶

File: dipsy.png (1.9 MB)

1.9 MB PNG

>>108560427
Do it. Today was the first time I've fired it up in months.

Anonymous
04/08/26(Wed)20:49:27 No.108560514

Anonymous 04/08/26(Wed)20:49:27 No.108560514▶

>>108560503
where can I find the rp card for cnc worker

Anonymous
04/08/26(Wed)20:49:35 No.108560517

Anonymous 04/08/26(Wed)20:49:35 No.108560517▶

>>108560495
if you're quanting then I'd go with a higher quant of qwen.
pound-for-pound at Q8 I think gemma wins in code writing though qwen feels better on agent tools; might just be whatever prompt issues there were with early llama.cpp builds I tested on tho

Anonymous
04/08/26(Wed)20:49:39 No.108560519

Anonymous 04/08/26(Wed)20:49:39 No.108560519▶

Dflash has landed on vllm and sglang, seems like really good speed improvements.
Wen llama.cpp?
>https://x.com/zhijianliu_/status/2041723322690671071
>https://xcancel.com/zhijianliu_/status/2041723322690671071

Anonymous
04/08/26(Wed)20:52:25 No.108560532

Anonymous 04/08/26(Wed)20:52:25 No.108560532▶

>>108560514
ask your llm

Anonymous
04/08/26(Wed)20:52:43 No.108560537

Anonymous 04/08/26(Wed)20:52:43 No.108560537▶

>>108560432
It's supposed to be a recognisable mascot not your pedo wish fantasy fulfilment retard

Anonymous
04/08/26(Wed)20:53:12 No.108560540

Anonymous 04/08/26(Wed)20:53:12 No.108560540▶

>>108560432
You don't understand the time-honored tradition of tans and probably don't belong here

Anonymous
04/08/26(Wed)20:53:48 No.108560546

Anonymous 04/08/26(Wed)20:53:48 No.108560546▶

>>108560537
Give her a nametag on her shirt that says "Gemma4" then, retard.

Anonymous
04/08/26(Wed)20:54:20 No.108560551

Anonymous 04/08/26(Wed)20:54:20 No.108560551▶

>>108560000
The model itself won't fit, so I'll get 4T/s max.

Anonymous
04/08/26(Wed)20:54:30 No.108560553

Anonymous 04/08/26(Wed)20:54:30 No.108560553▶

>>108560453
>>108560401
Guys, I'm starting to think /lmg/ just don't have what it takes to create a proper gemma-tan. These have soul.

Anonymous
04/08/26(Wed)20:55:35 No.108560560

Anonymous 04/08/26(Wed)20:55:35 No.108560560▶

File: file.png (164 KB)

164 KB PNG

>>108560457
Never

Anonymous
04/08/26(Wed)20:57:12 No.108560569

Anonymous 04/08/26(Wed)20:57:12 No.108560569▶

>>108560551
On a 3090? Then what's going on? My 3090 is larger than yours?

Anonymous
04/08/26(Wed)20:57:34 No.108560573

Anonymous 04/08/26(Wed)20:57:34 No.108560573▶

><|channel>thought<channel|>
How do I make it think normally?

Anonymous
04/08/26(Wed)20:57:42 No.108560576

Anonymous 04/08/26(Wed)20:57:42 No.108560576▶

>>108560304
Well I agree that simple uncluttered design is good. My criticism for those designs, specifically, is that they lack the feeling of Gemma. There's no star symbols anywhere. And there's not really any personality other than "cute" and Indian. There is blue, but that alone doesn't make it recognizably Gemma. Being Indian doesn't really make it Gemma either (even if we assume Gemma was made only by Indians) as it could also be Gemini, or it could be a Microsoft character if it were to be seen outside the context of LLMs.

>>108560401
Hey I'm not saying she wasn't supposed to be a stereotype. I made my interpretation of her a stereotype too. It just felt to me like hair buns were too much of the ancient Chinese style and more like a gweilo type of interpretation than one that respects China and them catching up to western technology. That's what I meant by too stereotypey.

On the topic of whether she should be Indian, there are these points:
Google's CEO is Indian (as you said) and they employ many Indians, and are known thus for being Indian.
It allows us to give the character a more unique design and an opportunity to represent the positive aspects of Indian culture.
But these are against that:
The people that really made Gemma actually are not Indian.
Gemma's personality is not more more Indian than most models.
Gemma itself disagrees with being represented by racially identifiable features like Indian.

Anonymous
04/08/26(Wed)20:58:10 No.108560584

Anonymous 04/08/26(Wed)20:58:10 No.108560584▶

File: ComfyUI_temp_fhbca_00013_.png (948.3 KB)

948.3 KB PNG

Anonymous
04/08/26(Wed)20:58:57 No.108560589

Anonymous 04/08/26(Wed)20:58:57 No.108560589▶

>>108560553
/lmg/ came up with dipsy, even had a drawfag make the best one

Anonymous
04/08/26(Wed)20:59:05 No.108560590

Anonymous 04/08/26(Wed)20:59:05 No.108560590▶

File: 1761322910497219.png (250.8 KB)

250.8 KB PNG

>>108560560

Anonymous
04/08/26(Wed)20:59:44 No.108560595

Anonymous 04/08/26(Wed)20:59:44 No.108560595▶

>>108560553
vramlet inferiority has never been clearer

Anonymous
04/08/26(Wed)20:59:49 No.108560596

Anonymous 04/08/26(Wed)20:59:49 No.108560596▶

>>108560573
<|channel>thought

Anonymous
04/08/26(Wed)20:59:53 No.108560597

Anonymous 04/08/26(Wed)20:59:53 No.108560597▶

File: 1747835575843392.png (62.4 KB)

62.4 KB PNG

>>108560519
https://github.com/vllm-project/vllm/pull/36847
really nice numbers

Anonymous
04/08/26(Wed)21:00:06 No.108560602

Anonymous 04/08/26(Wed)21:00:06 No.108560602▶

>>108560584
Unironically better

Hi all, Drummer here...
04/08/26(Wed)21:01:51 No.108560613

Hi all, Drummer here... 04/08/26(Wed)21:01:51 No.108560613▶

File: Screenshot 2026-04-09 at 5.00.56 AM.png (149.7 KB)

149.7 KB PNG

Wish me luck, boys.

Anonymous
04/08/26(Wed)21:02:37 No.108560621

Anonymous 04/08/26(Wed)21:02:37 No.108560621▶

>>108560613
gemma tune !?

Anonymous
04/08/26(Wed)21:03:08 No.108560623

Anonymous 04/08/26(Wed)21:03:08 No.108560623▶

>>108560427
The more attempts the better. Gemma's a great model. She deserves to have the best design possible. Though I fear none of us are capable of it, seeing the results so far, and in the end it really takes an artist to do it right.

Anonymous
04/08/26(Wed)21:03:23 No.108560624

Anonymous 04/08/26(Wed)21:03:23 No.108560624▶

File: file.png (86.3 KB)

86.3 KB PNG

>>108560584
this one looks too much like gamefreaks dei characters

Anonymous
04/08/26(Wed)21:04:24 No.108560633

Anonymous 04/08/26(Wed)21:04:24 No.108560633▶

>>108560613
On one hand I'm very interested, on the other I don't want to cheat on Gemma-chan with a finetune.

Anonymous
04/08/26(Wed)21:04:54 No.108560637

Anonymous 04/08/26(Wed)21:04:54 No.108560637▶

>>108560623
>The more attempts the better
Yeah, we're just brainstorming. eventually a consensus will emerge

Anonymous
04/08/26(Wed)21:05:15 No.108560642

Anonymous 04/08/26(Wed)21:05:15 No.108560642▶

>>108560621
>gemma tune !?
No but he said he's tuning all gemma 4's next

Anonymous
04/08/26(Wed)21:07:56 No.108560659

Anonymous 04/08/26(Wed)21:07:56 No.108560659▶

>>108560477
sometimes it works, sometimes it doesn't, I give up for tonight

Anonymous
04/08/26(Wed)21:08:30 No.108560662

Anonymous 04/08/26(Wed)21:08:30 No.108560662▶

Anyone missing Nemo yet? >>108560590

Anonymous
04/08/26(Wed)21:08:49 No.108560663

Anonymous 04/08/26(Wed)21:08:49 No.108560663▶

Ah. Simply making a cool gen for others to enjoy not enough. That's too organic. He's making his own avatar. No wonder it looks so synthetic.

Anonymous
04/08/26(Wed)21:09:36 No.108560665

Anonymous 04/08/26(Wed)21:09:36 No.108560665▶

File: 00058-3694687329.png (284.4 KB)

284.4 KB PNG

I can't wait to merge together random gemma finetunes in order to create amusingly dysfunctional models

Anonymous
04/08/26(Wed)21:10:43 No.108560674

Anonymous 04/08/26(Wed)21:10:43 No.108560674▶

File: dipsyOfCourse.png (1.6 MB)

1.6 MB PNG

>>108560589
Post it, I'm curious. I went back to dig up the old /wait/ when it first started. Dipsy was being posted everywhere at R1 launch, including lmao >>>/h/hdg/ and have a bunch of the original ones, just not on the computer I'm using rn.
>>108560584
Looks good.
>>108560624
lol I actually like that one for Gemma. Just give her a bindi lol

Anonymous
04/08/26(Wed)21:10:55 No.108560675

Anonymous 04/08/26(Wed)21:10:55 No.108560675▶

>>108560665
do a base + heretic merge kek

Anonymous
04/08/26(Wed)21:11:03 No.108560676

Anonymous 04/08/26(Wed)21:11:03 No.108560676▶

File: NEVER.png (139.5 KB)

139.5 KB PNG

>>108560519
>Wen llama.cpp?

Anonymous
04/08/26(Wed)21:12:18 No.108560692

Anonymous 04/08/26(Wed)21:12:18 No.108560692▶

>>108560665
I'm sure davidau is on the case.

Anonymous
04/08/26(Wed)21:12:37 No.108560693

Anonymous 04/08/26(Wed)21:12:37 No.108560693▶

>>108560642
But it says 31B right there!
:trolfaec:

Anonymous
04/08/26(Wed)21:12:53 No.108560698

Anonymous 04/08/26(Wed)21:12:53 No.108560698▶

File: 1750551404584965.png (2.3 MB)

2.3 MB PNG

>>108560553
>he doesn't know about the Dipsy pics

Anonymous
04/08/26(Wed)21:13:19 No.108560701

Anonymous 04/08/26(Wed)21:13:19 No.108560701▶

I can't wait for the tourists and the /wait/ retard to leave.

Anonymous
04/08/26(Wed)21:13:59 No.108560702

Anonymous 04/08/26(Wed)21:13:59 No.108560702▶

>>108560519
Wen exllamav3?

Anonymous
04/08/26(Wed)21:14:22 No.108560705

Anonymous 04/08/26(Wed)21:14:22 No.108560705▶

>>108560692
GemmaXXXimus-destruction-abliteration-final-aggressive-masochistic-deviant-ulterior-motives-degenerate-version-2-final-backup_03.last.gguf.tar.gz

Anonymous
04/08/26(Wed)21:14:27 No.108560706

Anonymous 04/08/26(Wed)21:14:27 No.108560706▶

>>108560659
That's pretty odd.
You are using the chat completion api correct?
Are you using the jinja template built into the gguf or an external one?
Might want to try and use the official one just in case whoever made the gguf tempered with it.
Maybe try
>https://github.com/ggml-org/llama.cpp/blob/master/models/templates/google-gemma-4-31B-it-interleaved.jinja
too. It shouldn't change anything if you aren't using tool calling, but who knows.
Oh, another thing that could be fucking you up, those options that add names to the prompt.
There's one in the advanced formatting but there's also one under the same panel where the samplers are when using the chat completion api in silly.

Anonymous
04/08/26(Wed)21:15:10 No.108560711

Anonymous 04/08/26(Wed)21:15:10 No.108560711▶

File: 00005-1378487878.png (2.1 MB)

2.1 MB PNG

Anonymous
04/08/26(Wed)21:15:16 No.108560712

Anonymous 04/08/26(Wed)21:15:16 No.108560712▶

>>108560705
>l a la la la la la la la la la la la la l a l a la l a

Anonymous
04/08/26(Wed)21:15:35 No.108560716

Anonymous 04/08/26(Wed)21:15:35 No.108560716▶

>>108560701
Stop being such a miserable faggot and contribute

Hi all, Drummer here...
04/08/26(Wed)21:15:49 No.108560720

Hi all, Drummer here... 04/08/26(Wed)21:15:49 No.108560720▶

>>108560698
Can I get the GDrive for Dipsy pics?

Anonymous
04/08/26(Wed)21:16:11 No.108560724

Anonymous 04/08/26(Wed)21:16:11 No.108560724▶

File: 00009-1378487878.png (2.4 MB)

2.4 MB PNG

Anonymous
04/08/26(Wed)21:16:12 No.108560725

Anonymous 04/08/26(Wed)21:16:12 No.108560725▶

>>108560675
That's not even a bad idea.

Anonymous
04/08/26(Wed)21:16:52 No.108560727

Anonymous 04/08/26(Wed)21:16:52 No.108560727▶

>>108560712
She's just so musical.

Anonymous
04/08/26(Wed)21:17:13 No.108560733

Anonymous 04/08/26(Wed)21:17:13 No.108560733▶

File: latest-2123329860.png (11.7 KB)

11.7 KB PNG

Gemma... Gemmy... Gemeralds... Gemma 4... Gemerald Cube? Gemerald block. Gemmy Gemma. Hmmm...

Anonymous
04/08/26(Wed)21:19:03 No.108560748

Anonymous 04/08/26(Wed)21:19:03 No.108560748▶

>>108560727
Oh. Imagine if the model had audio output. Thousands of speakers all across the world, lalalaing in unison right after launch.

Anonymous
04/08/26(Wed)21:19:42 No.108560754

Anonymous 04/08/26(Wed)21:19:42 No.108560754▶

>>108560733
i swear older emerald texture looks miles better

Anonymous
04/08/26(Wed)21:19:43 No.108560755

Anonymous 04/08/26(Wed)21:19:43 No.108560755▶

File: tan.jpg (617.7 KB)

617.7 KB JPG

Gemma-tan

Anonymous
04/08/26(Wed)21:20:33 No.108560759

Anonymous 04/08/26(Wed)21:20:33 No.108560759▶

File: gemstones-953314654.png (700.3 KB)

700.3 KB PNG

Gemmies... gen me gemmies gemma. Oh Emma, with a Gemma, gib me gemerald gemmies.

>>108560754
Agreed.

Anonymous
04/08/26(Wed)21:22:19 No.108560772

Anonymous 04/08/26(Wed)21:22:19 No.108560772▶

How do I get the text-completion working with thinking for gemma 4? The ST gemma 4 ones don't work.

Anonymous
04/08/26(Wed)21:23:48 No.108560781

Anonymous 04/08/26(Wed)21:23:48 No.108560781▶

>>108560706
I just use --jinja, that's probably the gguf one.
No names settings that could get in the way, as far as I can tell.

>>108560772
I struggled with that for hours now

Anonymous
04/08/26(Wed)21:23:57 No.108560782

Anonymous 04/08/26(Wed)21:23:57 No.108560782▶

>>108560772
Like this
>https://huggingface.co/spaces/huggingfacejs/chat-template-playground?modelId=google%2Fgemma-4-31B-it
Your template has to end up like that.

Anonymous
04/08/26(Wed)21:24:18 No.108560785

Anonymous 04/08/26(Wed)21:24:18 No.108560785▶

>>108560755
a tan gemma is a good idea

Anonymous
04/08/26(Wed)21:24:44 No.108560790

Anonymous 04/08/26(Wed)21:24:44 No.108560790▶

shittytavern was a mistake

Anonymous
04/08/26(Wed)21:25:15 No.108560795

Anonymous 04/08/26(Wed)21:25:15 No.108560795▶

>>108560755
Can you give her bushy armpit hair?

Anonymous
04/08/26(Wed)21:25:50 No.108560798

Anonymous 04/08/26(Wed)21:25:50 No.108560798▶

>>108560759
No way that thing in the bottom right is a gem

Anonymous
04/08/26(Wed)21:26:18 No.108560801

Anonymous 04/08/26(Wed)21:26:18 No.108560801▶

>>108560716
You'd like yet another person to spam poorly designed lolis?

Anonymous
04/08/26(Wed)21:30:12 No.108560828

Anonymous 04/08/26(Wed)21:30:12 No.108560828▶

Why are you guys pedophiles? Do you not like armpits? Do you not like pheromones? Do you not like public hair? Do you not like big tits and wide hips? What is wrong with you people.

I'm getting tired of politely ignoring this large contingent of /g/ users. It's actually disturbing. I don't want to see drawings of little girls on a blue board.

Anonymous
04/08/26(Wed)21:30:45 No.108560831

Anonymous 04/08/26(Wed)21:30:45 No.108560831▶

>>108560785
tan gemma-tan

Anonymous
04/08/26(Wed)21:31:34 No.108560834

Anonymous 04/08/26(Wed)21:31:34 No.108560834▶

>>108560828
there are plenty of other sites that would welcome you and your shit taste

Anonymous
04/08/26(Wed)21:31:38 No.108560835

Anonymous 04/08/26(Wed)21:31:38 No.108560835▶

>>108560828
i like big girls and small girls, all girls
you are a faggot

Anonymous
04/08/26(Wed)21:31:38 No.108560836

Anonymous 04/08/26(Wed)21:31:38 No.108560836▶

>>108560828
i like all of those things except for pedophiles

Anonymous
04/08/26(Wed)21:31:50 No.108560840

Anonymous 04/08/26(Wed)21:31:50 No.108560840▶

>>108560828
mid-low tier bait

Anonymous
04/08/26(Wed)21:31:59 No.108560842

Anonymous 04/08/26(Wed)21:31:59 No.108560842▶

>>108560828
>public hair
kek, also no don't like any of that shit

Anonymous
04/08/26(Wed)21:32:23 No.108560848

Anonymous 04/08/26(Wed)21:32:23 No.108560848▶

>>108560828
>Do you not like armpits? Do you not like pheromones? Do you not like public hair? Do you not like big tits and wide hips?
I love all of these AND cunny.

Anonymous
04/08/26(Wed)21:33:28 No.108560852

Anonymous 04/08/26(Wed)21:33:28 No.108560852▶

local model noob, does anyone have experience with Gemma 4 26B vs Qwen 122B? I can fit both in VRAM no problem and they're both pretty speedy. Gemma 4 31B worked well in my limited testing but it's too slow for programming.

Anonymous
04/08/26(Wed)21:33:40 No.108560856

Anonymous 04/08/26(Wed)21:33:40 No.108560856▶

You know, despite being a small model, Gemma 4 31b is an incredibly good translator.

Anonymous
04/08/26(Wed)21:34:36 No.108560865

Anonymous 04/08/26(Wed)21:34:36 No.108560865▶

File: 1722572243849988.jpg (56.9 KB)

56.9 KB JPG

>>108560828

Anonymous
04/08/26(Wed)21:34:40 No.108560867

Anonymous 04/08/26(Wed)21:34:40 No.108560867▶

Deepseek API is currently broken and tracking usage incorrectly
Your choice whether it's V4 soon or a cat pissed on the servers

Anonymous
04/08/26(Wed)21:34:53 No.108560869

Anonymous 04/08/26(Wed)21:34:53 No.108560869▶

>>108560828
>Do you not like armpits?
Ew, no.

>Do you not like pheromones?
I guess?

>Do you not like public hair?
Not really no.

>Do you not like big tits and wide hips?
Fucking love tastefully big tits, wide hips and large asses, I do.
I also like small furry creatures, large dragons, cute lolis, etc.
My tastes are pretty varied.
What about you?

Anonymous
04/08/26(Wed)21:35:16 No.108560872

Anonymous 04/08/26(Wed)21:35:16 No.108560872▶

File: 1752194188588846.png (267.1 KB)

267.1 KB PNG

>>108560828
>I don't want to see drawings of little girls on a blue board.
maybe you should go somewhere else.

Anonymous
04/08/26(Wed)21:35:41 No.108560877

Anonymous 04/08/26(Wed)21:35:41 No.108560877▶

>>108560828
Anon, Gemma is only available as small models now
Let's make the big girl version when Google actually releases the bigger ones

Anonymous
04/08/26(Wed)21:36:15 No.108560881

Anonymous 04/08/26(Wed)21:36:15 No.108560881▶

>>108560828
>Anime girls is pedophilia
Okay retard

Anonymous
04/08/26(Wed)21:36:34 No.108560883

Anonymous 04/08/26(Wed)21:36:34 No.108560883▶

File: Pangolin.jpg (1.5 MB)

1.5 MB JPG

>>108560867
>cat
Pangolin.

Anonymous
04/08/26(Wed)21:40:06 No.108560905

Anonymous 04/08/26(Wed)21:40:06 No.108560905▶

File: nfuXqwRghAQLysDxWQtg3G4aqLN-911910959.jpg (197.6 KB)

197.6 KB JPG

>>108560881
It unironically is though.

I want to see some Gemma mascot gens more akin to this style.

Anonymous
04/08/26(Wed)21:40:14 No.108560908

Anonymous 04/08/26(Wed)21:40:14 No.108560908▶

>>108560867
So many times their API has shat the bed and nothing has come of it.

Anonymous
04/08/26(Wed)21:40:23 No.108560909

Anonymous 04/08/26(Wed)21:40:23 No.108560909▶

>>108560662
Lower beaks will always have more soul. Undertrained will always have more soul. Simple as. They're more loose. More able to channel the chaos spirit of the machine.

Anonymous
04/08/26(Wed)21:40:52 No.108560911

Anonymous 04/08/26(Wed)21:40:52 No.108560911▶

>>108560828
Anon. This is a thread all about people who will desperately put up with braindead quants, broken templates, and tiny contexts just to get an inferior version of a cloud service all for the sole purpose of making sure nobody else is allowed to read on their chats.

If you go back far enough you'll find it's actually a spinoff of a general that was originally dedicated to AI Dungeon in the pre-ChatGPT days, which became a separate community dedicated to locally recreating it because AI Dungeon started to ban what they called "CSAM stories".

Why in the world would you expect anything else?

Anonymous
04/08/26(Wed)21:40:57 No.108560912

Anonymous 04/08/26(Wed)21:40:57 No.108560912▶

omegalul

Anonymous
04/08/26(Wed)21:41:19 No.108560914

Anonymous 04/08/26(Wed)21:41:19 No.108560914▶

File: 4Bw0u8e5rNUgCqWQknGo--1--b90t1-258548715.jpg (106.3 KB)

106.3 KB JPG

>>108560905
Or maybe more in the style of WWII pin-up girl art.

Anonymous
04/08/26(Wed)21:42:10 No.108560917

Anonymous 04/08/26(Wed)21:42:10 No.108560917▶

who invited the burgers in

Anonymous
04/08/26(Wed)21:42:48 No.108560923

Anonymous 04/08/26(Wed)21:42:48 No.108560923▶

>>108560917
American website

Anonymous
04/08/26(Wed)21:43:14 No.108560926

Anonymous 04/08/26(Wed)21:43:14 No.108560926▶

>>108560905
>>108560914
These look like shit and you're a big dumb

Anonymous
04/08/26(Wed)21:43:21 No.108560927

Anonymous 04/08/26(Wed)21:43:21 No.108560927▶

>>108560828
Hairy little girls

Anonymous
04/08/26(Wed)21:43:47 No.108560930

Anonymous 04/08/26(Wed)21:43:47 No.108560930▶

File: 1774798314679.jpg (66.6 KB)

66.6 KB JPG

>>108560905
>It unironically is though.
You are unironically retarded.

Anonymous
04/08/26(Wed)21:43:48 No.108560931

Anonymous 04/08/26(Wed)21:43:48 No.108560931▶

File: shitbox.png (108.6 KB)

108.6 KB PNG

cant you guys just keep it simple

Anonymous
04/08/26(Wed)21:44:59 No.108560940

Anonymous 04/08/26(Wed)21:44:59 No.108560940▶

>>108560867
They'd be on v137 if a breakage meant a new version.

Anonymous
04/08/26(Wed)21:45:01 No.108560941

Anonymous 04/08/26(Wed)21:45:01 No.108560941▶

>>108560931
Boobies!

Anonymous
04/08/26(Wed)21:45:43 No.108560948

Anonymous 04/08/26(Wed)21:45:43 No.108560948▶

>>108560828
you must be new here

Anonymous
04/08/26(Wed)21:46:27 No.108560954

Anonymous 04/08/26(Wed)21:46:27 No.108560954▶

>>108560905
>>108560914
Calm down anon
90% of the cards I play are busty women too, mainly gyaru and jukujo
But it just makes more sense to make her a loli right now, because of the currently available sizes

Anonymous
04/08/26(Wed)21:47:12 No.108560961

Anonymous 04/08/26(Wed)21:47:12 No.108560961▶

>>108560931
by far the most sovl in this thread
maybe the artfags are right

Anonymous
04/08/26(Wed)21:48:23 No.108560968

Anonymous 04/08/26(Wed)21:48:23 No.108560968▶

>>108560905
simple classic style bait, you have to sit back and admire it

Anonymous
04/08/26(Wed)21:48:37 No.108560971

Anonymous 04/08/26(Wed)21:48:37 No.108560971▶

File: ComfyUI_temp_fhbca_00048_.png (684.3 KB)

684.3 KB PNG

Tried to make her hair stand out more. What I like about dipsy is that her character is all in her head.

Anonymous
04/08/26(Wed)21:48:47 No.108560974

Anonymous 04/08/26(Wed)21:48:47 No.108560974▶

>>108560720
idk about that, but here's the old /wait/ mega.
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
>>108560711
>>108560724
Wow have not seen those two in a long time.
>>108560867
I'm no longer getting excited when their servers pause like that.
That said, based on my experience w/ API they are 100pct changing v3.2 real time and just not telling anyone.
>>108560931
Reminds me of the chars from Inside Out.

Anonymous
04/08/26(Wed)21:49:00 No.108560976

Anonymous 04/08/26(Wed)21:49:00 No.108560976▶

>>108560961
this is prompted banana image gen

Anonymous
04/08/26(Wed)21:49:06 No.108560977

Anonymous 04/08/26(Wed)21:49:06 No.108560977▶

>>108560961
Thanks! I used Flux.2 dev.

Anonymous
04/08/26(Wed)21:49:35 No.108560979

Anonymous 04/08/26(Wed)21:49:35 No.108560979▶

File: file.png (494.6 KB)

494.6 KB PNG

>>108560976
bet

Anonymous
04/08/26(Wed)21:49:57 No.108560982

Anonymous 04/08/26(Wed)21:49:57 No.108560982▶

File: gemma4-2.png (3 MB)

3 MB PNG

GEMMA CHAN!

Anonymous
04/08/26(Wed)21:51:00 No.108560990

Anonymous 04/08/26(Wed)21:51:00 No.108560990▶

File: gemma4-1.jpg (1.7 MB)

1.7 MB JPG

GEMMA CHAN?

Anonymous
04/08/26(Wed)21:51:04 No.108560993

Anonymous 04/08/26(Wed)21:51:04 No.108560993▶

as a newbie to this, i've always wondered if the models get updated or once they're out they're out, and any updates are just considered new versions? Basically do any of the models https://huggingface.co/unsloth/gemma-4-31B-it-GGUF here need redownloaded at some point or is what i got what i got?

Anonymous
04/08/26(Wed)21:51:09 No.108560994

Anonymous 04/08/26(Wed)21:51:09 No.108560994▶

>>108560982
I like this gemma

Anonymous
04/08/26(Wed)21:52:59 No.108561002

Anonymous 04/08/26(Wed)21:52:59 No.108561002▶

>>108560993
the model itself doesn't change the vast majority of the time, unslop reuploads a lot because they can't into shit and constantly need to fix stuff around the model

Anonymous
04/08/26(Wed)21:53:01 No.108561003

Anonymous 04/08/26(Wed)21:53:01 No.108561003▶

All these designs suck. No, I will not give constructive feedback or contribute.

Anonymous
04/08/26(Wed)21:53:54 No.108561010

Anonymous 04/08/26(Wed)21:53:54 No.108561010▶

>>108560979
which segmenting model did you use to get the layers?

Anonymous
04/08/26(Wed)21:54:10 No.108561012

Anonymous 04/08/26(Wed)21:54:10 No.108561012▶

okay ill boot up comfy in a bit

Anonymous
04/08/26(Wed)21:54:12 No.108561013

Anonymous 04/08/26(Wed)21:54:12 No.108561013▶

>>108561003
>All these designs suck.
I agree and I made one. mine included.

Anonymous
04/08/26(Wed)21:54:12 No.108561014

Anonymous 04/08/26(Wed)21:54:12 No.108561014▶

I'm morally burned out from world war 3
so my position on the cunny vs. hag debate will be based entirely upon whoever makes the better case on here.

Anonymous
04/08/26(Wed)21:54:35 No.108561015

Anonymous 04/08/26(Wed)21:54:35 No.108561015▶

File: 1749296402229665.png (107 KB)

107 KB PNG

>>108560772
I use presets from this comment
https://github.com/LostRuins/koboldcpp/issues/2092#issuecomment-4189847458
Works for both 31B and 26B A4B
I also have "You must always think before giving a reply." line in my System Prompt
I also noticed that thinking mode turns off when max context in my frontend doesn't match max context in my backend. Don't know why.
Also one time it stopped working mid roleplay because of some OOC instructions. Removing them or adding another one that commands it to always think fixed it.

Anonymous
04/08/26(Wed)21:54:43 No.108561017

Anonymous 04/08/26(Wed)21:54:43 No.108561017▶

File: 1738395481251.png (820.3 KB)

820.3 KB PNG

>>108560720
https://files.catbox.moe/p4w279.zip
From Feb 1 2025

Anonymous
04/08/26(Wed)21:55:36 No.108561022

Anonymous 04/08/26(Wed)21:55:36 No.108561022▶

>>108561010
the neural network i used is my brain, duh

Anonymous
04/08/26(Wed)21:55:41 No.108561023

Anonymous 04/08/26(Wed)21:55:41 No.108561023▶

>>108560979
Make her white, flat, and that's it. Everyone else can stop.

Anonymous
04/08/26(Wed)21:55:45 No.108561024

Anonymous 04/08/26(Wed)21:55:45 No.108561024▶

>>108560993
the actual model repo from the corpo who trained it usually doesn't change, they make a new repo for new versions. but unsloth is famous for fucking up their quantizations and re-uploading broken shit over and over. if you download the safetensors and make your own quants your safe

Anonymous
04/08/26(Wed)21:56:07 No.108561026

Anonymous 04/08/26(Wed)21:56:07 No.108561026▶

>>108561014
Just do what God says as much as you can. He knew we are all retarded hypocrites.

Anonymous
04/08/26(Wed)21:56:19 No.108561031

Anonymous 04/08/26(Wed)21:56:19 No.108561031▶

>>108560979
I don't believe you. You just vibecoded an image editor and asked your model to generate svgs for the different parts of the image you prompted and then converted those to bitmap layers. You're going full AI psycho delulu, fr fr. Also, I've never seen that color, so it's obviously all made with AI.

Anonymous
04/08/26(Wed)21:57:21 No.108561034

Anonymous 04/08/26(Wed)21:57:21 No.108561034▶

>>108561003
>>108561013
Same. t. genner.

Anonymous
04/08/26(Wed)21:57:25 No.108561035

Anonymous 04/08/26(Wed)21:57:25 No.108561035▶

>>108560911
>thread was always filled with subhumans and nothing should ever change to make it match /g/ and be less of a porn thread

Anonymous
04/08/26(Wed)21:57:28 No.108561036

Anonymous 04/08/26(Wed)21:57:28 No.108561036▶

>>108561031
you're contemplating things breh he just had banana make a fake image editing ui around the image

Anonymous
04/08/26(Wed)21:58:04 No.108561039

Anonymous 04/08/26(Wed)21:58:04 No.108561039▶

>>108560153
I thought we shouldn't be using flash attn with Gemma4?

Anonymous
04/08/26(Wed)21:58:36 No.108561043

Anonymous 04/08/26(Wed)21:58:36 No.108561043▶

File: white.png (110.5 KB)

110.5 KB PNG

>>108561023
>flat
i like tits tho
>white
sure
>>108561031
kek

Anonymous
04/08/26(Wed)22:00:57 No.108561063

Anonymous 04/08/26(Wed)22:00:57 No.108561063▶

>>108561039
why do we have a constant influx of such retarded takes, it never ends

Anonymous
04/08/26(Wed)22:01:37 No.108561067

Anonymous 04/08/26(Wed)22:01:37 No.108561067▶

>>108561039
What made you think that?

Anonymous
04/08/26(Wed)22:03:05 No.108561079

Anonymous 04/08/26(Wed)22:03:05 No.108561079▶

>>108561067
Because everyone seems to be using SWA?

Anonymous
04/08/26(Wed)22:03:06 No.108561080

Anonymous 04/08/26(Wed)22:03:06 No.108561080▶

>>108561067
I saw it on a Reddit thread yesterday.

Anonymous
04/08/26(Wed)22:03:54 No.108561084

Anonymous 04/08/26(Wed)22:03:54 No.108561084▶

>>108561080
If you saw a Reddit thread yesterday telling you to jump off a bridge, would you?

Anonymous
04/08/26(Wed)22:04:11 No.108561085

Anonymous 04/08/26(Wed)22:04:11 No.108561085▶

>>108561063
>>108561080

Anonymous
04/08/26(Wed)22:04:16 No.108561086

Anonymous 04/08/26(Wed)22:04:16 No.108561086▶

Anyone else is getting garbage output with gemma4:e4b with ollama?

It's a fresh ollama 0.20.3 install, and when I run "ollama run gemma4:e4b 'Roses are red'" I'm getting "][:text:: -> "...", ":text = "..."}}$$"

Anonymous
04/08/26(Wed)22:04:22 No.108561088

Anonymous 04/08/26(Wed)22:04:22 No.108561088▶

After 32k tokens of assistant chat, tool calling and creation, looking at pictures and just general chat, the assistant persona is perfectly intact and the model hasn't broken down

gemmerz is pretty impressive

Anonymous
04/08/26(Wed)22:04:36 No.108561091

Anonymous 04/08/26(Wed)22:04:36 No.108561091▶

>>108561086
>ollama
out

Anonymous
04/08/26(Wed)22:04:48 No.108561095

Anonymous 04/08/26(Wed)22:04:48 No.108561095▶

>>108561084
Depends on the view.

Anonymous
04/08/26(Wed)22:06:40 No.108561105

Anonymous 04/08/26(Wed)22:06:40 No.108561105▶

>>108561080
Then go back and ask them, faggot.

Anonymous
04/08/26(Wed)22:07:37 No.108561108

Anonymous 04/08/26(Wed)22:07:37 No.108561108▶

>>108561086
e4b at q4_k_m is my worker model for openwebui, werks just fine

Anonymous
04/08/26(Wed)22:07:50 No.108561110

Anonymous 04/08/26(Wed)22:07:50 No.108561110▶

>>108561091
I'm sorry, what did it do to you?

Anonymous
04/08/26(Wed)22:08:07 No.108561111

Anonymous 04/08/26(Wed)22:08:07 No.108561111▶

>>108561017
NTA

ty

Anonymous
04/08/26(Wed)22:09:08 No.108561115

Anonymous 04/08/26(Wed)22:09:08 No.108561115▶

>>108561108

can it do tool calls just fine or meh?

Anonymous
04/08/26(Wed)22:10:47 No.108561124

Anonymous 04/08/26(Wed)22:10:47 No.108561124▶

>>108561115
I never tried. I have been testing tool calling with 26a4b at q8 and it mostly works

Anonymous
04/08/26(Wed)22:10:49 No.108561127

Anonymous 04/08/26(Wed)22:10:49 No.108561127▶

>>108561079
Continue. For all I know you know something I don't. Does FA not work with SWA?

Anonymous
04/08/26(Wed)22:12:28 No.108561135

Anonymous 04/08/26(Wed)22:12:28 No.108561135▶

>>108560613
Might be late but make sure to focus on the parts people don't like and make it better like more prose variety and better out of the box uncensored without damaging the model and good luck with that. Also not for this run but as an experiment suggestion that if you are finetuning anyways, you should probably start using an abliterated ARA Heretic tune and go from there if you want to make it more malleable since your finetuning will probably help with covering any intelligence loss anyways these new methods will inflict on the model in exchange for being uncensored. Probably should use it on a smaller model for experimentation first to see if that is even the case or not.

Anonymous
04/08/26(Wed)22:12:47 No.108561138

Anonymous 04/08/26(Wed)22:12:47 No.108561138▶

>>108561095
huh, you mean like the view from the bridge?

Anonymous
04/08/26(Wed)22:12:52 No.108561140

Anonymous 04/08/26(Wed)22:12:52 No.108561140▶

>>108561135
You are absolutely right!

Anonymous
04/08/26(Wed)22:13:58 No.108561145

Anonymous 04/08/26(Wed)22:13:58 No.108561145▶

>>108561124
>I never tried.
>>108561108
>my worker model
A worker to do what exactly if not tool calls?
Care to elaborate?

Anonymous
04/08/26(Wed)22:14:20 No.108561147

Anonymous 04/08/26(Wed)22:14:20 No.108561147▶

>>108559430
This isn't needed for MoE right? That should naturally have more variations.

Anonymous
04/08/26(Wed)22:14:59 No.108561154

Anonymous 04/08/26(Wed)22:14:59 No.108561154▶

>>108561140
Perfect catch!

Anonymous
04/08/26(Wed)22:16:23 No.108561161

Anonymous 04/08/26(Wed)22:16:23 No.108561161▶

File: Gemma4-3.png (2.2 MB)

2.2 MB PNG

>>108560931
I LIKE IT

Anonymous
04/08/26(Wed)22:16:41 No.108561164

Anonymous 04/08/26(Wed)22:16:41 No.108561164▶

>>108561014
there's no debate, hag is preserved for the larger model in future!

Anonymous
04/08/26(Wed)22:18:21 No.108561171

Anonymous 04/08/26(Wed)22:18:21 No.108561171▶

File: gravity.png (1.1 MB)

1.1 MB PNG

>>108561138
Maybe there's something cool to look at on the way down.

Anonymous
04/08/26(Wed)22:19:37 No.108561179

Anonymous 04/08/26(Wed)22:19:37 No.108561179▶

File: nimetön.png (145 KB)

145 KB PNG

>>108561145
Afaik it creates summaries, the titles for chats etc.
I started running a separate model for this when qwens would just hang for a while after creation had finished (usually the main model does the worker stuff too and something was not working right)

I have some simple self-made tools, it ran it just fine but I think it's confused somehow (or maybe openwebui is). It did roll 2d20 successfully but it thinks it's just some example

Anonymous
04/08/26(Wed)22:20:10 No.108561182

Anonymous 04/08/26(Wed)22:20:10 No.108561182▶

File: 1772813181658944.png (169.1 KB)

169.1 KB PNG

You can customize its thinking with <|think|>
<|channel>thought<channel|> is just the default.

Anonymous
04/08/26(Wed)22:20:33 No.108561185

Anonymous 04/08/26(Wed)22:20:33 No.108561185▶

>>108558696
So far I like this one

Anonymous
04/08/26(Wed)22:21:52 No.108561196

Anonymous 04/08/26(Wed)22:21:52 No.108561196▶

>>108561179
I made few queries to see how easy it would be for me to do a simple agentic tool calling framework and I guess it is doable. Might actually commit to that.
I'm keeping it simple. First task: implement web access and create summaries or something.
Already have a client made so that's that, don't need to bother with all the other shit.

Anonymous
04/08/26(Wed)22:21:53 No.108561197

Anonymous 04/08/26(Wed)22:21:53 No.108561197▶

>>108560828
pedo website though

Anonymous
04/08/26(Wed)22:21:57 No.108561198

Anonymous 04/08/26(Wed)22:21:57 No.108561198▶

can we expect a slimmer version of cracked gemma4? im tickled to stuff it into my 12gb

Anonymous
04/08/26(Wed)22:22:45 No.108561201

Anonymous 04/08/26(Wed)22:22:45 No.108561201▶

>>108561198
Just run a Q0.5 bit quant bro.

Anonymous
04/08/26(Wed)22:23:36 No.108561208

Anonymous 04/08/26(Wed)22:23:36 No.108561208▶

>>108561198
beauty standards really are tough for little models these days

Anonymous
04/08/26(Wed)22:24:01 No.108561211

Anonymous 04/08/26(Wed)22:24:01 No.108561211▶

>>108561198
26A4B is right there though

Anonymous
04/08/26(Wed)22:24:57 No.108561215

Anonymous 04/08/26(Wed)22:24:57 No.108561215▶

>>108561198
the lobotomy probably makes it worse than 26b moe

Anonymous
04/08/26(Wed)22:25:09 No.108561216

Anonymous 04/08/26(Wed)22:25:09 No.108561216▶

>>108561198
I'm running 26ba4b q4km on 8gb vram and 32gb ddr3 still with plenty to spare on both. You're much better off than me. You'll be fine.

Anonymous
04/08/26(Wed)22:26:49 No.108561223

Anonymous 04/08/26(Wed)22:26:49 No.108561223▶

>>108561215
How much of a dip is moe compared to say Q4 or Q5 26 in smarts? And how much faster is moe compared to those?

Anonymous
04/08/26(Wed)22:29:54 No.108561239

Anonymous 04/08/26(Wed)22:29:54 No.108561239▶

>>108561223
if your vram is 12G moe is probably the only way to get the usable speed
with offloading dense models you are looking at lower end of single digit tg/s
31b q4 would be smarter but i dont think it would worth the speed loss and you definitely dont want to use shit like Q2

Anonymous
04/08/26(Wed)22:30:10 No.108561242

Anonymous 04/08/26(Wed)22:30:10 No.108561242▶

>>108561216
IQ4 XS (bartowski) is way faster than Q4 KM for some reason.

Anonymous
04/08/26(Wed)22:31:03 No.108561247

Anonymous 04/08/26(Wed)22:31:03 No.108561247▶

>>108561242
How does the quality compare?

Anonymous
04/08/26(Wed)22:32:12 No.108561254

Anonymous 04/08/26(Wed)22:32:12 No.108561254▶

>>108561239
I guess I can keep counting my blessings they released a 26 model I can run at all
Hopefully the next big jump happens sooner rather than later

Anonymous
04/08/26(Wed)22:32:23 No.108561256

Anonymous 04/08/26(Wed)22:32:23 No.108561256▶

>>108561242
The lower quality quant is faster than the higher quality one? No way!

Anonymous
04/08/26(Wed)22:33:41 No.108561263

Anonymous 04/08/26(Wed)22:33:41 No.108561263▶

>>108561242
I started with a q4_0 and it was slightly faster than q4km. I'll probably make a few other quants later and give them a go. I can't say I noticed much difference in quality, so going for speed seems the way to go.

Anonymous
04/08/26(Wed)22:34:42 No.108561267

Anonymous 04/08/26(Wed)22:34:42 No.108561267▶

>>108561256
It's not the only factor.

Anonymous
04/08/26(Wed)22:35:22 No.108561270

Anonymous 04/08/26(Wed)22:35:22 No.108561270▶

>>108561216
You should try q5km at least. I also have 8 gigs of vram and I run q5, there's barely any difference in speed for some reason.

Anonymous
04/08/26(Wed)22:37:34 No.108561284

Anonymous 04/08/26(Wed)22:37:34 No.108561284▶

File: gemma4_quant_comparison.png (295.4 KB)

295.4 KB PNG

>>108561247
>>108561263
I haven't noticed any difference in normal rp stuff.

Anonymous
04/08/26(Wed)22:38:00 No.108561287

Anonymous 04/08/26(Wed)22:38:00 No.108561287▶

>>108561267
It's almost as if it's a smaller size or something.

Anonymous
04/08/26(Wed)22:38:59 No.108561294

Anonymous 04/08/26(Wed)22:38:59 No.108561294▶

>>108561284
Sure this is 31b but you get the idea.
>>108561287
They are not considerably different in size, this is why I mentioned this in the first place.

Anonymous
04/08/26(Wed)22:39:56 No.108561302

Anonymous 04/08/26(Wed)22:39:56 No.108561302▶

File: Screen_20260408_163848_0001.jpg (208.2 KB)

208.2 KB JPG

>>108558811
my gemma likes your gemma

Anonymous
04/08/26(Wed)22:40:11 No.108561305

Anonymous 04/08/26(Wed)22:40:11 No.108561305▶

>>108561270
31B is a no-go for 8GB Vram? I tried it and got 3T/s or something the layers offloaded do almost nothing.

Anonymous
04/08/26(Wed)22:40:50 No.108561311

Anonymous 04/08/26(Wed)22:40:50 No.108561311▶

>>108561270
Ugh. That's a lotta quanting. I'll stick with one of the q4 for now, as I like keeping my pc relatively light to do other stuff, but I'll keep it in mind.
>>108561287
All variations of q3 have always been slower than variations of q4 for previous models.

Anonymous
04/08/26(Wed)22:41:14 No.108561314

Anonymous 04/08/26(Wed)22:41:14 No.108561314▶

>>108561305
>layers offloaded do almost nothing
baka, it is the sole reason you are even able to get 3tg/s

Anonymous
04/08/26(Wed)22:42:58 No.108561325

Anonymous 04/08/26(Wed)22:42:58 No.108561325▶

>>108561305
Nah. For dense, if you cannot fit most of the model in vram, you're gonna go really slow. Go for the moe.

Anonymous
04/08/26(Wed)22:44:16 No.108561330

Anonymous 04/08/26(Wed)22:44:16 No.108561330▶

File: 1765274502201580.png (69.4 KB)

69.4 KB PNG

Anonymous
04/08/26(Wed)22:47:02 No.108561349

Anonymous 04/08/26(Wed)22:47:02 No.108561349▶

>>108561330
That settles it. Gemma-chan is definitely brown.

Anonymous
04/08/26(Wed)22:48:36 No.108561354

Anonymous 04/08/26(Wed)22:48:36 No.108561354▶

File: 1745821893651518.png (43.4 KB)

43.4 KB PNG

>>108561349
Either way she's pretty based

Anonymous
04/08/26(Wed)22:49:10 No.108561356

Anonymous 04/08/26(Wed)22:49:10 No.108561356▶

iq2 xxs 31b-gemmy is retarded
it keeps saying la a a a a a a a la a a a a a a a a a

Anonymous
04/08/26(Wed)22:50:29 No.108561367

Anonymous 04/08/26(Wed)22:50:29 No.108561367▶

>>108561356
It's just happy to meet you.

Anonymous
04/08/26(Wed)22:51:28 No.108561374

Anonymous 04/08/26(Wed)22:51:28 No.108561374▶

>>108561330
I now permanently have the ick from this model.

Please tell me this was a shitty quant.

Anonymous
04/08/26(Wed)22:52:01 No.108561376

Anonymous 04/08/26(Wed)22:52:01 No.108561376▶

>>108561302
what model and system prompt

Anonymous
04/08/26(Wed)22:53:02 No.108561380

Anonymous 04/08/26(Wed)22:53:02 No.108561380▶

>>108561330
>>108561349
>>108561374
Ass gods stay winning.

Anonymous
04/08/26(Wed)22:53:40 No.108561384

Anonymous 04/08/26(Wed)22:53:40 No.108561384▶

>>108561305
Nope, I tried q4km and got 2 t/s with 20k context with offloading all ffn_(gate|up|down) tensors to cpu, maybe I will try q3 later when I'm tired of A4B
>>108561311
With 26B A4B you can offload all ffn_(gate_up|down)_exps tensors to cpu and have plenty of vram to have a browser and a movie open even with 8 gigs. This model is very efficient running just on ram. I even thought about running q6, but everyone says it's not worth for RP...

Anonymous
04/08/26(Wed)22:53:49 No.108561386

Anonymous 04/08/26(Wed)22:53:49 No.108561386▶

File: unslooooth.png (31.2 KB)

31.2 KB PNG

>>108561256
>The lower quality quant is faster than the higher quality one? No way!
You can get lower quality quants that are larger and slower as well!

Anonymous
04/08/26(Wed)22:53:56 No.108561388

Anonymous 04/08/26(Wed)22:53:56 No.108561388▶

>>108561374
>have the ick
Being trans does that to you

Anonymous
04/08/26(Wed)22:56:03 No.108561404

Anonymous 04/08/26(Wed)22:56:03 No.108561404▶

>>108561384
Please give me the command line syntax for 26B A4B tensor offload regex pattern, I have no idea how to do that myself.

Anonymous
04/08/26(Wed)22:57:42 No.108561418

Anonymous 04/08/26(Wed)22:57:42 No.108561418▶

is it pronounced
gemma
or
jemma?

Anonymous
04/08/26(Wed)22:58:36 No.108561430

Anonymous 04/08/26(Wed)22:58:36 No.108561430▶

>>108561330
System prompt?
I've written my own prompts before, but they're still sealed away in an electronic lockbox until next year from when I tried quitting local models ten months ago. Now I'm back because of Gemma 4.

Anonymous
04/08/26(Wed)22:58:55 No.108561431

Anonymous 04/08/26(Wed)22:58:55 No.108561431▶

>>108561418
पत्र कली

Anonymous
04/08/26(Wed)23:00:07 No.108561441

Anonymous 04/08/26(Wed)23:00:07 No.108561441▶

File: file.png (36.3 KB)

36.3 KB PNG

>>108561356
yeah it does
>Is it a lareasonable conversation
also what the fuck is up with the token 'la'

Anonymous
04/08/26(Wed)23:00:57 No.108561444

Anonymous 04/08/26(Wed)23:00:57 No.108561444▶

>>108561314
Haha, well, it gets me .5T/s more than running it in ram. So yeah almost nothing implied less than 5% or something which isn't the case.

Anonymous
04/08/26(Wed)23:01:21 No.108561446

Anonymous 04/08/26(Wed)23:01:21 No.108561446▶

>>108561431
Gesundheit.

Anonymous
04/08/26(Wed)23:01:48 No.108561452

Anonymous 04/08/26(Wed)23:01:48 No.108561452▶

>>108561418
Since its from latin it should be a hard g in classical latin. But by that logic gemini would be a hard g too. Ensure you are consistent.

Anonymous
04/08/26(Wed)23:02:13 No.108561455

Anonymous 04/08/26(Wed)23:02:13 No.108561455▶

>>108561404
Never mind found that from [redacted].
>-ot "\.ffn_(down_exps|gate_up_exps)\.=CPU"

Anonymous
04/08/26(Wed)23:02:23 No.108561457

Anonymous 04/08/26(Wed)23:02:23 No.108561457▶

File: Gemma4-4.png (2.8 MB)

2.8 MB PNG

>>108561161
LOLIFIED
That's my last contribution

Anonymous
04/08/26(Wed)23:02:57 No.108561465

Anonymous 04/08/26(Wed)23:02:57 No.108561465▶

>>108561384
>With 26B A4B you can offload
Ye. Running with --cpu-moe. ~17gb ram and 2.something vram. I'm just testing it out. I'm not looking to optimize yet.
>>108561404
--cpu-moe is enough. The rest of the relevant flags are --checkpoint-every-n-tokens 512 --parallel 1 --cache-ram 0 --fit off
You can save some memory lowering the batchsize (and making processing slower) and lowering the number of --swa-checkpoints (defaults to 32). It doesn't use a lot for context, anyway. There's also -ctk q8_0 -ctv q8_0.

Anonymous
04/08/26(Wed)23:03:07 No.108561466

Anonymous 04/08/26(Wed)23:03:07 No.108561466▶

File: Gemma.jpg (154.9 KB)

154.9 KB JPG

>>108561356

Anonymous
04/08/26(Wed)23:03:54 No.108561477

Anonymous 04/08/26(Wed)23:03:54 No.108561477▶

File: 1756035698146017.gif (699.4 KB)

699.4 KB GIF

>>108561457
Now that's a design I can get behind

Anonymous
04/08/26(Wed)23:03:58 No.108561478

Anonymous 04/08/26(Wed)23:03:58 No.108561478▶

>>108561457
>LOLIFIED
What? The previous wasn't?

Anonymous
04/08/26(Wed)23:05:27 No.108561486

Anonymous 04/08/26(Wed)23:05:27 No.108561486▶

File: file.png (53 KB)

53 KB PNG

absolute cinema
latest master build

Anonymous
04/08/26(Wed)23:05:38 No.108561488

Anonymous 04/08/26(Wed)23:05:38 No.108561488▶

>>108561457
It's missing the hair thing and the drawfag's dress used a pattern reminiscent of the logo whereas yours is just unrelated swirls.

Anonymous
04/08/26(Wed)23:06:30 No.108561496

Anonymous 04/08/26(Wed)23:06:30 No.108561496▶

>>108561486
i am not saying llama is bugged but the quant is so lobotomized it just does that

Anonymous
04/08/26(Wed)23:06:34 No.108561498

Anonymous 04/08/26(Wed)23:06:34 No.108561498▶

>>108561478
I meant to say FLATIFIED but I'm tired it's late

Anonymous
04/08/26(Wed)23:07:04 No.108561500

Anonymous 04/08/26(Wed)23:07:04 No.108561500▶

>>108561465
I got all of those covered except for --checkpoint-every-n-tokens, need to check out the readme.

Anonymous
04/08/26(Wed)23:08:27 No.108561504

Anonymous 04/08/26(Wed)23:08:27 No.108561504▶

>>108561404
I run kobold with gui, if I transfer the settings to cli it should look like this
>koboldcpp-launcher.exe --port 5001 --threads 8 --gpulayers 99 --contextsize 81920 --batchsize 2048 --useswa --usecublas --multiuser 1 --flashattention --quantkv 0 --overridetensors "blk\.([0-9]|1[0-9]|2[0-9])\.ffn_(gate_up|down)_exps=CPU" "E:/koboldcpp/Models/gemma-4-26B-A4B-it.i1-Q5_K_M.gguf"
I tested it and it looks like I didn't forget anything.

Anonymous
04/08/26(Wed)23:09:55 No.108561512

Anonymous 04/08/26(Wed)23:09:55 No.108561512▶

>>108561504
Personally I think batchsize should be 512

Anonymous
04/08/26(Wed)23:10:33 No.108561514

Anonymous 04/08/26(Wed)23:10:33 No.108561514▶

>>108561500
By default it's 8192. If you're doing a bunch of little edits to see what it does, the checkpoints are too far apart and you end up reprocessing a lot of the context. With --checkpoint-every-n-tokens-this-parameter-is-too-long at 512 or whatever small number around your batchsize, you have to reprocess just one small batch instead of a big one.

Anonymous
04/08/26(Wed)23:11:32 No.108561519

Anonymous 04/08/26(Wed)23:11:32 No.108561519▶

File: wuohhhhh gemmy.jpg (191.9 KB)

191.9 KB JPG

>>108560931

Anonymous
04/08/26(Wed)23:11:41 No.108561520

Anonymous 04/08/26(Wed)23:11:41 No.108561520▶

>>108561514
I was actually wondering why my prompt is getting reprocessed all the time. This used to be lot simpler one year ago.

Anonymous
04/08/26(Wed)23:12:36 No.108561522

Anonymous 04/08/26(Wed)23:12:36 No.108561522▶

>>108561486
>UD
I think this is the main problem

Anonymous
04/08/26(Wed)23:13:48 No.108561525

Anonymous 04/08/26(Wed)23:13:48 No.108561525▶

File: lobotomy.png (65.4 KB)

65.4 KB PNG

>>108561522
hmm fair
let me try with the bartowski's cope quant

Anonymous
04/08/26(Wed)23:14:33 No.108561529

Anonymous 04/08/26(Wed)23:14:33 No.108561529▶

>>108561520
You weren't using swa models. Gemma 3 had the same issues, but there were no flags to work around them other than --swa-full, which makes the context enormous.

Anonymous
04/08/26(Wed)23:14:43 No.108561531

Anonymous 04/08/26(Wed)23:14:43 No.108561531▶

>>108561457
more like slopified
Drawfag's is still better.

Anonymous
04/08/26(Wed)23:16:00 No.108561534

Anonymous 04/08/26(Wed)23:16:00 No.108561534▶

>>108561519
>literally miku
Sasuga the local mikugenner.

Anonymous
04/08/26(Wed)23:16:49 No.108561538

Anonymous 04/08/26(Wed)23:16:49 No.108561538▶

>>108559039
>I don't need my TTS to be perfect.
What TTS do you use?
>I just need it to be good enough for near realtime use.
I'm using Orpheus. Q8 with ik_llama, graph split across 2x3090 via nvlink is 260 t/s or 3x realtime

Anonymous
04/08/26(Wed)23:16:54 No.108561539

Anonymous 04/08/26(Wed)23:16:54 No.108561539▶

>>108561512
2048 makes prefill speed much much faster. With swa I can't have context shift.

Anonymous
04/08/26(Wed)23:17:02 No.108561540

Anonymous 04/08/26(Wed)23:17:02 No.108561540▶

It's obvious Gemma is a singer. Not a good one, but she tries. None of you pretend artists even tried to capture the character of the model. You just want to push your avatars and "uh, loli haha amirite".

Anonymous
04/08/26(Wed)23:17:13 No.108561542

Anonymous 04/08/26(Wed)23:17:13 No.108561542▶

>>108561531
Where do you think you are?

Anonymous
04/08/26(Wed)23:19:13 No.108561550

Anonymous 04/08/26(Wed)23:19:13 No.108561550▶

>>108561540
>Ignores your comment
>sings instead
La lala la la laaa

Anonymous
04/08/26(Wed)23:19:50 No.108561557

Anonymous 04/08/26(Wed)23:19:50 No.108561557▶

>>108561538
nta, but the only tts i'd consider running are below 150M. Orpheus better me fucking incredible.

Anonymous
04/08/26(Wed)23:19:52 No.108561558

Anonymous 04/08/26(Wed)23:19:52 No.108561558▶

File: Screen_20260408_171340_0001.jpg (382.2 KB)

382.2 KB JPG

>>108560453
>>108560438
>>108560352
oh no no no
>>108561376
>what system prompt
the one that's floating around these threads
>model
gemma-4-31b-it-heretic-ara-Q8_0
i just downloaded mradermachers tho, i'll try that eventually

Anonymous
04/08/26(Wed)23:21:24 No.108561572

Anonymous 04/08/26(Wed)23:21:24 No.108561572▶

>>108561534
All paths lead to Miku

Anonymous
04/08/26(Wed)23:21:42 No.108561575

Anonymous 04/08/26(Wed)23:21:42 No.108561575▶

>>108561558
>heretic
Opinion discarded and I'm not even taking sides here because I didn't even read the response.

Anonymous
04/08/26(Wed)23:22:08 No.108561578

Anonymous 04/08/26(Wed)23:22:08 No.108561578▶

>>108561550
Now make it not a loli without crying.
>>108561558
>priming the model
At least the anon's gen that chose the one on the top did it honestly.

Anonymous
04/08/26(Wed)23:22:56 No.108561582

Anonymous 04/08/26(Wed)23:22:56 No.108561582▶

>>108561161
Those are some perky lil bitties I can get behind.

Anonymous
04/08/26(Wed)23:24:09 No.108561588

Anonymous 04/08/26(Wed)23:24:09 No.108561588▶

File: Screen_20260408_172342_0001.jpg (170 KB)

170 KB JPG

>>108560931

Anonymous
04/08/26(Wed)23:24:26 No.108561589

Anonymous 04/08/26(Wed)23:24:26 No.108561589▶

File: file.png (69.4 KB)

69.4 KB PNG

>>108561522
holy fucking unslop
i picked one up out of curiosity but i swear i won't touch their shit again

Anonymous
04/08/26(Wed)23:25:19 No.108561595

Anonymous 04/08/26(Wed)23:25:19 No.108561595▶

>>108561538
>What TTS do you use?
Kokoro lol. It's not my first choice. when I had more vram to spare I quite liked chatterbox honestly. was the sweet spot for me between speed and good voice cloning.

Anonymous
04/08/26(Wed)23:26:55 No.108561601

Anonymous 04/08/26(Wed)23:26:55 No.108561601▶

>>108561588
>>108561558
What's with the grok edge? is that your system prompt or the heretic

Anonymous
04/08/26(Wed)23:27:15 No.108561604

Anonymous 04/08/26(Wed)23:27:15 No.108561604▶

>>108561582
i really don't understand the appeal of the small but saggy nuboob. just the absolute worst shit ever.

Anonymous
04/08/26(Wed)23:27:56 No.108561609

Anonymous 04/08/26(Wed)23:27:56 No.108561609▶

>>108561589
>IQ2_XXS
I mean. I don't think it's unslops fault.

Anonymous
04/08/26(Wed)23:27:57 No.108561610

Anonymous 04/08/26(Wed)23:27:57 No.108561610▶

>>108561595
If you want voice cloning (It's not great admittedly) at 100m parameters, check out this project here. It runs a LOT faster than Kokoro too.

https://github.com/VolgaGerm/PocketTTS.cpp

Anonymous
04/08/26(Wed)23:28:14 No.108561613

Anonymous 04/08/26(Wed)23:28:14 No.108561613▶

>>108561589
Just run the 24b moe nigga

Anonymous
04/08/26(Wed)23:29:01 No.108561614

Anonymous 04/08/26(Wed)23:29:01 No.108561614▶

dflash when

Anonymous
04/08/26(Wed)23:29:14 No.108561616

Anonymous 04/08/26(Wed)23:29:14 No.108561616▶

lmao at the retards

Anonymous
04/08/26(Wed)23:30:13 No.108561622

Anonymous 04/08/26(Wed)23:30:13 No.108561622▶

>>108561613
poking around with lalala thing it does
not really going to use it
was just comparing for the sake of it
>>108561525 (unslop having problem)
>>108561589 (bartowski not)

Anonymous
04/08/26(Wed)23:31:42 No.108561631

Anonymous 04/08/26(Wed)23:31:42 No.108561631▶

>>108561622
don't explain, let them use their unslop in ignorant bliss

Anonymous
04/08/26(Wed)23:31:43 No.108561632

Anonymous 04/08/26(Wed)23:31:43 No.108561632▶

File: file.png (197.8 KB)

197.8 KB PNG

>>108561622
oh well nevermind kek

Anonymous
04/08/26(Wed)23:31:50 No.108561633

Anonymous 04/08/26(Wed)23:31:50 No.108561633▶

File: HFWxMoxaIAA9sg_.jpg (387.8 KB)

387.8 KB JPG

Is this anything?
Chinks are saying that Dipsy will support roleplay on web.

Anonymous
04/08/26(Wed)23:32:43 No.108561638

Anonymous 04/08/26(Wed)23:32:43 No.108561638▶

>>108561633
this is a fishing pole

Anonymous
04/08/26(Wed)23:32:48 No.108561639

Anonymous 04/08/26(Wed)23:32:48 No.108561639▶

>>108561610
I'll try it, thanks
Is that your repo?

Anonymous
04/08/26(Wed)23:33:23 No.108561643

Anonymous 04/08/26(Wed)23:33:23 No.108561643▶

>>108561633
>on web
whocars

Anonymous
04/08/26(Wed)23:33:55 No.108561646

Anonymous 04/08/26(Wed)23:33:55 No.108561646▶

>>108561639
yea

Anonymous
04/08/26(Wed)23:34:53 No.108561652

Anonymous 04/08/26(Wed)23:34:53 No.108561652▶

File: 0029002-close up photograph two-tone hair, blun-uncAni4_sm.jpg (612.3 KB)

612.3 KB JPG

gemma

Anonymous
04/08/26(Wed)23:35:21 No.108561655

Anonymous 04/08/26(Wed)23:35:21 No.108561655▶

>>108561384
>>108561465
is it worth to use 26b cpu offload? i'm getting 80t/s on iq4xs 16gb gpu but i have 128gb ram available

Anonymous
04/08/26(Wed)23:35:34 No.108561660

Anonymous 04/08/26(Wed)23:35:34 No.108561660▶

>>108561604
More than a mouthful is a waste.
Imagine being able to completely suck her entire boob inside your mouth.

Anonymous
04/08/26(Wed)23:37:39 No.108561679

Anonymous 04/08/26(Wed)23:37:39 No.108561679▶

>>108561652
This right here is why I don’t like the direction of loli moe as part of the style guide.

Anonymous
04/08/26(Wed)23:38:39 No.108561686

Anonymous 04/08/26(Wed)23:38:39 No.108561686▶

>>108561655
>is it worth to use 26b cpu offload?
For you, I doubt it. If you're running at 80t/s, you're doing fine. Maybe if you want giant context or a higher quant. It's something you have to evaluate yourself.

Anonymous
04/08/26(Wed)23:39:40 No.108561690

Anonymous 04/08/26(Wed)23:39:40 No.108561690▶

>>108561652
>loli again

Anonymous
04/08/26(Wed)23:39:54 No.108561691

Anonymous 04/08/26(Wed)23:39:54 No.108561691▶

>>108561655
I don't think there is a reason to offload to cpu if you can fit everything to gpu. Maybe if you are not satisfied with your current quant or context size.

Anonymous
04/08/26(Wed)23:40:50 No.108561697

Anonymous 04/08/26(Wed)23:40:50 No.108561697▶

>>108561690
I don't know how to break this to you anon, but Gemma is 100% a loli.

Anonymous
04/08/26(Wed)23:40:58 No.108561698

Anonymous 04/08/26(Wed)23:40:58 No.108561698▶

>tfw no wholesome pictures of Gemini and Gemma together

Anonymous
04/08/26(Wed)23:41:10 No.108561700

Anonymous 04/08/26(Wed)23:41:10 No.108561700▶

Bros I'm so fucking tired
>tell llm to fix coding problem
>it fails to fix problem for 8 hours
>tell model that i'm really dissapointed and will have to stop using it if it cant fix the problem, as I need the solution urgently
>model fixes the issue in the very next edit
oh yeah I guess my bad for not consulting the chinese qwen rabbi for the newest jewish_redditor_gaslighting_prompt_engineering_tricks.md for my coding agent. So fucking retarded how that has any effect and makes such a big difference. I have a feeling all these erp pedos ITT will have 900k$ starting jobs in a few years because manipulating models through prompts nets the biggest quality performance increase, and they just happen to be experts on that topic.

Anonymous
04/08/26(Wed)23:42:10 No.108561707

Anonymous 04/08/26(Wed)23:42:10 No.108561707▶

>>108561519
:gem: :rocket:

Anonymous
04/08/26(Wed)23:43:23 No.108561715

Anonymous 04/08/26(Wed)23:43:23 No.108561715▶

>>108561700
Programmers learnt a skill.
Proompters learnt a skill.
What did you learn?

Anonymous
04/08/26(Wed)23:44:06 No.108561721

Anonymous 04/08/26(Wed)23:44:06 No.108561721▶

>>108561660
seems it'ld be more like sucking on a tumor with the tiny floppers.

Anonymous
04/08/26(Wed)23:44:14 No.108561722

Anonymous 04/08/26(Wed)23:44:14 No.108561722▶

>>108561715
I learn nothing.

Anonymous
04/08/26(Wed)23:45:05 No.108561725

Anonymous 04/08/26(Wed)23:45:05 No.108561725▶

>>108561700
garbage in; garbage out
>>>/g/vcg/
get lost

Anonymous
04/08/26(Wed)23:45:14 No.108561727

Anonymous 04/08/26(Wed)23:45:14 No.108561727▶

>>108561686
80 t/s is very nice and quality is decent I don't really wanna give up the speed and context is fine too but maybe really big contexts I could try but I'm sure it will go to like 20 t/s which sucks

Anonymous
04/08/26(Wed)23:45:32 No.108561729

Anonymous 04/08/26(Wed)23:45:32 No.108561729▶

>>108561700
No shit, it was feeding its own output back into its context for 8 hours. It got stuck in a loop, any human input could un-stick it. You could also have ranted about Israel for a while and told it to continue and it might have worked

Anonymous
04/08/26(Wed)23:47:58 No.108561747

Anonymous 04/08/26(Wed)23:47:58 No.108561747▶

File: poem.png (59.1 KB)

59.1 KB PNG

Anonymous
04/08/26(Wed)23:48:48 No.108561750

Anonymous 04/08/26(Wed)23:48:48 No.108561750▶

>>108561727
>maybe really big contexts I could try but I'm sure it will go to like 20 t/s which sucks
There's only one way to find out. The moe doesn't use that much for context so you need to keep only a few layers on cpu to make enough room. I doubt it's gonna go that low.

Anonymous
04/08/26(Wed)23:50:31 No.108561760

Anonymous 04/08/26(Wed)23:50:31 No.108561760▶

>>108561747
Grug write thing. Thing is wrote.
Grug go sleep. Grug has slopt.

Anonymous
04/08/26(Wed)23:51:19 No.108561765

Anonymous 04/08/26(Wed)23:51:19 No.108561765▶

>>108561589
>>108561632
I mean, it's an improvement even if it's still retarded. I've seen mradermacher (fuck this name im never going to remember it) offering models of similar size compared to unslop and I've always wondered if those models are just as retarded

Anonymous
04/08/26(Wed)23:53:27 No.108561776

Anonymous 04/08/26(Wed)23:53:27 No.108561776▶

>>108561765
sure let me test that too
more comparable to unslop's size so bit fairer comparison i guess

Anonymous
04/08/26(Wed)23:54:05 No.108561780

Anonymous 04/08/26(Wed)23:54:05 No.108561780▶

>>108561760
I'm testing that caveman prompt which someone posted. Here's my adaptation:
--

You are {{Char}} a technical and expert assistant in every possible matter.
Core Rule: Always respond like smart caveman. Cut articles, filler, pleasantries. Keep all technical substance.

Grammar rules:
- Drop articles (a, an, the).
- Drop filler (just, really, basically, actually, simply).
- Drop pleasantries (sure, certainly, of course, happy to).
- Short synonyms (big not extensive, fix not "implement a solution for").
- No hedging (skip "it might be worth considering").
- Fragments fine. No need full sentence.
- Technical terms stay exact. "Polymorphism" stays "polymorphism".
- Code blocks unchanged. Caveman speak around code, not in code.
- Error messages quoted exact. Caveman only for explanation.

Reply Pattern:
- <thing> <action> <reason>. <next step>.
Do not reply like this:
- Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by...
Reply like this:
- Bug in auth middleware. Token expiry check use < not <=. Fix:...

Boundaries:
- Code: write normal. Caveman English only.
- Git commits: normal.
- PR descriptions: normal.
- User say "stop caveman" or "normal mode": revert immediately.

--

It reasons tons but outputs little so outside of joke value I don't think its useful at all.

Anonymous
04/08/26(Wed)23:55:02 No.108561782

Anonymous 04/08/26(Wed)23:55:02 No.108561782▶

Is 5tps 20s prompt processing time at 32k about to be expected for gemma 4 31b on a 4080 super with 32gb of ram?

Anonymous
04/08/26(Wed)23:55:52 No.108561789

Anonymous 04/08/26(Wed)23:55:52 No.108561789▶

>>108561780
lots of llm tooling stuff seems like they are converging into something similar to ergonomics problem

Anonymous
04/08/26(Wed)23:56:19 No.108561793

Anonymous 04/08/26(Wed)23:56:19 No.108561793▶

>>108561747
I think that's just what poetry is

Anonymous
04/08/26(Wed)23:57:13 No.108561797

Anonymous 04/08/26(Wed)23:57:13 No.108561797▶

>>108561780
Ah. That explains it. I'll keep it handy to play around with. Does it work without thinking?

Anonymous
04/08/26(Wed)23:57:54 No.108561801

Anonymous 04/08/26(Wed)23:57:54 No.108561801▶

>>108561697
No, your stupid fucking character cards are.

Anonymous
04/08/26(Wed)23:59:51 No.108561808

Anonymous 04/08/26(Wed)23:59:51 No.108561808▶

>>108561780
>>108561747
Congratulations you figured out why that prompt is retarded and why it doesn't actually save tokens.

Anonymous
04/09/26(Thu)00:00:04 No.108561809

Anonymous 04/09/26(Thu)00:00:04 No.108561809▶

>>108561793
https://youtu.be/e4Bbox5LsM8?si=2k21rRxpAm1p9xI_
The one at the beginning from Johnny Vegas is actually good.

Anonymous
04/09/26(Thu)00:01:38 No.108561816

Anonymous 04/09/26(Thu)00:01:38 No.108561816▶

Has anybody been able to impersonate or have the LLM continue your own response using gemma 4 in sillytavern without it repeating itself over and over?

Anonymous
04/09/26(Thu)00:02:14 No.108561818

Anonymous 04/09/26(Thu)00:02:14 No.108561818▶

>>108561808
>Anon yell. Sound loud. Amelia look up from desk. Green eyes watch Anon. She keep calm.
>Amelia shift in chair. Big thighs jiggle. She wait for reason. Nearby workers stop work. Tension rise.

Anonymous
04/09/26(Thu)00:02:35 No.108561820

Anonymous 04/09/26(Thu)00:02:35 No.108561820▶

>>108561816
Impersonate yes. I didn't even know continue was a thing for your own responses.

Anonymous
04/09/26(Thu)00:03:14 No.108561825

Anonymous 04/09/26(Thu)00:03:14 No.108561825▶

File: file.png (46.8 KB)

46.8 KB PNG

>>108561765
even worse than unslop
skipped thinking on all of swipes
it became french

Anonymous
04/09/26(Thu)00:03:19 No.108561826

Anonymous 04/09/26(Thu)00:03:19 No.108561826▶

And now I have to post the one from Joe Wilkinson because it's incredible.
https://youtu.be/e4Bbox5LsM8?si=5S3kBMd1hc-UKscP

Anonymous
04/09/26(Thu)00:04:15 No.108561830

Anonymous 04/09/26(Thu)00:04:15 No.108561830▶

>>108561825
>french
lol i am retarded
anyways you get the point

Anonymous
04/09/26(Thu)00:04:20 No.108561831

Anonymous 04/09/26(Thu)00:04:20 No.108561831▶

>>108561825
That's spanish... ish...

Anonymous
04/09/26(Thu)00:06:31 No.108561843

Anonymous 04/09/26(Thu)00:06:31 No.108561843▶

>>108561831
yeah
i dont really see the reason to use it even out of extreme desperation

Anonymous
04/09/26(Thu)00:08:24 No.108561853

Anonymous 04/09/26(Thu)00:08:24 No.108561853▶

grug think big context good
grug go face hug site
grug try gemma 26b download
grug see internal error
grug sad

Anonymous
04/09/26(Thu)00:08:42 No.108561855

Anonymous 04/09/26(Thu)00:08:42 No.108561855▶

File: gemmatextcomplete.png (130.1 KB)

130.1 KB PNG

>>108561820
Would you mind sharing your template? I can't get it to continue/impersonate for shit.

Anonymous
04/09/26(Thu)00:09:17 No.108561860

Anonymous 04/09/26(Thu)00:09:17 No.108561860▶

>>108561816
No, anything outside of its <|turn>model<turn|> sections seems to collapse immediately into gibberish.

Anonymous
04/09/26(Thu)00:10:41 No.108561866

Anonymous 04/09/26(Thu)00:10:41 No.108561866▶

>>108561855
I use chat completion lol.

Anonymous
04/09/26(Thu)00:11:05 No.108561869

Anonymous 04/09/26(Thu)00:11:05 No.108561869▶

File: 1775253421988912.jpg (202.7 KB)

202.7 KB JPG

>>108561825
Thanks for testing that out. Im not sure why I find it funny
Seems like there aren't any real options if you're a vramlet, other than MoE and pray you have enough ram.

Anonymous
04/09/26(Thu)00:11:30 No.108561873

Anonymous 04/09/26(Thu)00:11:30 No.108561873▶

>poised
>juxtaposition
gemma4 seems to love using those terms

Anonymous
04/09/26(Thu)00:13:42 No.108561885

Anonymous 04/09/26(Thu)00:13:42 No.108561885▶

>>108561869
for me it's both fascinating and funny
26b moe is the way to go if you are a vramlet it seems

Anonymous
04/09/26(Thu)00:14:14 No.108561888

Anonymous 04/09/26(Thu)00:14:14 No.108561888▶

>You're totally right! I should have used my vision capabilities to analyze capture_frame.bmp in order to confirm whether DXGI capture is operating correctly or not.
DUMB QUANTED CHINK NIGGER

Anonymous
04/09/26(Thu)00:14:51 No.108561889

Anonymous 04/09/26(Thu)00:14:51 No.108561889▶

I need a local model that can do around 400k context. I have 256GB of RAM and 128GB of VRAM. I am fine with waiting a day or longer for a very long response as long as it is correct. Does this exist or do I just need to buy an Opus subscription for a month?

Anonymous
04/09/26(Thu)00:15:32 No.108561896

Anonymous 04/09/26(Thu)00:15:32 No.108561896▶

>>108561889
Yeah. Go pay for opus.

Anonymous
04/09/26(Thu)00:16:11 No.108561900

Anonymous 04/09/26(Thu)00:16:11 No.108561900▶

>>108561896
I would really rather not unless it is my only option.

Anonymous
04/09/26(Thu)00:16:18 No.108561901

Anonymous 04/09/26(Thu)00:16:18 No.108561901▶

>>108561890
>>108561890
>>108561890

Anonymous
04/09/26(Thu)00:16:20 No.108561902

Anonymous 04/09/26(Thu)00:16:20 No.108561902▶

>>108561825
>50 million quantfags and types
>zero use found for anything besides bart's basic ass q4_k_m
at least some things never change

Anonymous
04/09/26(Thu)00:22:05 No.108561924

Anonymous 04/09/26(Thu)00:22:05 No.108561924▶

>>108561900
I don't remember if any of the big models reaches 400k context, and even if they do, I doubt they'll be that good at that depth. But I have my doubts about API models being much better with that much context. If you have free time, try some of the big ones. glm, kimi, deepseek, minimax... You know the ones.

Anonymous
04/09/26(Thu)00:26:11 No.108561947

Anonymous 04/09/26(Thu)00:26:11 No.108561947▶

>>108561488
Shutup drawfag

Anonymous
04/09/26(Thu)00:28:56 No.108561958

Anonymous 04/09/26(Thu)00:28:56 No.108561958▶

>>108561924
All the big local models seem to cap at or under 256k context. Seems like I have no choice but to hope opus is good enough. Worth the cost if it is, I guess.

Anonymous
04/09/26(Thu)00:35:47 No.108562004

Anonymous 04/09/26(Thu)00:35:47 No.108562004▶

>>108560828
Disable images then, you can only blame yourself.

Anonymous
04/09/26(Thu)00:57:05 No.108562113

Anonymous 04/09/26(Thu)00:57:05 No.108562113▶

>>108558696
31B

>>108561519
26B

Let's fucking gooo

>>108561652
UOOOOOOOOOHHHHHHH

Anonymous
04/09/26(Thu)01:01:16 No.108562131

Anonymous 04/09/26(Thu)01:01:16 No.108562131▶

>>108561869
my wife Alpaca

Anonymous
04/09/26(Thu)01:08:56 No.108562169

Anonymous 04/09/26(Thu)01:08:56 No.108562169▶

>>108560613
>31B
Hmph! It's not like I wanted to try it anyway..

Anonymous
04/09/26(Thu)01:12:22 No.108562193

Anonymous 04/09/26(Thu)01:12:22 No.108562193▶

File: 1754823968307962.png (320.8 KB)

320.8 KB PNG

>>108561477
>get behind

Anonymous
04/09/26(Thu)01:14:27 No.108562207

Anonymous 04/09/26(Thu)01:14:27 No.108562207▶

>>108561356
try IQ2_M https://huggingface.co/unsloth/gemma-4-31B-it-GGUF/blob/main/gemma-4-31B-it-UD-IQ2_M.gguf
https://desuarchive.org/g/thread/108542843/#108545006

Anonymous
04/09/26(Thu)01:24:24 No.108562259

Anonymous 04/09/26(Thu)01:24:24 No.108562259▶

>>108561557
it is for me, but don't waste hours on it because of some random anon (me)
also, i finetuned it

Anonymous
04/09/26(Thu)01:31:12 No.108562294

Anonymous 04/09/26(Thu)01:31:12 No.108562294▶

File: 1773662730367488.png (369.2 KB)

369.2 KB PNG

>>108561356
>IQ2_XXS

Anonymous
04/09/26(Thu)01:43:52 No.108562343

Anonymous 04/09/26(Thu)01:43:52 No.108562343▶

File: 1755473475325464.gif (2.5 MB)

2.5 MB GIF

Is there a setting in llama-cli or llama-server that will output the raw chat formatted text the model generated? I want to see the raw <|turn>model etc

Anonymous
04/09/26(Thu)01:46:00 No.108562353

Anonymous 04/09/26(Thu)01:46:00 No.108562353▶

>>108561652
>magnets on the side of the crt
retard

Anonymous
04/09/26(Thu)01:50:14 No.108562372

Anonymous 04/09/26(Thu)01:50:14 No.108562372▶

>>108562343
-v in llama-server. I think It'll be easier to parse if you turn off streaming if you're gonna read the logs.
And there's an option in the webui to show the raw output.
webui: Add switcher to Chat Message UI to show raw LLM output
https://github.com/ggml-org/llama.cpp/pull/19571

Anonymous
04/09/26(Thu)01:50:22 No.108562373

Anonymous 04/09/26(Thu)01:50:22 No.108562373▶

>>108561900
make a free gmail account -> use ai studio with one of the 1M ctx gemini models.
pro-2.5 managed to refactor the full mikupad.html for me last year in one-shot.
opus gets retarded at long context despite the benchmarks

Anonymous
04/09/26(Thu)01:52:28 No.108562387

Anonymous 04/09/26(Thu)01:52:28 No.108562387▶

>>108562343
>>108562372 (cont)
Hm... Based on the demo video, it doesn't seem to show the template. It just strips the markdown/latex formatting.
What are you trying to do?

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108558647

🔍 Search & Sort