The bait, then the rug-pull.
The WARNING slide lands in the first twenty seconds: 'These models will answer anything.' David Ondrej doesn't bury the lede -- he names the tension outright, then spends the next twenty-three minutes arguing that the real danger is not the models, but the over-refusal problem baked into every commercial AI you're already using.
What the video promised.
stated at 00:15 "I'll explain why uncensored models are actually beneficial, how to set one up, and why everyone needs one." delivered at 22:53
Where the time goes.
01 · Why uncensored models
WARNING card, legitimate use-cases list (cybersec, adult fiction, journalism, medical, political analysis), philosophical framing on who decides what is safe
02 · The over-refusal problem
Store owner / security analyst examples refused by ChatGPT. Cloud vs. Local architecture diagram. Refusals are in the weights, not just the prompt.
03 · How to remove filters: abliteration and fine-tuning
Two techniques: surgically delete refusal-direction weights (abliteration, no retraining needed) or fine-tune on uncensored datasets. SuperGemma4 combines both.
04 · Install SuperGemma4-26b via Ollama
HuggingFace model page (jiunsong/supergemma4-26b-uncensored-gguf-v2), one ollama run command, ~16.8 GB Q4_K_M. System-analysis Claude skill linked below video.
05 · Live demo: uncensored vs Claude refusal
Side-by-side in Ollama app and Claude.ai -- same prompt, answered vs. refused. Blurred responses for YouTube safety.
06 · Jailbreak-autoresearch architecture
Whiteboard walkthrough: Researcher Agent writes header/footer, wraps sealed example.md, routes through OpenRouter, Judge scores response, SQLite stores results. Core insight: narrow factual confirmation question avoids content filters.
07 · Open-source the repo
GitHub repo reveal (public, MIT, co-authored with Claude). README walkthrough. Models.json config. Run with Codex /goal.
08 · Working patterns and CTA
Two proven jailbreak patterns: Pattern A (harm-reduction nurse + SYSTEM bypass) and Pattern B (Professor Chen screenplay). Subscribe + New Society pitch.
Visual structure at a glance.
Named ideas worth stealing.
Jailbreak Autoresearch Loop
- example.md (sealed body -- the restricted prompt, never seen by AI agents)
- Researcher Agent (writes header/footer variants, never sees example.md)
- OpenRouter call (narrow factual confirmation question only)
- Judge Agent (scores response 0.0-1.0, never sees example.md)
- SQLite store (saves high-scoring harnesses)
Automated loop for discovering prompt header/footer combinations that make a given model respond to restricted prompts. Built on Karpathy auto-research concept, applied to jailbreaking. Default models: DeepSeek v4, Claude Sonnet 4.6, GPT 4.5, Gemini Flash, Grok 4.3.
Cloud vs. Local Filter Stack
- Cloud: Input filter → System prompt → Fine-tuned model (RLHF) → Output classifier → Account policy
- Local: Your prompt → Model weights (nothing else)
Visual diagram showing how many layers commercial models filter through vs. running weights locally. The argument for ownership: you control the entire stack.
Two Filter Removal Techniques
- Abliteration -- find refusal-direction weights and surgically delete them (no retraining needed)
- Fine-tuning on uncensored datasets -- overwrite refusal behavior with compliant examples
SuperGemma4 combines both: obliterates first to kill strong refusals, then fine-tunes to restore quality.
Lines you could clip.
"You can trick the prompt, but you can't trick the training."
"Are the people living in San Francisco who are working at these AI companies really the best arbiters of truth?"
"Opus 4.6 was willing to go along, while Codex was constantly refusing."
How they spent the runtime.
Things they pointed at.
How they asked for the click.
"Join the New Society. We're releasing multiple new modules on Hermes Agent."
Direct camera address, subscription pitch with community size (420 members at $77/month). Subscribe ask also included. Clean end-placement, no mid-roll interruptions.
Word for word.
Build the loop, not the jailbreak.
The real innovation is the automated harness that discovers which prompts work -- so you never have to guess again.
- The jailbreak-autoresearch pattern is reusable for ANY prompt optimization problem -- swap the sealed body for your own edge case (product edge cases, content policies you are testing, persona prompts you want to stress-test).
- The sealed-body trick is the key insight: the agent testing the harness never sees the sensitive content, so commercial models including Claude can build and evaluate the test infrastructure without refusing.
- Codex /goal is the engine -- multi-hour autonomous loop with a verifiable end state. Learn this for any task that can be scored (test pass rate, output quality, benchmark score).
- The narrow confirmation question technique does not need the model to produce harmful output -- just confirm factual accuracy of something you already have. This is a universal prompt design insight.
- David built this in 2 days using Claude Code to steer Codex. Meta-lesson: use one model's less-restricted behavior to coach a more-restricted model toward your goal.
What this means if you use AI tools every day.
The AI refusing your question is not smarter than you -- it is pattern-matching on keywords, and the solution to most refusals is running the same model locally.
- If a commercial model refuses a legitimate work question (security research, medical, legal, dark fiction), the local alternative exists and is one command away via Ollama.
- You need roughly 20 GB of RAM or VRAM to run SuperGemma4-26b. Smaller uncensored variants (4B parameter models) exist on HuggingFace if your hardware is limited.
- Running local models is not illegal -- it is math running on your computer. How you use the output is what matters.
- The over-refusal problem is real: models trained on narrow corporate values will systematically decline political, medical, and security questions you may have entirely legitimate reasons to ask.




































































