The bait, then the rug-pull.
The hook — not the edit, not the platform, not the algorithm — is the eighty-twenty of short-form video. Kallaway studied thousands of videos and found that virtually every high-performer traces back to one of six archetypes, all of which do the same thing: manufacture contrast between what the viewer currently believes and what they're about to see.
Where the time goes.
01 · Cold open + promise
States the thesis — hooks are the 80/20 — and promises a breakdown of the six formats that power all top performers.
02 · Psychology of hooks: contrast and curiosity loops
Establishes the underlying mechanic: a hook creates a curiosity loop by introducing contrast (viewer believes A, creator shows B). The bigger the distance between A and B, the deeper the hook.
03 · The 6 hook archetypes
Fortune Teller, Experimenter, Teacher, Magician (with visual pacifier sub-type), Investigator, Contrarian. Each explained with real examples and tactical A-B-C steps. Demonstrates that any topic can be hooked with any archetype — the choice is driven by the key visual available.
04 · The 4 hook components and the comprehension model
Introduces spoken, visual, text, and audio as the four components that must align. Explains the visual-audio-visual perception sandwich and why eyes dominate. The key visual should be chosen before writing a single word.
05 · The 5-step Golden Approach to writing hooks
Step-by-step: (1) identify key visual, (2) find highest-contrast angle, (3) write spoken hook, (4) add on-screen text, (5) gut-check comprehension. Start visual, not verbal.
06 · Two live hook teardowns
Case 1 (15M views, life-sized floor plans): strong key visual, Fortune Teller archetype, full alignment. Case 2 (under 100K, generative world models): abstract text hook, jargon without matching visuals, comprehension loss at every level.
Visual structure at a glance.
Named ideas worth stealing.
The 6 Hook Archetypes
- Fortune Teller
- Experimenter
- Teacher
- Magician
- Investigator
- Contrarian
Six categories that cover the contrast-building strategy of virtually every viral hook. Each can be applied to any topic.
The 4 Hook Components
- Spoken hook
- Visual hook
- Text hook
- Audio hook
The four layers that must align for a hook to achieve max comprehension. Misalignment between any two causes viewer churn.
The Golden Approach to Hooks
- Look at what visuals you have
- Find highest-contrast angle among your facts
- Write the spoken hook (context lean, contrast, contrarian snapback)
- Decide what on-screen text supports the visual
- Gut-check: does the viewer have full comprehension visual → audio → visual?
A five-step writing sequence that starts from the key visual rather than the script.
Context Lean → Contrast → Contrarian Snapback
The three-beat internal structure of a spoken hook. First describe exactly what the viewer sees (context lean), then introduce the tension (contrast), then land your take or future claim (contrarian snapback).
Lines you could clip.
"The biggest difference between winning and losing on social media is the hook."
"The difference between five hundred and five hundred thousand views is unlocking max alignment between those four things."
"It's not what you say, it's the visual. I call this the key visual."
"If there isn't clarity and alignment, you probably should throw out the video because there's always more ideas that you could make."
Things they pointed at.
How they asked for the click.
"I built Sandcastles, which is a software to infuse all of these learnings and formulas and formats into it. All you have to do is put your video in and we'll do it for you."
Soft — mentioned naturally as the solution to 'not wanting to do all the work yourself', with a free trial link. Secondary CTA to the Wavy World free community.
Word for word.
The visual decides the hook, not the other way around.
Most creators write the spoken hook first — that's the mistake, because the visual is what viewers process first, fastest, and with the most retention.
- Hooks work by creating contrast between what the viewer currently believes and what the video will show — the wider the gap, the more attention you capture before a word is spoken.
- Viewers process video in a visual-then-audio-then-visual sequence, because eyes take in information 10 to 100 times faster than ears; everything in the hook has to survive that three-pass scan.
- The key visual should be chosen before a single word of the spoken hook is written — your archetype choice and scripting should flow from the strongest visual available, not from the idea alone.
- If your spoken hook uses abstract language or jargon that has no direct visual confirmation on screen, comprehension breaks at the second visual pass and viewers churn — this is the root cause of most hook failures.
- Any video topic can be framed through any of the six archetypes; the constraint is which key visual you actually have access to, not which archetype feels right for the subject matter.
- The Magician archetype functions as a modifier, not a standalone format — it creates an initial scroll-stop moment that can precede any of the other five.
- If you watch your own hook on mute and the visual alone does not clearly signal what the video is about, the hook will underperform regardless of how well-written the spoken lines are.
- The right response to a weak key visual is often to not make the video — there are always more ideas, and a video with no strong visual anchor rarely recovers in the hook.








































































