WEBVTT

00:00:00.000 --> 00:00:32.230
Here's the easy way to create AI videos. In this tutorial, I'm showing you exactly how I generated this realistic forty four second long fight scene and how I made the seventy one second long short film, and I'll show you how to extend them even more to be as long as you want. This is by far the simplest and easiest method to generate long AI videos. The first step is to use AI to generate a storyboard just like this one. It's gonna divide our long AI videos into these separate scenes. The tool we're gonna use for this is the newest GPT two image model.

00:00:32.390 --> 00:00:40.710
What this tool is amazing at is reasoning and generating text, and that's gonna come in really handy for us when we're trying to generate these lawn storyboards.

00:00:40.905 --> 00:00:44.665
To use GPT image two, I'm inside Higgs Field AI,

00:00:44.825 --> 00:00:50.185
and you can find it on the home page, or we can go to the list of image generators

00:00:50.265 --> 00:00:58.870
and find the new GPT image two model. For the first example I'm creating, I'm gonna generate an lawn AI video based around these two photos

00:00:59.030 --> 00:01:05.030
of the scientist in the hazmat suit and his robot companion who are exploring this toxic forest.

00:01:05.110 --> 00:01:06.710
So we have the GPT

00:01:06.710 --> 00:01:12.945
image two model chosen, and what I'm gonna do is upload those two reference images of my characters.

00:01:13.265 --> 00:01:25.720
Let's put those inside here. Next, I'm gonna ask the g p t image two model to create a full storyboard for us. The prompt can be super, super simple. Just a one sentence description is gonna be enough to create a full storyboard.

00:01:25.800 --> 00:01:30.520
Now you have some other options, uh, like choosing the quality,

00:01:30.920 --> 00:01:33.000
the resolution of the storyboard,

00:01:33.735 --> 00:01:41.095
and also the aspect ratio. I'd recommend setting this to 16 to nine. And then we just have to generate the AI storyboard.

00:01:41.175 --> 00:01:49.560
This is the full built out story. It's got 12 complete shots in it that tell a full story, and underneath each individual shot,

00:01:49.720 --> 00:01:59.080
there's a small text description. Those little text descriptions are gonna be super useful later on because we're gonna use those as part of the prompts for our AI video generator.

00:01:59.475 --> 00:02:05.155
I did notice sometimes that when you're generating a bunch of separate panels like this, there is some repetition.

00:02:05.235 --> 00:02:18.310
So if you look at this image in panel three, which is the robot gesturing upwards, it ends up being the same image as shown inside panel 11. So we need to make a minor edit to the storyboard so we don't have this repetition.

00:02:18.390 --> 00:02:21.750
So inside Hicksfield, what I'm gonna do is hit reference,

00:02:21.990 --> 00:02:24.790
which is gonna let me edit this storyboard.

00:02:25.510 --> 00:02:32.715
And so there's that storyboard added on into our image references. Let's delete all the other ones, um, and also

00:02:33.035 --> 00:02:36.795
change the prompt. What we wanna do is change

00:02:36.955 --> 00:02:54.750
panel 11 so that it's not a repetition of panel three, and that's basically exactly what I'm gonna tell the AI to do. Just adjust shot 11 so it's not a repeat of shot number three. And looking at the result, it's swapped shot 11 to one of the scientists, uh, inspecting some kind of a toolkit.

00:02:55.725 --> 00:03:08.190
So now that we have the storyboard sequence, how do we leverage this and turn it into a lawn AI video? We're gonna use a AI video model called c dance two point o. So if we have a storyboard,

00:03:08.270 --> 00:03:13.950
and what cdance can do for us is to animate all of these in a single video generation

00:03:14.190 --> 00:03:28.045
at the same time. To use the cdance video model, I'm gonna look through the video models on Higgs field, and the top one is cdance two point o. So for this AI video model, the maximum duration of each video generation

00:03:28.205 --> 00:03:30.365
happens to be fifteen seconds.

00:03:30.525 --> 00:03:33.965
So then how do we create a super long AI video sequence

00:03:34.045 --> 00:03:36.045
of our entire storyboard?

00:03:36.290 --> 00:03:39.090
Well, if we try to fit this entire storyboard,

00:03:39.090 --> 00:03:43.650
all 12 shots into a single fifteen second video,

00:03:43.730 --> 00:03:51.885
there just isn't enough time inside those fifteen seconds to animate this entire storyboard. So I'm gonna split the storyboard up

00:03:52.125 --> 00:03:54.845
and crop out each individual

00:03:54.845 --> 00:04:01.885
role. So this is the first role I've cropped out where I'm just taking the first four shots of that storyboard.

00:04:01.885 --> 00:04:27.445
And what we're gonna do is animate these four shots inside the fifteen second video sequence, and it should be able to produce a pretty good result for us. A quick technical point, you will need to layer the cropped roll on top of a 16 by nine image. This is just to make sure that Higgs Field can actually use this image as a reference. For the prompt that I'm gonna use, what I'm gonna rely on is the pregenerated

00:04:27.525 --> 00:04:37.160
text descriptions that's already inside the storyboard. So what we're gonna do is tell it to generate a scene using the shots in the uploaded film storyboard.

00:04:37.240 --> 00:04:41.240
Then for each of the shots inside this fifteen second

00:04:41.480 --> 00:04:56.025
video generation, I'm gonna give it a time frame of when I want the shot to happen. For example, the first four seconds. And then for the description of what happens in the shot, I'm basically just gonna copy the text description that's already described

00:04:56.905 --> 00:04:58.425
in the storyboard,

00:04:58.425 --> 00:05:07.880
and I'm gonna fill out the rest of the prompt using the same method. You'll also wanna add in this line at the bottom that says no music and no subtitles.

00:05:07.880 --> 00:05:20.745
I found that that just makes it much easier later on when we're putting everything together. Before we go and use this prompt though, there's an really important addition that we need to add into this to preserve character consistency.

00:05:21.065 --> 00:05:42.310
So if we just used the prompt exactly written like this, we'll get results like this, which does basically animate the scene as expected except the robot doesn't look quite the same. For starters in this sequence, the robot's legs look super long for some reason. He should be much shorter than that, and there's a lot of different variation

00:05:42.390 --> 00:06:01.830
in the way that the character looks between different video generations. We need an additional character reference sheet so that when we generate the long AI video sequence of them, they actually stay consistent throughout the entire scene. To create a character reference sheet, I'm still gonna use the GPT image two model, and I'm gonna upload the original

00:06:01.830 --> 00:06:29.360
shot of my robot in the forest. And then in the prompt, I'm gonna tell it to create a character reference sheet. I'm gonna put the prompt in the description so you can go and copy it, and the prompt is gonna give us this really useful character reference sheet to help us maintain consistency inside the videos. Now that we have all these different assets, we can start putting together our long AI video sequence. So first off, I'm gonna upload into Higgs Field the first four shots of our storyboard

00:06:29.360 --> 00:06:32.480
along with a character reference sheet of our robot.

00:06:32.640 --> 00:06:39.040
Let's drag that inside there. And inside the prompt, we're gonna write it exactly like I previously described.

00:06:39.605 --> 00:06:45.285
So first telling it to generate a scene using the shots inside the uploaded film storyboard.

00:06:45.445 --> 00:06:48.725
And here, I'm actually gonna make a reference

00:06:48.885 --> 00:06:51.125
to the film storyboard,

00:06:51.285 --> 00:06:57.640
uh, uploaded image. So I'll type at, which lets me tag different references,

00:06:57.720 --> 00:07:02.920
and then I'm gonna write out each of the individual scenes. Just basically copy in the

00:07:03.240 --> 00:07:05.240
text descriptions already in the storyboard.

00:07:05.765 --> 00:07:09.045
And then inside this prompt, I'm also gonna need to reference

00:07:09.125 --> 00:07:10.645
our character sheet

00:07:10.805 --> 00:07:18.325
for the robot as well so that the AI video knows exactly what the robot should look like. So here, I'll add an additional tag

00:07:18.990 --> 00:07:21.470
next to the word robot

00:07:21.470 --> 00:07:28.110
for our character sheet. I'm actually also gonna do this in a few other places as well.

00:07:28.590 --> 00:07:34.030
And then in the settings, I'm just gonna make sure that I'm using the full fifteen second

00:07:33.155 --> 00:07:34.515
video duration.

00:07:35.395 --> 00:07:38.355
Now let's generate the video and see what it looks like.

00:07:53.550 --> 00:07:56.270
It actually looks like it animated five

00:07:56.430 --> 00:08:07.075
shots here instead of the four shots that are prompted for, but that's not a bad problem to have. It still followed all the shots inside the storyboard accurately

00:08:07.075 --> 00:08:22.400
and the robot looks great. And then using the same exact method, I went ahead and animated the rest of the storyboard. So here's the animation sequence for roll number two of the storyboard, which is shots five through eight.

00:08:24.240 --> 00:08:26.800
And here are the last four shots of the storyboard.

00:08:30.575 --> 00:08:51.340
And now we have three separate fifteen second video clips that are in sequence to each other, which means that we can combine them together into a full forty five second AI video sequence, but it doesn't stop here. We can actually extend our storyboard as many times as we want to. So looking at the sequence of shots inside the storyboard,

00:08:51.340 --> 00:09:11.570
what if we wanted to extend this so that they start exploring the forest even more? What we can do is actually use GPT image two to generate the next 12 panels of the storyboard as well. So here's what we're gonna do. Inside the g p t two image generator, we're gonna upload the original storyboard reference.

00:09:11.810 --> 00:09:15.490
So this is the first 12 shots that we just animated.

00:09:15.490 --> 00:09:23.385
And then I'm also gonna upload a character reference sheet for my scientist and also the character reference sheet that I generated for the robot.

00:09:23.545 --> 00:09:32.745
Then inside the prompt, I'm gonna ask it to simply generate the next 12 panels of the storyboard, and here's the prompt I'm gonna use. Generate the next page of the storyboard

00:09:32.960 --> 00:09:38.320
from image one, which continues the story with 12 panels using the uploaded images.

00:09:38.400 --> 00:09:41.360
The robot reveals hidden knowledge of the forest,

00:09:41.600 --> 00:09:46.880
guiding the scientists to a deeper, more dangerous core where the source of the toxic outbreak lies.

00:09:47.555 --> 00:09:50.995
And this is the extended storyboard that GPT

00:09:50.995 --> 00:10:03.480
image two has created for us. And then I can use the same exact technique of animating four separate shots at the same time to turn this storyboard into three fifteen second

00:10:03.480 --> 00:10:04.200
video

00:10:04.440 --> 00:10:06.040
generations as well.

00:10:08.040 --> 00:10:11.400
And if you do the math, using the two storyboard pages,

00:10:11.655 --> 00:10:17.735
we'll end up with six fifteen second video clips, which is a ninety second video sequence.

00:10:18.215 --> 00:10:22.695
I found that some of the sequences that I generated were a little bit repetitive,

00:10:23.175 --> 00:10:26.375
so I ended up trimming it down to a seventy one second

00:10:26.760 --> 00:10:28.040
video instead.

00:10:28.440 --> 00:10:33.960
But you can use this method and extend your videos for as long as you want.

00:10:35.240 --> 00:10:41.720
I'm gonna put a link in the description for Higgs Field AI if you wanna go and generate your own long video sequences

00:10:41.455 --> 00:10:57.940
using GPT image two and c dense two point o. Now, one of the challenges we're gonna run into this especially when animating a more dynamic action packed scenes, for example, this fight scene that I created, is that because of how much action there is, I found that if we animate

00:10:58.180 --> 00:11:00.660
each of the roles inside the storyboard

00:11:00.740 --> 00:11:02.820
separately by themselves,

00:11:03.140 --> 00:11:06.820
it's much harder to combine them together seamlessly.

00:11:06.980 --> 00:11:09.380
So let's see how we can fix this problem.

00:11:09.945 --> 00:11:17.305
First off, for my fight scene, these are the characters and the environment that I want them to be in, and this is the storyboard

00:11:17.385 --> 00:11:24.720
and the prompt that I used inside GPT image two to create this. Now when I go and animate each row separately of the storyboard,

00:11:24.960 --> 00:11:33.040
each of the individual animations look really, really good. So this is the first role animated, and keep an eye on what happens in the last scene.

00:11:36.495 --> 00:11:41.855
The bounty hunter suddenly ambushes the female character and has her in this chokehold.

00:11:42.495 --> 00:11:49.695
Now when I go and animate the second row of the storyboard, it starts off with the first frame of them already engaged

00:11:49.695 --> 00:11:53.110
inside a fight scene. If we look at this individually,

00:11:53.110 --> 00:11:57.510
it's a really amazing looking AI video. However, if we try to combine

00:11:57.750 --> 00:12:01.270
those two separate fifteen second clips together,

00:12:01.590 --> 00:12:06.415
the transition is gonna look a little weird. So starting with the first video sequence

00:12:07.375 --> 00:12:16.975
and suddenly jumping into the second video sequence. That transition right there, it doesn't really make sense. She's in a choke hold and suddenly is free and engaged in a fight scene.

00:12:17.690 --> 00:12:27.130
So then how do we get the transitions to be seamless inside our lawn AI video? What we're gonna need to do is give the AI some extra information

00:12:27.370 --> 00:12:34.045
when generating each individual video clip. So first, I'm gonna use this tool called video frame extractor,

00:12:34.045 --> 00:12:41.165
and I'm gonna save that last image frame of our video sequence. Now when we go and generate the next four shots

00:12:41.245 --> 00:12:45.485
of our storyboard sequence, actually tell the AI to use the screenshot

00:12:45.690 --> 00:12:54.570
I just saved as the first frame and generate those next four shots starting from that initial screenshot.

00:12:54.810 --> 00:13:07.625
This is what it's gonna look like inside Higgs field first with all of our image references uploaded. And then inside the prompt, I'm gonna tell it to generate a scene using the uploaded film storyboard fight sequence

00:13:07.945 --> 00:13:10.345
starting with this image frame

00:13:10.425 --> 00:13:17.700
of the female character getting attacked. And then the rest of the prompt is written just like how it was done before,

00:13:17.860 --> 00:13:18.580
basically

00:13:18.900 --> 00:13:22.580
describing what happens inside each storyboard sequence.

00:13:22.740 --> 00:13:40.400
Now when we go and generate this scene, it should generate the next four shots starting with the initial image frames. And using this method, you can generate endless continuous shots for your AI films. Here's a quick note about using Higgs field. When you upload image references,

00:13:40.800 --> 00:13:44.880
what Higgs field is gonna do is check the eligibility

00:13:44.880 --> 00:13:46.240
of each image.

00:13:46.400 --> 00:13:49.680
This is to avoid any copyright issues.

00:13:50.000 --> 00:13:54.935
Now if you're using, like, a celebrity or a scene from a movie or something,

00:13:55.255 --> 00:13:56.615
it's gonna get denied.

00:13:57.015 --> 00:14:01.015
But sometimes when you upload images of your own characters

00:14:01.015 --> 00:14:03.255
as well, it can also get denied.

00:14:03.415 --> 00:14:13.220
So the first time that I tried to upload my reference sheet for my bounty hunter character, it actually got determined as not eligible.

00:14:13.700 --> 00:14:18.180
But then when I tried again, the next time it was determined

00:14:18.180 --> 00:14:25.705
as eligible to be used. So if you try to do this and you upload an image of yourself or your characters and it gets denied,

00:14:25.865 --> 00:14:37.590
just try uploading that image reference a few different times and eventually it might work. Let's take a look at what the full animated fight scene looks like so you can get an idea of what this method is capable of.

00:15:21.535 --> 00:15:28.095
If you also want a complete breakdown of 10 practical tips to generate the most realistic

00:15:28.095 --> 00:15:31.650
possible AI videos, go watch his guide right here.