Achieving Character Consistency through Fine-Tuning in Storyboarding
Solution Overview
Prepare the Training Data
Creative Character Extraction
Data Labeling
Human Verification
Fine-tune Amazon Nova Canvas
Create a Fine-Tuning Job on the Amazon Bedrock Console
Create a Fine-Tuning Job Using the Python SDK
Deploy the Fine-Tuned Model
Deploy the Model on the Amazon Bedrock Console
Deploy the Model Using the Python SDK
Test the Fine-Tuned Model
Clean Up
Conclusion
About the Authors
Achieving Visual Consistency in Animation with Fine-tuning: A Journey with Amazon Nova Canvas
In the ever-evolving world of animated storytelling, achieving professional-grade visual consistency across multiple scenes is a paramount challenge. While careful prompt crafting can provide impressive results, it often falls short when it comes to maintaining uniformity in character appearances, expressions, and styles. This is where the magic of fine-tuning comes into play. In this blog post, we will explore how to elevate the consistency of animated characters by fine-tuning an Amazon Nova Canvas foundation model (FM). We will take inspiration from the animated short film Picchu produced by FuzzyPixel using AWS and walk through an automated workflow designed to streamline the storyboard creation process.
Solution Overview
To implement our comprehensive solution, we designed an automated workflow utilizing various AWS services that process video assets and prepare them for character-model fine-tuning.
Workflow Steps
- Upload Video Asset: A user uploads a video of the animated short film to an Amazon S3 bucket.
- Process Video: Amazon Elastic Container Service (Amazon ECS) is triggered to analyze the video.
- Extract Frames: The ECS downsamples the frames, selects those containing key characters, and center-crops them to produce character images.
- Generate Captions: The ECS invokes an Amazon Nova model (Amazon Nova Pro) to generate captions from the images.
- Store Metadata: Captions and metadata are written back to the S3 bucket.
- Fine-tune the Model: Using Amazon SageMaker, the user initiates a model training job to fine-tune the carved-out character images and their corresponding descriptions.
- Create Custom Model: The user customizes the Amazon Nova Canvas model for inference, ensuring character consistency across various scenes.
This two-phase workflow initially focuses on preparing the training data before shifting to the fine-tuning procedure and test inference. You can find example code in our GitHub repository linked below.
Preparing the Training Data
In the first phase, we aim to develop an automated video character extraction pipeline. Here’s how we accomplish this:
Creative Character Extraction
We begin by sampling video frames at fixed intervals (e.g., one frame per second) and leveraging Amazon Rekognition’s label detection and face collection search capabilities. This allows us to identify and track characters by matching their features against a pre-populated face collection.
During the extraction phase, we center-crop the identified characters while applying a deduplication algorithm using the Amazon Titan Multimodal Embeddings model to ensure a diverse dataset. Redundant images can lead to model overfitting, so we adjust our similarity threshold to optimize the dataset’s quality.
Data Labeling
After extracting the images, we generate captions using Amazon Nova Pro. This labeling process emphasizes character attributes uniquely and aids the foundation model (FM) in accurately identifying characters in various contexts. The resulting JSONL file is uploaded to S3 for training.
For example, an entry might look like this:
{
"image_ref": "s3://media-ip-dataset/characters/blue_character_01.jpg",
"alt_text": "This animated character features a round face with large expressive eyes and a distinctive blue color scheme."
}
Human Verification
To ensure quality, especially in enterprise scenarios, it’s advisable to incorporate a human-in-the-loop approach. Amazon Augmented AI (Amazon A2I) can help quickly verify labeled data for accuracy.
Fine-tuning Amazon Nova Canvas
Now that we have our training data structured and ready, it’s time to fine-tune our Amazon Nova Canvas model.
Create a Fine-tuning Job
Through the Amazon Bedrock console, creating a fine-tuning job is straightforward. Users can customize hyperparameters, input data locations, and output storage details. A well-considered combination of hyperparameters can make all the difference in achieving the desired character consistency.
Testing and Deploying the Model
After the fine-tuning job is complete—potentially taking up to 12 hours—you can deploy the fine-tuned model on Amazon Bedrock. This newly created model ensures quicker and more consistent outputs for various scenes in your animated shorts.
Example Code Snippet for Deployment
You can easily deploy the model using a Python SDK:
response_ft = bedrock.create_model_customization_job(
jobName="picchu-canvas-v0",
customModelName="picchu-canvas-v0",
roleArn=roleArn,
baseModelIdentifier="amazon.nova-canvas-v1:0",
hyperParameters=hyperParameters,
trainingDataConfig={"s3Uri": training_path},
outputDataConfig={"s3Uri": f"s3://{bucket}/{prefix}"}
)
Testing the Fine-tuned Model
Once the provisioned throughput is live, you can generate new images for potential sequels to Picchu with unprecedented consistency!
def generate_image(prompt, num_of_images=3):
...
response = bedrock_runtime.invoke_model(modelId=provisioned_model_id, body=json.dumps(request_body))
...
Clean Up
To avoid unexpected AWS charges, make sure to clean up resources once testing is complete. This includes removing the Amazon SageMaker Studio domain and deleting the fine-tuned Amazon Nova model.
Conclusion
In this post, we demonstrated how to elevate character and style consistency in storyboarding by fine-tuning the Amazon Nova Canvas in Amazon Bedrock. Our comprehensive workflow fuses automated video processing, intelligent character extraction, and precise model customization, dramatically accelerating the storyboarding process. By investing in fine-tuning, creative teams can swiftly produce high-quality storyboards, achieving a level of consistency that far surpasses conventional prompt engineering.
Ready to elevate your storytelling? Start experimenting with Nova Canvas fine-tuning today!
About the Authors
Dr. Achin Jain is a Senior Applied Scientist at Amazon AGI. With over a decade of experience, he’s pioneering advancements in multi-modal foundation models.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS, helping customers design scalable AI/ML solutions with a focus on computer vision.
Randy Ridgley is a Principal Solutions Architect specializing in real-time analytics and AI. With a focus on scalable data solutions, he helps enterprises extract actionable insights from diverse data streams.
For comprehensive example code and further information, check out our GitHub repository.