Achieving Character Consistency through Fine-Tuning in Storyboarding

Solution Overview

Prepare the Training Data

Creative Character Extraction

Data Labeling

Human Verification

Fine-tune Amazon Nova Canvas

Create a Fine-Tuning Job on the Amazon Bedrock Console

Create a Fine-Tuning Job Using the Python SDK

Deploy the Fine-Tuned Model

Deploy the Model on the Amazon Bedrock Console

Deploy the Model Using the Python SDK

Test the Fine-Tuned Model

Clean Up

Conclusion

About the Authors

Achieving Visual Consistency in Animation with Fine-tuning: A Journey with Amazon Nova Canvas

In the ever-evolving world of animated storytelling, achieving professional-grade visual consistency across multiple scenes is a paramount challenge. While careful prompt crafting can provide impressive results, it often falls short when it comes to maintaining uniformity in character appearances, expressions, and styles. This is where the magic of fine-tuning comes into play. In this blog post, we will explore how to elevate the consistency of animated characters by fine-tuning an Amazon Nova Canvas foundation model (FM). We will take inspiration from the animated short film Picchu produced by FuzzyPixel using AWS and walk through an automated workflow designed to streamline the storyboard creation process.

Solution Overview

To implement our comprehensive solution, we designed an automated workflow utilizing various AWS services that process video assets and prepare them for character-model fine-tuning.

Workflow Steps

Upload Video Asset: A user uploads a video of the animated short film to an Amazon S3 bucket.
Process Video: Amazon Elastic Container Service (Amazon ECS) is triggered to analyze the video.
Extract Frames: The ECS downsamples the frames, selects those containing key characters, and center-crops them to produce character images.
Generate Captions: The ECS invokes an Amazon Nova model (Amazon Nova Pro) to generate captions from the images.
Store Metadata: Captions and metadata are written back to the S3 bucket.
Fine-tune the Model: Using Amazon SageMaker, the user initiates a model training job to fine-tune the carved-out character images and their corresponding descriptions.
Create Custom Model: The user customizes the Amazon Nova Canvas model for inference, ensuring character consistency across various scenes.

This two-phase workflow initially focuses on preparing the training data before shifting to the fine-tuning procedure and test inference. You can find example code in our GitHub repository linked below.

Preparing the Training Data

In the first phase, we aim to develop an automated video character extraction pipeline. Here’s how we accomplish this:

Creative Character Extraction

We begin by sampling video frames at fixed intervals (e.g., one frame per second) and leveraging Amazon Rekognition’s label detection and face collection search capabilities. This allows us to identify and track characters by matching their features against a pre-populated face collection.

During the extraction phase, we center-crop the identified characters while applying a deduplication algorithm using the Amazon Titan Multimodal Embeddings model to ensure a diverse dataset. Redundant images can lead to model overfitting, so we adjust our similarity threshold to optimize the dataset’s quality.

Data Labeling

After extracting the images, we generate captions using Amazon Nova Pro. This labeling process emphasizes character attributes uniquely and aids the foundation model (FM) in accurately identifying characters in various contexts. The resulting JSONL file is uploaded to S3 for training.

For example, an entry might look like this:

{
  "image_ref": "s3://media-ip-dataset/characters/blue_character_01.jpg",
  "alt_text": "This animated character features a round face with large expressive eyes and a distinctive blue color scheme."
}

Human Verification

To ensure quality, especially in enterprise scenarios, it’s advisable to incorporate a human-in-the-loop approach. Amazon Augmented AI (Amazon A2I) can help quickly verify labeled data for accuracy.

Fine-tuning Amazon Nova Canvas

Now that we have our training data structured and ready, it’s time to fine-tune our Amazon Nova Canvas model.

Create a Fine-tuning Job

Through the Amazon Bedrock console, creating a fine-tuning job is straightforward. Users can customize hyperparameters, input data locations, and output storage details. A well-considered combination of hyperparameters can make all the difference in achieving the desired character consistency.

Testing and Deploying the Model

After the fine-tuning job is complete—potentially taking up to 12 hours—you can deploy the fine-tuned model on Amazon Bedrock. This newly created model ensures quicker and more consistent outputs for various scenes in your animated shorts.

Example Code Snippet for Deployment

You can easily deploy the model using a Python SDK:

response_ft = bedrock.create_model_customization_job(
    jobName="picchu-canvas-v0",
    customModelName="picchu-canvas-v0",
    roleArn=roleArn,
    baseModelIdentifier="amazon.nova-canvas-v1:0",
    hyperParameters=hyperParameters,
    trainingDataConfig={"s3Uri": training_path},
    outputDataConfig={"s3Uri": f"s3://{bucket}/{prefix}"}
)

Testing the Fine-tuned Model

Once the provisioned throughput is live, you can generate new images for potential sequels to Picchu with unprecedented consistency!

def generate_image(prompt, num_of_images=3):
    ...
    response = bedrock_runtime.invoke_model(modelId=provisioned_model_id, body=json.dumps(request_body))
    ...

Clean Up

To avoid unexpected AWS charges, make sure to clean up resources once testing is complete. This includes removing the Amazon SageMaker Studio domain and deleting the fine-tuned Amazon Nova model.

Conclusion

In this post, we demonstrated how to elevate character and style consistency in storyboarding by fine-tuning the Amazon Nova Canvas in Amazon Bedrock. Our comprehensive workflow fuses automated video processing, intelligent character extraction, and precise model customization, dramatically accelerating the storyboarding process. By investing in fine-tuning, creative teams can swiftly produce high-quality storyboards, achieving a level of consistency that far surpasses conventional prompt engineering.

Ready to elevate your storytelling? Start experimenting with Nova Canvas fine-tuning today!

About the Authors

Dr. Achin Jain is a Senior Applied Scientist at Amazon AGI. With over a decade of experience, he’s pioneering advancements in multi-modal foundation models.

James Wu is a Senior AI/ML Specialist Solution Architect at AWS, helping customers design scalable AI/ML solutions with a focus on computer vision.

Randy Ridgley is a Principal Solutions Architect specializing in real-time analytics and AI. With a focus on scalable data solutions, he helps enterprises extract actionable insights from diverse data streams.

For comprehensive example code and further information, check out our GitHub repository.

Exclusive Content:

Create Consistent Character Storyboards with Amazon Nova in Amazon Bedrock – Part 2

Achieving Character Consistency through Fine-Tuning in Storyboarding

Solution Overview

Prepare the Training Data

Creative Character Extraction

Data Labeling

Human Verification

Fine-tune Amazon Nova Canvas

Create a Fine-Tuning Job on the Amazon Bedrock Console

Create a Fine-Tuning Job Using the Python SDK

Deploy the Fine-Tuned Model

Deploy the Model on the Amazon Bedrock Console

Deploy the Model Using the Python SDK

Test the Fine-Tuned Model

Clean Up

Conclusion

About the Authors

Achieving Visual Consistency in Animation with Fine-tuning: A Journey with Amazon Nova Canvas

Solution Overview

Workflow Steps

Preparing the Training Data

Creative Character Extraction

Data Labeling

Human Verification

Fine-tuning Amazon Nova Canvas

Create a Fine-tuning Job

Testing and Deploying the Model

Example Code Snippet for Deployment

Testing the Fine-tuned Model

Clean Up

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe