Transforming Pet Monitoring: How Tomofun Optimized Furbo’s Inference with AWS Inferentia2
Revolutionizing Remote Pet Interaction with Furbo
Challenge: Reducing GPU Inference Costs for Scalable Real-Time Monitoring
Solution Overview: Pet Behavior Detection Architecture on AWS
Improving BLIP on Inferentia2: A Modular Approach
Original Model Code: Ensuring Integrity During Migration
Wrapper Code: Adapting to Neuron’s Requirements
Model Compilation for Inferentia2: Streamlining the Process
Model Deployment on Inferentia2: Achieving Seamless Integration
Stress Testing: Validating Performance at Scale
Conclusion: Achieving 83% Cost Reduction and Maintaining High Throughput
About the Authors: Meet the Team Behind the Innovation
Redefining Remote Pet Interaction: Tomofun’s Success with Furbo and AWS Inferentia2
In an age where technology continuously transforms our lifestyles, Tomofun, a pet-tech startup headquartered in Taiwan, is leading the charge in redefining how pet owners interact with their furry friends from afar. With its innovative Furbo Pet Camera, the company blends smart camera technology with artificial intelligence (AI) to allow pet parents to monitor behaviors such as barking and unusual activity in real time. But this evolution hasn’t come without its own set of challenges.
The Challenge: Scaling AI Efficiently
Initially, Tomofun hosted its advanced vision-language models on GPU-based Amazon Elastic Compute Cloud (EC2) instances. While GPUs are adept at processing real-time data, they can be costly, especially when managing continuous monitoring across hundreds of thousands of devices. Tomofun faced the dual challenge of sustaining high cost efficiency without sacrificing model fidelity or throughput.
With the inference workloads needing to run constantly to deliver real-time alerts, a transformation was essential. This is where AWS Inferentia2 came into play.
Solution Overview: Migrating to EC2 Inf2 Instances
Tomofun turned to EC2 Inf2 instances, purpose-built by AWS for AI workloads. Shifting to Inferentia2 required minimal changes to their existing system, meaning Tomofun could implement this transition without the need to rewrite large portions of their already optimized BLIP (Bootstrapping Language-image Pre-Training) codebase.
The architecture leverages various AWS services to efficiently manage pet behavior detection. At the center sits Furbo’s API, orchestrating image streams from customer cameras to inference endpoints. With Elastic Load Balancing (ELB) and Auto Scaling groups, the design accommodates real-time scaling to handle incoming requests.
When a frame is captured, it flows through Amazon CloudFront and ELB before hitting the API servers dedicated to monitoring pet behavior. These servers forward requests to an Auto Scaling group designed specifically for model inference.
Real-Time Flexibility and Monitoring
A significant advantage of this new architecture is its ability to direct inference requests between GPU and Inferentia2 backends in real time. The AWS CloudWatch service continuously monitors key operational metrics—including latency, throughput, and error rates—allowing Tomofun to react swiftly to performance changes.
Improving BLIP With Inferentia2
With the move to Inferentia2, Tomofun had to ensure that their BLIP model components were compatible with the new architecture. The approach involved creating lightweight wrappers for the three essential components of the BLIP model—Image Encoder, Text Encoder, and Text Decoder. By isolating these elements, Tomofun could compile them for Inferentia2 without altering the core BLIP architecture, thus ensuring seamless integration into the existing inference pipeline.
Stress Testing: Validating Performance
To confirm the system’s capability to handle real-world workloads, Tomofun conducted stress tests that simulated practical scenarios such as detecting if a dog was barking or playing. The results were promising: the EC2 Inf2 instances demonstrated the ability to manage high-throughput requests while maintaining low latency and confirmed an 83% cost reduction compared to their previous GPU setups.
Conclusions: A Model for Future Workloads
By leveraging AWS Inferentia2, Tomofun not only slashed operational costs but also upheld high performance levels—a critical requirement for their global customer base. The seamless migration strategy of using lightweight wrappers ensured that the core logic of the BLIP model remained untouched, allowing for easy scalability in response to demand.
Looking ahead, Tomofun plans to expand the use of Inferentia2 beyond vision-language models to include audio event detection and other innovative AI applications that could enhance pet-owner interactions.
Explore Further
For those interested in optimizing AI workloads similar to Tomofun’s journey, the AWS Neuron documentation provides a wealth of resources. Additionally, the Furbo website outlines the AI-powered features that keep pets safe and connected with their owners.
About the Authors
Chen-Hsin Ding, a Staff Machine Learning Engineer at Tomofun, specializes in Generative AI and MLOps best practices.
Ray Wang and Howard Su are Senior Solutions Architects at AWS, bringing their extensive experience in cloud solutions and software development to a diverse range of innovative projects.
Through its commitment to precision, efficiency, and continuous technological integration, Tomofun exemplifies how startups can harness AI to better engage with the world around them—starting with our beloved pets.