Building a Visual Recommendation Engine for Men’s Fashion
Explore how to create a cutting-edge visual recommendation system that goes beyond traditional tags and titles, enhancing users’ shopping experiences with personalized suggestions based on image analysis.
Building a Visual Recommendation System for Men’s Fashion
Recommendation systems have become a staple in the way we consume content today. Applications like Netflix, Spotify, and Amazon harness these systems to recommend products or media based on user behavior and preferences. But what happens when we want to recommend items visually? This article walks you through the creation of a visual recommendation engine tailored for men’s fashion using image embeddings and Qdrant, a powerful vector database.
Learning Objective
In this tutorial, we will cover:
- Understanding how image embeddings represent visual content.
- Using FastEmbed to generate vectors.
- Storing and searching these vectors with Qdrant.
- Building a feedback-driven recommendation system.
- Creating a user interface with Streamlit.
Use Case: Visual Recommendations for T-Shirts and Polos
Let’s say a user clicks on a trendy polo shirt. Instead of relying on product tags, our fashion recommendation system will recommend similar T-shirts and polos based solely on the image itself. Let’s dive into how this works!
Step 1: Understanding Image Embeddings
What Are Image Embeddings?
Image embeddings are numerical vectors representing key features in an image. Images that are visually similar have embeddings that are close in vector space, enabling the system to measure similarity effectively. For instance, two different T-shirts may be different in pixel values but have similar embeddings due to color, patterns, or textures.
How Are Embeddings Generated?
We use deep learning models, specifically Convolutional Neural Networks (CNNs), to extract visual patterns. Here, we will utilize FastEmbed with the Qdrant/Unicom-ViT-B-32 model to generate our embeddings.
from fastembed import ImageEmbedding
from typing import List
from dotenv import load_dotenv
import os
load_dotenv()
model = ImageEmbedding(os.getenv("IMAGE_EMBEDDING_MODEL"))
def compute_image_embedding(image_paths: List[str]) -> list[float]:
return list(model.embed(image_paths))
This function takes a list of image paths, returning vectors that encapsulate the essence of those images.
Step 2: Getting the Dataset
For this project, we will use a dataset of approximately 2,000 men’s fashion images available on Kaggle. Here’s how to load the dataset:
import shutil, os, kagglehub
from dotenv import load_dotenv
load_dotenv()
kaggle_repo = os.getenv("KAGGLE_REPO")
path = kagglehub.dataset_download(kaggle_repo)
target_folder = os.getenv("DATA_PATH")
def getData():
if not os.path.exists(target_folder):
shutil.copytree(path, target_folder)
Step 3: Store and Search Vectors with Qdrant
Once we have our embeddings, we can store and search them using Qdrant, a scalable vector database.
from qdrant_client import QdrantClient
client = QdrantClient(
url=os.getenv("QDRANT_URL"),
api_key=os.getenv("QDRANT_API_KEY"),
)
Here’s how we insert images paired with their embeddings into a Qdrant collection:
class VectorStore:
def insert_images(self, image_paths: List[str]):
for batch in chunked(image_paths, self.embed_batch):
embeddings = compute_image_embedding(batch) # Batch embed
points = [
models.PointStruct(id=str(uuid.uuid4()), vector=emb, payload={"image_path": img})
for emb, img in zip(embeddings, batch)
]
self.client.upload_points(collection_name=self.collection_name, points=points)
This code takes a list of image file paths and uploads those embeddings to Qdrant, ensuring efficient processing.
Search Similar Images
To find visually similar images, you can use this function:
def search_similar(query_image_path: str, limit: int = 5):
emb_list = compute_image_embedding([query_image_path])
hits = client.search(collection_name="fashion_images", query_vector=emb_list[0], limit=limit)
return [{"id": h.id, "image_path": h.payload.get("image_path")} for h in hits]
Step 4: Create the Recommendation Engine with Feedback
Imagine the user likes some images and dislikes others—can the system adapt? With Qdrant, it can!
class RecommendationEngine:
def get_recommendations(self, liked_images:List[str], disliked_images:List[str], limit=10):
recommended = client.recommend(collection_name="fashion_images", positive=liked_images, negative=disliked_images, limit=limit)
return [{"id": hit.id, "image_path": hit.payload.get("image_path")} for hit in recommended]
This function takes user preferences into account to provide personalized recommendations.
Step 5: Build a UI with Streamlit
Now, let’s create a simple user interface using Streamlit. Users can browse clothing items, like or dislike items, and see new, improved recommendations.
import streamlit as st
from PIL import Image
import os
# Setup the Streamlit app
st.set_page_config(page_title="🧥 Men's Fashion Recommender", layout="wide")
# Caching the data
@st.cache_resource
def initialize_data():
getData()
return VectorStore(), RecommendationEngine()
vector_store, recommendation_engine = initialize_data()
# Display recommendations and preferences
Conclusion
Congratulations! You’ve built a complete fashion recommendation system that can visually comprehend clothing items and offer smart suggestions. With FastEmbed, Qdrant, and Streamlit, you’ve harnessed powerful technologies that can be adapted to various image-based recommendations, expanding the possibilities for visual search in many domains.
Frequently Asked Questions
Do the numbers in image embeddings represent pixel intensities?
No, they capture semantic features like shapes, colors, and textures—understanding the image’s meaning beyond raw pixel data.
Does this recommendation system require training?
No, it operates on vector similarity without needing to train a traditional model.
Can I fine-tune or train my own image embedding model?
Yes, you can use frameworks like TensorFlow or PyTorch to customize embeddings for specific needs.
Is it possible to query image embeddings using text?
Yes, if using a multimodal model that maps both images and text into the same vector space.
Building a visual recommendation system opens gateways to innovative user experiences. Dive into the code, explore the technologies, and start customizing your own solutions!