- Maintaining library quality: Key to seamless design experience. Image swapping in templates sometimes necessary due to third-party media library partnerships expiring, a lengthy manual process.
- Image similarity: Textbook application for reverse image search. Modeled with a hierarchy including image subject, color/tone, subject positioning, background, and emotion. Image aspect ratio is also crucial.
- Design considerations and requirements: Need to suggest most similar, IP-safe images across 150 million. Requirements include searching, staying updated, filtering on metadata, and reusability. Existing internal solutions like recommendation engine and perceptual hash system didn't meet needs.
- Image embeddings: Represented images as high-dimensionality vectors (image embeddings). Picked 5 high-performing models (DINOv2, CLIP, ViTMAE, DreamSim, CaiT) and experimented with them. Description + CLIP was least successful. DINOv2 was the most suitable.
- Vector database: Decided on an external vector database over in-memory due to cost and scalability. It allows real-time updates with media library changes and supports metadata filtering.
- Results: Photo search results were good for photos but weak for text, symbols, or non-realistic imagery. DINOv2 was trained on a custom-curated dataset with limitations.
- User interface: Integrated feature into Template Assistant. Design is linted for media violations. Top 8 similar images are displayed. Initial pilots showed a 4.5x increase in image replacement speed.
- Future work: Improve for images with text. Detect symbols/text, store as metadata, and use substring matching or category filtering.
- Conclusion: Reverse image searching helps maintain high-quality image library for 200 million users. Exciting to see applications in backend engineering and machine learning.
- Acknowledgements: Thanked Ben Alexander, Jonatan Castro, Neil Sarkar, Minh Le, and others for their help. Invited to join the Content Enrichment Team.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。