- Hiring: Recall.ai is hiring engineers. Apply here to join the team.
- Cloud Cost Optimization: IPC (Inter-Process Communication) is often overlooked in optimizing cloud costs. Inefficient IPC on AWS can lead to huge bills. Recall.ai processes millions of meetings monthly on AWS and is focused on performance and efficiency.
- CPU Usage: Initially, video encoding and decoding were expected to consume most CPU. But profiling showed that
__memmove_avx_unaligned_ermsand__memcpy_avx_unaligned_ermsfunctions were using the majority of CPU time. These functions are optimized for AVX and unaligned memory access. - WebSocket Issues: Recall.ai used a local WebSocket server to transport raw decoded video from Chromium's Javascript environment. WebSocket fragmentation (splitting large messages into multiple frames) and masking (randomly XORing data) were causing high computational costs. A single 1080p raw video frame was fragmented into 24 WebSocket frames. Masking added an additional pass over every byte sent.
- Search for a Cheaper Transport: WebSocket had performance pitfalls, so Recall.ai explored other options. TCP/IP had issues with fragmentation and copying data between user-space and kernel-space. Unix Domain Sockets had similar copying overhead. Shared Memory was chosen as it allowed direct access without copying between processes, but required building the transport from scratch.
- Ring Buffer Implementation: A ring buffer was settled on as the high-level transport design. It needed to be lock-free, support multiple producers and a single consumer, handle dynamic frame sizes, enable zero-copy reads, be sandbox friendly, and have low latency signalling. An in-house ring buffer implementation was created with three pointers and atomic operations. After deployment with other optimizations, CPU usage was reduced by up to 50% and AWS bill was reduced by over a million dollars per year.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用。你还可以使用@来通知其他用户。