WebSockets 如何在我们的 AWS 账单上花费了我们 100 万美元

  • Hiring: Recall.ai is hiring engineers. Apply here to join the team.
  • Cloud Cost Optimization: IPC (Inter-Process Communication) is often overlooked in optimizing cloud costs. Inefficient IPC on AWS can lead to huge bills. Recall.ai processes millions of meetings monthly on AWS and is focused on performance and efficiency.
  • CPU Usage: Initially, video encoding and decoding were expected to consume most CPU. But profiling showed that __memmove_avx_unaligned_erms and __memcpy_avx_unaligned_erms functions were using the majority of CPU time. These functions are optimized for AVX and unaligned memory access.
  • WebSocket Issues: Recall.ai used a local WebSocket server to transport raw decoded video from Chromium's Javascript environment. WebSocket fragmentation (splitting large messages into multiple frames) and masking (randomly XORing data) were causing high computational costs. A single 1080p raw video frame was fragmented into 24 WebSocket frames. Masking added an additional pass over every byte sent.
  • Search for a Cheaper Transport: WebSocket had performance pitfalls, so Recall.ai explored other options. TCP/IP had issues with fragmentation and copying data between user-space and kernel-space. Unix Domain Sockets had similar copying overhead. Shared Memory was chosen as it allowed direct access without copying between processes, but required building the transport from scratch.
  • Ring Buffer Implementation: A ring buffer was settled on as the high-level transport design. It needed to be lock-free, support multiple producers and a single consumer, handle dynamic frame sizes, enable zero-copy reads, be sandbox friendly, and have low latency signalling. An in-house ring buffer implementation was created with three pointers and atomic operations. After deployment with other optimizations, CPU usage was reduced by up to 50% and AWS bill was reduced by over a million dollars per year.
阅读 20
0 条评论