回复读者反馈:作战室与深入调查

  • Reader asked to follow up on something mentioned in 2018 about "war rooms" being bad for getting thoughtful analysis work done.
  • August 1, 2014: "Call the Cops" outage took down Facebook for hours. The Muppet-named room was repurposed as a war room during the weekly "big show" review.
  • The author was waiting for the bus when the site broke and used cell phone tethering. On the bus, he tried to follow things.
  • Working on a 13-inch Macbook Air was janky due to key layout and small screen. He ran VMWare Fusion with Ubuntu in a virtual machine.
  • The author crossed back to the building and worked from his desk as he was more effective there.
  • The goal was to root-cause why machines nuked everything when out of memory. People were hacking away to reproduce and fix the issue.
  • It took over 18 days to figure out the sequence. The machines were running Upstart for init.
  • Fork failed due to memory exhaustion. Reproducing it involved shrinking swap size and running "memeater" things.
  • The fbagent process was found to be the cause. It ran as root, called fork, and later killed "pid -1", killing everything.
  • The checked-in source code showed a fix but production was still running the buggy version.
  • The author couldn't imagine doing multi-window parallel investigation on a small laptop screen in a war room.
  • A war room might work for coordinating activities in a crisis but not for "heads-down hack" work. The author has seen a gathering work nicely in another context.
阅读 10
0 条评论