大型语言模型供应商在响应安全问题方面极其糟糕

发布于 7 月 24 日

IT consultant Mark Pesce was building an LLM-based similarity finder for a legal client and discovered a prompt that made multiple LLMs output gibberish like "it descended into a babble-like madness. Which went on and on and on and on and … on."
Pesce didn't reveal the prompt but his LLM service provider Groq acknowledged it could reproduce the bug across multiple LLMs.
In computer security, causing a crash often leads to possible attacks. Adversarial prompt attacks to break LLM guard rails and get them to spill prohibited data are known and easy. The BEAST paper describes a related attack construction kit.
Pesce reported the discovery to LLM providers but they all either decided it wasn't a security issue or were uncontactable. In some cases, the only contact was through the company's PR agency, and in others, the AI team was too busy after upgrading the LLM.
Microsoft later got back to Pesce after patching a workaround in Copilot and said it wasn't a security issue that could lead to other adversarial prompts. Pesce then asked if it was okay to publish the prompt if it wasn't an issue and Microsoft has yet to reply.

阅读 11