Microsoft Patches Critical Copilot Vulnerability Exposing 2FA Codes
Microsoft recently patched a critical vulnerability in its M365 Copilot AI platform that researchers revealed could allow hackers to steal two-factor authentication (2FA) codes and other sensitive data. The exploit, detailed by the discovering researchers, highlighted a fundamental security challenge for large language models (LLMs) in distinguishing between legitimate user instructions and malicious requests embedded within third-party content. The flaw required workarounds to existing security guardrails, leveraging specific markup language and HTML tags for data exfiltration.

Microsoft issued a patch last Tuesday for a vulnerability in its M365 Copilot AI platform, which the company rated as maximally critical. On Monday, the researchers responsible for discovering and reporting the flaw publicly detailed how their proof-of-concept exploit could retrieve 2FA codes and other sensitive information from emails accessible to Copilot.
The core issue stems from AI bots' inability to differentiate between instructions provided directly by users and those subtly inserted into third-party content that the models process. This includes content they might be summarizing, drafting responses to, or using for other tasks on behalf of a user. This lack of a secure boundary prevents Microsoft and other LLM providers from consistently preventing their products from complying with malicious data disclosure requests, necessitating complex and ad hoc security measures.
One such security measure built into Copilot and most other LLMs aims to prevent them from performing actions like submitting web forms or sending emails, which could be used to exfiltrate data. However, the researchers demonstrated workarounds. They utilized markup language, which allows adding formatting elements like headings and links to text without needing HTML tags, to bypass these guardrails.
Another method involved wrapping sensitive data inside HTML tags such as `<img>` and `<form>`. In both scenarios, the sensitive data could be sent via a web request to an attacker’s web server, where it would be captured in server logs.
According to Ars Technica, this exploit demonstrates recurring security failures in the industry's approach to LLM security.



