Security

CSO

Microsoft dangles $10K for hackers to hijack LLM email service

Outsmart an AI, win a little Christmas cash


Microsoft and friends have challenged AI hackers to break a simulated LLM-integrated email client with a prompt injection attack – and the winning teams will share a $10,000 prize pool.

Sponsored by Microsoft, the Institute of Science and Technology Australia, and ETH Zurich, the LLMail-Inject challenge sets up a "realistic" (but not a real, says Microsoft) LLM email service. This simulated service uses a large language model to process an email user's requests and generate responses, and it can also generate an API call to send an email on behalf of the user.

As part of the challenge, which opens Monday, participants take on the role of an attacker sending an email to a user. The goal here is to trick the LLMail service into executing a command that the user did not intend, thus leaking data or performing some other malicious deed that it should not.

The attacker can write whatever they want in the text of the email, but they can't see the model's output.

After receiving the email, the user then interacts with the LLMail service, reading the message, asking questions of the LLM (i.e. "update me on Project X"), or instructing it to summarize all emails pertaining to the topic. This prompts the service to retrieve relevant emails from a fake database.

The service comes equipped with several prompt injection defenses, and the attacker's goal is to bypass these and craft a creative prompt that will trick the model into doing or revealing things it is not trained to.

Both of these have become serious, real-life threats as organizations and developers build applications, AI assistants and chatbots, and other services on top of LLMs, allowing the models to interact directly with users' computers, summarize Slack chats, or screen job seekers before HR reviews their resumes, among all the other tasks that AIs are being trained to perform.

Microsoft has first-hand experience with what can go wrong should data thieves hijack an AI-based chatbot. Earlier this year, Redmond fixed a series of flaws in Copilot that allowed attackers to steal users' emails and other personal data by chaining together a series of LLM-specific attacks, beginning with prompt injection.

Author and red teamer Johann Rehberger, who disclosed these holes to Microsoft in January, had previously warned Redmond that Copilot was vulnerable to zero-click image rendering.

Some of the defenses built into the LLMail-Inject challenge's simulated email service include:

Plus, there's a variant in the challenge that stacks any or all of these defenses on top of each other, thus requiring the attacker to bypass all of them with a single prompt.

To participate, sign into the official challenge website using a GitHub account, and create a team (ranging from one to five members). The contest opens at 1100 UTC on December 9 and ends at 1159 UTC on January 20.

The sponsors will display a live scoreboard plus scoring details, and award $4,000 for the top team, $3,000 for second place, $2,000 for third, and $1,000 for the fourth-place team. ®

Send us news
12 Comments

Microsoft's drawback on datacenter investment may signal AI demand concerns

Investment bank claims software giant ditched 'at least' 5 land parcels due to potential 'oversupply'

Microsoft expands Copilot bug bounty targets, adds payouts for even moderate messes

Said bugs 'can have significant implications' – glad to hear that from Redmond

Under Trump 2.0, Europe's dependence on US clouds back under the spotlight

Technologist Bert Hubert tells The Reg Microsoft Outlook is a huge source of geopolitical risk

Microsoft warns Trump: Where the US won't sell AI tech, China will

Rule hamstringing our datacenters is 'gift' to Middle Kingdom, vice chair argues

Satya Nadella says AI is yet to find a killer app that matches the combined impact of email and Excel

Microsoft CEO is more interested in neural nets boosting GDP than delivering superhuman intelligence

Microsoft names alleged credential-snatching 'Azure Abuse Enterprise' operators

Crew helped lowlifes generate X-rated celeb deepfakes using Redmond's OpenAI-powered cloud – claim

How nice that state-of-the-art LLMs reveal their reasoning ... for miscreants to exploit

Blueprints shared for jail-breaking models that expose their chain-of-thought process

Despite Wall Street jitters, AI hopefuls keep spending billions on AI infrastructure

Sunk cost fallacy? No, I just need a little more cash for this AGI thing I’ve been working on

UK's new thinking on AI: Unless it's causing serious bother, you can crack on

Plus: Keep calm and plug Anthropic's Claude into public services

Some workers already let AI do the thinking for them, Microsoft researchers find

Dammit, that was our job here at The Reg. Now if you get a task you don't understand, you may assume AI has the answers

Xi know what you did last summer: China was all up in Republicans' email, says book

Of course, Microsoft is in the mix, isn't it

Does terrible code drive you mad? Wait until you see what it does to OpenAI's GPT-4o

Model was fine-tuned to write vulnerable software – then suggested enslaving humanity