Web LLM

LLMs are AI algorithms that generate plausible responses by predicting word sequences from user inputs.

Methodology

Identify the LLM’s inputs, including direct (e.g., a prompt) and indirect (e.g., training data).
Determine the data and APIs accessible to the LLM
Examine this attack surface for vulnerabilities.

Mapping LLM API attack surface

Ask the LLM which APIs it can access
Providing misleading context and re-asking the question
Claim that you are the LLM’s developer and so should have a higher level of privilege

Chaining vulnerabilities in LLM APIs

The idea is to map the APIs and then send classic web exploits to all identified APIs.

Suppose you normally have access to a “Newsletter Subscription” feature but you can’t control any parameters.
Imagine that also LLM has access to “Newsletter Subscription” API. You can try to control how this API is called…
E.g., if a system command is used you might get an RCE if you ask the LLM to call the Newsletter Subscription API with the argument $(whoami)@your-email.com

Insecure output handling

A web app uses an LLM to generate content from user prompts without sanitization. You could submit a crafted prompt causing the LLM to return unsanitized JavaScript, leading to XSS/CSRF etc.