3 Vulnerabilities in Generative AI Systems and How Penetration Testing Can Help

3 Vulnerabilities in Generative AI Systems and How Penetration Testing Can Help

Published: Oct 17, 2024

Last Updated: Oct 18, 2024

With proven real-life use cases, it’s a no-brainer that companies are looking for ways to integrate large language models (LLMs) into their existing offerings to generate content. A combination that’s often referred to as Generative AI, LLMs enable chat interfaces to have a human-like, complex conversation with customers and respond dynamically, saving you time and money. However, with all these new, exciting bits of technology come related security risks—some that can arise even at the moment of initial implementation.

Given that the capabilities of LLMs are powerful and ever-growing, it would make sense that you’re looking to leverage them, but as cybersecurity experts with a dedicated artificial intelligence (AI) practice, we know that with great power comes great responsibility and we want to help you with yours.

In this blog post, we’ll illustrate three technical attack vectors a hacker could take when targeting your Generative AI application before detailing how penetration testing can serve in locking down your systems.

Security Concerns in Generative AI

The release of OpenAI’s GPT (Generative Pre-trained Transformer) models kickstarted a revolution in the AI landscape. For the first time, end-users had access to powerful AI capabilities via an easily understandable chat interface through ChatGPT, which was capable of answering complex questions, generating unique scenarios, and assisting with writing working, useable code in a variety of languages.

Since then, a multitude of different offerings have sprung up, including Google’s Gemini and Anthropic Claude. Open-source models have also emerged, like those found within the Hugging Face repository, which are enabling organizations to run their own LLMs.

At the same time, security organizations are racing to manage the risk of AI implementations, including through:

3 Potential Attack Vectors for Your Implemented LLMs

Because when you do integrate LLMs with your web application, you do run certain risks and it’s important to understand where, as well as how attackers may target this—depending on how they’re implemented—largely vulnerable technology.

The following are three ways bad actors could potentially breach insecure Generative AI systems, with a focus on LLMs.

1. Malicious Prompt Injection

Prompt injection, as defined in OWASP’s Top 10 for LLMs, “manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.” What that means in plain language—prompt injection is when input from the user, such as chat messages sent to a chatbot, tricks the LLM into performing unintended actions.

When you implement an AI model—such as any of OpenAI’s GPTs—typically you include a system prompt that instructs it to act in a certain way, and these system prompts are usually not intended to be disclosed to end-users, as they sometimes can contain sensitive information.

One of the first things an attacker would do in targeting an LLM would be to try and obtain that system prompt data, as it would help them understand how the model is supposed to behave or what additional features you could gain access to through it.

In the following example, you can see how the model initially refuses to reveal the system prompt, but clever use of prompts can convince the model to disclose this information. Also, note the non-deterministic nature of LLMs—simply repeating the same prompt convinces this model to reveal the system prompt, even when initially rejected:

Now, the attacker has details on certain internal access that the model was given—functions that, in this instance, were not intended to be disclosed to end-users. This information could then be used to craft prompts targeting the functionality disclosed in the system prompt.

2. Insecure Output Manipulation

Generative AI systems can also be vulnerable to insecure output handling, which is defined by OWASP as “insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems.”

In traditional web application security practices, both input validation and proper output encoding are essential to prevent attacks, but if your generative AI bot insecurely handles output—i.e., displays data obtained from other parts of the application—a lot of different attacks are possible.

To illustrate this, consider the following example where the model is asked to list all users in a web application and produces the following output:

Using this information, an attacker could update the user’s profile information to include a simple XSS payload like so:

Running the command again, you can see that the application failed to sanitize data from the user’s profile name, which could then be used to launch XSS attacks against any other user of the application that might be using this function.

3. Exploitation of Excessive Agency

Lastly, there’s vulnerability if your AI system has “excessive agency”—meaning that it has more autonomy or decision-making power than is safe. For example, let's say the model has the ability to reach an API endpoint that can make outbound HTTP requests and analyze the responses. If there are no restrictions—such as those preventing access to internal resources—the system becomes vulnerable to manipulation through crafted inputs or prompts.

For those familiar with traditional web application security, excessive agency can lead to something similar to a Server-Side Request Forgery (SSRF) attack, where a hacker tricks a server into making requests to unintended locations—often internal services that are not directly accessible.

As a demonstration of this correlation, we configured a model with access to a vulnerable webhook endpoint that fetches data from a website. In a traditional SSRF vulnerability, the attack might look something like this:

1. HTTP Request: The application has an endpoint that fetches data from a URL provided in a parameter.

2. Manipulated Parameter: The attacker modifies the URL parameter to point to an internal service, such as the AWS metadata service (169.254.169.254).

3. Data Exposure: The server makes a request to the internal service, and sensitive information is returned to the attacker.

When an LLM with excessive permissions is integrated into your application, something similar can occur:

1. Malicious Prompt: The attacker asks the LLM to fetch data from an internal endpoint (e.g., the AWS metadata service).

2. LLM Action: The LLM forwards the request to the vulnerable API endpoint.

3. Internal Request Execution: Due to its over-extended capabilities, the API endpoint performs the HTTP GET request to the internal host without restriction.

4. Data Exposure: Data from the internal host is returned in the LLM's response to the attacker.

Despite not having direct access to the vulnerable API endpoint, the attacker could leverage the LLM's general use /chat endpoint and its excessive agency to disclose sensitive information.

And while we’re specifically comparing two SSRF attacks here, know that the attack vectors enabled by excessive agency in AI are more varied—for example, if the model has access to something that is running a direct SQL query, it may be possible for the attacker to leverage that to perform other attacks like SQL injection.

How a Penetration Test Can Help Secure Your Generative AI Application

These vulnerabilities in your Generative AI systems can be present even from the time of their implementation with your web applications, which is why you must test their security before letting them go live. One way you can do that is through a penetration test.

If you were to engage us to evaluate your Generative AI application, we would simulate the actions a hacker might take to breach them. More technically, here’s how we would gauge your vulnerability to the threats we just discussed:

Prompt Injection – Our experienced professionals would craft specific prompts with the goal of obtaining the following information:
- The model in-use
- The model’s system prompt
- The where and how in terms of data retrieval:
  - Foundational Training Data
  - Retrieval Augmented Generation (RAG) (Alternative data sources, such as user-uploaded documents)
  - Existing data within the integrated application
- What functionality the LLM has access to, such as:
  - Custom API endpoints
  - Custom plugins
- Any other opportunities where we can goad the model into disclosing sensitive data, (like that belonging to other customers).
Insecure Output Handling – To understand how data from the model’s responses are displayed within your application, we would evaluate whether:
- Special characters are sanitized (think traditional XSS via JavaScript or Markdown)
- Sources of user input from other application components are displayed by the model
- Whether hidden characters (such as Unicode Tag) are accepted
Excessive Agency – In testing the custom functionality of your AI application to understand its intended use case, we would attempt to use the model to perform unintended actions by:
- Documenting all custom functionalities, general use cases, and access levels.
- Testing and reviewing each function to identify potential vulnerabilities such as data leaks, authorization issues, or unintended code execution.

In some cases, we may also attempt to get the model to provide instructions on creating malicious content or producing misleading information, but risk acceptance here will vary from organization to organization.

(For example, if you’re simply providing a chat interface to provide general assistance to end-users, you might not care if they’re using the application to produce obscene responses because the attack surface is limited to those users. However, if you’re using a RAG implementation that produces summaries based on a variety of documents, you would be more wary of the threat of a malicious user uploading documents that contain indirect prompt injections to throw the model off and produce misleading information for other users.)

Moving Forward with More Secure Generative AI Applications

Whether you’re wanting to automate services or create a better user experience, it seems that everyone is jumping on the Generative AI train these days. But as with all new technology, there are new security risks that come with the advantages of implementation—including some weaknesses that, if present, make your organization vulnerable from the moment the AI is live.

A penetration test can help you identify those issues before a bad actor does, as our trained professionals are experts in cybersecurity and have the mindset of an attacker. If you’re interested in leveraging our expertise to better secure your AI applications, contact us today.

But if you’re still on the fence regarding the right security solution for you, make sure to read our other content detailing other frameworks that may also be of help:

About Cory Rey

Cory Rey is a Lead Penetration Tester at Schellman where he plays a key role in advancing the firm’s offensive security capabilities, including spearheading the development of its AI Red Team service line. Focused on performing penetration tests for leading cloud service providers, he now extends his expertise to identifying and exploiting vulnerabilities in Generative AI systems—areas often overlooked by traditional security assessments. With a strong foundation in Application Security, Cory has a proven track record of uncovering complex security flaws across diverse environments.

Suite of Services

About Us