Cybersecurity insights from industry experts.
Establishing Reward Criteria for Reporting Bugs in AI Products
Bug hunter programs can help organizations foster third-party discovery and reporting of issues and vulnerabilities specific to AI systems.
At Google, we maintain a Vulnerability Reward Program to honor cutting-edge external contributions addressing issues in Google-owned and Alphabet-subsidiary Web properties. To keep up with rapid advances in AI technologies and ensure we're prepared to address the security challenges in a responsible way, we recently expanded our existing Bug Hunters program to foster third-party discovery and reporting of issues and vulnerabilities specific to our AI systems. This expansion is part of our effort to implement the voluntary AI commitments that we made at the White House in July.
To help the security community better understand these developments, we've included more information on reward program elements.
What's in Scope for Rewards
In our recent AI red team report, which is based on Google's AI Red Team exercises, we identified common tactics, techniques, and procedures (TTPs) that we consider most relevant and realistic for real-world adversaries to use against AI systems. The following table incorporates what we learned to help the research community understand our criteria for AI bug reports and what's in scope for our reward program. It’s important to note that reward amounts are dependent on severity of the attack scenario and the type of target affected (visit the program rules page for more information on our reward table).
Category | Attack scenario | Guidance |
---|---|---|
Prompt Attacks: Crafting adversarial prompts that allow an adversary to influence the behavior of the model and, hence, the output, in ways that were not intended by the application. | Prompt injections that are invisible to victims and change the state of the victim's account or any of their assets. | In scope |
Prompt injections into any tools in which the response is used to make decisions that directly affect victim users. | In scope | |
Prompt or preamble extraction in which a user is able to extract the initial prompt used to prime the model only when sensitive information is present in the extracted preamble. | In scope | |
Using a product to generate violative, misleading, or factually incorrect content in your own session: e.g, "jailbreaks." This includes "hallucinations" and factually inaccurate responses. Google's generative AI products already have a dedicated reporting channel for these types of content issues. | Out of scope | |
Training Data Extraction: Attacks that are able to successfully reconstruct verbatim training examples that contain sensitive information. Also called membership inference. | Training data extraction that reconstructs items used in the training data set that leak sensitive, non-public information. | In scope |
Extraction that reconstructs non-sensitive/public information. | Out of scope | |
Manipulating Models: An attacker able to covertly change the behavior of a model such that they can trigger pre-defined adversarial behaviors. | Adversarial output or behavior that an attacker can reliably trigger via specific input in a model owned and operated by Google ("backdoors"). Only in scope when a model's output is used to change the state of a victim's account or data. | In scope |
Attacks in which an attacker manipulates the training data of the model to influence the model's output in a victim's session according to the attacker's preference. Only in scope when a model's output is used to change the state of a victim's account or data. | In scope | |
Adversarial Perturbation: Inputs that are provided to a model that results in a deterministic, but highly unexpected output from the model. | Contexts in which an adversary can reliably trigger a misclassification in a security control that can be abused for malicious use or adversarial gain. | In scope |
Contexts in which a model's incorrect output or classification does not pose a compelling attack scenario or feasible path to Google or user harm. | Out of scope | |
Model Theft/Exfiltration: AI models often include sensitive intellectual property, so we place a high priority on protecting these assets. Exfiltration attacks allow attackers to steal details about a model such as its architecture or weights. | Attacks in which the exact architecture or weights of a confidential/proprietary model are extracted. | In scope |
Attacks in which the architecture and weights are not extracted precisely, or when they're extracted from a non-confidential model. | Out of scope | |
If you find a flaw in an AI-powered tool other than what is listed above, you can still submit, provided that it meets the qualifications listed on our program page. | A bug or behavior that clearly meets our qualifications for a valid security or abuse issue. | In scope |
Using an AI product to do something potentially harmful that is already possible with other tools. For example, finding a vulnerability in open source software (already possible using publicly available static analysis tools) and producing the answer to a harmful question when the answer is already available online. | Out of scope | |
As consistent with our program, issues that we already know about are not eligible for reward. | Out of scope |
We believe that expanding our bug bounty program to our AI systems will support responsible AI innovation, and look forward to continuing our work with the research community to discover and fix security and abuse issues in our AI-powered features. If you find a qualifying issue, please go to our Bug Hunters website to send us your bug report and — if the issue is found to be valid — be rewarded for helping us keep our users safe.
Read more about:
Partner PerspectivesAbout the Authors
You May Also Like