How do you analyze a large language model (LLM) for harmful biases? The 2022 release of ChatGPT launched LLMs onto the public stage. Applications that use LLMs are suddenly everywhere, from customer service chatbots to LLM-powered healthcare agents. Despite this widespread use, concerns persist about bias and toxicity in LLMs, especially with respect to protected characteristics such as race and gender. In this blog post, we discuss our recent research that uses a role-playing scenario to audit ChatGPT, an approach that opens new possibilities for revealing unwanted biases. At the SEI, we’re working to understand and measure the trustworthiness of artificial intelligence (AI) systems. When harmful bias is present in LLMs, it can decrease the trustworthiness of the technology and limit the use cases for which the technology is appropriate, making adoption more difficult. The more we understand how to audit LLMs, the better equipped we are to identify and address learned biases.
Publisher Statement
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The view, opinions, and/or findings contained in this material are those of the author(s) and should not be construed as an official Government position, policy, or decision, unless designated by other documentation. References herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute.
This report was prepared for the SEI Administrative Agent AFLCMC/AZS 5 Eglin Street Hanscom AFB, MA 01731-2100.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution.Date
2024-07-22Copyright Statement
Copyright 2024 Carnegie Mellon University.