Prompt Templates: A Methodology for Improving Manual Red Teaming Performance
Abstract
Large language models (LLMs) may output content that is undesired or outright harmful. On method for auditing this unwanted model output is a process called manual red teaming in which a person creates prompts to probe the LLMs behavior. However, successful red teaming requires experience and expertise. To better support novices in red teaming, we developed \textit{prompt templates} to facilitate novices to red teaming more effectively. We evaluated the prompt templates in a user study of 29 participants who were tasked with red teaming an LLM to identify biased output based on societal stigmas. We found that using prompt templates led to increased success and performance in this task, with multiple effective strategies being used while doing so.