LLMGuard: Guarding Against Unsafe LLM Behavior

Shubh Goyal; Medha Hira; Shubham Mishra; Sukriti Goyal; Arnav Goel; Niharika Dadu; Kirushikesh D B; Sameep Mehta; Nishtha Madaan

AAAI 2024

Demo paper

20 Feb 2024

LLMGuard: Guarding Against Unsafe LLM Behavior

Abstract

Although the rise of Large Language Models (LLMs) in en- terprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inap- propriate, biased, or misleading content that violates regu- lations and can have legal concerns 1. To alleviate this, we present “LLMGuard”, a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLM- Guard employs an ensemble of detectors.

Workshop paper