InLegalLLaMA: Indian Legal Knowledge Enhanced Large Language Models

Sudipto Ghosh; Devanshu Verma; Balaji Ganesan; Purnima Bindal; Vikas Kumar; Vasudha Bhatnagar

IJCAI 2024

Workshop paper

03 Aug 2024

InLegalLLaMA: Indian Legal Knowledge Enhanced Large Language Models

Abstract

Large Language Models (LLM) are being increasingly used in many domains including legal and justice. General purpose models trained on web data are not performant enough on legal text analytics (LTA) tasks while fine tuning task specific models is expensive because of the annotation and compute costs. Pre-training domain or application specific models is increasingly popular. However pre-training LLMs in small domain corpora like Indian legal documents is stymied by repetitive and less relevant information in court judgements and records. We introduce InLegalLlama models and show that pre-training LLMs on knowledge graph triples significantly reduces the training effort while retaining comparable performance on LTA tasks.

Conference paper