Context is Key: Aligning Large Language Models with Human Moral Judgments through Retrieval-Augmented Generation
This research investigates whether pre-trained large language models (LLMs) can align with human moral judgments on a dataset of approximately fifty thousand interpersonal conflicts from the AITA (Am I the A******) subreddit. We introduce a retrieval-augmented generation (RAG) approach that uses pre-trained LLMs as core components. Using OpenAI's GPT-4o, our agent outperforms directly prompting the LLM while achieving 83% accuracy and a Matthews correlation coefficient of 0.469 while also reducing the rate of toxic responses from 22.53% to virtually zero.
