vpfrc_llm_vulnerability_classifier
VPFRC LLM Vulnerability Classifier Data
LLM-Based Vulnerability Classification in Police Narratives
This repository contains datasets used in our research on applying large language models (LLMs) to identify indicators of vulnerability in police incident narratives. These resources support the replication of findings in our paper: "Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives."
Project Overview
Law enforcement frequently encounters vulnerable individuals, but identifying vulnerability factors in police records remains challenging. Our research explores how LLMs can assist in identifying four key vulnerability indicators in police Field Interrogation and Observation (FIO) narratives:
- Mental health issues
- Drug abuse
- Alcoholism
- Homelessness
This project advances police research methodology by:
- Evaluating LLM performance in vulnerability classification against human labelers
- Comparing different LLM architectures and prompt engineering approaches
- Investigating potential demographic biases through counterfactual analysis
- Developing a reusable framework for qualitative text analysis
Datasets
This repository includes four key datasets:
- boston_narratives_test_classified_4000.csv: 4,000 narratives classified with our LLM pipeline, including all labels and model explanations
- counterfactual_narratives_all_coded.csv: Systematically generated counterfactual narratives with varied demographic characteristics
- examples_for_counterfactuals.csv: 100 base narratives used for counterfactual generation
- labelled_fio_data_for_analysis.csv: 500 pre-processed examples with human and GPT-4o labels
Code Repository
The complete codebase for replicating our research is available in our GitHub repository: llm-deductive-coding (particularly in the boston_fio_paper directory).
The repository includes:
- Data preprocessing scripts
- Classification pipeline implementation
- Counterfactual generation code
- Analysis notebooks
- Visualization tools
Citation
If you use these resources in your research, please cite our paper:
@article{author2023llm,
title={Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives},
author={Relins, S. and Birks, D and Lloyd, C},
journal={Arxiv Preprint},
year={2023},
note={Currently under review for the Journal of Quantitative Criminology}
}
License
These datasets are released under the MIT License. The original Boston FIO data is released under the Open Data Commons Public Domain Dedication and License (PDDL).
Contact
For questions about this research or datasets, please contact the authors or open an issue in our GitHub repository.