LLM-Based Vulnerability Classification in Police Narratives

This repository contains datasets used in our research on applying large language models (LLMs) to identify indicators of vulnerability in police incident narratives. These resources support the replication of findings in our paper: "Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives."

Project Overview

Law enforcement frequently encounters vulnerable individuals, but identifying vulnerability factors in police records remains challenging. Our research explores how LLMs can assist in identifying four key vulnerability indicators in police Field Interrogation and Observation (FIO) narratives:

Mental health issues
Drug abuse
Alcoholism
Homelessness

This project advances police research methodology by:

Evaluating LLM performance in vulnerability classification against human labelers
Comparing different LLM architectures and prompt engineering approaches
Investigating potential demographic biases through counterfactual analysis
Developing a reusable framework for qualitative text analysis

Datasets

This repository includes four key datasets:

boston_narratives_test_classified_4000.csv: 4,000 narratives classified with our LLM pipeline, including all labels and model explanations
counterfactual_narratives_all_coded.csv: Systematically generated counterfactual narratives with varied demographic characteristics
examples_for_counterfactuals.csv: 100 base narratives used for counterfactual generation
labelled_fio_data_for_analysis.csv: 500 pre-processed examples with human and GPT-4o labels

Code Repository

The complete codebase for replicating our research is available in our GitHub repository: llm-deductive-coding (particularly in the boston_fio_paper directory).

The repository includes:

Data preprocessing scripts
Classification pipeline implementation
Counterfactual generation code
Analysis notebooks
Visualization tools

Citation

If you use these resources in your research, please cite our paper:

@article{author2023llm,
  title={Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives},
  author={Relins, S. and Birks, D and Lloyd, C},
  journal={Arxiv Preprint},
  year={2023},
  note={Currently under review for the Journal of Quantitative Criminology}
}

License

These datasets are released under the MIT License. The original Boston FIO data is released under the Open Data Commons Public Domain Dedication and License (PDDL).

Contact

For questions about this research or datasets, please contact the authors or open an issue in our GitHub repository.