Papers With Code 2 | ML Benchmarks, SotA Results & Code

Replication Material

This document contains the necessary materials and instructions to replicate the findings presented in our paper. We provide comprehensive information on the data sources, code, and analytical procedures used in our study. The replication package includes raw data files, data cleaning scripts, and analysis code. We encourage users to contact us with any questions or issues encountered during the replication process.

Data Sources

We conducted two different types of interviews: human-human and AI-human. The raw responses from our participants and interviewers can be found in the following folders:

AI-Human Interviews: All responses from the AI as interviewer
- File: ai_interviewing-responses.csv
Human-Human Interviews: All transcribed responses from human interviewers
- Files: interview-transcripted_i{1..5}.csv (5 files, one for each interviewer)

Application

We used Langchain and Chainlit for the development stack. The version used in the experiment can be found in the app-v1 directory. For deployment, we used Fly.io. Conversation data was stored using Literal AI.

Setup

Install requirements from requirements.txt (in a virtual environment):

pip install -r requirements.txt

Version v1 uses ChatGPT, so you need to create a .env file with your OpenAI key:

OPENAI_API_KEY=<KEY>

Run Chainlit app:

chainlit run app.py

Evaluation Sources

We employed various evaluation methods including qualitative surveys, annotations, and quantitative analysis of the conducted interviews:

Post-interview Surveys:
- Purpose: Addresses aspects such as clarity
- Contents: Survey results and the codebook used
- Location: post_interview_surveys folder
Quality Coding on Interview Responses:
- Purpose: Annotation of interview quality along dimensions described in the paper (e.g., engagement)
- Contents: Merged annotations from two annotators
- Note: Raw data from individual annotators available upon request (kept private for anonymization)
- Location: quality_coding folder
Observer Comments:
- Purpose: Documentation of issues during interviews
- Contents: Observer comments and the form used
- Location: observer_comments folder
Quantitative Text Analysis:
- Purpose: Analysis of responses from AI and human interviews
- Contents: Results of quantitative analysis
- Location: quantitative_analysis folder

All results from these sources and scripts can be found in Table X in the paper.