Interpretability Techniques for Deep Learning on CausalGym

Metric: Log odds-ratio (pythia-6.9b) (higher is better)

LeaderboardDataset
Loading chart...