Jackson Eshbaugh
Neural networks excel as function approximators, but their complexity often obscures the nature of the functions they learn. In this work, we propose the linearity score $\lambda(f)$, a simple and interpretable diagnostic that quantifies how well a regression network's output can be mimicked by a linear model. Defined as the $R^2$ between the network's predictions and those of a trained linear surrogate, $\lambda(f)$ offers insight into the linear decodability of the learned function. We evaluate this framework on both synthetic ($y = x \sin(x) + \epsilon$) and real-world datasets (Medical Insurance, Concrete, California Housing), using dataset-specific networks and surrogates. Our findings show that while high $\lambda(f)$ scores indicate strong linear alignment, they do not necessarily imply predictive accuracy with respect to the ground truth. This underscores both the promise and the limitations of using linear surrogates to understand nonlinear model behavior, particularly in high-stakes regression tasks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| regression | Concrete Compressive Strength | R2 Score | 0.8588 | Neural Network |
| regression | Concrete Compressive Strength | lambda | 0.6659 | Neural Network |
| regression | Concrete Compressive Strength | R2 Score | 0.5944 | Baseline Regression |
| regression | Concrete Compressive Strength | R2 Score | 0.5821 | Mimic / Surrogate |
| regression | Medical Cost Personal Dataset | R2 Score | 0.8673 | Neural Network |
| regression | Medical Cost Personal Dataset | lambda | 0.9186 | Neural Network |
| regression | Medical Cost Personal Dataset | R2 Score | 0.7836 | Baseline Regression |
| regression | Medical Cost Personal Dataset | R2 Score | 0.7835 | Mimic / Surrogate |
| regression | Synthetic: y = x * sin x | R2 Score | 0.9755 | Neural Network |
| regression | Synthetic: y = x * sin x | lambda | -0.0105 | Neural Network |
| regression | Synthetic: y = x * sin x | R2 Score | -0.008 | Baseline Regression |
| regression | Synthetic: y = x * sin x | R2 Score | -0.0137 | Mimic / Surrogate |
| regression | California Housing Prices | R2 Score | 0.7908 | Neural Network |
| regression | California Housing Prices | lambda | 0.6968 | Neural Network |
| regression | California Housing Prices | R2 Score | 0.5758 | Baseline Regression |
| regression | California Housing Prices | R2 Score | 0.5658 | Mimic / Surrogate |