SCORF: Selective Cost-Aware Oblique Random Forests for Unreliable Data
Abstract
Decision forests are widely used for tabular data due to their efficiency and strong performance. However, they typically optimize accuracy under i.i.d. assumptions, ignoring decision costs, abstention options, and reliability issues such as missing data, distribution shift, or label noise. Oblique variants (which use linear feature combinations) improve representational power, but often lack decision-aware objectives or reliability guarantees. We introduce SCORF, a Selective Cost-Aware Oblique Random Forest framework for unreliable data. SCORF consists of four key components: (i) a spectral sensitivity transform learned from model probability gradients to highlight cost-relevant feature directions; (ii) training a standard tree ensemble on the transformed features; (iii) a calibrated selective prediction mechanism with a defer option that controls a target error rate; and (iv) targeted training perturbations along sensitivity directions to improve worst-case robustness. On three public credit-risk datasets and four challenging test conditions (clean i.i.d., covariate shift, missing data, and label noise), SCORF consistently reduces mean policy cost compared to other ensembles, while respecting the desired error rate via selective abstention. Ablation studies confirm that each component (spectral transform, selective calibration, and robustness augmentation) provides complementary benefits.