There is increasing interest in understanding similarities and differences between convolutional neural networks (CNNs) and the visual cortex. A common approach is to use features extracted from intermediate CNN layers to fit brain encoding models. Each brain region is then typically associated with the best predicting layer. However, this winner-take-all mapping is non-robust, because consecutive CNN layers are strongly correlated and have similar prediction accuracies. Moreover, the winner-take-all approach ignores potential complementarities between layers to predict brain activity. To address this issue, we propose to fit a joint model on all layers simultaneously. The model is fit with banded ridge regression, grouping features by layer, and learning a separate regularization hyperparameter per feature space. By performing a selection over layers, this model effectively removes non-predictive or redundant layers and disentangles the contributions of each layer on each voxel. This model leads to increased prediction accuracy and to finer mappings of layer selectivity.