Modelling natural images with sparse coding (SC) has faced two main challenges: flexibly representing varying pixel intensities and realistically representing low-level components, e.g. edges. This paper proposes a novel multiple-cause generative model of low-level image statistics that generalizes the standard SC model in two crucial points: (1) it uses a spike-and-slab prior distribution for a more realistic representation of component absence/intensity, and (2) the model uses the highly nonlinear combination rule of maximal causes analysis (MCA). The major challenge is parameter optimization because a model with either (1) or (2) results in a strongly multimodal posterior. We show for the first time that a model combining both improvements can be trained efficiently while retaining the rich structure of the posterior. We design an exact piecewise Gibbs sampling method and combine this with a variational method based on preselection of latent dimensions. This combined training scheme tackles both analytical and computational intractability and enables application of the model to a large number of observed and hidden dimensions. Applying the model to image patches we study the optimal encoding of images by simple cells in V1 and compare the model's predictions with in vivo neural recordings. In contrast to standard SC, we find that the optimal prior favors asymmetric, bimodal, and sparse activity of simple cells. Testing our model for consistency we find that the average posterior is approximately equal to the prior. Furthermore, due to the nonlinearity, the model predicts a large number of globular receptive fields (RFs), another significant difference from standard SC. The inferred prior and the high proportion of predicted globular fields make the model more consistent with neural data than previous SC models, suggesting closer tuning of simple cells to visual stimuli than has been predicted until now.