When judging the sameness of three-dimensional (3D) objects that differ by a rotation, response time typically increases with the angle of rotation. This increase is usually taken as evidence for mental rotation, but the extent to which low-level perceptual mechanisms contribute to this phenomenon is unclear. To investigate this, we built a neural model that breaks down this computation into two stages: a fast feedforward stage that extracts low-dimensional latent representations of the objects being compared, and a slow recurrent processing stage that compares those representations to arrive at a decision by accumulating evidence at a rate that is proportional to the proximity of the representations. We found that representation of 3D objects learned by a generic autoencoder was sufficient to emulate human response times using this model. We conclude that perceptual representations may play a key role in limiting the speed of spatial reasoning. We discuss our findings in the context of the mental rotation hypothesis and identify additional, as yet unverified representational constraints that must be satisfied by neural systems that perform mental rotation.