Deep learning-based data-driven models of geophysical turbulence, e.g., data-driven weather forecasting models, have received substantial attention recently. These models, trained on observational data, are competitive with numerical weather prediction (NWP) models in terms of short-term performance and are devoid of numerical biases. They can be used for probabilistic forecasts with a large number of ensemble members, as well as efficient data assimilation at a computational cost which is several orders of magnitude smaller than that of NWP models. However, these data-driven models do not remain stable when integrated for a long time period (decadal time scales). This hinders their usefulness to simulate long-term climate statistics with synthetically generated data that could be used for studying the physical mechanisms of extreme events. A physical cause of this instability in data-driven models of weather, and generally turbulence, is yet-so-far unknown and several ad-hoc strategies are often adopted for improving their stability. In this work, we propose a causal mechanism for this instability through the lenses of physical and deep learning theory and propose an architecture-agnostic mitigation strategy to obtain long-term stable models of weather, climate, and generally geophysical turbulence.