<p>We introduce a new set of algorithmic tools capable of producing scalable, low-rank decompositions of global spatio-temporal atmospheric chemistry data. By exploiting emerging <i>randomized linear algebra</i> algorithms, a suite of decompositions are proposed that extract the dominant features from <i>big data sets</i> (i.e. global atmospheric chemistry at longitude, latitude and elevation) with improved interpretability. Importantly, our proposed algorithms scale with the intrinsic rank of the global chemistry space rather than the ever increasing spatio-temporal measurement space, thus allowing for efficient representation and compression of the data. In addition to scalability, two additional innovations are proposed for improved interpretability: (i) a non-negative decomposition of the data for improved interpretability by constraining the chemical space to have only positive expression values (unlike PCA analysis), and (ii) sparse matrix decompositions, which thresholds low-correlations to zero, thus highlighting the dominant, localized spatial activity (again unlike PCA analysis). Our methods are demonstrated on a full year of global chemistry dynamics data, showing its significant improvement in computational speed and interpretability. We show that the here presented decomposition methods successfully extract known major features of atmospheric chemistry, such as summertime surface pollution and biomass burning activities. Indeed, we find that the full annual model output can be reconstructed using only 50–100 principal modes, suggesting that the presented methods offer the potential to archive model data of atmospheric chemistry with compression factors in the range of 200–4000 or greater. In the emerging area of <i>big data</i>, specifically global chemistry monitoring, such technologies are critically enabling for real-time and computationally tractable diagnostics of both large scale simulation and measurement data.</p>