Carnegie Mellon University
andresar_phd_physics_2023.pdf (6.02 MB)

Dark Energy Science from 100 Million Galaxies: AI-Driven Analysis and Data-Intensive Techniques for Cosmological Discovery

Download (6.02 MB)
posted on 2024-01-12, 21:45 authored by Andresa Rodrigues de Campos

 Observational cosmology is a rapidly evolving field. Thanks to technological advancements, the advent of big data, machine learning, and international collaborations, there have been significant advances in cosmology in recent years, which have greatly enhanced our understanding of the universe. Observational cosmology aims to thoroughly test theoretical predictions about the expansion history of the universe and the evolution of cosmic structure over time. This is achieved through cosmological surveys associated with a variety of observables. Measurements derived from sources such as the cosmic microwave background (CMB), exemplified by the Planck satellite’s detailed mapping of the CMB’s temperature fluctuations, and the distance-redshift relationship using Type Ia supernovae, as observed in projects like the Supernova Legacy Survey, provide essential data. Baryonic acoustic oscillations (BAO) observed in the clustering of galaxies, such as those charted by the Sloan Digital Sky Survey (SDSS), along with the observed growth of cosmic structure through galaxy clustering and gravitational lensing phenomena, as investigated by surveys like the Dark Energy Survey (DES), the Kilo-Degree Survey (KiDS) and the Hyper Suprime-Cam (HSC), all contribute to a coherent picture. 

The collective evidence from these surveys indicates that deviations from the predictions of the ΛCDM (Lambda Cold Dark Matter) standard cosmological model are minor, typically within a few percent. However, the next phase in this research program is to achieve even greater precision and accuracy in our measurements to robustly challenge the ΛCDM model with empirical data. Current and upcoming experiments, such as the Euclid mission, the Nancy Grace Roman Space Telescope and the Vera C. Rubin Observatory’s Legacy Survey of Space and Time (LSST), have been meticulously designed to reduce statistical uncertainties in cosmological measurements, aiming to surpass the current state of the art. Nonetheless, assuming that these surveys successfully gather data, it is anticipated that the primary challenges in our quest for deeper cosmological insights will arise from systematic uncertainties. Thus, the future challenges we face are not solely about improving statistical precision but also involve identifying and mitigating sources of systematics that could influence the accuracy and integrity of our cosmological findings. This thesis explores several crucial facets pertaining to systematic uncertainties in cosmological inquiries. Moreover, considering that the concordance of predictions across different surveys is essential for validating the cosmological model, this thesis also encompasses a critical examination of inter-survey consistency. 

The first major emphasis of this study is the mitigation of systematics associated with photometric redshift estimation. An accurate characterization of the redshift distribution, 𝑛(𝑧), for the observed sample is crucial for cosmological analyses, particularly in the context of weak lensing shear studies. To this end, I have improved the Self-Organizing Map (SOM) method for photometric redshift estimation, which I refer to as SOMPZ. This approach, which leverages unsupervised machine learning, was initially implemented for the DES Year 3 (DES Y3). I have further enhanced it for the upcoming DES Y6 data set. The analyses in this thesis show substantial improvements by substituting the Y3 SOM algorithm with an optimized version that better addresses the intricacies of redshift estimation. Moreover, the integration of g-band flux data has markedly enhanced redshift precision, achieving a reduction in the overlap between redshift bins by as much as 66%. These advancements are key in refining weak lensing redshift characterization, setting a higher standard not just for DES Y6, but also for future stage IV surveys like the Rubin Observatory. 

The second pivotal subject of this thesis is an empirical approach to model selection, with a focus on explicitly balancing parameter bias against model complexity. This approach utilizes synthetic data to calibrate the relationship between bias and the 𝜒 2 difference between models. It enables the interpretation of 𝜒 2 values obtained from real data, even when catalogs are blinded, facilitating informed decisions regarding model selection. This method is applied to tackle the challenge of intrinsic alignments, a significant systematic uncertainty in weak lensing studies that substantially contributes to the error budget in modern lensing surveys. Specifically, I compare two commonly used models, nonlinear alignment (NLA) and tidal alignment & tidal torque (TATT), against bias in the Ωm − 𝑆8 plane, with a particular focus on the DES Y3. In this case, there is a roughly a 30% chance that were NLA to be the fiducial model, the results would be biased (in the Ωm − 𝑆8 plane) by more than 0.3𝜎 . 

Lastly, the third focus of this thesis involves the application of several tension estimators to assess the DES large-scale structure measurement and Planck cosmic microwave background data. These tension metrics are evaluated for their responsiveness to artificially introduced tension between the two data sets using synthetic DES data. Given the importance of tensions, which represent discrepancies in cosmological parameter measurements across different experiments, identifying them is critical. Statistical significant tensions may hint at novel physics beyond the standard cosmological model, or unaccounted systematics. These tension metrics are then applied to compare Planck and actual DES Y1 data. The parameter differences, Eigentension, and Suspiciousness metrics yield consistent results on both simulated and real data, while the Bayes ratio stands out due to its dependence on the prior volume. Using these metrics, we calculate the tension between DES Y1 3 × 2pt and Planck revealing that the surveys are in approximately 2.3𝜎 tension under the ΛCDM paradigm. This suite of metrics provided a robust tool set for testing tensions in the DES Y3 data, where we found approximately 0.7𝜎 tension to Planck 2018 under the ΛCDM paradigm. 

In summary, the projects that compose this thesis are dedicated to the development and enhancement of statistical and machine learning methodologies for the analysis of extensive data sets in large-scale structure surveys.  




Degree Type

  • Dissertation


  • Physics

Degree Name

  • Doctor of Philosophy (PhD)


Scott Dodelson

Usage metrics



    Ref. manager