Abstract
The amount and complexity of data delivered by modern galaxy surveys has been steadily increasing over the past years. New facilities will soon provide imaging and spectra of hundreds of millions of galaxies. Extracting coherent scientific information from these large and multi-modal data sets remains an open issue for the community and data driven approaches such as deep learning have rapidly emerged as a potentially powerful solution to some long lasting challenges. This enthusiasm is reflected in an unprecedented exponential growth of publications using neural networks, which have gone from a handful of works in 2015 to an average of one paper per week in 2021 in the area of galaxy surveys. Half a decade after the first published work in astronomy mentioning deep learning, and shortly before new big-data sets such as Euclid and LSST start becoming available, we believe it is timely to review what has been the real impact of this new technology in the field and its potential to solve key challenges raised by the size and complexity of the new datasets. The purpose of this review is thus two-fold. We first aim at summarizing, in a common document, the main applications of deep learning for galaxy surveys that have emerged so far. We then extract the major achievements and lessons learned and highlight key open questions and limitations, which in our opinion, will require particular attention in the coming years. Overall, state-of-the art deep learning methods are rapidly adopted by the astronomical community, reflecting a democratization of these methods. This review shows that the majority of works using deep learning up to date are oriented to computer vision tasks (e.g. classification, segmentation). This is also the domain of application where deep learning has brought the most important breakthroughs so far. However, we also report that the applications are becoming more diverse and deep learning is used for estimating galaxy properties, identifying outliers or constraining the cosmological model. Most of these works remain at the exploratory level though which could partially explain the limited impact in terms of citations. Some common challenges will most likely need to be addressed before moving to the next phase of massive deployment of deep learning in the processing of future surveys; e.g. uncertainty quantification, interpretability, data labeling and domain shift issues from training with simulations, which constitutes a common practice in astronomy.
Structure of the review
We structure this review around four major categories:- Deep learning for general computer vision tasks. These are applications that we consider closest to standard computer vision applications for natural images for which deep learning has been shown to generally outperform other traditional approaches. It typically includes classification and segmentation tasks.
- Deep learning to derive physical properties of galaxies. These are applications in which deep learning is used to estimate galaxy properties such as photometric redshifts or stellar populations properties. Neural Networks are typically used to replace existing algorithms with a faster and more efficient solution, hence more suited for large data volumes. In addition, we also review applications in which deep learning is employed to derive properties of galaxies which are not directly accessible with known observables, i.e. to find new relations between observable quantities and physical properties of galaxies from simulations.
- Deep learning for assisted discovery. Neural networks are used here for data exploration and visualization of complex datasets in lower dimension. We include also in this category, efforts to automatically identify potentially interesting new objects, i.e. anomalies or outliers.
- Deep learning for cosmology. Cosmological simulations including baryonic physics are computationally expensive. Deep learning can be used as a fast emulator of the galaxy-halo connection by populating dark matter halos. In addition, a second major application is cosmological inference. Cosmological models are traditionally constrained using summary statistics (e.g. 2 point statistics). Deep learning has been used to bypass these summary statistics and constrain models using all available data.
The Deep Learning Boom
The first published work mentioning deep learning in astronomy is from 2015 in which CNNs were applied for the classification of galaxy morphology. Since then, the number of works using deep learning in astrophysics has been growing exponentially, being the fastest growth of other topics in the field.
Deep Learning Methods Applied in Astrophysics
We provide a survey of the broad type of neural network architectures used in the four categories of scientific applications we defined in this review, highlighting that applications in astronomy cover a wide range of deep learning techniques.
Measuring The Impact of Deep Learning
Based on the 400+ papers surveyed in this review, we attempt to quantify the impact of deep learning in the astronomical literature.
We find for instance that deep learning papers has been accounting for ~5% of galaxy literature in recent years, they also receive ~1.5x less citations per publications on average.
The Main Challenges
We explore what we think are some of the major challenges that deep learning works face and which need to be addressed in the coming years by the community based on the works reported in this review. Some of these challenges are not specific to the astronomical community and can benefit from solutions arising from the field of Machine Learning. However, in some cases, the requirements are more strict in astronomy.
Paper
The Dawes Review 10: The Impact of Deep Learning for the
Analysis of Galaxy Surveys
Marc Huertas-Company*, Francois Lanusse
Publications of the Astronomical Society of Australia, accepted for publication
Citation:
@ARTICLE{MLReview2022, title = {The Dawes Review 10: The Impact of Deep Learning for the Analysis of Galaxy Surveys}, author = {Huertas-Company, Marc and Lanusse, Francois}, journal = {\pasa}, year = {2022} }