Analyzing multi-band deep imaging with Machine Learning
Reported by Nicola R. Napolitano (School of Physics and Astronomy, SYSU)
Abstract: A team led by Prof. Nicola R. Napolitano at School of Physics and Astronomy, SYSU, has developed a series of Machine Learning tools, which are successfully applied to multi-band deep imaging data from the Kilo Degree Survey (KiDS) for studies of strong gravitational lensing and galaxy structure. These tools are expected to be applied to the much deeper and wider survey to be carried out by the China Space Station Telescope in near future.
1. Machine Learning tools for Strong Lensing
Strong gravitational lensing (SGL) is a powerful tool to infer the distribution of dark matter (DM) in massive galaxies (the so-called deflectors or lenses) and study the properties of high-redshift galaxies behind them (the sources). According to General Relativity, the light from the sources is magnified and split in multiple images or stretched in arcs (depending if they are compact or extended systems) by the warped space-time generated by the mass of the deflectors, and, if not "lensed", would be hardly observable.
With the Chinese Space Station Telescope (CSST, Zhan 2018) we will collect billions of galaxies, where we expect to find up to hundreds of thousands of SGL events (see e.g. Collett 2015). Searching for strong lenses in this huge amount of galaxies is beyond human possibilities and needs specialised tools. The combination of clear morphological patterns and different colours between the lens and the source images, are distinctive features to identify these events among even millions of possible lens suspects. This “feature recognition” is a classical problem for Machine Learning techniques, in particular Convolutional Neural Networks (CNNs), which can typically handle it with very small computational times.
At the CSST Centre for the Greater Bay Area, we have started a project to develop CNN classifier to apply to upcoming CSST data. As a pathfinder dataset for CSST we have used the multi-band imaging from the Kilo Degree Survey (KiDS, Kuijken et al. 2019). KiDS is an optical (ugri) imaging survey carried out with the VST telescope on Cerro Paranal Chile, complemented by NIR data (ZYJHK) from the VISTA Kilo Degree Infrared Galaxy Survey (VIKING, Edge et al. 2013) over an area of 1350 deg2 of the sky. The wide wavelength baseline and the high image quality in optical (spatial resolution of 0.21 arcsec/pixel, and a median r-band seeing of ~0.65 arcsec) of KiDS+VIKING, make this dataset unique for the strong lensing search and dark matter studies.
Using KiDS data we have developed a series of CNN classifiers using single r-band (see e.g. Li et al. 2020) and multi-band gri combined images and collected, together with previous analyses, a sample of ~500 new strong lensing candidates (Li et al. 2021a) — see Fig. 1.

2. The Discovery of the Blue Einstein Nuggets
Among these candidates we have identified a very peculiar class of systems never characterised before: a series of quads from blue compact sources, in some cases forming almost perfect Einstein crosses. Following-up two of these unique Einstein crosses (ECs hereafter) with integral field spectroscopy, we have discovered that these systems belong to a puzzling category of post-blue nugget galaxies (Napolitano et al. 2020) — see Fig. 2. These so-called “blue Einstein nuggets” have opened a new window on the usage of SGL to study the dark matter distribution of the lens and the nature of these high-redshift ultra-compact objects in great details. We have estimated (Napolitano et al. 2020) that we will be able to observe 5-7000 of such objects in the CSST footprint.
This extensive dataset will allow us 1) to study the dark matter distribution of the Einstein Nuggets deflectors and 2) to characterise the blue Einstein Nuggets sources: in particular we will use the lensing model to derive the size of the sources and the spectroscopy/multi band photometry to derive their stellar population properties to built a size-mass relation of the pBN systems and compare it with the one of "red nuggets" and put constraints on the early phases of the galaxy formation scenario.

3. Galaxy surface photometry and size evolution
We aim at developing a series of Machine Learning (ML) tools for the quantitative photometry of sources in CSST images: from extended galaxies at medium and high-z to compact sources in the local Universe. To do this we take advantage of the experience accumulated in the Kilo Degree Survey (KiDS) in the derivation of structural parameters of galaxies (e.g. total magnitude, effective radius, Sersic-index, axis ration and position angle) using both standard analysis methods (see e.g. Roy et al. 2018) and Convolutional Neural Networks (Li et al. 2021b).
In particular, we plan to develop ML techniques for the measurements of the structural parameters of billions of galaxies in the CSST footprint with signal-to-noise ratio large enough to allow both single-Sersic prediction and bulge+disc decomposition, to perform galaxy evolution studies, like the growth of galaxy sizes across cosmic time and the evolution of the bulge/disk ratio and sizes with redshift.
In 2021 we have delivered the first two science-ready Galaxy Light profile convolutional neural Networks (GaLNets), specialised for ground based observations. We have started from ground-based observations in order to test the suitability of these tools on large datasets, which are currently available only from ground. A first GaLNet has been trained using galaxy images only (GaLNet-1), and a second one, has been trained with both galaxy images and the local PSF (GaLNet-2). We have compared the results from the two CNNs with structural parameters (namely the total magnitude, the effective radius, Sersic index etc.) derived on set of galaxies from the Kilo-Degree Survey (KiDS) by 2DPHOT, as a representative of "standard" PSF-convolved Sersic fitting tools. The comparison shows that GaLNet-2 can reach an accuracy as high as 2DPHOT, while GaLNet-1 performs slightly worse because it misses the information on the local PSF (see Fig. 3) .
In terms of computational speed, both GaLNets are more than three orders of magnitude faster than standard methods (e.g. 2DPHOT and Galfit). This first application of CNN to ground-based galaxy surface photometry shows that CNNs are promising tools to perform parametric analyses of very large samples of galaxy light profiles, as expected from the Chinese Space Station Telescope.

References
Collett 2015, ApJ, 811, 20
Edge, A., et al. 2013, The Messenger, 154, 32
Kuijken, K., et al. 2019, A&A, 625,A2
Li, R., Napolitano, N.R., et al. 2020, ApJ, 899, 30
Li, R. Napolitano, N.R., et al. 2021a, ApJ, 923, 16
Li, R. Napolitano, N.R., et al. et al. 2021b, ApJ, submitted, preprint: arXiv:2111.05434
Napolitano, N.R., Li, R., et al. 2020, ApJ, 904, L31
Roy, N., Napolitano, N.R. et al., 2018, MNRAS, 480, 1057
Zhan 2018, 42nd COSPAR Scientific Assembly, E1.16-4-18