Note: this project is no longer available! Please see my other project: Representing galaxies and their surroundings with graph neural networks


PI:  John F. Wu (email, website)
Group Website: ISM*@ST
Project Duration: 12 month rotation project for a graduate student; can evolve into thesis project; RA funding is available

Project Abstract

The goal is to design and optimize a neural network capable of generating realistic galaxy images conditioned on their properties such as stellar masses, star formation rates, metallicities, ionization parameters, etc. The project will build on traditional techniques used for inferring galaxy properties, such as spectral energy distribution fitting, and novel deep learning approaches, such as conditional de-noising diffusion models. This work serves as the basis for a Hierarchical Bayesian model that will jointly infer galaxy properties using imaging data and photometric catalogs, which will be critical for breaking stellar population model degeneracies (e.g. between dust, age, and metallicity) and enable us to better understanding how galaxies grow.

Background and Context

As galaxies grow and evolve, they leave signatures of their evolution on their spectra as well as their appearances. Galaxy spectra are often observed in several broadband filters, and the total amount of light across the galaxies at each wavelength is measured as a flux in each band. Astronomers have devoted great care to modeling synthetic stellar populations in order to match the observed fluxes across the electromagnetic spectrum; this technique is called spectral energy distribution (SED) fitting (e.g. Conroy 2013). However, the appearances of galaxies are often reduced to just a few morphological parameters (such as concentration, asymmetry, smoothness, Gini coefficients) or human-made classifications, and none of these capture the richness of morphological detail that we readily see in galaxy images.

Figure 1, from Wu & Peek (2020). Examples of galaxy image cutouts with corresponding observed spectra (black) and CNN-predicted spectra (red).

Wu's research group and collaborators have attempted to pick up the slack on better characterizing galaxies' morphologies, and have done so using large imaging surveys in order to study a variety of topics related to galaxy evolution. These include estimating galaxy gas-phase metallicities (Wu & Boada 2019), predicting neutral hydrogen gas content (Wu 2020), separating AGN and star-forming systems (Holwerda, Wu, et al. 2021; Guo, Wu, & Sharon 2022), identifying rare nearby dwarf satellite galaxies (Wu et al. 2022; Darragh-Ford, Wu et al. in prep), and even predicting the entire optical spectrum of galaxies (Wu & Peek 2020, see above). In each case, deep convolutional neural networks play a key role in representing the morphological information found in galaxy images.

The current project will use cutting-edge machine learning methods to generate images of galaxies, conditioned on their physical properties. It is already possible to generate images without any sort of conditioning – i.e., without taking into account any galaxy properties. Very realistic SDSS-like images generated using this approach are shown in Figure 2 below. If we can generate galaxy images conditioned on physical properties, then we should also be able to evaluate how likely a galaxy image is given some combination of properties. For example, a high-metallicity galaxy with old stellar populations is unlikely to take the appearance of a dwarf galaxy with flocculent spirals, allowing us to narrow down the inference space. Eventually (i.e. as part of thesis), the goal will to infer galaxy properties using images and photometric catalogs simultaneously by using a Hierarchical Bayesian Model.

Figure 2. Images generated using an unconditional generation method called a denoising diffusion probabilistic model (DDPM; Ho et al. 2020).

Student work

Planned work as part of project:

  • Re-implement unconditional image generation model that produced Figure 2.
  • Obtain SDSS catalogs of galaxy properties derived from spectra (or other value-added catalogs) in order to determine which properties we will use as "ground truth" values.
  • Implement conditional image generation using SDSS images and catalog of physical properties.
  • Write results for submission in AAS journal.

Side projects or future projects:

  • Train + deploy machine learning model using Legacy Survey images and DESI spectra (technical or ML paper – also useful for those considering industry careers).
  • Use conditional image generation in CANDELS and other HST/JWST deep fields to identify bad photometry (e.g. photometric shreds) or extremely rare objects (may result in a paper).
  • Use cosmological simulation (e.g. TNG 50) to get "better" ground-truth galaxy properties, and train a conditional diffusion model on synthetic images (paper, possibly part of a thesis).
  • Implement Hierarchical Bayesian Model to jointly infer galaxy properties on photometry + imaging (paper, part of thesis).

Optional work, useful for background:

  • Replicate the Wu & Peek (2020) results with a much larger data set (Github)
  • Replicate recent results from Doorenbos et al. (2022), who do the inverse problem of estimating spectra from images (Github, paper)

Skill sets (no prior experience necessary):

  • Understanding of galaxy evolution
  • Proficiency in machine learning, including use of modern Python-based framework such as Pytorch or Jax
  • Proficiency in analyzing large data sets – including coding in the Numpy/Astropy stack
  • Proficiency in cloud-based computing (e.g. AWS or GCP) using accelerated hardware (e.g. GPUs or TPUs)
  • Academic and technical writing


  • No labels