(Translated by https://www.hiragana.jp/)
Hierarchical Bayesian Inference of Photometric Redshifts with Stellar Population Synthesis Models - IOPscience

A publishing partnership

The following article is Open access

Hierarchical Bayesian Inference of Photometric Redshifts with Stellar Population Synthesis Models

, , , , and

Published 2023 January 11 © 2023. The Author(s). Published by the American Astronomical Society.
, , Citation Boris Leistedt et al 2023 ApJS 264 23 DOI 10.3847/1538-4365/ac9d99

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/264/1/23

Abstract

We present a Bayesian hierarchical framework to analyze photometric galaxy survey data with stellar population synthesis (SPS) models. Our method couples robust modeling of spectral energy distributions with a population model and a noise model to characterize the statistical properties of the galaxy populations and real observations, respectively. By self-consistently inferring all model parameters, from high-level hyperparameters to SPS parameters of individual galaxies, one can separate sources of bias and uncertainty in the data. We demonstrate the strengths and flexibility of this approach by deriving accurate photometric redshifts for a sample of spectroscopically confirmed galaxies in the COSMOS field, all with 26-band photometry and spectroscopic redshifts. We achieve a performance competitive with publicly released photometric redshift catalogs based on the same data. Prior to this work, this approach was computationally intractable in practice due to the heavy computational load of SPS model calls; we overcome this challenge by the addition of neural emulators. We find that the largest photometric residuals are associated with poor calibration for emission-line luminosities and thus build a framework to mitigate these effects. This combination of physics-based modeling accelerated with machine learning paves the path toward meeting the stringent requirements on the accuracy of photometric redshift estimation imposed by upcoming cosmological surveys. The approach also has the potential to create new links between cosmology and galaxy evolution through the analysis of photometric data sets.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Deriving accurate redshifts and redshift distributions from photometry alone is of central importance for the scientific exploitation of modern galaxy surveys. In particular, cosmological analyses involving galaxy clustering and weak gravitational lensing require exquisitely accurate estimates of the redshift distributions of the selected galaxy populations (see, e.g., Hildebrandt et al. 2020; Abbott et al. 2022). Photometric redshifts (photo-z's) of individual galaxies are also useful for selecting targets of interest (e.g., at high redshift).

Photo-z's are typically obtained via two types of methods: template fitting, and machine learning. 8 Template fitting (e.g., Benitez 2000; Ilbert et al. 2006, 2006; Brammer et al. 2008; Acquaviva et al. 2015; Tanaka 2015; Carnall et al. 2018; Battisti et al. 2019) relies on lists of grids or templates for the spectral energy distributions (SEDs) of the galaxies considered, combined with a prior on the parameters involved (galaxy type, magnitude, etc). Machine-learning techniques (e.g., Collister & Lahav 2004; Kind & Brunner 2013) involve flexible models trained on spectroscopic or synthetic data. Despite their past successes, both these approaches are unable to satisfy the accuracy requirements imposed by upcoming Stage IV surveys, as they are not robust to errors in the data (calibration offsets, underestimated uncertainties), in the SED model, or to the non-representativity or sample variance of the spectroscopic data used for training and validation (e.g., Hartley et al.2020; Gatti et al. 2021; Myles et al. 2021; Newman & Gruen 2022). For example, the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will require the uncertainty on the mean of redshift distributions to be smaller than 0.001(1 + z) by its Year 10 data release (The LSST Dark Energy Science Collaboration et al. 2018), which in turn imposes very stringent requirements on the accuracy of individual photo-z's.

We introduce a framework capable of addressing these needs: a hierarchical model of galaxy photometry, consisting of three components describing a process for generating survey data:

  • 1.  
    galaxy SEDs generated with stellar population synthesis;
  • 2.  
    a population model describing the distributions of the intrinsic parameters of galaxies in the sample analyzed;
  • 3.  
    a noise model for generating photometric observations (i.e., a likelihood function).

This provides a powerful combination of empirical modeling with physics-driven components, incorporating known (astro)physics while leaving sufficient room for data-driven corrections, addressing the aforementioned issues of flexibility, interpretability, miscalibration and extrapolation outside of training data. Thus, once applied and calibrated to large survey data sets, it has the potential to yield both precise and accurate inferences on intrinsic galaxy parameters, including redshift. Building on previous work on hierarchical modeling in photometric surveys (Leistedt et al. 2016; Jones & Heavens 2018; Sánchez & Bernstein 2018; Rau et al. 2019; Alarcon et al. 2020; Sánchez et al. 2020), our framework takes advantage of the increasing accuracy and flexibility of SPS, and of methodological innovations in Bayesian inference for large hierarchical models, in particular involving machine-learning-facilitated acceleration of computationally intensive analysis steps.

In this paper, we demonstrate the efficacy of this framework by applying it to the COSMOS2020 data set (Weaver et al. 2022). For the choice of SED model, we consider Prospector-αあるふぁ (Leja et al. 2017, 2019; Johnson et al. 2021). We address the high computational needs of its model calls with neural emulators (Alsing et al. 2020) applied to both SPS total fluxes and emission-line contributions. We find that the emission-line contributions are responsible for the largest photometric residuals in our analysis and therefore require calibration in order to derive accurate photo-z's. Together, the SED, data, and population models form a complex hierarchical model which would normally be computationally intractable, a challenge we overcome with neural emulator-based acceleration. We obtain redshift estimates competitive with those publicly released with the COSMOS2020 data using this baseline model, providing a benchmark validation test demonstrating the power of the new methodology. The extensive 26-band wavelength coverage of COSMOS2020 simplifies parameter inference and allows us to demonstrate the conceptual advantages of the new methodology without the significant overhead of carrying out a Markov Chain Monte Carlo (MCMC) analysis. Application to broadband-only photometric surveys, such as the LSST, will require a more involved treatment of the uncertainties and selection effects, which we defer to future work. In a companion paper (Alsing et al. 2022), we show how to correctly include selection effects in the forward modeling of such surveys, and in particular for the inference of galaxy redshift distributions.

The outline of this paper is as follows. In Section 2 we describe our framework in the form of a generative model of photometric data. An inference formalism is presented in Section 3. We describe the data in Section 4 and the results in Section 5. We discuss the results in further detail in Section 6 and conclude in Section 7. In what follows, a Planck 2015 (Ade et al. 2016) and a Chabrier (2003) initial mass function are adopted for all relevant calculations. p( · ∣ · ) refers to the probability density of the quantity before the "∣," conditioned on quantities behind it. We work with fluxes in AB units.

2. Generative Model for Galaxy Photometry

Hierarchical models tackling photometric redshifts and redshift distributions have been developed (Leistedt et al. 2016; Jones & Heavens 2018; Sánchez & Bernstein 2018; Rau et al. 2019; Alarcon et al. 2020; Sánchez et al. 2020), but typically assume a fixed SED model. Leistedt et al. (2019) pioneered the approach of inferring redshifts jointly with the calibration of the hyperparameters of an SED model. However, like most photo-z methodologies (Ilbert et al. 2006; Brammer et al. 2008; Acquaviva et al. 2015; Tanaka 2015; Carnall et al. 2018; Battisti et al. 2019), the model still relied on a curated set of SED templates, which made it difficult to construct more structured, interpretable corrections to the SEDs or to the population model. Our work introduces the use of full continuous stellar population synthesis (SPS) models for redshift estimation, tailored to the multilevel calibration and inference that will be needed for the upcoming generation of wide-deep photometric surveys (Newman & Gruen 2022).

2.1. SED Model

SPS provides a powerful way to model the SED of a galaxy. It exploits fundamental principles to generate and sum the contributions of "simple" stellar populations (as observed in, e.g., star clusters) and to apply the effects of additional complications, such as dust and nebular emission, in order to create galaxy SEDs within a forward modeling framework. This step involves a set of specific modeling choices for these components; for example an explicit star formation history (SFH). We use the Flexible Stellar Population Synthesis (FSPS; Conroy et al. 2009a, 2009b; Conroy & Gunn 2010) code, accessed through the python-FSPS binding (Foreman-Mackey et al. 2014).

For the SED model we consider a variation of Prospector-αあるふぁ, which was first introduced in Leja et al. (2017), revised in Leja et al. (2019), Johnson et al. (2021), and has been successfully used to constrain Bayesian models for the galaxy population in Leja et al. (2020, 2022), Nagaraj et al. (2022) and Whitler et al. (2022). While these works fixed the redshift to the value measured with spectroscopy (spec-z), we relax this assumption and treat it as a free parameter. In our variation of Prospector-αあるふぁ, a galaxy SED is (deterministically) defined by 15 parameters:

  • 1.  
    Eight parameters describing the SFH: one parameter for the total stellar mass formed $\mathrm{log}{M}_{\star }$ [M] (integral of the SFH); one parameter for the stellar metallicity $\mathrm{log}({Z}_{\star }/{Z}_{\odot })$, assumed to be the same for all stars in the galaxy; and six parameters for the relative age SFH bins (ratios ri of the piece-wise SFR in adjacent temporal bins), where the seven time bins are spaced following Leja et al. (2020).
  • 2.  
    Three parameters for the dust attenuation, following the model of Charlot & Fall (2000) (see Leja et al. 2017 for details) with birth cloud τたう1, and diffuse attenuation τたう2 with a power law (of index n) from Calzetti et al. (2000).
  • 3.  
    A gas-phase metallicity parameter $\mathrm{log}({Z}_{\mathrm{gas}}/{Z}_{\odot })$ for the nebular emission (decoupled from the stellar metallicity) modeled after the grids of Byler et al. (2017).
  • 4.  
    Two parameters, logfAGN and τたうAGN, for the AGN torus emission model of Nenkova et al. (2008).
  • 5.  
    Redshift, z.

The SED includes dust heating from stars via energy balance, via a dust SED of fixed shape (Draine & Li 2007).

2.2. Emission Lines

Prospector-αあるふぁ includes a nebular emission model where the gas is ionized by the same stars synthesized in the SED (Byler et al. 2017; Byler 2018). There is a large variability of the line strengths, as demonstrated by Byler et al. (2017). In practice, previous studies have required the flexible addition of emission lines on top of SED templates in order to obtain accurate photo-z's (Ilbert et al. 2006, 2008; Brammer et al. 2008; Alarcon et al. 2021), so we adopt a similar approach.

We model offsets and uncertainty in the strength of the emission lines by encoding them into parametric bias and variance parameters. For a set of SPS parameters φふぁい , the total energy density as a function of wavelength, , consists of the base FSPS prediction SPS, with an extra additive contribution from emission lines. In practice, in FSPS, emission lines are modeled as delta functions, integrated in bandpasses and added to the model photometry. We can therefore write the SED as

Equation (1)

where j is the amplitude of the jth line, λらむだj the rest-frame wavelength of the line and δでるたD( · ) is the Dirac delta function. αあるふぁj is the hyperparameter for the additional contribution from line j. Because line j is already included in SPS with a default weight, its total contribution in is 1 + αあるふぁj .

2.3. Synthetic Photometry

We can obtain total model fluxes in the bth band by integrating the SED through the bandpass filter Wb (λらむだ), such that

Equation (2)

where dL (z) is the luminosity distance and τたう(z, λらむだ) is the effective optical depth of the intergalactic medium, calculated with the model of Madau (1995).

2.4. Population Model

So far we have a mechanism for using SPS to generate the photometry of a single galaxy given its intrinsic properties, φふぁい and redshift, z. We now describe a formalism for generating a sample of galaxies with a probability distribution p( φふぁい , z κかっぱ ), where κかっぱ are the hyperparameters which describe the population. These could in principle be inferred as well, but this typically requires a careful treatment of selection effects (i.e., how galaxies are selected into the sample at hand), which must be included at the population level. We show how to treat selection effects in a companion paper, Alsing et al. (2022). For our demonstration with COSMOS2020 in this paper, we focus on redshift inference for individual galaxies. Due to the constraining power of the data, the population prior will have very little impact on the galaxy posterior distributions, especially the redshifts. Therefore, we are able to bypass an explicit treatment of selection effects and also adopt a fairly uninformative population model.

We adopt minor variations on the fiducial Prospector-αあるふぁ prior (see Leja et al. 2019; Johnson et al. 2021), summarized in Table 1. It factorizes as

Equation (3)

Most parameters have simple uniform or log-uniform priors, except in the cases discussed below.

Table 1. Summary of Parameters and Priors Describing the SPS Model

ParameterPrior BoundsPrior
$\mathrm{log}{M}_{\star }$ [M][7, 13]Uniform(7, 13)
$\mathrm{log}({Z}_{\star }/{Z}_{\odot })$ [−1.98, 0.19]Uniform (−1.98, 0.19)
${\{\mathrm{log}{r}_{i}\}}_{i=1,\cdots ,6}$ [−5, 5]Stu(0, 0.3, 2)
τたう2 [0, 4] ${{ \mathcal N }}_{c}({\mu }_{2},{\sigma }_{2},0,4)$
τたう1/τたう2 [0, 2] ${{ \mathcal N }}_{c}({\mu }_{1},{\sigma }_{1},0,2)$
n [−1, 0.4] ${{ \mathcal N }}_{c}({\mu }_{n},{\sigma }_{n},-1,0.4)$
logfAGN [10−5, 150]LogUniform(10−5, 150)
τたうAGN [−2, 0.5]LogUniform(5, 150)
$\mathrm{log}({Z}_{\mathrm{gas}}/{Z}_{\odot })$ [−2, 0.5] ${{ \mathcal N }}_{c}(\mathrm{log}({Z}_{\star }/{Z}_{\odot }),{\sigma }_{Z},-2,0.5)$
z [0, 2.5]Uniform(0, 2.5)

Note. ${{ \mathcal N }}_{c}(l,s,m,M)$ refers to a truncated normal distribution of location l, scale s, and in the range [m, M]. Our fiducial values for the parameters below are ${\mu }_{n}=-0.095+0.111\,{\tau }_{2}-0.0066\,{\tau }_{2}^{2}$, σしぐまn = 0.4, σしぐまZ = 3.0, (μみゅー1, σしぐま1) = (1.0, 0.3), (μみゅー2, σしぐま2) = (0.3, 1.0), following Leja et al. (2019).

Download table as:  ASCIITypeset image

For the gas-phase metallicity, we take a normal 9 prior of mean at the stellar metallicity and standard deviation σしぐまZ = 3.0. Despite the fact that they should track each other approximately, it is more robust to treat the gas-phase metallicity as a nuisance parameter (Leja et al. 2020). We take a normal prior on the diffuse dust components τたう2 with mean equal to μみゅー2 = 0.3 and standard deviation σしぐま2 = 1.0. For the birth cloud component τたう1, we take a normal prior on the ratio r = τたう1/τたう2, with mean equal to μみゅー1 = 1 and standard deviation σしぐま1 = 0.3. The index of the dust attenuation law (for the diffuse component) is assumed to vary as a function of the total dust attenuation, with mean given by: $\langle \delta \rangle ={\mu }_{n}=-0.095+0.111\,{\tau }_{2}-0.0066\,{\tau }_{2}^{2},$ where δでるた is the (negative) offset from the index of the Calzetti attenuation curve (Calzetti et al. 2000). We take a normal prior on δでるた, with mean μみゅーn given above and standard deviation σしぐまδでるた = 0.4. This is a simple average of the results of Leja et al. (2019), but could be updated with a more complex prior, such as that of Nagaraj et al. (2022).

Finally, we collect the following hyperparameters in a vector κかっぱ = (μみゅーr , σしぐまr , σしぐまn , σしぐまδでるた , σしぐまZ ). We will infer them jointly with the other components of the model, in order to demonstrate how one could, in principle, learn about the underlying physics with this type of forward modeling framework. However, we expect little sensitivity to these hyperparameters for the data set we use. A more detailed population prior, encoding more of the known relations from galaxy formation and evolution, was developed by Alsing et al. (2022), and successfully reproduced galaxy redshift distributions given broadband photometry.

2.5. Noise Model

The final stage of the generative model is to simulate observed photometry ${\hat{F}}_{b}$ from the model flux Fb , by adding noise. The noise model can be fully characterized using a likelihood function. We use a scaled and translated Student's-t distribution 10 with two degrees of freedom, which is more robust to outliers than a normal distribution, since it has heavier tails:

Equation (4)

Aside from the observational uncertainty σしぐまb , which is given for each object (typically measured from the images), we introduce two additional terms: a zero-point offset ωおめがb and the model uncertainty Σしぐまb added in quadrature. ωおめがb appears in front of the uncertainty because the latter will involve the model Fb , which must be rescaled by ωおめがb (although there is some freedom in choosing how to define these terms self-consistently).

We write the model uncertainty as the sum (in quadrature) of a fraction of the total model flux and a fraction of the total line fluxes,

Equation (5)

where Ljb is the photometric flux of the jth line in the bth band, i.e., the result of applying Equation (2) to j δでるたD(λらむだλらむだj ). The first term operates at the level of the band (controlled by a hyperparameter γがんまb ) and is analogous to the magnitude uncertainty floor often employed in photo-z techniques. The second term is similar, but adds a contribution from each line. As before, the hyperparameter βべーたj can be interpreted in terms of a fraction of the total emission-line flux (which is now 1 + αあるふぁj ). In summary, this allows us to separate the extra uncertainty Σしぐまb preferred by the data into contributions from calibration uncertainty (via γがんまb ) and SED uncertainty (via βべーたj ).

3. Inference Formalism

We now assemble the components presented in the previous section into a hierarchical model for generating and analyzing photometric data. We describe an inference methodology and also computational acceleration using machine learning.

3.1. Hierarchical Model

A summary of the parameters of our model is provided in Table 2. A graphical representation is provided in Figure 1. Each galaxy has 15 intrinsic SPS parameters (including redshift), as described in Section 2.1, and in Sections 2.22.5 we have introduced five sets of hyperparameters, aiming to add some targeted sources of flexibility into the model. Furthermore, a hierarchical modeling approach has the ability to fix subsets of parameters, and to observe the effect on the others, offering additional robustness diagnostics.

Table 2. Notation for All Model Parameters

ParameterDescription
  Hyperparameters
ωおめがb Zero-point offset (for bth band, relative to the i band)
γがんまb Additional flux uncertainty contribution for the bth band (fraction of the total model flux)
αあるふぁj Additional flux for the jth emission line (fraction of the line strength in FSPS)
βべーたj Additional flux uncertainty contribution for the jth emission line (fraction of the line strength in FSPS)
κかっぱ Hyperparameters describing the galaxy population model
  Latent parameters
φふぁい Stellar population parameters describing the rest-frame spectrum (per galaxy)
z Redshift (per galaxy)
  Derived parameters
SPS( φふぁい )Flux energy density predicted by FSPS (per galaxy)
j ( φふぁい )Amplitude of emission line (Dirac delta model) in FSPS (per galaxy)
Ljb ( φふぁい , z)Flux of the jth emission line in the bth band (per galaxy), i.e., j inserted in Equation (3)
Fb ( φふぁい , z, αあるふぁ )Total model flux in the bth band, with all contributions (per galaxy), i.e., Equation (1) inserted into Equation (3)
Σしぐまb ( φふぁい , z, αあるふぁ , γがんまb , γがんま )Additional flux uncertainty in the bth band, with all contributions (per galaxy), see Equation (5)
  Data
${\hat{F}}_{b}$ Measured flux in the bth band (per galaxy)
σしぐまb Flux measurement uncertainty in the bth band (per galaxy)

Note. Vectors are indicated with bold symbols. Our total log likelihood and graphical model presented below include an extra index i per galaxy, which we have omitted in this table and all other equations in the paper for simplicity.

Download table as:  ASCIITypeset image

Figure 1.

Figure 1. Graphical representation of the main parameters involved of our model. Circles are inferred random variables, shaded circles are observed and dots indicate random variables that are deterministic in their inputs.

Standard image High-resolution image

The full posterior distribution of our model (adding a subscript i for galaxies) is

Equation (6)

3.2. Priors

In our approach it is straightforward to encode prior knowledge or constraints on corrections such as zero-points or emission-line contributions, as the prior on hyperparameters appears explicitly as p( αあるふぁ , βべーた , ωおめが , γがんま , κかっぱ ). The priors we use are given in Table 3.

Table 3. Priors on Hyperparameters

HyperparameterPrior
${\{{\alpha }_{j}\}}_{j=1,\cdots ,{N}_{\mathrm{lines}}}$ Laplace(0, 10)
${\{{\beta }_{j}\}}_{j=1,\cdots ,{N}_{\mathrm{lines}}}$ Normal(0, 1)
${\{{\omega }_{b}\}}_{b=1,\cdots ,{N}_{\mathrm{bands}}}$ Normal(1, 1)
${\{{\gamma }_{b}\}}_{b=1,\cdots ,{N}_{\mathrm{bands}}}$ Normal(0, 1)

Note. The numbers in parentheses indicate the location and scale of the distribution.

Download table as:  ASCIITypeset image

We find that the Laplace prior for each αあるふぁj is preferable to a normal prior, since it promotes sparsity of the coefficients, i.e., the coefficients should be zero unless they significantly improve the model fits. Without this regularization there is a risk that line contributions would mimic other narrow features in the SED, such as absorption lines, which should instead be associated with a separate set of stellar evolution physics.

3.3. Optimization Strategy

Our inference strategy is to optimize the posterior distribution with respect to all parameters. This is because the mixture of broad bands (typically deeper) and noisier intermediate or narrow bands makes it possible to pin down SED features accurately and unequivocally. As a result, it is possible to derive estimates of the hyper- and latent parameters through optimization, without the need for more involved inference strategies, such as MCMC. We adopt the Adam optimizer with a learning rate of 10−3.

Challenges that need to be overcome at this stage involve the computational needs of SPS calls, and the fact that they are not differentiable analytically (although differentiable SPS models are now being developed, Hearin et al. 2021; Alarcon et al. 2023). We address these issues with neural emulators of the model predictions, which are both ∼104 faster than native SPS calls and differentiable (Alsing et al. 2020). We return to the details of our neural emulators in the next section.

In principle, one could determine all parameters in a single run. But, for the purpose of this demonstration study, we adopt a two-stage approach (e.g., Laigle et al. 2016; Weaver et al. 2022) that is commonly used and leverages the availability of spectroscopic redshifts. We first determine the hyperparameters with redshifts fixed at the values acquired via spectroscopy (optimizing all SPS parameters other than the redshift). A second run is then performed by fixing the hyperparameters at the inferred values, and only optimizing the latent SPS parameters (now including the redshift). The resulting optimized redshifts are denoted ${\hat{z}}^{\mathrm{MAP}}$ (omitting the galaxy index).

Finally, we employ bijectors to map the original parameters to new variables which have zero-mean, unit-variance normal priors. These are easier to work with than the original variables, since they do not have boundaries, which facilitates optimization (and sampling). The parameter bounds are shown explicitly in Table 1.

3.4. Emulators

As previously mentioned, one drawback associated with SPS models is the computational cost of each model call, which is the result of the many intensive numerical steps involved in predicting flux densities and photometry from stellar isochrones, initial mass function and the other ingredients of SPS. While this has been overcome with MCMC in the past and applied to medium-sized (∼105 objects) data sets (e.g., Leja et al. 2019), photometric galaxy surveys are approaching the order of a billion objects (The LSST Dark Energy Science Collaboration et al. 2018; Abbott et al. 2022), so brute-force inference is not practical. Hyperparameters or model calibration would make the computational load even heavier. We now discuss how we employ emulators 11 of fluxes and line emission in order to overcome this challenge.

3.4.1. SPS Fluxes

For each band, we use the methodology described in Alsing et al. (2020) to train an emulator to predict the photometric flux, i.e., SPS fed into Equation (3). While, in principle, one could train emulators to predict the SPS(λらむだ), it is advantageous to emulate photometric fluxes directly in order to avoid evaluating Equation (3). The emulator for each band is a fully connected 4-layer neural network (with 128 neurons in each layer). We train with staggered learning rates (10−3, 10−4, 10−5, 10−6) and batch sizes (1000, 10,000, 50,000, all objects) to improve convergence. We train using 4 million samples from the prior, setting 10% of the training data aside for validation and stopping the training when the loss does not improve for 20 epochs.

3.4.2. Emission-line Fluxes

Our model photometry has additional contributions from emission lines, so for any parameter vector ( φふぁい , z) we need to calculate them in addition to the base SPS predictions. This extra computational load can again be alleviated with emulators. However, in this case building emulators for individual emission lines and all bands would be prohibitive. Instead, we adopt a different strategy. We train an emulator to model the fluxes {j } on the rest-frame wavelength grid {λらむだj }, following the procedure of Alsing et al. (2020). In our setting the size of the wavelength grid is the number of emission lines in FSPS (all 128 lines in Byler et al. 2017), so the accuracy of the emulator is easier to keep under control. 12 To predict the photometric flux of an emission line, we need to implement Equation (3). Because the bandpass {Wb (λらむだ)} are in tabular form, it would be preferable to work with an interpolated or continuous approximation. We find that fitting the sum of three generalized normal distributions to each Wb offers a simple solution, since it is straightforward to fit to the tabulated data without any tuning. This approach delivers accurate and fast results for the final emission-line fluxes.

3.4.3. Emulator Accuracy

When comparing the initial and the emulator-reconstructed fluxes (computed from fluxes), we find that they are accurate at a level better than 1% (99th percentile of the magnitude differences in the range spanned by the COSMOS2020 data below). This will be comfortably covered by Σしぐまb , for example, thanks to the γがんまb hyperparameter being greater than 0.01. The accuracy of emulation of emission-line fluxes is typically below 0.1% (99th percentile), so the errors in this emulation can safely be ignored.

4. Data

We now construct a data set that demonstrates that our framework is able to produce accurate photometric redshift inferences from a benchmark sample. This consists of selecting broad- and narrow-band photometry in the COSMOS field, and cross-matching with a selection of spectroscopic data sets to enable validation of inferred redshifts.

4.1. Photometry

Weaver et al. (2022) presented COSMOS2020, the latest compilation of imaging in the COSMOS field. Two sets of photometric measurements were performed on these data: Classic and Farmer. For our case study we choose Farmer, which is based on the model-fitting software The Tractor. 13 The photo-z's published with this data set were derived using Eazy (Brammer et al. 2008), as well as LePhare (Ilbert et al. 2006, 2008) using slightly different subsets of bands. While we will compare our results to both of these methods, we adopt the same subset of selected bands as Eazy; its SED templates are also derived using FSPS, thus providing a more direct comparison. This selection of bands exclude the Subaru Suprime-Cam broad bands (shallower than other filters at similar wavelengths) and the GALEX bands (shallow and with broad PSFs) (Weaver et al. 2022). We apply the recommended "combined" mask, which retains the deepest regions with the greatest number of available bands and removes areas corrupted by bright stars and artifacts in all the relevant bands.

This photometric data set is prepared for further analysis using the code released with the COSMOS2020 data, which applies various flux corrections (including Galactic extinction) and unit conversions.

4.2. Spectroscopic Redshifts

We compile the positions and spectroscopic redshifts from three large campaigns covering the COSMOS field: zCosmos-bright (Lilly et al. 2007), DEIMOS (Hasinger et al. 2018)], and C3R2 (Masters et al. 2017, 2019; Stanford et al. 2021).

zCosmos-bright (Lilly et al. 2007) is a highly representative sample of 20,000 bright (i* ≤ 22.5) galaxies. DEIMOS (Hasinger et al. 2018) is a sample of ∼4 × 103 objects selected from a variety of input catalogs based on multiwavelength observations in the field and thus represents a diverse selection function. Finally, the Complete Calibration of the Color-Redshift Relation (C3R2, Masters et al. 2017, 2019; Stanford et al. 2021) is a multi-institution, multi-instrument survey targeting faint galaxies (i ∼ 24.5) populating regions of color space under-sampled by other surveys, in order to provide validation data for upcoming weak lensing surveys. We use data from the Data Releases 1, 2, and 3 (∼4 × 103 objects).

For all three samples, we keep objects with quality flags 3 and 4. We perform a spatial cross-match between the positions of these objects in the spectroscopic and photometric catalogs. Specifically, we keep the nearest unique photometric source within a radius of 0farcs6 around each spectroscopic object, as recommended by Weaver et al. (2022). We also remove objects classified as stars by Eazy or LePhare, as well as the few sources with spectroscopic redshift >2.5.

We are left with 12,473 objects: 6920, 3168, and 2043 for zCosmos-bright, DEIMOS and C3R2, respectively. This is the data set upon which we carry out our demonstration study.

This data set combines three samples and so has a complicated selection function (i.e., the probability, conditional on its intrinsic properties, that a galaxy appears in the sample). The selection function must be incorporated into any population level analysis, as is treated explicitly in the inference of redshift distributions presented by Alsing et al. (2022). But if the focus is on inference of object-level parameters (i.e., individual galaxy redshifts) then selection effects can be ignored if the data on detected objects is sufficiently constraining; in the limit that the photometry determines the redshift (and other parameters) perfectly, the posterior is a delta function and hence is completely decoupled from the population. While not strictly satisfied here—the high-quality photometry and broad wavelength coverage do yield finite parameter uncertainties—this is an excellent approximation. We hence do not consider selection effects further.

5. Results

We now evaluate the performance of our inference framework applied to the crossmatched data set described in the previous section. In what follows, unless explicitly stated, the hyperparameters of the population model ( κかっぱ ) are not optimized and are set to the fiducial values defined previously.

In practice, to avoid degeneracies with other parameters of the model, we anchor the zero-point of the HSC i band to unity and define the others relative to this band.

5.1. Emission Lines

We perform an initial fit to the data with our model with no contribution from lines, i.e., setting αあるふぁj = βべーたj = 0 ∀j. Thus, we optimize the zero-point offsets and uncertainty hyperparameters ${\{{\omega }_{b},{\gamma }_{b}\}}_{b=1,\cdots ,{N}_{\mathrm{bands}}}$, as well as the SPS parameters of each galaxy, with the procedure outlined in Section 3.3. This consists of a run, with redshifts fixed at the spectroscopic redshifts, to estimate the hyperparameters, and a second run now relaxing redshifts but fixing the hyperparameters. The resulting maximum a posteriori redshift estimates, ${\hat{z}}^{\mathrm{MAP}}$, are poor compared to the released COSMOS2020 photo-z's. We also observe sharp features in the redshift residuals ${\hat{z}}^{\mathrm{MAP}}-$ spec-z and in the flux residuals (data minus model at the best-fit parameters) binned in spec-z. Many of these features correspond to strong emission lines, such as [O ii], demonstrating that their luminosities are not correctly calibrated for our data set. Re-calibrating them is straightforward within our framework.

We must first determine a suitable subset of emission lines (out of the 128 implemented in FSPS by Byler et al. 2017) to include in the model. To achieve this we take the best-fit SEDs of our first run described above (at z = spec-z) and calculate the contribution of each emission line to the total photometric flux, i.e., the flux ratio Ljb /Fb , for all objects, lines, and bands. We then take the average of these values over objects and bands. This procedure yields the lines which make the strongest contributions to the data at hand. This set includes most of the lines with intrinsic large variance according to Byler et al. (2017). Importantly, this procedure also filters out lines that are outside our wavelength coverage. We are left with 44 lines. Note that nearby emission lines are not necessarily all well-resolved by the data, i.e., each may only unequivocally contribute to a few objects. This could be improved with more informative priors on the lines or on line selection; however we do not expect this to have a significant impact on our redshift results.

In total, following this line-selection procedure, we have 2 × Nbands + 2 × Nlines = 140 hyperparameters (excluding κかっぱ ), and 15 × Nobjects ∼ 2 × 105 "latent" SPS parameters. We now optimize all of these parameters and extract maximum a posteriori redshift estimates ${\hat{z}}^{\mathrm{MAP}}$.

5.2. Photo-z's

We evaluate the performance of this model using some commonly used metrics. For every object in our catalog, a spectroscopic redshift zspec and a redshift estimate ${\hat{z}}^{\mathrm{MAP}}$ are available. The main quantity of interest is the difference between the two, divided by the standard 1 + z factor accounting for the expected scaling of the quality of the estimates due to the effect of redshift on wavelength,

Equation (7)

We compute the mean, median, and standard deviation of Δでるたz over our sample of galaxies. For the standard deviation, an estimator more robust to outliers than the sample variance is ΣしぐまNMAD, the median absolute deviation (the median of ∣Δでるたz ∣) multiplied by 1.48. Finally, the outlier fraction is defined as the percentage of objects with ∣Δでるたz ∣ > 0.15.

Figure 2 shows these metrics, calculated in redshift and magnitude bins (in the reference HSC i band, which is not affected by zero-points), for our framework, as well as the publicly released COSMOS2020 LePhare and Eazy redshifts.

Figure 2.

Figure 2. Photo-z metrics calculated using our sample for our hierarchical model compared with the COSMOS2020 released LePhare and Eazy photo-z's. We compute the median of ${{\rm{\Delta }}}_{z}=({z}_{\mathrm{spec}}-{\hat{z}}^{\mathrm{MAP}})/(1+{z}_{\mathrm{spec}})$, as well as σしぐまMAD = 1.48 × median(∣Δでるたz ∣), and the outlier fraction as the fraction of objects with ∣Δでるたz ∣ > 0.15.

Standard image High-resolution image

Overall, our approach yields comparable levels of bias (as measured by the median of Δでるたz ), fewer outliers, and a slightly larger level of scatter (as measured by σしぐまNMAD). The comparison with LePhare is not straightforward, as it uses a slightly different set of bands. On the other hand, we use the same set of bands as Eazy, also based on templates derived from a grid of parameters using an SPS model. Eazy includes a magnitude-redshift prior as well as a template error function (Brammer et al. 2008). These differences may be responsible for the small improvement in σしぐまNMAD. Note that we have not attempted to tune or improve our model in order to carry out redshift inferences on these data, beyond the procedure described above.

While these photo-z metrics are informative summaries of the results, it is also interesting to examine the distributions of redshift estimates themselves. The residuals Δでるたz are shown in Figure 3, while Figure 4 also shows a conventional comparison of the estimates and the spectroscopic redshifts. These confirm the results captured by the metrics above. This also highlights that the redshift estimates are noisier for the DEIMOS and C3R2 samples because these surveys targeted populations typically under-represented in other spectroscopic campaigns (fainter, bluer, higher redshift). These populations also present greater challenges in terms of accurate photo-z estimation.

Figure 3.

Figure 3. Differences between the maximum a posteriori redshift estimates and the spectroscopic redshifts (divided by 1 + z) for the new model compared with the COSMOS2020 released LePhare and Eazy redshifts. This highlights the areas where they achieve similar performance in terms of scatter and outliers. Splitting between the three sources of spectroscopic redshifts also reveals the differences in photo-z performance for samples with different selection, as described in Section 4.

Standard image High-resolution image
Figure 4.

Figure 4. Maximum a posteriori redshift estimates versus spectroscopic redshifts for the three spectroscopic sources comprising the data we analyze. Differences are best highlighted in Figure 3, whereas these scatter plots with a logarithmic color bar better highlight the overall distributions and the outliers (which are rare).

Standard image High-resolution image

5.3. Hyperparameters

In this section, we investigate the results of our optimization procedure for the model's hyperparameters and their implications for the robustness of the data model. The hyperparameters resulting from the optimization are shown in Figure 5. We also show the result of a run with the noise and emission-line offsets set to fiducial values γがんまb = 0.01 ∀b, αあるふぁi = 0 ∀i. Throughout this section we will also comment on robustness checks obtained by running variations of these models.

Figure 5.

Figure 5. Hyperparameters obtained with the optimization procedure of Section 3.3. The black stars correspond to our best model, with all parameters optimized, while the green diamonds correspond to an additional model (used for robustness checks, see text for details) fixing the emission-line offsets αあるふぁj = 0 ∀j and error floors γがんまb = 0.01 ∀b.

Standard image High-resolution image

5.3.1. Zero-point Offsets

The inferred zero-points {ωおめがb } are at the level of a few percent in most bands. They are well-constrained by the data and robust to changes in the model. The biggest changes are shifts in the intermediate bands caused by emission-line offsets αあるふぁi , as shown in Figure 5. These results are qualitatively consistent with the Eazy zero-points released with COSMOS2020. This is not a surprise given that we use the same set of bands and that Eazy also involves a grid of SPS templates. Some possible explanations for the disparities are differences in the SPS modeling, the approximation resulting from using a grid as opposed to a continuous model, or even the fact that Eazy calibrates the zero-points based on the whole COSMOS2020 data set (not only objects with spectroscopic redshifts).

5.3.2. Examination of Photometric Uncertainties

Figure 5 shows that the inferred values of flux uncertainties γがんまb are at the level of a few percent in most bands: smaller for the broad bands (which are typically deeper), and larger for intermediate and narrow bands (noisier and also more sensitive to the details of the SED modeling, such as emission lines). The u and ch2 bands are the most noisy, consistent with previous findings (Laigle et al. 2016; Weaver et al. 2022). Just as for zero-points, the inferred values are mostly stable when switching other hyperparameters on and off. However, there is no uncertainty contributed by emission lines in this model (bottom right panel), indicating that the band-to-band uncertainty is preferred by the data. If we set the band-to-band uncertainty to 1% (green diamonds), then emission lines contribute to the uncertainty budget (via non-zero βべーた) to compensate, as expected. Thus, we conclude that the noise model performs well, but that emission-line uncertainty is simply not required in our full model for this data set.

5.3.3. Emission-line Offsets

We find that including extra contributions from emission lines significantly improves the quality of the derived photo-z's. This is consistent with previous findings by Ilbert et al. (2006, 2008), Brammer et al. (2008) and Alarcon et al. (2021). Note that in Figure 5 emission lines are ordered by their contribution relative to the total broadband flux, following the procedure described above. Offsets are primarily given to the strongest lines, with some of them being almost entirely canceled (αあるふぁ ≈ −1). The Laplace prior is effective at keeping most line offsets at zero, unless they bring a significant improvement to the fits. We also note that the interpretation of the hyperparameters is conditioned on the sample at hand: changing the subset of COSMOS2020 galaxies analyzed may change the resulting values of zero-points and emission-line offsets, for example.

5.4. Population Model

We perform one additional run, also optimizing the hyperparameters of the population model, κかっぱ . We find that the data do not significantly perturb the fiducial values originally adopted; nor do these parameters appear degenerate with other parameters of the model. This demonstrates the robustness of the model to the prior and additionally shows that varying these hyperparameters does not lead to significantly improved fits. However, it should be noted that we have not included selection effects in our model. If we had found that the inferred values had significantly moved from the fiducial values, we would have been unable to determine whether this had been caused by incomplete modeling of the selection process (Alsing et al. 2022). However in this setting, as we have previously argued, the selection effects should be negligible so this concern does not arise.

5.5. Residuals

Finally, we check the ability of the presented model to explain the statistical properties of the data. We examine the distribution of the flux residuals, defined as

Equation (8)

These are shown in Figure 6. The blue lines show our likelihood function, the Student's-t distribution described in Equation (4). There is good agreement between the chosen likelihood and the residuals. We have checked that this agreement significantly deteriorates when all of the hyperparameters are not optimized (not shown here). The deviations in the u and ch2 bands are likely to be due to residual data systematics which are not well described by our model.

Figure 6.

Figure 6. Flux residuals (black), as defined in Equation (8), compared with the likelihood function of Equation (4) in blue, showing a satisfactory agreement between the data and the model predictions.

Standard image High-resolution image

When repeating this analysis with a normal likelihood function, we find that the residuals are worse and provide a poor fit of the likelihood function, despite optimizing the other components of the model. This setting also results in worse redshift estimates. This demonstrates the presence of outliers and shows that accommodating them with a suitable likelihood function can significantly improve the ability of a model to fit complicated data.

6. Discussion

Our results demonstrate that the hierarchical SPS inference framework presented here (and in Alsing et al. 2022) delivers accurate redshifts on a benchmark data set, therefore passing a stringent validation test. We now discuss possible extensions which relax some of its assumptions.

The model itself could be made more sophisticated in various ways in order to better suit specific applications and increase the accuracy of the photo-z's further. Such extensions include a stellar model, which could straightforwardly be incorporated to allow star-galaxy separation, akin to what is done in Eazy and LePhare with stellar templates.

In the future, it will be critical to include effects which are currently neglected, but which affect the data (and in turn, photometric redshifts) at significant levels for LSST, such as airmass-dependent fluxes (Graham et al. 2018). Furthermore, extending our model to include spatial information (Sánchez & Bernstein 2018; Jasche & Lavaux 2019; Alarcon et al. 2020; Sánchez et al. 2020) could provide a powerful probe of the connection between galaxies and dark matter and exploit more of the information offered by photometric surveys.

Our chosen optimization strategy, paired with the posterior distribution taken as a loss function, minimizes the residuals between the measured and the model fluxes. Other choices may be possible, especially when including spectroscopic redshifts. For example, one could explicitly minimize the residuals between photometric (inferred) and spectroscopic redshifts, although this will render the results more sensitive to selection effects in the data.

We neglected the uncertainties in hyperparameters and SPS parameters. This is because the extensive wavelength coverage of the 26-band COSMOS2020 data, the small flux uncertainties, and the availability of spectroscopic redshifts combine to yield well-constrained galaxy SEDs and in turn hyperparameters. Nevertheless, there are fundamental degeneracies expected between some of the SPS parameters (Leja et al. 2017, 2019), as well as with the redshift when it is not fixed to a spectroscopic value. Selection effects may also be important, and can be accounted for in the posterior distributions as described in Alsing et al. (2022). In analyzing data sets where the uncertainties in hyperparameters or SPS parameters (including redshift) are not negligible (e.g., broadband-only surveys like the LSST), one can resort to the variety of Bayesian inference techniques (i.e., MCMC, variational inference, Laplace approximation, simulation-based inference) available for hierarchical models. Since the impact of selection cuts and parameter uncertainties is strongly dependent on the specific data and models being considered, we defer further discussion and the development of effective inference strategies to future work.

7. Conclusion

We have presented a hierarchical model to infer redshift and other intrinsic galaxy properties from photometric data. This approach makes it possible to self-consistently encode knowledge of (1) galaxy formation and evolution via a population model, (2) astrophysics via SEDs predicted with stellar population synthesis, and (3) observational effects via a noise model and a likelihood function. By formulating the inference within a Bayesian hierarchical model, we are able to parameterize additional sources of bias and uncertainty in each of these components and to solve for them self-consistently. Thus, any manual tuning or inversion methods can be avoided, with the added benefit that it is possible to set informative priors on these additional terms.

Our photo-z's (as measured with traditional metrics on redshift point estimates) are competitive with the publicly released photo-z's from COSMOS2020 (Weaver et al. 2022). While this is a benchmark test on a relatively bright, low-redshift sample of spectroscopically confirmed galaxies, these results demonstrate the power and flexibility of our methodology and motivate its further development toward addressing the challenging setting presented by stage IV cosmological surveys. We made our inference tractable using neural emulators to accelerate model calls (Alsing et al. 2020).

In future applications of our framework, we will relax some of the modeling assumptions made here. In particular, we demonstrate how to incorporate selection in the companion paper Alsing et al. (2022). Comparing inferences from multiple SPS and population models will also provide valuable consistency checks. In fact, refining the SED and population models directly from the upcoming survey data is a promising avenue for directly probing the underlying (astro)physics using large amounts of untapped statistical power. Leveraging hybrid data sets and incorporating additional data available (e.g., spec-z's, or even full spectra) will be instrumental for correctly separating the different physical effects at play. This is straightforwardly included in hierarchical models (see, e.g., Alarcon et al. 2021; Nagaraj et al. 2022). Together, these attributes make the methodology presented here well-suited for addressing the challenges of upcoming photometric surveys and the robust scientific exploitation of their data.

We thank George Efstathiou for valuable input during the course of this project, and John R. Weaver for assistance with the COSMOS2020 data. We also thank Konrad Kuijken, Hendrik Hildebrandt, Angus Wright, and Will Hartley for useful discussions. B.L. is supported by the Royal Society through a University Research Fellowship. This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 101018897 CosmicExplorer). This work has also been enabled by support from the research project grant Understanding the Dynamic Universe' funded by the Knut and Alice Wallenberg Foundation under Dnr KAW 2018.0067. J.A., H.V.P. and D.J.M. were partially supported by the research project grant "Fundamental Physics from Cosmological Surveys" funded by the Swedish Research Council (VR) under Dnr 2017-04212. The work of H.V.P. was additionally supported by the Göran Gustafsson Foundation for Research in Natural Sciences and Medicine. H.V.P. and D.J.M. acknowledge the hospitality of the Aspen Center for Physics, which is supported by National Science Foundation grant PHY-1607611. The participation of H.V.P. and D.J.M. at the Aspen Center for Physics was supported by the Simons Foundation.

Author Contributions

B.L.: conceptualization, methodology, software, validation, formal analysis, writing—original draft. J.A.: conceptualization, methodology, software, validation, writing—review and editing. H.V.P.: conceptualization, methodology, validation, writing—review and editing, funding acquisition. D.M.: conceptualization, methodology, validation, writing—review and editing, funding acquisition. J.L.: conceptualization, methodology, validation, writing—review and editing.

Footnotes

  • 8  

    A third technique, clustering redshifts, can constrain redshift distributions, but cannot deliver individual photo-z's, see, e.g., Schneider et al. (2006), Newman (2008), Ménard et al. (2013), McQuinn & White (2013), Schmidt (2013), Morrison et al. (2017), Gatti et al. (2021).

  • 9  

    Since we apply normal priors to parameters only defined in finite ranges, a renormalization is necessary, giving rise to truncated normals, as described in Table 1. For simplicity we omit this detail throughout this section.

  • 10  

    The density of the scaled and translated Student's-t distribution is

    with t = (xl)/s as the residuals and Γがんま( · ) as the Gamma function. l and s are often referred to as location and scale parameters and νにゅー as the number of degrees of freedom.

  • 11  

    Note that emulators are not the only approach to accelerating inference. For example, Hahn & Melchior (2022) and Ramachandra et al. (2022) showed how to approximate posterior distributions directly for the estimation of SPS parameters or redshifts from broadband photometry.

  • 12  

    For reference, a standard FSPS spectrum with the default MILES spectral library has ∼6 × 103 wavelength points.

  • 13  
Please wait… references are loading.
10.3847/1538-4365/ac9d99