Abstract
This paper presents the nonhomogeneous Poisson process (NHPP) for modeling the rate of fast radio bursts (FRBs) and other infrequently observed astronomical events. The NHPP, well-known in statistics, can model the dependence of the rate on both astronomical features and the details of an observing campaign. This is particularly helpful for rare events like FRBs because the NHPP can combine information across surveys, making the most of all available information. The goal of the paper is two-fold. First, it is intended to be a tutorial on the use of the NHPP. Second, we build an NHPP model that incorporates beam patterns and a power-law flux distribution for the rate of FRBs. Using information from 12 surveys including 15 detections, we find an all-sky FRB rate of 587 events per sky per day above a flux of 1 Jy (95% CI: 272, 924) and a flux power-law index of (95% CI: 0.57, 1.25). Our rate is lower than other published rates, but consistent with the rate given in Champion et al.
Export citation and abstract BibTeX RIS
1. Introduction
Rates for astronomical events are typically computed using only some of the available information. When some part of the observed parameter space (e.g., sensitivity) is non-ideal, astronomers cut that space out prior to estimating a rate (e.g., "completeness limit"). This simplifies the analysis, but is dangerous when events are observed below the conservative threshold. Many compelling and active fields of astronomical research are subject to strong observational bias and low numbers of events, such as exoplanets (Foreman-Mackey et al. 2014), gravitational waves (Abbott et al. 2016), and near-earth asteroids (Stuart & Binzel 2004).
The study of fast radio bursts (FRBs) is another good example. See, for example, Lorimer et al. (2007), Thornton et al. (2013), or Burke-Spolaor & Bannister (2014). To date, 17 FRBs, discovered in millisecond radio transient surveys, are listed in the FRB Catalog5 of Petroff et al. (2016). In many of the detection papers, the rate is estimated by scaling the number of observed FRB detections by the product of the instrument observing area and the time-on-sky. This rate is generally assumed to be a cumulative rate for all events above a certain flux chosen to be the minimum detectable flux at the full-width-at-half-maximum (FWHM) sensitivity for the instrument. Measured flux values do not otherwise affect the analysis. However, the actual flux values contain information about how the rate changes as a function of flux. Oppermann et al. (2016) is an exception that does use the flux values. These studies typically, but not always, make other assumptions that are potentially detrimental. The telescope sensitivity is often modeled as a flat function covering the FWHM region of the beam (Deneva et al. 2009 is an exception). They also tend to include only those surveys in which FRBs are detected.
The approach based only on counts, correct but incomplete, can be limiting. This is especially true for FRB rate calculations because few have been observed thus far and we want to use all available information in the rate computations. For example, we know that radio telescopes have a sensitivity pattern (the "beam") that drops with distance from the center, so there is more observing area at lower sensitivity and recorded fluxes are likely to be lower than the true flux that arrives at Earth. Furthermore, since most of the universe is far away, we expect to see a much higher number of faint FRBs. The first effect reduces the flux that we record, while the second reduces the actual flux that reaches Earth. Both effects lead us to expect more events at one flux than we would at a higher flux.
The paper presents the use of the nonhomogeneous Poisson process (NHPP) for computing the rate of FRBs. The approach, frequently used in statistical modeling, will address the issues raised above and could be extended to address other issues that ultimately cause the probability of detection to vary. The method is applicable to event searches of any kind. The paper has two goals. One, it is intended to be a tutorial in the use of the nonhomogeneous Poisson process, which we expect to be widely applicable in astronomy. Second, we aim to produce an estimate for the rate of FRBs that combines data across several surveys. Section 2 describes the general NHPP model and the specifics of our approach for FRB surveys. Section 3 describes the results of the fit to the FRB survey data. Finally, Section 4 presents some discussion on the model and what can be added in a more sophisticated analysis.
2. The Nonhomogeneous Poisson Process Model
A homogeneous Poisson process is a counting process that says that (1) the number of events that occur in an interval (e.g., of time or space or some other domain) follows a Poisson distribution whose mean is proportional to the size of that interval (e.g., length in 1D, area in 2D, etc.) and that (2) the number of events in two disjoint intervals are independent of each other. Thus, for a region of the sky, a telescope's field of view, for example, with area A, we might think that the number of FRBs occurring in this region over an interval of time T is described by the Poisson distribution:
where the parameter
The nonhomogeneous Poisson model assumes that the instantaneous rate may vary over the space of observations. For example, in practice, a radio telescope has varying sensitivity over its field of view. Assuming a radially symmetric sensitivity, the constant rate in the above example will need to be replaced by , which varies with the radius from the center of the beam. The term would be replaced by an integral over the desired region. We will flesh out the detail below when we consider a rate that varies with both the radius and the flux measurement itself. For more details on Poisson processes, homogeneous and otherwise, see Kingman (1993) or Ross et al. (1996).
We assume that observations are made at a mix of single-dish radio telescopes (e.g., Parkes) and radio interferometers (e.g., VLA). We begin with a distinction between incident and observed flux. The incident flux is the flux that arrives at the telescope and that would be measured if the event were detected at the center of the beam. Let be the cumulative rate of events with incident flux, that is, the flux that is independent of attenuation from the observing device, greater than sI. Assuming a uniform spatial distribution of possible events, we know that this should follow a power law with an exponent of −3/2. We will consider a general power-law form:
where is the so-called Euclidean scaling that is valid for any population with a fixed luminosity distribution, as long as the luminosity does not evolve with redshift and the population has a uniform spatial distribution (see Oppermann et al. 2016). Let be the instantaneous rate for events with incident flux sI, which is given by the derivative of with respect to sI:
Our actual measurements are also affected by the telescope or array that we use. For simplicity, we will assume that the beam attenuation is radially symmetric and given by a function p(r). The flux we measure, sO, at some radius r from the center of the beam is the product of the incident flux and the attenuation function . We can make substitutions to determine the instantaneous rate for observed flux of value sO at radius r:
The term replaces the dsI term as part of the change of variables and is the infinitesimal observing area of the beam at radius r. The dt acknowledges that the process has a time dimension, although the rate is constant in time. (FRB 121102 is perhaps an exception as it has been observed to repeat.) For simplicity, we will assume a Gaussian beam shape for all telescopes, , where R is the half-max radius for the beam. This gives a rate function of
Beams with sidelobes are considered in Section 4. For the beams of a single-dish telescope, the radius is not observed. For these devices, we need to integrate this function over the radius to produce the relevant rate function:
The term can be interpreted as the effective area of the receiver.
We will take a maximum likelihood approach to estimating the model. In brief, we will write down the probability for the model, plug in the observed data, and then maximize this equation over the unknown parameters. The probability of observing n events with fluxes over T hours at a single-dish telescope with minimum detectible flux
where is the cumulative rate for observed fluxes greater than
Analysis is done on the log scale and the important quantity is the log-likelihood. Consider the log-likelihood component for survey i with ni flux measurements :
for single-dish observations and
for interferometer observations with radius measurements . The log-likelihood over all surveys is simply the sum of these:
This function is maximized over the unknown parameters to obtain estimates for a and b. This can be done with a gradient-based numerical optimization scheme. The covariance of these parameters is estimated using the second derivative matrix of this function: . See Theorem 5.1 of Lehmann & Casella (1998) for details of the asymptotic normality of these estimates and the rest of the book for background on the theory of maximum likelihood. Details of these computations are provided in Appendix
3. Results for FRB Surveys
Our data are given in Table 1. The data come from 12 surveys using four telescopes and recording a total of 15 FRBs. This list does not include all surveys or detections, but only those for which we could determine the relevant observing characteristics. The list does include a variety of surveys, including some with zero detections. It also includes nearly 90% of the published FRB detections. The table lists the number of FRBs, the observed fluxes, the minimum detectable flux, the time-on-sky, and the full width at half maximum for each instrument. The time-on-sky includes the number of beams for the single-dish instruments (for example, if the Parkes telescope observed for x hours, the time listed here would be to account for the 13 beams). For simplicity, the beams are assumed to not overlap. We have focused the work on flux thresholds instead of fluence. Where necessary, we follow Law et al. (2015) and Burke-Spolaor et al. (2016) by assuming a pulse width of 3 ms based on the average value for the FRBs detected to date.
Table 1. Survey Data for Several Published FRB Searches with 15 Total Detections
Survey Ref. | n | Flux | Sensitivity | Time | FWHM |
---|---|---|---|---|---|
si (Jy) | (Jy) | Ti (beams*hr) | (arcmin) | ||
Lorimer et al. (2007) | 1 | 30 | 0.295 | 6376.4 | 14 |
Deneva et al. (2009) | 0 | NA | 0.1825 | 3213.7 | 4 |
Keane et al. (2010) | 0 | NA | 0.2235 | 20247.5 | 14 |
Siemion et al. (2011) | 0 | NA | 59 | 4074 | 150 |
Burgay et al. (2013) | 0 | NA | 0.2645 | 6923.8 | 14 |
Petroff et al. (2014) | 0 | NA | 0.434 | 12038 | 14 |
Spitler et al. (2014) | 1 | 0.4 | 0.105 | 80523.1 | 4 |
Burke-Spolaor & Bannister (2014) | 1 | 0.3 | 0.3075 | 11924.9 | 14 |
Ravi et al. (2015) | 1 | 2.1 | 0.2775 | 889.2 | 14 |
Petroff et al. (2015) | 1 | 0.47 | 0.2775 | 1111.5 | 14 |
Law et al. (2015) | 0 | NA | 0.12 | 166 | 30 |
Champion et al. (2016) | 10 | 1.3, 0.4, 0.5, 0.5, | 0.28 | 36224.5 | 14 |
0.87, 0.42, 0.69 | |||||
1.67, 0.18, 2.2 |
Note. Except for the information for Law et al. (2015), the values for n, sensitivity, and time are derived from Burke-Spolaor et al. (2016), which assumes a pulse width of 3 ms. Flux values are taken directly from each reference. The flux for Ravi et al. (2015) was read from Figure 3 of that reference. The first four fluxes for Champion et al. (2016) are taken from Thornton et al. (2013). The last five were read from Figure 1 of the reference. Note that the FRB catalog includes two other detections, but the survey characteristics were difficult to obtain. The time-on-sky includes the number of beams for the single-dish instruments (for example, if the Parkes telescope observed for x hours, the time listed here would be 13x to account for the the 13 beams).
Download table as: ASCIITypeset image
The observed data are plugged into the log-likelihood described above. Note that the only interferometer survey, Law et al. (2015), has no detections, so the instantaneous rate including a radius is not used here (the summation in Equation (10) is not used). The likelihood is maximized at the values events per sky per day above a flux of 1 Jy and power-law index . The 95% confidence intervals are (271.86, 923.72) for a and (0.57, 1.25) for b. The latter is notable, as it excludes the Euclidean scaling value of 1.5. However, our confidence interval for the power-law index has substantial overlap with the 95% confidence interval reported by Oppermann et al. (2016), (0.8, 1.7), and the interval reported by Caleb et al. (2016), (0.9 ± 0.3). Li et al. (2017) parameterize the model differently and their estimates of the power-law index are generally even lower, with some of their intervals (see Figure 4 of that paper) overlapping ours and others excluding even a value of 1.
These estimates give distinctly lower rates than other commonly cited results. Champion et al. (2016) give a "fluence-complete rate" of 2500 FRBs per sky per day above about 2 Jy ms, or above 0.67 Jy assuming a 3 ms pulse. Their 95% confidence interval is 900–5700 FRBs per sky per day. The corresponding rate from our model is only about 849.29 FRBs per sky per day with a 95% confidence interval between about 508.86 and 1417.47 FRBs per sky per day. These intervals overlap, indicating some agreement. Rane et al. (2016) uses information from a collection of Parkes surveys and reports a rate of 4400 FRBs per sky per day above 4.0 Jy ms, or 1.33 Jy assuming a 3 ms pulse. They report a 99% confidence interval of 1300–9600 FRBs per sky per day. Our corresponding estimate is about 451.51 FRBs per sky per day with a 99% confidence interval between about 225.21 and 905.20 FRBs per sky per day. These intervals do not overlap suggesting a significant difference. These differences arise, at least in part, because our approach has included more surveys from a variety of telescopes, with more zero-detection results. Further, the methods are quite different with our approach attempting to leverage the information contained in the actual flux measurements.
Figure 1 shows the fit in terms of the cumulative rate as a function of incident flux, along with individual point estimates and uncertainties from each survey. The solid black line shows the maximum likelihood estimate and the dashed black lines show the 95% confidence interval for the fit (see Appendix
The individual survey results are plotted at their FWHM sensitivity values, as is commonly done. Their rate values are computed as described in Section 1, with the observing area assumed, as is commonly done, to be the area of the circle with diameter given by the full width at half maximum. This approach can be somewhat problematic however. Equation (6) indicates that this area needs to be scaled by . If b takes the assumed value of 3/2, the scaling term is about 1.04 and the simple area calculation slightly overestimates the true effective area and estimated rates are not much affected. If we use the maximum likelihood estimate of , the scaling term is 0.63 and the simple area calculation underestimates the effective area by about 60% and therefore overestimates the effective rate by about 60%.
Figure 1 highlights the advantage of using the flux values. A crude estimate of the power-law model could be determined using the traditionally calculated individual rates by simply drawing a line through the various points. However, this approach is difficult, at least in part, because most of the surveys have similar flux thresholds. This makes it difficult to constrain the parameters, particularly the power-law index value. This would create the potential awkward desire to obtain detections using less sensitive surveys (those with higher thresholds) in order to constrain the index. The NHPP approach uses the flux values themselves to very directly estimate the power-law model.
To check the quality of the model fit, we compare the fitted survival function with the empirical survival function in Figure 2. In statistics, the survival function refers to one minus the cumulative distribution function. As the cumulative distribution function gives the probability of observing random variables less than or equal to a given value, the survival function gives the probability of observing random variables greater than a given value. This makes particular sense in this context since we are often discussing fluxes that are greater than or equal to a particular value. The unusual name arises from its use in reliability and medical studies where it gives the probability of a part or patient surviving at least a certain period of time. The x-axis is flux and the y-axis is the probability of an FRB with greater flux. In this case, the survival function gives the probability of observing an FRB with flux greater than s. As shown below, the function is dependent on b, but not a. No detections currently come from interferometers, but if some did, we recommend treating the position as unknown and using the observed flux with the single-dish rate function for this model check. Ultimately, the comparison presented here is similar to the quantile-quantile plot commonly used in statistics to visually check a distributional assumption.
Download figure:
Standard image High-resolution imageTo derive this function, consider the Poisson process over the combination of all of the surveys. The rate function for this process is just the sum of the individual rate functions. Given that an FRB is detected in a survey, its flux value has a probability density proportional to the sum of the rate functions. The normalizing constant for this density is just the sum of the cumulative rate functions evaluated at the minimum sensitivity. To find the survival probability at a flux s, we just integrate each rate function from s to . Thus, the survival function is given by
where i indexes the surveys, is the cumulative rate function for observed flux for survey i, and is the minimum observable flux for survey i. The term arises because survey i can only see FRBs with flux greater than s when . As a function of s, each term in the numerator is flat until s exceeds . As a result, the function's behavior can change sharply at these points. This behavior can be seen as kinks in the curves in Figure 2. The short dashes above the x-axis indicate the minimum detectable flux, for each survey. The sharp behavior is most easily noticed through the last of the dashed lines below one and the dashed line at S = 59.
Figure 2 shows the survival function for the estimated model. The solid black line shows the function computed using the maximum likelihood estimate (, ) and the dashed black lines show the 95% confidence interval on this function computed using a parametric bootstrap (see Appendix
The data in Table 1 and software used to make the FRB rate estimate are available at https://github.com/lanl/NHPP-for-FRBs.
4. Discussion
The NHPP model provides an approach that can use all of the available information from astronomical events by giving probabilities for both the number of events detected in a survey and the measured flux values of those events. This is especially important when events, like FRB detections, are infrequently observed across a number of surveys; the model provides a natural solution to combining all of the available data across multiple single-dish and interferometer surveys. This approach avoids the need to make conservative choices about sensitivity (e.g., computing the sensitivity and field of view for the FWHM) that throw away information. Using the NHPP model, we estimate a rate of 586.88 sky−1 day−1 above a flux of 1 Jy. This rate is significantly lower than described in other works for two reasons. First, other rate estimates have not considered how non-Euclidean brightness distributions can change the effective volume probed. In the best-fit model discussed here, this increases the effective volume by 60%, which reduces the rate by a similar factor. Second, this rate estimate includes surveys from all points on the sky and does not attempt to model Galactic latitude dependence (discussed below).
One important use for this model is survey planning. We can estimate the expected number of hours until a detection using Equation (6). In the case of the VLA, this equation still applies assuming a uniform distribution of events over the beam. We integrate this equation over sO to get the cumulative rate
and plug in the minimum sensitivity value to get a Poisson rate for a particular telescope. We can incorporate all sources of uncertainty by sampling a and b using the parametric bootstrap to get a rate and then drawing an exponential random variable using the resulting rate. Assuming the sensitivity and FWHM values used for Spitler et al. (2014), Petroff et al. (2015), and Law et al. (2015), we can get estimates for the time-to-first detection for Arecibo, Parkes, and the VLA. The mean time-to-first detection for these instruments is about 5989, 650, and 861 hours respectively. In the case of Parkes, this value, multiplied by 13 beams and giving 8450 beam hours, is reasonably close to the total number of beam hours divided by the total number of detections, 6838 beam hours per event, across all surveys in our data set. The 95% upper bounds on the time-to-first detection are 18375, 2017, and 2693 hr for Arecibo, Parkes, and the VLA respectively.
The method presented here can be extended to include anything that systematically varies the sensitivity of a detection including dispersion measure scalloping, unknown pulse widths, or galactic position. The method can also be extended to include differences in sensitivities and effective areas from different pointings within the same survey. The approach can also be extended to a slightly generalized probability model. The negative binomial model is also a counting model, which can be derived from the Poisson model by letting the rate follow a gamma distribution. This introduces extra variation that can be used to account for effects that are not included in the rate model.
An important extension could be the inclusion of more precise beam shapes. To improve incrementally on commonly assumed flat beams, we have assumed radially symmetric Gaussian beams, that do not overlap in the case of the single-dish telescopes. Although uncertainty quantification and model checking are more difficult, the approach can be extended to include numerical integration of precise descriptions of the beam, including two-dimensional descriptions of the multi-beam single-dish telescopes that properly include beam overlap. Figure 3 shows the results from a limited exploration of the beam shape. Here, the sensitivity is based on the square of a sinc function: , where
Download figure:
Standard image High-resolution imageA second important extension is the inclusion of dependence on galactic latitude. We can crudely consider latitude dependence by estimating the model parameters using just the surveys that collected data exclusively or mostly at high galactic latitudes. We include Lorimer et al. (2007), Ravi et al. (2015), Petroff et al. (2015), Law et al. (2015), and Champion et al. (2016) in this group. These surveys produce an estimate of FRBs per sky per day above 1 Jy (95% CI: 509.83, 2152.37) and a power-law index of (95% CI: 0.38, 1.22). The 1 Jy rate rate is higher and the index is lower. As before, we assume a 3 ms pulse to compare with other rates given in fluence. Our predicted rate above 0.67 Jy is about 1814.18 FRBs per sky per day with a 95% confidence interval between about 1029.68 and 3196.40, which falls entirely within the interval given by Champion et al. (2016) at this flux. Our predicted rate above 1.33 Jy is about 1046.52 FRBs per sky per day with a 99% confidence interval between about 500.80 and 2186.89, which does now overlap with the interval given by Rane et al. (2016), but their rate is computed using data over a range of latitudes and their rate is given as an all-sky rate. While this suggests some dependence, we are not comfortable stating this with certainty without building latitude into the model more directly.
This model also finds evidence for a non-Euclidean flux distribution of FRBs with a power-law index of . This is consistent with estimates made with other techniques and data (Vedantham et al. 2016). A non-Euclidean distribution produces unintuitive properties for observed FRB properties, such as that it is easier to detect more distant events and that small, wide-field radio telescopes can detect more events than large, sensitive telescopes. Finally, we note that a flat flux distribution is inherently surprising, as demonstrated by the extreme brightness of the first FRB reported by Lorimer et al.(2007).
This work was supported by the University of California Lab Fees Program under award number #LF-12-237863.
Appendix A: Gradient and Hessian of the Log-likelihood
Here we derive the gradient and Hessian for the log-likelihood. The latter is important for computing an estimate of the variance of the unknown parameters. For convenience, we will reparameterize the model in terms of and .
Recall that the log-likelihood is the sum of the log-likelihoods from each survey:
thus, we can consider the derivatives of each summand individually. Consider the gradient of one survey-level log-likelihood, a vector of the partial derivatives.
The result for a single-dish term is similar. The key thing to note is that this function only depends on the derivatives of the rate function. The same is true of the second derivative matrix, the Hessian, which only depends on the first and second derivatives of the rate function,
Note that is column vector of length two and that is a 2 × 2 matrix.
Here are the first and second partial derivative terms for
Here are the first and second partial derivative terms for
The gradient is helpful for maximizing the likelihood. The Hessian is necessary for the covariance of the parameters, as described above.
Appendix B: Confidence Intervals Based on the Parametric Bootstrap
Figures 1 and 2 show confidence intervals for the cumulative rate of incident flux and the survival plot. Both sets of bounds are computed using the parametric bootstrap. This uses the covariance of the estimated parameters give in Section 2. The statistical theory says that our unknown parameters are approximately normally distributed with mean given by the maximum likelihood estimates and covariance given by . The parametric bootstrap propagates this uncertainty by sampling the parameters from this distribution and computing a quantity of interest (e.g., the cumulative rate over some grid of flux values) for each sample. This provides an empirical distribution for the quantity of interest from which we can compute quantiles to use as bounds.