SPIRE Spectral Feature Finder Catalogue

R. Hopwood, I. Valtchanov, N. Marchili, L.D. Spencer, J. Scott, C. Benson, N. Hładczuk, E.T. Polehamption, N. Lu, G. Makiwa, D.A. Naylor, G. Noble, M.J. Griffin

HERSCHEL-HSC-TN-2321, Sep 2018

Abstract

The SPIRE Spectral Feature Finder Catalogue is the result of an automated run of the Spectral Feature Finder (FF). The FF is designed to extract significant spectral features from SPIRE FTS data products. These spectral features are only identified for High Resolution (HR) sparse or mapping observations, while for Low Resolution (LR) sparse or mapping observations the FF only provides the best fit continuum parameters. The FF engine iteratively searches for peaks over a set of signal-to-noise ratio (SNR) thresholds, either in the HR spectra of the two central detectors (sparse mode) or in each pixel (SPIRE Long Wavelength - SLW, and SPIRE Short Wavelength - SSW) of the two hyper-spectral cubes (mapping). At the end of each iteration, independently for each spectral band, the FF simultaneously fits the continuum and the features found. The residual of the fit is used for the next iteration. The final FF catalogue contains emission and absorption feature frequencies, and their respective SNR, for each observation; SNR is negative for absorption features. Line fluxes are not included as extracting reliable line flux from the FTS data is a complex process that requires careful evaluation and analysis of the associated spectra. Testing of the FF routine indicates that the FTS Spectral Feature Finder Catalogue is 100% complete for features above SNR=10, and 50-70% complete down to SNR=5. The full SPIRE Automated Feature Extraction Catalogue (SAFECAT) contains 167,525 features at |SNR| >= 5 from 641 sparse and 179 mapping observations.

Table of Contents

Important note:: Most concepts presented in this document require a good knowledge of the SPIRE Fourier-Transform Spectrometer, its characteristics and observing modes. The reader is advised to read the SPIRE Handbook available from the Herschel Explanatory legacy Library.

Observations

The observations with FF products are listed in four coma-separated-value (.csv) files. These are available in the release and also in the FF legacy area topmost folder:

Section Individual FF catalogues and postcards provides links to all the individual obsid results.

Feature Finder Algorithm

The Herschel SPIRE Spectral Feature Finder (FF) finding and fitting process is summarised by the flowchart shown in Fig. 1 and presented in full in Hopwood et al. (in preparation). This document briefly describes the main steps of the FF algorithm.

flowchart
Figure 1: flowchart for the FF processing (click the image to enlarge).

The SPIRE FTS has two overlapping spectral bands: Spectrometer Short Wavelength, SSW (191-310 μm or 1568-944 GHz) and Spectrometer Long Wavelength, SLW (294-671 μm or 1018-447 GHz). They share an overlap region of 294-310 μm (1018-944 GHz). The primary input to the FF script is a single, per band spectrum, that has been extracted from one of the centre detectors (sparse mode) or a single spectrum from each of the two per band hyper-spectral cube spaxels (mapping).

Each feature found is fitted using a sinc-function (sin(x)/x) profile of fixed width, with the width set using the actual spectral resolution of the input data.

The following steps are carried out:

1. Fitting and subtracting the continuum

An input spectrum is resampled onto a coarser frequency grid (5 GHz for HR) and a "difference spectrum" is generated by subtracting the resampled spectrum from itself, after a shift of one frequency bin. No resampling is necessary for LR observations (1 GHz grid). The strong peaks correspond to "jumps" in the difference spectrum and are masked before a 3rd order polynomial is fitted (2nd order for LR). Neither the masking nor the peaks are carried forward into the main finding loop, only the polynomial model.

This is the only stage for LR observations (sparse or mapping) for which we only keep the derived continuum parameters.

2. Iterating over SNR thresholds

We use the following SNR thresholds: +[100, 50, 30, 10, 5, 3] for emission and [-100, -50, -30] for absorption features.

For each threshold, the signal-to-noise ratio (SNR) spectrum is taken using the model subtracted residual and the spectrum dataset "error" column (for the first iteration the continuum model is subtracted).

Peaks are determined by merging all SNR data points that sit above the SNR threshold, within a 10 GHz width per peak.

Each new peak represents a potential new feature, so a sinc function is added to the total model (polynomial + sinc) per new peak found.

A global fit is performed using the input spectrum and total model, so all found and potential features, and the continuum, are simultaneously fitted.

For each SNR iteration, the resulting fitted sinc models for the potential features found are put through several reliability checks (e.g., checking their fitted position and looking for multiple peaks fitted to partially resolved features).

Features that are accepted as reliable have their fitted position limited to within a 2 GHz window before the global fit is repeated.

The resulting total model is carried forward to the next iteration, with the frequency either-side of each new feature masked by a SNR threshold dependent width, where no new features are permitted in any of the following finding iterations. The existing masking is updated to account for movement of already found features, noting that these have already had the position of their respective sinc models limited.

3. Final SNR estimate and final check

The final SNR is calculated using the fitted peaks and the total-model-subtracted residual spectrum as [fitted peak amplitude]/[local standard deviation].

Features with |SNR| >= 5 are carried forward to the final check - a search to discriminate unique features from fitting to the sinc wings of neighbouring significant features. The exception to this is an a posteriori check to preserve any [CI](2-1) detection, that falls within the wings of the strong 12CO(7-6) line (see step6).

4. FF feature flags

To assess the distinction between false and true spectral features and to identify the likelihood of prospective spectral features being correctly identified, a goodness of fit metric has been developed that combines the goodness of fit for each individual feature found (GoF) with the goodness of fit of the total model (Total Fit Evaluation - ToFE). GoF and ToFE are used in conjunction to assign a flag to each identified spectral feature (FF flag). The relative weighting of the GoF and ToFE in determining the spectral feature flags varies for different frequency regions within the SLW and SSW bands (e.g., the band edges are identified as suffering higher than average noise and thus are flagged using different criteria - see below).

The Feature Finder requires a goodness of fit metric that is not sensitive to the continuum or to line flux. Such a statistical metric, r, is calculated using a cross-correlation function between fitted feature and total model. r is not sensitive to the continuum, due to the subtraction of the local mean. r is also not sensitive to line flux, because of the division by the standard deviation of a given spectral region.

The ToFE step involves a Bayesian method to compare the evidence (also known as model likelihood) for two concurrent models: with and without a particular feature included. It is therefore complimentary to the r parameter identified above. For both models the fitting engine calculates the evidence, which is a set of probability logarithms. Their difference provides the odds for the feature, the smaller the odds the better the model with the feature included. As this is a probabilistic check, no assumption about the noise is made and therefore ToFE is insensitive to systematic noise.

More details on the methods will be provided in Hopwood et al, in preparation.

Flag definitions (metadata keyword)

Flagging criteria

The flagging criteria were chosen empirically.

Both criteria must be met for a "good fit", unless there are more than 35 features found in a given spectrum, in which case only GoF is considered, as ToFE tends to return a null result.

Noisy regions

The following regions are considered as noisy:

5. Source radial velocity estimate

At the end of the Feature Finder (FF) process, for each high resolution (HR) observation, the radial velocity is estimated by searching for 12CO lines and the [NII] 205 µm atomic line, in the respective feature catalogue with identified 12CO taking priority over [NII]. In addition, a cross-correlation (XCOR) technique is applied using the feature catalogue and a template line catalogue, which includes most of the characteristic molecular and atomic lines in the far-infrared. For FF catalogues where few features have been found, XCOR includes an additional check with [NII] (for SSW) and 12CO(7-6) lines. These FF based estimates are compared to radial velocities from a collection by the HIFI team (Lisa Benamati, private communication).

More details on the methods will be provided in Hopwood et al and Scott_a et al, all in preparation.

Radial velocity metadata and flags

The following radial velocity related metadata are included in each FF catalogue:

Warning: the radial velocities should be considered with caution, especially for sources with RV_FLAG having a ? or nan, as well as sources with very few lines.

6. Neutral Carbon Check

The Neutral Carbon Check (NCC) is a focused check of the 12CO(7-6) and [CI](2-1) spectral region using the radial velocity from the previous step and the known rest frequency of these lines. If either one of these neighbouring features were missed by the main FF process, the NCC-identified missing feature is added to the final list of features found. If [CI](2-1) is detected by the nominal FF or the NCC, a similar search is performed for the [CI](1-0) line. Features detected or modified by the NCC can be identified as having the NCC_flag column in the FF products set to True. More details are provided in Scott_b et al, in preparation.

7. Bespoke Handling

HR mapping

For HR mapping observation the initial list of features for the subsequent FF iterations is provided by a python-based peak finder, applied on HR apodized spectra. This method provides better stability and avoids too many spurious features in noisy hyper-spectral cube pixels (spaxels).

Non-standard data

Special calibration observations

Two observations, 1342227785 and 1342227778, were performed with special settings at two beam-steering mirror positions. In principle the pipeline successfully process them as mapping mode observations, and they are available in the Herschel Science Archive as spectral cubes with spatial coverage slightly better than sparse-mode and slightly worse than the intermediate sampling (4 BSM positions).

For the Feature Finder we used those observations as two separate sparse mode observations. In order to avoid files with the same name we changed their obsids to 1001342227785 (BSM position 1) and 2001342227785 (BSM position 2) for 1342227785, and to 1001342227778 and 2001342227778 for 1342227778. Hence their postcards, continuum parameters and feature catalogues will be available under these modified OBSIDs.

Highly Processed Data Products

Background Subtracted (BGS) Spectra

Bespoke FF settings

If, for a particular observation, non-default FF parameters were used or bespoke treatment was applied, this is indicated with a "1" in the bespokeTreatment column of hrSparseObservations.csv, which lists all HR observations for which there is a set of FF products.

Fewer negative SNR thresholds

By default, the FF iterates over a number of SNR thresholds when looking for peaks: +[100, 50, 30, 10, 5, 3] followed by [-100, -50, -30, -10] for absorption features. For one particularly spectral rich observation (OBSID: 1342210847), if the default set of negative SNR thresholds is used, there are so many features found that there are not enough unmasked data points available for the final SNR estimates, and thus many features are discarded. This loss of features is prevented by omitting the -30 and -10 SNR threshold iterations.

Final SNR estimate

To optimise the final SNR estimate, the FF stipulates that the number of unmasked data points in the local region used to calculate the residual standard deviation must be at least 17 (5 GHz). If this condition is not met, then the local region around the fitted peak is widened.

For the observations of two spectral rich sources (OBSIDs: 1342192834 and 1342197466) this condition is never matched for the majority of the features found; using the default FF settings, the majority of features are discarded. For these two cases, no minimum is set for the number of data points required for the the final SNR estimate to go ahead.

Features added by hand

During the FF SNR threshold iterations, the SNR is taken using the spectral dataset "error" column. For a handful of observations this can lead to no significant SNR peak at the position of significant spectral peaks and the corresponding features are therefore never found by the FF, regardless of how low the SNR threshold drops.

A handful of missing significant features were added at the appropriate SNR threshold during the FF process for six observations: OBSIDs 1342197466, 1342248242, 1342216879, 1342197466, 1342193670 and 1342210847.

The FF products

Feature catalogue per observation

The Feature Finder produces feature catalogues for all SPIRE Spectrometer HR sparse and mapping mode observations, unless these are of known featureless sources (e.g. the asteroids Vesta, Ceres) or the target is not included in the FF list of observations (e.g. dark sky and sources with well developed models, including Uranus, Neptune and Mars). Failed observations and calibration observations with unusual settings are also omitted.

The Feature Finder catalogues are available as FITS files with the catalogue and its metadata in the first Header Data Unit (HDU).

The sparse mode catalogue table contains the following columns:

Note #1: the frequency axis of all SPIRE spectra, as well as those from the other Herschel spectrometers, are provided in the kinematic Local Standard of Rest (LSRk). Consequently the measured frequencies in the FF are also in the same LSRk reference frame.

Note #2: no consolidation of features in the overlap region (944-1018 GHz) has been performed. Therefore the same feature may be present for both central detectors for the same observation.

The mapping mode catalogue table contains the following additional columns:

Note: the SSW and SLW hyper-spectral cubes have different world-coordinate-systems (WCS), their pixel size and centres are not matched, i.e., row and column are array dependent. This is clearly evident within the SLW and SSW maps presented in the mapping postcards (see Fig. 5).

In addition to the catalogue table, the mapping FITS file contains 5 more HDU extensions: velocity, velocityError, vFlags, nLines, and arrays. In deriving radial velocity estimates for mapping observations, features detected within the SLW map were projected into an SSW equivalent grid. The number of lines within each pixel of the synthesized map and which maps these lines originally came from (L for SLW and S for SSW) is indicated in the nLines and arrays HDU extensions, respectively. The velocity, velocityError, and vFlags extensions indicate the estimated radial velocity, velocity error, and associated velocity flag of the pixels in the synthesized map. The vFlags HDU uses the notation (N,M), where N indicates the number of 12CO candidate features used to derive the velocity estimate, and M indicates the number of matches each candidate feature has with the characteristic difference array of the 12CO ladder (see Scott_a et al, in preparation, for more details). The closer these values are to 10, the more reliable the velocity estimate is expected to be. Velocity estimates based on the [NII] feature are denoted by NII,1.

The catalogue FITS extension metadata contain the following information:

The SPIRE Automated Feature Extraction CATalogue: SAFECAT

The SAFECAT contains all the features found from the catalogues per observation for SPIRE Spectrometer HR sparse and mapping observations. SAFECAT is intended as an archive mining tool that can be searched by frequency, position, etc., to provide all SPIRE Spectrometer observations with significant features that match the search criteria.

SAFECAT_v2 contains both sparse and mapping observations, and contains the following columns:

The metadata provides the feature flag definitions; the minimum SNR cut applied (5); the frequency range avoided at the ends of the bands (10 GHz); and lists two special calibration observations and the unique IDs assigned for the purpose of SAFECAT only, as they consist of two sparse pointings in one observations.

Continuum fit parameters

The SPIRE FTS instrumental line shape (ILS) is essentially a sinc function (sin(x)/x). The sinc-like wings of each feature introduce ringing, which, although decreasing in amplitude, does extend over the whole frequency range. Therefore, to gain a good fit to the continuum, this sinc ILS should be simultaneously fitted with the main spectral features.

During the Feature Finder finding and fitting process, a 3rd order polynomial (2nd order for LR) is fitted to the continuum per band spectrum, in conjunction with sinc functions for each feature found. The resulting continuum fit may be hard to precisely recreate, unless a similar procedure is carried out. Therefore the best fit polynomial parameters for the continuum are provided with the other Feature Finder products.

The parameters are of the form:
p0 + p1*v + p2*v^2 + p3*v^3 (or p0 + p1*v + p2*v^2 for LR),
where v is the frequency.

The frequency ranges for the SPIRE FTS, used in the continuum calculations, are [446.99, 1017.80] GHz for SLW and [944.05, 1567.91] GHz for SSW.

For a given observation, the associated fittedContinuumParameters FITS file contains a table in the first HDU with the parameters for each detector that has been through the Feature Finder algorithm: the centre detectors for sparse observations, and each pixel for mapping observations.

For sparse mode the fitted polynomial is also provided as a Herschel spectrum dataset, with the best fit parameters reported in the metadata. The dataset is stored in a FITS file with each detector in a separate HDU, identified by the detector name, e.g. hdu["SSWD4"] will contain the continuum spectrum for the central detector of the SSW bolometer array. The frequency grid in these spectra is the same as the input ones.

The Feature Finder Postcards

Sparse mode

The Feature Finder results are visually summarised per observation in a “postcard”. Each postcard compares the input spectra for the centre detectors, the best fit to the continuum, and includes vertical ticks representing the features found (with these arbitrarily scaled by the signal-to-noise). Fig. 2 provides a few examples of postcards.

postcard
Figure 2: Example postcards for 4 different observations, 3 are point-sources (the vertical axis unit is in Jy), while the bottom right one is extended (the unit is in W/m2/Hz/sr). The original spectra are shown in blue (SSW) and red (SLW), the best-fit continua are shown in green, the features are indicated with short vertical ticks, above the feature (emission) or below (absorption). The semi-transparent vertical thick bars indicate the regions where the calibration is uncertain - the noisy regions where the features are flagged (see FF feature flags).

By default, the Feature Finder operates on point-source calibrated data, which has flux density units of Jy. For sources with some spatial extent there is also a postcard for the extended-source calibrated data, which has surface brightness units of W/m2/Hz/sr. Both postcards for such sources are included in the product pages, accessible via the tables at the end of the Feature Finder Release Note.

If an observation of interest is of a partially or fully extended source, it is advisable to visually check both sets of FF results. These are available in one combined postcard in folder HRpointProducts/HRdoublePanelPostcards, an example is shown in Fig. 3.

postcard
Figure 3: Example postcards for one OBSID and the results of the FF run on both point-source (top) and extended-source (bottom) calibrated spectra. In this particular example the source is fully extended, as can be inferred by the good match of the two bands (blue: SSW and red: SLW) in the overlap area for extended-source calibrated spectra. This OBSID is marked extended in hrObservations.csv file.

For LR observations no features are provided, as LR was primarily aimed at providing a measurement of the continuum and there are few features found in these observations. However, two postcards are provided for LR, as shown in Fig. 4. Many targets are semi-extended in nature and simply by comparing the postcards it may be possible to gauge which calibration scheme is more appropriate (or if there is a problem that may require further processing to correct for partial extent). Visual inspection should evaluate the continuity of the SLW/SSW spectra within the spectral overlap region.

postcard
Figure 4: Example postcard for one LR observation, both point-like (top) and extended versions (bottom) are provided. The good match of SSW and SLW in the overlap area in the top panel is an indication that the source is point-like.

Mapping mode

The Feature Finder mapping results are also visually summarised per observation in a “postcard”. Each postcard is comprised of a 2x3 array of figures illustrating various aspects of the observation.

postcard
Figure 5: Example postcard for one mapping observation.

The left column illustrates the integrated flux associated with each band, with SLW on top and SSW on the bottom. Note the difference in pixel sizes for the SLW and SSW maps. Each of the integrated flux maps in the left column have two pixels identified for each band with coloured box outlines. For the SSW array, these correspond to the dimmest and brightest pixels within the map. For the SLW array these pixels correspond to the closest SLW pixels to the identified brightest and dimmest pixels for the SSW array.

The central column presents the spectra corresponding to the flagged pixels in the flux maps (brightest and dimmest integrated flux pixels for SSW, closet match for the SLW array) with SLW on top and SSW on the bottom. Also shown in the central column is the spectrum corresponding to the pixel with the most spectral features identified by the FF, again with SLW on top and SSW on the bottom.

The upper right figure is a map of the number of lines identified by the FF in both the SLW and SSW arrays combined (equivalent to the nLines HDU extension in the mapping products). This figure also has the pixel regions associated with the most lines in the SLW and SSW arrays identified with coloured box outlines. As the SLW pixels are larger than the SSW pixels, the SLW lines may be counted in multiple SSW pixels within this FF histogram map.

The lower right figure presents the FF radial velocities provided by the radial velocity routine (equivalent to the velocity HDU extension in the mapping products). The results are shown on SSW pixels, with the pixel colour indicating the velocity (colour-bar on the side), and a code within each SSW pixel indicating the quality of the radial velocity. Within the radial velocity map, a marker of A means all of the expected CO lines contributed to the radial velocity estimate, N means that the [NII] nitrogen spectral feature was used, and a number indicates the number of CO lines used in the radial velocity estimate (less than all of the expected lines were identified).

The header of the mapping postcard indicates the target name, as provided by the observer, and the observation number (OBSID). Pixels outlined in green on the velocity map highlight pixels where the feature finder did find spectral features but the radial velocity routine did not provide an accurate velocity estimate.

Feature Finder Products access

The FF products are available as Highly Processed Data Products (HPDP) in the Herschel legacy area and also in the Herschel Science Archive (HSA).

The HSA is providing access to the individual FF products linked to a particular observation ID, while the HPDP gives access to all products within a web browser.

All products are combined in a single tar.gz file: FF_v2.tar.gz.

Folder structure and content

At the top level, there are a number of CSV files, which are described in section Observations.

The folders and their content are described below:

Both folders have sub-directories containing the FF postcards (postcards), the best fit continuum parameters (continuumParameters), the continuum in a spectral dataset (continuumSpectrum), and the FF feature catalogues (featureCatalogues). The files in these subfolders are per OBSID. The HRpointProducts folder also contains a sub-folder HRdoublePanelPostcards with the combined postcards for the FF results for both point and extended calibrated spectra.

Individual FF catalogues and postcards

The FF results, with some additional information, are also tabulated in a number of HTML pages, linked in the tables below. Each table provides links to the FF product tables in the left-hand column. The observations included on each page is given in the middle column, with the corresponding operational days in the right-hand column.

Go to page Observations covered Operational days
HR sparse-mode FF products
HR Sparse Page 1 1342187893 — 1342212341 209 — 602
HR Sparse Page 2 1342212342 — 1342231985 602 — 908
HR Sparse Page 3 1342231986 — 1342247763 908 — 1151
HR Sparse Page 4 1342247764 — 1342258698 1151 — 1335
HR Sparse Page 5 1342258699 — 1342270195 1335 — 1434
HR mapping-mode FF products
HR Mapping Page 01 1342192173 — 1342245117 302 — 1079
HR Mapping Page 02 1342245083 — 1342270045 1080 — 1433
LR sparse-mode FF products
LR Sparse Page 1 1342188674 — 1342248229 227 — 1160
LR Sparse Page 2 1342248245 — 1342257934 1160 — 1326
LR Sparse Page 3 1342259570 — 1342270194 1340 — 1434
LR intermediate and fully sampled mapping mode FF products
LR Mapping Page 1 1342192179 — 1342262926 302 — 1362
LR Mapping Page 2 1342262927 — 1342270038 1362 — 1433

Ivan Valtchanov, HSC, 25 Sep 2018