Generating PDS compliant data and label files from IDFS-formatted data


written by Carrie A. Gonzalez
cgonzalez@swri.org

Created: 04/02/2004

Table Of Contents

  1. Overview
  2. Getting Started
  3. Generating PDS compliant Data and Label Files
  4. Saving the Definitions
  5. Changing the Options
    1. Time Definition
    2. Data Source
    3. Data Items
      1. Data Attributes
    4. PDS File Attributes
  6. Format of the PDS CSV Data File

Overview

IDFStoPDS is a program which is used to generate PDS compliant data and label files from data that has been stored in the Instrument Description File System (IDFS) format. The IDFS format is a data storage format that is designed to be general enough to handle the majority of scientific data sets. These data sets include raw telemetry, processed data, simulation data and theoretical data. IDFS data sources are defined as either scalar instruments or vector instruments. A scalar instrument returns singular data quantities that are dependent only upon time and position. A vector instrument returns one-dimensional data quantities that have a functional dependence on a single variable, which in IDFS terminology is called the scanning variable.

With the IDFStoPDS program, the user may access any of the physical units defined for the IDFS data source being processed. The IDFS format performs a real-time conversion of telemetry data into physical units as the data is accessed. This allows for the refinement of calibration factors and processing algorithms without having to reprocess the original data set.

The IDFStoPDS program can be invoked in one of two modes: (1) interactive mode or (2) batch mode. In interactive mode, the program utilizes a GUI-based definition session to define the data items to be processed. The definition phase of the IDFStoPDS program has many options so that the user can tailor the processing to meet their individual needs. Once this definition session has been completed, the selected data items can then be retrieved from the IDFS data files and returned in PDS compliant form for the selected time range. To invoke the program in interactive mode, type IDFStoPDS at the command line.

In batch mode, the interactive GUI-based definition session is bypassed and the data requested is immediately processed based upon information contained in the named layout file. To invoke the program in batch mode, type IDFStoPDS -FName filename at the command line. The argument filename is the name of the layout file that is to be utilized during the current session. Note that the name of the layout file does not include the .I2P extension, which is appended to the filename provided by the user during the GUI-based definition session. If the named layout file does not exist, an error is displayed and processing terminates. In order to modify the time range that is processed for a specific layout file, the user may utilize the begin time (BTime) and/or end time (ETime) command line options. For example, the following call

IDFStoPDS -FName TEST -BTime 2003/191:21:47:00.000 -ETime 2003/191:21:50:00.000

invokes the IDFStoPDS program in order to process the selected data items defined in the layout file TEST for the specified time period.

Getting Started

In order to generate the PDS compliant data and label files, certain information must first be ascertained from the user. One of these pieces of information is the time range for which the IDFS data is to be processed. The time range can be set by selecting the "Time" button. This action will invoke the "Set Time" GUI. Next, the IDFS data source which contains the data items to be processed must be identified. This can be achieved by selecting the "Data Source" button. This action will invoke the "IDFS Source" GUI. Once a valid IDFS data source has been selected, the "Data Items" button becomes visible and the data items to be processed must be defined. The data items to be processed are referred to as IDFS sensors. An IDFS sensor is defined as a primary data source returned by the virtual instrument in question. When the Data Items button is selected, the "Data Items" GUI is invoked. On this GUI resides a list which indicates the IDFS data item(s) to be processed from the selected IDFS data source. Initially, this list is empty. To add a data item to the list, the pull-down Insertion menu is utilized. Once the position for the data item to be added has been determined, the actual data item must be selected using the Data Attributes GUI. This GUI is automatically invoked when a new item is added to the list. The new item will be defaulted to the first IDFS sensor defined for the IDFS data source based upon information contained in the PIDF file. Once a data item has been added to the list, the "Attributes" button becomes visible and accessible. If changes to any of the attributes for a specific data item need to be made, the data item should be selected from the list and the "Attributes" button should be activated in order to invoke the "Data Attributes" GUI.

At this point, the PDS compliant data and label files can be generated since all other information is defaulted based upon information found in the PIDF and VIDF file for the selected IDFS data source.

Generating PDS compliant Data and Label Files

Once the IDFS data source, data items, and time range have been defined, the selected data items can then be processed. To generate the PDS compliant data and label files, select the pull-down Action menu from the main menubar and select the Create PDS File option. Upon activation, the local database is checked to see if the requested data files are online. If data for the requested time range is not online, the missing data is promoted to the local disk from the archive. Once the data has been placed online, the datafiles are opened, the data are extracted, converted to the appropriate physical units and written to a data file in PDS-compliant form. Data will continue to be processed until the user-requested end time has been reached or until an error condition is raised. When an error condition is encountered, a message is displayed, the partially created data file is purged and processing terminates. Upon completion of the task, successful or unsuccessful, any promoted IDFS data files are removed from the local disk. Upon successful completion of the task, a PDS label file is generated to describe the PDS data file that was created. At least one data item from the selected IDFS data source must be specified; otherwise, an error message will be displayed when the "Create PDS File" action is selected.

The IDFStoPDS program utilizes the PDS Spreadsheet object (see PDS Standards) to describe the IDFS data items that are placed within the PDS data file. When using the PDS spreadsheet object, the number of items that are returned for the data source must stay constant throughout the data file since the PDS label file defines the number of items that are returned. Depending upon the time range being processed, this may not be the case for the IDFS data source selected. Within the IDFS paradigm, vector instruments may return a subset of the maximum number of vector elements defined for the virtual instrument. When the IDFStoPDS application is started, the number of items being returned for a vector instrument is determined. For every IDFS data record processed thereafter, a check is made to determine if the number of items has changed. If the number of items has changed, the existing opened PDS data and label files are closed and a new set of PDS data and label files are generated, starting with the data record which returned a different number of items for the data in question. As an example, think about the case where a vector instrument can return either 32, 64, or 128 energy steps, depending upon the mode in which it is operating. If the time period being processed covers an interval when the instrument is stable at a specific operational mode, then one pair of PDS data and label files will be generated. However, if the time period being processed covers an interval when the instrument changes states, for example, from a 32-step mode to a 64-step mode, then two pairs of PDS data and label files will be generated. If the time period being processed covers an interval where the instrument changes states, for example, from a 32-step mode to a 64-step mode back to a 32-step mode, then three pairs of PDS data and label files will be generated, with the first set reflecting a 32 item object, the second set reflecting a 64 item object and the last set reflecting a 32 item object.

According to PDS standards, filename extensions are limited to 3 characters and filenames must not exceed a 27.3 format. In order to comply with this restriction, the filenames that are generated by the IDFStoPDS application use the following format:

VVVVVVVVYYYYDDDHHMMXXXXX[S]NN

VVVVVVVV represents the virtual instrument name for the IDFS data source selected (up to 8 characters max). The next 11 characters represent the time of the first data sample written into the data file, where YYYY represents the 4-digit year, DDD represents the 3-digit day of year, HH represents the 2-digit hour, and MM represents the 2-digit minute. XXXXX represents the units label (up to 5 characters max) for the data unit selected by the user. If the virtual instrument selected is a vector instrument, the letter S will be incorporated next before the last 2 characters (NN) of the filename. The last 2 characters NN represent a 2-digit file version number. The file version number is defaulted to 01 but can be modified by selecting the "PDS File Attributes" button and modifying the File Version Number option. For the PDS data file, the extension ".CSV" is appended to the filename generated and for the PDS label file, the extension ".LBL" is appended to the filename generated. An example of a filename generated by the IDFStoPDS application is the filename NPINORM20031912147RAW01.CSV, which indicates that the PDS data file contains data in RAW units, starting at hour 21, minute 47 in day 191 in year 2003 for the NPINORM virtual instrument. This data file is flagged as the first version generated (01) by the IDFStoPDS application.

When the IDFStoPDS program is run in interactive mode, a check is made to see if the PDS data file to be generated already exists in the current working directory. If it does, the user will be asked if they wish to overwrite the data file. If the user answers yes, the data file and the associated label file are removed and an attempt is made to create new data and label files. If the user answers no, the current request is aborted. When run in batch mode, no query is made; the files are removed and an attempt is made to create new data and label files.

Since the IDFStoPDS program has the potential to generate large data files, a clean-up mechanism is utilized. Whether or not the clean-up mechanism is invoked depends upon the actual user running the IDFStoPDS program. If there exists a ".guest" file in the user's home directory, the data and label files will be scheduled for removal 30 minutes after the data file has been closed. The user will be informed of this situation. If a ".guest" file does not exist in the user's home directory, the generated data and label files will be left untouched. This scheme was designed for those sites that set up a public guest account through which outside users are given access to the named local system. The contents of the ".guest" file is not important; simply, the existence of the file is utilized.

Saving the Definition

Once all the information has been defined, the information may be saved to a layout file for future retrieval. This is achieved by selecting the pull-down File menu and selecting the Save As option. The information defined is not saved by the program unless the user explicitly does so. Note that when providing the name of the layout file, do not specify the .I2P extension. The IDFStoPDS program automatically appends the .I2P extension to the name of the layout file upon creation of the file.

Changing the Options

The remainder of this document gives an in-depth explanation of the options that appear on the various GUIs utilized by the IDFStoPDS program.


Time

In order to set the time values, enter the values in the boxes that appear next to the time component being set or use the increment / decrement arrows. The stop time must be greater than the start time. The time is initially set to the current time. By Julian convention, January 1 is day 1.

Data Source

The user must select a project, satellite, experiment, instrument and virtual instrument from which data is to be extracted. To change any of the selected options, click on the buttons on the right hand side. Note that all lineage information under the branch being changed is no longer applicable and must be re-selected. When the IDFS data source is changed, any previous data item definitions are deleted from the list and must be re-defined.

Data Items

The data items to be processed are referred to as IDFS sensors. An IDFS sensor is defined as a primary data source returned by the virtual instrument in question. To add a data item to the list, the pull-down Insertion menu is utilized. The menu options indicate the position within the list at which the current data item definition is to be inserted. These options include:

The first two options, After and Before, indicate a position that is relative to the highlighted entry on the list. The new parameter definition is either placed after or before the current position in the list, respectively. The last two options, First and Last, indicate an absolute position on the list; that is, the new parameter definition is either placed at the beginning of the list or at the end of the list, respectively. Obviously, these options make sense for a non-empty list. Therefore, the first data item definition is always placed at the beginning of the list, regardless of the option selected. The position is utilized when the data is extracted and processed; that is, the data is processed in the order in which it exists on the list. Once the list contains an entry, the "Attributes" button becomes visible. If changes to any of the attributes for a specific data item need to be made, the data item should be selected from the list and the "Attributes" button should be activated in order to invoke the "Data Attributes" GUI.

To delete a data item from the list, the pull-down Removal menu is utilized. Currently, this pull-down menu contains just one option

When this option is selected, the highlighted entry on the list is removed from the list. If no entry is highlighted, no action is taken.

If all of the sensors defined for an IDFS source are to be written to a PDS compliant data file, the user has 2 ways in which to indicate this scenario. One way is for the user to keep adding another data item to the list by selecting the Insertion menu, then selecting the specific IDFS sensor using the Sensor Group and Sensor options on the Data Attributes GUI. If the user selects this option, the user can select different scientific units and can individually select/de-select the ancillary information that will be processed for the sensor selected. A second way to select all IDFS sensors for a specific IDFS data source is to utilize the Select All Sensors checkbox option. When this option is selected, only one data item may exist on the list. When the data is processed, the "Data Items" list is temporarily expanded to hold a definition for each IDFS sensor defined. The same set of ancillary data is processed for all sensors and the same units for the sensor and scan data are utilized as those selected by the user.

Data Attributes

The primary data items (IDFS sensors) returned by the selected IDFS source are presented in two lists entitled Sensor Group and Sensor. The PIDF file utilizes these two groupings to allow an additional level of subdivision within the primary data sources. This scheme is useful when the IDFS data source contains a large number of primary data sources representing a diverse set of measurements.

PDS File Attributes

The IDFStoPDS program creates a PDS label file for each PDS data file that is generated. Some of the values contained within the PDS label file can be modified by the user by selecting the PDS File Attributes button. This action will invoke the "PDS Label File Information" GUI. The values for these PDS label fields are defaulted by the IDFStoPDS program. A brief explanation of the options is given below. In all cases where a list is utilized, the list of options that are selectable are defined as Standard Values according to Data Dictionary Elements documentation provided by PDS.

Format of the PDS CSV Data File

The IDFStoPDS program utilizes the PDS Spreadsheet object (see PDS Standards) to describe the IDFS data items that are placed within the PDS data file. The PDS data file is simply an ASCII file which contains the selected IDFS data items, along with secondary data sources (ancillary data) and any instrument state values. The layout of the data file is row-oriented, with each row in the format of

Start time Stop time Data type name Data type id Data name Data unit label Data value(s)

The primary or sensor data (Data type name = SENSOR) is the first row outputted, followed by any secondary data products selected which are associated with the IDFS data item, with each secondary data product outputted on a separate row. This pattern of sensor data and secondary data is repeated for each selected IDFS data item. There may be multiple calibration variables defined for the virtual instrument (IDFS data source) in question. Therefore, there will be one row outputted for each defined calibration variable.

If the IDFS data source selected is a vector instrument, the scan values which correspond to the returned sensor data values are also outputted. If all sensors utilize the same scan range, the scan values are outputted as the last row for the time period being processed; that is, after all data for all selected IDFS data items have been outputted. However, if all of the IDFS sensors do not utilize the same scan range, the scan values are outputted as the last row for each individual IDFS data item outputted.

The last data type to be outputted is the instrument state values (Data type name = MODE), as they pertain to the instrument as a whole. For vector instruments, the instrument state values are written once for each time interval processed for the primary data. However, for scalar instruments, this may or may not be true. The IDFS paradigm allows for the "packing" of multiple scalar values into a single group (referred to as an IDFS sensor set) in order to cut down on the size of the data files. The instrument state values stay constant throughout the IDFS sensor set. The IDFStoPDS program outputs these packed scalars one value at a time; however, the instrument state values are only written once per IDFS sensor set since they stay constant. The time range indicates the duration for which the instrument state values are valid. If the scalar instrument does not pack the primary data, then the instrument state values are written once for each time interval processed for the primary data.

File Button

Action

Currently, this pull-down menu contains just one option

When this option is selected, the IDFStoPDS program attempts to generate PDS compliant data and label files for the selected IDFS data items for the selected time range.