Raw converted data

The EchoData object

EchoData is an object that conveniently handles raw converted data from either raw instrument files (via open_raw) or previously converted and standardized raw files (via open_converted). It is essentially a container for multiple xarray Dataset objects, where each such object corresponds to one of the netCDF4 groups specified in the SONAR-netCDF4 convention. EchoData objects are used for conveniently accessing and exploring the echosounder data, for calibration and other processing, and for serializing into netCDF4 or Zarr file formats.

A sample EchoData object is presented below, showing the hierarchical structure of the SONAR-netCDF4 version 1 groups. Click on a group to drill down to variables and attributes and to examine the structure and representative content of an EchoData object.

EchoData: standardized raw data from Internal Memory
    • <xarray.Dataset>
      Dimensions:  ()
      Data variables:
          *empty*
      Attributes:
          conventions:                 CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3
          keywords:                    EK60
          sonar_convention_authority:  ICES
          sonar_convention_name:       SONAR-netCDF4
          sonar_convention_version:    1.0
          summary:                     EK60 raw file s3://ncei-wcsd-archive/data/ra...
          title:                       2017 Pacific Hake Acoustic Trawl Survey
          date_created:                2017-07-28T18:16:19Z
          survey_name:                 

      • <xarray.Dataset>
        Dimensions:                 (frequency: 3, ping_time: 529)
        Coordinates:
          * frequency               (frequency) float64 1.8e+04 3.8e+04 1.2e+05
          * ping_time               (ping_time) datetime64[ns] 2017-07-28T18:16:19.31...
        Data variables:
            absorption_indicative   (frequency, ping_time) float64 0.002822 ... 0.03259
            sound_speed_indicative  (frequency, ping_time) float64 1.481e+03 ... 1.48...

      • <xarray.Dataset>
        Dimensions:          (location_time: 2165, frequency: 3, ping_time: 529)
        Coordinates:
          * location_time    (location_time) datetime64[ns] 2017-07-28T18:16:21.47599...
          * frequency        (frequency) float64 1.8e+04 3.8e+04 1.2e+05
          * ping_time        (ping_time) datetime64[ns] 2017-07-28T18:16:19.313999872...
        Data variables:
            latitude         (location_time) float64 dask.array<chunksize=(2165,), meta=np.ndarray>
            longitude        (location_time) float64 dask.array<chunksize=(2165,), meta=np.ndarray>
            sentence_type    (location_time) <U3 dask.array<chunksize=(2165,), meta=np.ndarray>
            pitch            (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray>
            roll             (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray>
            vertical_offset  (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray>
            water_level      (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray>
        Attributes:
            platform_type:       Research vessel
            platform_name:       Bell M. Shimada
            platform_code_ICES:  315

        • <xarray.Dataset>
          Dimensions:        (location_time: 22037)
          Coordinates:
            * location_time  (location_time) datetime64[ns] 2017-07-28T18:16:19.3140003...
          Data variables:
              NMEA_datagram  (location_time) <U73 '$SDVLW,5050.149,N,5050.149,N' ... '$...
          Attributes:
              description:  All NMEA sensor datagrams

      • <xarray.Dataset>
        Dimensions:  ()
        Data variables:
            *empty*
        Attributes:
            conversion_software_name:     echopype
            conversion_software_version:  0.5.6.dev53+g62c3d1fb.d20220426
            conversion_time:              2022-04-27T04:33:55Z
            src_filenames:                s3://ncei-wcsd-archive/data/raw/Bell_M._Shi...
            duplicate_ping_times:         0

      • <xarray.Dataset>
        Dimensions:           (beam_group: 1)
        Dimensions without coordinates: beam_group
        Data variables:
            beam_group_name   (beam_group) <U11 'Beam_group1'
            beam_group_descr  (beam_group) <U131 'contains backscatter power (uncalib...
        Attributes:
            sonar_manufacturer:      Simrad
            sonar_model:             ER60
            sonar_serial_number:     
            sonar_software_name:     
            sonar_software_version:  2.4.3
            sonar_type:              echosounder

        • <xarray.Dataset>
          Dimensions:                         (frequency: 3, ping_time: 529,
                                               range_sample: 3957)
          Coordinates:
            * frequency                       (frequency) float64 1.8e+04 3.8e+04 1.2e+05
            * ping_time                       (ping_time) datetime64[ns] 2017-07-28T18:...
            * range_sample                    (range_sample) int64 0 1 2 ... 3955 3956
          Data variables: (12/30)
              channel_id                      (frequency) <U37 'GPT  18 kHz 009072058c8...
              beam_type                       (frequency) int64 1 1 1
              beamwidth_receive_alongship     (frequency) float64 10.9 6.81 6.58
              beamwidth_receive_athwartship   (frequency) float64 10.82 6.85 6.52
              beamwidth_transmit_alongship    (frequency) float64 10.9 6.81 6.58
              beamwidth_transmit_athwartship  (frequency) float64 10.82 6.85 6.52
              ...                              ...
              data_type                       (frequency, ping_time) float64 3.0 ... 3.0
              count                           (frequency, ping_time) float64 3.957e+03 ...
              offset                          (frequency, ping_time) float64 0.0 ... 0.0
              transmit_mode                   (frequency, ping_time) float64 0.0 ... 0.0
              angle_athwartship               (frequency, ping_time, range_sample) float64 ...
              angle_alongship                 (frequency, ping_time, range_sample) float64 ...
          Attributes:
              beam_mode:              vertical
              conversion_equation_t:  type_3

      • <xarray.Dataset>
        Dimensions:           (frequency: 3, pulse_length_bin: 5)
        Coordinates:
          * frequency         (frequency) float64 1.8e+04 3.8e+04 1.2e+05
          * pulse_length_bin  (pulse_length_bin) int64 0 1 2 3 4
        Data variables:
            sa_correction     (frequency, pulse_length_bin) float64 0.0 -0.7 ... -0.3
            gain_correction   (frequency, pulse_length_bin) float64 20.3 22.95 ... 26.55
            pulse_length      (frequency, pulse_length_bin) float64 0.000512 ... 0.00...

Modifications to SONAR-netCDF4

Echopype follows the ICES SONAR-netCDF4 convention ver.1 when possible to create interoperable data. However, to fully leverage the power of label-aware manipulation provided by the xarray library and enhance coherence of data representation for scientific echosounders, we (the echopype developers) have made decisions to deviate from the convention in key aspects. These changes are explained below.

Organization of multi-frequency data

Echopype implements a modification of the SONAR-netCDF4 data model that optimizes data access and filtering (“slicing”) efficiency and usability at the expense of potentially increased file storage. For each sonar beam, the convention defines data variables such as backscatter_r based on a one-dimensional ragged array structure that uses a custom variable-length vector data type (sample_t) and ping_time as its coordinate dimensions; each frequency channel is stored in a separate netCDF4 group (Sonar/Beam_group1, Sonar/Beam_group2, …).

Echopype restructures this multi-group ragged array representation into a single-group, 4-dimensional gridded representation, with dimensions (channel, range_sample, ping_time, beam) across all channels. Here, the ping_time and beam dimensions are defined in the convention, whereas the channel and range_sample (along-range sample number) dimensions are echopype-specific modifications. Data from each frequency channel (i.e., transducers for echosounders) are mapped along the channel dimension, and echo data from each ping are mapped along the range_sample dimension. These consolidated, uniform multi-channel (or multi-frequency) DataArrays are stored in Sonar/Beam_group1, Sonar/Beam_group2, and potentially other such groups (Sonar/Beam_group3, etc.) in the netCDF data model.

Note

Due to flexibility in echosounder settings, there can potentially be unequal number of samples along sonar range (i.e., length of the range_sample dimension) across different ping_time or channel. Echopype addresses this by padding NaN for pings or channels with fewer samples to maintain the uniform shape of the 4-dimensional gridded representation.

The NaN padding approach could consume large amount of memory in some specific cases due to the echosounder setup. This is an issue we are actively working on. See #489 for detail.

Data from different echosounders

Power/Angle data

For single-beam setups, only the echo power (or intensity) data are available and these data are stored in the variable backscatter_r (the r in the suffix means the real part of the signal). This is the case for data from the AZFP echosounder or EK60/EK80 echosounder paired with single-beam transducers (see below for more details on EK80 data).

For split-beam setups, the echo power data are similarly stored in the variable backscatter_r, but with the additional split-beam angle data for each sample (along range_sample) stored in variables angle_alongship and angle_athwartship. This is the case for data from the EK60 echosounder or the EK80 echosounder configured to store power/angle data.

All the above data variables (backscatter_r, angle_alongship, angle_athwartship) use the gridded representation with dimensions (channel, range_sample, ping_time, beam). Here, the length of the beam dimension equals to 1. This length is intuitive for single-beam data. For split-beam data, the length of this dimension is 1, because the power/angle data are already in a derived form from the split-beam transducer sectors. All data are stored in the Sonar/Beam_group1 group.

Complex data

A deviation from the above is the case when the raw complex samples are recorded by EK80 echosounders paired with split-beam transducers. In this case, both backscatter_r and backscatter_i variables exist and contain the real and imaginary part of the echo waveform data, respectively. These vairables are with dimension (channel, range_sample, ping_time, beam) as before, but the length of the beam dimension can be 3 or 4, depending on the specific transducer used in the setup. The angle_alongship and angle_athwartship variables are not present in such files.

Note

It is possible for power/angle data and complex data to coexist in files collected by EK80 echosounders, since each frequency channel can be configured separately. In this case, the complex data are stored in the Sonar/Beam_group1 group and the power/angle data are stored in the Sonar/Beam_group2 group.