Raw converted data
Contents
Raw converted data#
The EchoData
object#
EchoData
is an object that conveniently handles raw converted data from either raw instrument files (via open_raw
) or previously converted and standardized raw files (via open_converted
). It is essentially a container for multiple xarray Dataset
objects, where each such object corresponds to one of the netCDF4 groups specified in the SONAR-netCDF4 convention. EchoData
objects are used for conveniently accessing and exploring the echosounder data, for calibration and other processing, and for serializing into netCDF4 or Zarr file formats.
A sample EchoData
object is presented below, showing the hierarchical structure of the SONAR-netCDF4 version 1 groups. Click on a group to drill down to variables and attributes and to examine the structure and representative content of an EchoData
object.
-
<xarray.Dataset> Dimensions: () Data variables: *empty* Attributes: conventions: CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3 keywords: EK60 sonar_convention_authority: ICES sonar_convention_name: SONAR-netCDF4 sonar_convention_version: 1.0 summary: EK60 raw file s3://ncei-wcsd-archive/data/ra... title: 2017 Pacific Hake Acoustic Trawl Survey date_created: 2017-07-28T18:16:19Z survey_name:
-
<xarray.Dataset> Dimensions: (frequency: 3, ping_time: 529) Coordinates: * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05 * ping_time (ping_time) datetime64[ns] 2017-07-28T18:16:19.31... Data variables: absorption_indicative (frequency, ping_time) float64 0.002822 ... 0.03259 sound_speed_indicative (frequency, ping_time) float64 1.481e+03 ... 1.48...
-
<xarray.Dataset> Dimensions: (location_time: 2165, frequency: 3, ping_time: 529) Coordinates: * location_time (location_time) datetime64[ns] 2017-07-28T18:16:21.47599... * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05 * ping_time (ping_time) datetime64[ns] 2017-07-28T18:16:19.313999872... Data variables: latitude (location_time) float64 dask.array<chunksize=(2165,), meta=np.ndarray> longitude (location_time) float64 dask.array<chunksize=(2165,), meta=np.ndarray> sentence_type (location_time) <U3 dask.array<chunksize=(2165,), meta=np.ndarray> pitch (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray> roll (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray> vertical_offset (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray> water_level (frequency, ping_time) float64 dask.array<chunksize=(3, 529), meta=np.ndarray> Attributes: platform_type: Research vessel platform_name: Bell M. Shimada platform_code_ICES: 315
-
<xarray.Dataset> Dimensions: (location_time: 22037) Coordinates: * location_time (location_time) datetime64[ns] 2017-07-28T18:16:19.3140003... Data variables: NMEA_datagram (location_time) <U73 '$SDVLW,5050.149,N,5050.149,N' ... '$... Attributes: description: All NMEA sensor datagrams
-
<xarray.Dataset> Dimensions: () Data variables: *empty* Attributes: conversion_software_name: echopype conversion_software_version: 0.5.6.dev53+g62c3d1fb.d20220426 conversion_time: 2022-04-27T04:33:55Z src_filenames: s3://ncei-wcsd-archive/data/raw/Bell_M._Shi... duplicate_ping_times: 0
-
<xarray.Dataset> Dimensions: (beam_group: 1) Dimensions without coordinates: beam_group Data variables: beam_group_name (beam_group) <U11 'Beam_group1' beam_group_descr (beam_group) <U131 'contains backscatter power (uncalib... Attributes: sonar_manufacturer: Simrad sonar_model: ER60 sonar_serial_number: sonar_software_name: sonar_software_version: 2.4.3 sonar_type: echosounder
-
<xarray.Dataset> Dimensions: (frequency: 3, ping_time: 529, range_sample: 3957) Coordinates: * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05 * ping_time (ping_time) datetime64[ns] 2017-07-28T18:... * range_sample (range_sample) int64 0 1 2 ... 3955 3956 Data variables: (12/30) channel_id (frequency) <U37 'GPT 18 kHz 009072058c8... beam_type (frequency) int64 1 1 1 beamwidth_receive_alongship (frequency) float64 10.9 6.81 6.58 beamwidth_receive_athwartship (frequency) float64 10.82 6.85 6.52 beamwidth_transmit_alongship (frequency) float64 10.9 6.81 6.58 beamwidth_transmit_athwartship (frequency) float64 10.82 6.85 6.52 ... ... data_type (frequency, ping_time) float64 3.0 ... 3.0 count (frequency, ping_time) float64 3.957e+03 ... offset (frequency, ping_time) float64 0.0 ... 0.0 transmit_mode (frequency, ping_time) float64 0.0 ... 0.0 angle_athwartship (frequency, ping_time, range_sample) float64 ... angle_alongship (frequency, ping_time, range_sample) float64 ... Attributes: beam_mode: vertical conversion_equation_t: type_3
-
<xarray.Dataset> Dimensions: (frequency: 3, pulse_length_bin: 5) Coordinates: * frequency (frequency) float64 1.8e+04 3.8e+04 1.2e+05 * pulse_length_bin (pulse_length_bin) int64 0 1 2 3 4 Data variables: sa_correction (frequency, pulse_length_bin) float64 0.0 -0.7 ... -0.3 gain_correction (frequency, pulse_length_bin) float64 20.3 22.95 ... 26.55 pulse_length (frequency, pulse_length_bin) float64 0.000512 ... 0.00...
Modifications to SONAR-netCDF4#
Echopype follows the ICES SONAR-netCDF4 convention ver.1 when possible to create interoperable data. However, to fully leverage the power of label-aware manipulation provided by the xarray library and enhance coherence of data representation for scientific echosounders, we (the echopype developers) have made decisions to deviate from the convention in key aspects. These changes are explained below.
Organization of multi-frequency data#
Echopype implements a modification of the SONAR-netCDF4 data model that optimizes data access and filtering (“slicing”) efficiency and usability at the expense of
potentially increased file storage. For each sonar beam, the convention defines data
variables such as backscatter_r
based on a one-dimensional ragged array structure that uses a custom variable-length vector data type (sample_t
) and ping_time
as its coordinate dimensions; each frequency channel is stored in a separate netCDF4 group (Sonar/Beam_group1
, Sonar/Beam_group2
, …).
Echopype restructures this multi-group ragged array representation into a single-group, 4-dimensional gridded representation, with dimensions (channel, range_sample, ping_time, beam)
across all channels. Here, the ping_time
and beam
dimensions are defined in the convention, whereas the channel
and range_sample
(along-range sample number) dimensions are echopype-specific modifications. Data from each frequency channel (i.e., transducers for echosounders) are mapped along the channel
dimension, and echo data from each ping are mapped along the range_sample
dimension. These consolidated, uniform multi-channel (or multi-frequency) DataArrays
are stored in Sonar/Beam_group1
, Sonar/Beam_group2
, and potentially other such groups (Sonar/Beam_group3
, etc.) in the netCDF data model.
Note
Due to flexibility in echosounder settings, there can potentially be unequal number of samples along sonar range (i.e., length of the range_sample
dimension) across different ping_time
or channel
. Echopype addresses this by padding NaN
for pings or channels with fewer samples to maintain the uniform shape of the 4-dimensional gridded representation.
The NaN
padding approach could consume large amount of memory in some specific cases due to the echosounder setup. This is an issue we are actively working on. See #489 for detail.
Data from different echosounders#
Power/Angle data#
For single-beam setups, only the echo power (or intensity) data are available and these data are stored in the variable backscatter_r
(the r
in the suffix means the real part of the signal). This is the case for data from the AZFP echosounder or EK60/EK80 echosounder paired with single-beam transducers (see below for more details on EK80 data).
For split-beam setups, the echo power data are similarly stored in the variable backscatter_r
, but with the additional split-beam angle data for each sample (along range_sample
) stored in variables angle_alongship
and angle_athwartship
. This is the case for data from the EK60 echosounder or the EK80 echosounder configured to store power/angle data.
All the above data variables (backscatter_r
, angle_alongship
, angle_athwartship
) use the gridded representation with dimensions (channel, range_sample, ping_time, beam)
. Here, the length of the beam
dimension equals to 1. This length is intuitive for single-beam data. For split-beam data, the length of this dimension is 1, because the power/angle data are already in a derived form from the split-beam transducer sectors. All data are stored in the Sonar/Beam_group1
group.
Complex data#
A deviation from the above is the case when the raw complex samples are recorded by EK80 echosounders paired with split-beam transducers. In this case, both backscatter_r
and backscatter_i
variables exist and contain the real and imaginary part of the echo waveform data, respectively. These vairables are with dimension (channel, range_sample, ping_time, beam)
as before, but the length of the beam
dimension can be 3 or 4, depending on the specific transducer used in the setup. The angle_alongship
and angle_athwartship
variables are not present in such files.
Note
It is possible for power/angle data and complex data to coexist in files collected by EK80 echosounders, since each frequency channel can be configured separately. In this case, the complex data are stored in the Sonar/Beam_group1
group and the power/angle data are stored in the Sonar/Beam_group2
group.