The lack of interoperability among data collected by different sonar systems is currently a major obstacle toward integrative analysis of sonar data at large scales. echopype aims at addressing this problem by providing tools for converting data from manufacturer-specific formats into a standardized netCDF file format. NetCDF is the current defacto standard in climate research and is supported by many powerful Python packages for efficient computation, of which echopype take advantage in its data analysis modules.
Interoperable netCDF files¶
Echopype follows the ICES SONAR-netCDF4 convention when possible to create an interoperable data format to which all data are converted to. We made modifications to the file structure in the convention so that the computation can take full advantage of the power of xarray in manipulating labelled multi-dimensional arrays. See Modifications to SONAR-netCDF4 for details of this modification.
Echopype also supports converting raw data files
into the zarr format
for cloud-optimized data storage and access,
following the same structure as in the netCDF files.
However, computing based on the zarr format via
Process is still being
Modifications to SONAR-netCDF4¶
Echopype is designed to handle multi-dimensional labelled data sets
xarray under the hood.
Therefore, we store backscatter data (the echoes) from
different frequency channels in a multi-dimensional array under a
Beam group within a netCDF file.
Because of this change, all frequency-dependent parameters,
such as absorption coefficients, sample intervals, etc.,
are stored as an array with a frequency coordinate.
This is different from the SONAR-netCDF4 convention, in which data
and parameters from different frequency channels are stored in different
beam groups under the
In the convention this was designed to accommodate potential differences
in the number of bins along range, or when there is a change of the
temporal length of data collection in the middle of a file.
However, it is more convenient to store and slice data directly by the
time, range, and frequency/beam direction coordinates (see
xarray documentation for more info about coordinates and
dimensions) when the data are stored in a cubic form.
To accommodate this change, in the above two cases, echopype
handles the uneven number of data samples along range by filling in
NaNfor the shorter channels, and
splits the raw data file into multiple files when there is a change of the temporal length of data collection along range in the middle of a file.
In addition to computational efficiency, another advantage of echopype’s approach in restructuring the netCDF format is to enhance the code readability and make data analysis computations more tractable. For example, to extract data from a particular frequency, users can simply do the following without worrying about the numerical sequence of the index of the selected frequency:
import xarray as xr fname = 'some-path/some-file.nc' ds = xr.open_dataset(fname, group='Beam') # open file as an xarray DataSet data_120k = ds.backscatter_r.sel(frequency=120000) # explicit indexing for frequency