Working with converted files#
Open a converted netCDF or Zarr dataset#
Converted netCDF files can be opened with the open_converted
function that returns a lazy-loaded EchoData
object (only metadata are read during opening):
import echopype as ep
file_path = "./converted_files/file.nc" # path to a converted nc file
ed = ep.open_converted(file_path) # create an EchoData object
Likewise, specify the path to open a Zarr dataset. To open such a dataset from cloud storage, use the same storage_options
parameter as with open_raw. For example:
s3_path = "s3://s3bucketname/directory_path/dataset.zarr" # S3 dataset path
ed = ep.open_converted(s3_path, storage_options={"anon": True})
Combine EchoData objects#
Data collected by the same instrument deployment across multiple files can be combined into a single EchoData
object using combine_echodata
. With the release of echopype version 0.6.3
, one can now combine a large number of files in parallel (using Dask) while maintaining a stable memory usage. This is done under-the-hood by concatenating data directly into a Zarr store, which corresponds to the final combined EchoData
object.
To use combine_echodata
, the following criteria must be met:
Each
EchoData
object must have the samesonar_model
The
EchoData
objects to be combined must correspond to different raw data files (i.e., no duplicated files)The
EchoData
objects in the list must be of sequential order in time. Specifically, the first timestamp of eachEchoData
object must be smaller (earlier) than the first timestamp of the subsequentEchoData
objectThe
EchoData
objects must contain the same frequency channels and the same number of channelsThe following attribute criteria must be satisfied for all groups under each of the
EchoData
objects to be combined:the names of all attributes must be the same
the values of all attributes must be identical (other than the attributes
date_created
orconversion_time
; these attributes should have the same data type)
Attention
In previous versions, combine_echodata
corrected reversed timestamps and stored the uncorrected timestamps in the Provenance
group.
Starting from 0.6.3
, combine_echodata
will preserve time coordinates that have reversed timestamps and not correction is performed.
The first step in combining data is to establish a Dask client with a scheduler. On a local machine, this can be done as follows:
client = Client() # create client with local scheduler
With distributed resources, we highly recommend reviewing the Dask documentation for deploying Dask clusters.
Next, we assemble a list of EchoData
objects. This list can be from converted files (netCDF or Zarr) as in the example below, or from in-memory EchoData
objects:
ed_list = []
for converted_file in ["convertedfile1.zarr", "convertedfile2.zarr"]:
ed_list.append(ep.open_converted(converted_file)) # already converted files are lazy-loaded
Finally, we apply combine_echodata
on this list to combine all the data into a single EchoData
object. Here, we will store the final combined form in the Zarr path path_to/combined_echodata.zarr
and use the client we established above:
combined_ed = ep.combine_echodata(
ed_list,
zarr_path='path_to/combined_echodata.zarr',
client=client
)
Once executed, combine_echodata
returns a lazy loaded EchoData
object (obtained from zarr_path
) with all data from the input EchoData
objects combined.
Note
As shown in the above example, the path of the combined Zarr store is given by the keyword argument zarr_path
,
and the Dask client that parallel tasks will be submitted to is given by the keyword argument client
.
When either (or both) of these are not provided, default values listed in the Notes
section in combine_echodata
will be used.