How to add a new function

How to add a new function#

Thank you for your interest in contributing to Echopype!

Check out the Development roadmap to see what would make Echopype more powerful for analyzing echosounder data. Feel free to ping the maintainers (@leewujung, @ctuguina) for discussion and questions in existing issues, open a new issue, or create a PR directly.

Note

We encourage all code contributions to be accompanied by tests and documentation (doctrings and inline comments). We may ask for these when reviewing the PRs. If you have added new tests but the GitHub Actions for continuous integration need approval to run, ping the maintainers to get them started.

Add rule-based processing functions#

Many echosounder data processing functions (e.g., Rule-based algorithms) operate on Sv data or raw data parsed from instrument-generated files. In Echopype:

raw data are stored/accessed as EchoData object that can be opened using open_converted from the saved Zarr or netCDF files.
Sv data are stored in standard xarray Dataset that can be opened using xr.open_dataset and other related functions from the saved Zarr or netCDF files.

Since both raw data (via the EchoData object) and Sv data are ultimately loaded into xarray datasets:

If your algorithm is written using xarray operations: You can likely create a new function in the Echopype subpackage you think your function would best sit in (see Rule-based algorithms for subpackage ideas), and directly transplant your algorithm there. Just make sure that the dimension/coordinate names match between what your function needs and what Echopype Sv dataset contains.
If your algorithm is written using numpy, scipy, or other common libraries: We recommend that you replace the pure index-based slicing/indexing operations (e.g., i=1, j=2, …) with xarray label-aware operations (e.g., depth=1, ping_time="2025-04-12T12:00:00", …). This makes the implementation much more readable and easier to debug, and has the added advantage of directly leveraging xarray’s integration with numpy, dask, and zarr to allow distributed, out-of-core computing of large data.
If your algorithm uses image processing functions: Check out dask-image to see if you can leverage any implementations that are already scalable when adding your new function.

Typically:

A processing function would either add data variables to the Sv dataset (e.g., consolidate.add_latlon) or return a new xarray Dataset or DataArray (e.g., mask.apply_mask).
The functions should accept input data either as an in-memory or lazy-loaded data or a path to local or remote storage locations (e.g., cloud, http server). If the input data is in netCDF or Zarr, this is easily supported by xarray. The only thing to watch out for is that, for a remote path, access credentials need to be provided via adding a storage_options argument.

Tip

If your algorithm uses a library that is not currently an Echopype dependency, please add it to the appropriate dependency section in pyproject.toml.

Use [project.dependencies] for runtime dependencies, or one of the [project.optional-dependencies] groups such as test, docs, dev, or plot when the dependency is only needed for that purpose.

Steps to achieve scalability#

Computational scalability is a core goal of Echopype. However, from experience we know that scalability can be hard to achieve on first try, as it depends on the specific operations in the function and the exact implementation, as well as the chunking of the data.

Therefore, we recommend breaking down the addition of a new function into 3 steps:

Add the function following the guidelines in the above section
- Ensure the function works with datasets of reasonable sizes (e.g., 100 MB)
- Add tests for the new function in the testing suites (under echopype/tests)
Benchmark function performance with different sizes of dataset and different chunking schemes, to determine if further optimization for scalability is needed
- Watch out for unexpected memory expansion in the computing steps - this can happen due to implicit broadcasting or padding operations
Adjust the implementation to optimize the performance if necessary

Most current Echopype processing functions are capable of leveraging lazy-loaded datasets for delayed computation, which may require additional tuning.

Tip

The xarray documentation includes a nice starting guide to parallel computing with dask.

How to add a new function

Contents

How to add a new function#

Add rule-based processing functions#

Steps to achieve scalability#