.. _convert:

Convert raw files
=================

Supported raw file types
------------------------

Echopype currently supports conversion into
`netCDF4 <https://www.unidata.ucar.edu/software/netcdf/>`_ or
`Zarr <https://zarr.readthedocs.io>`_ files from the following raw formats:

- ``.raw`` files generated by `Kongsberg-Simrad <https://www.kongsberg.com/maritime/contact/simrad/>`_'s
  EK60, ES70, EK80, and ES80 echosounders and Kongsberg's EA640 echosounder
- ``.01A`` files generated by `ASL Environmental Sciences <https://aslenv.com>`_' AZFP echosounder
- ``.ad2cp`` files generated by `Nortek <https://www.nortekgroup.com/>`_'s
  Signature series Acoustic Doppler Current Profilers (ADCPs) (beta)


Importing echopype
------------------

We encourage importing the ``echopype`` package with the alias ``ep``:

.. code-block:: python

    import echopype as ep

In the examples below, we import ``open_raw`` as follows:

.. code-block:: python

    from echopype import open_raw

Conversion operation
--------------------

File conversion for different types of echosounders is achieved by
using the single function ``open_raw`` to parse the raw data and
create a fully parsed ``EchoData`` object.

Use the parameter ``sonar_model`` to indicate the sonar type:
    - ``EK60``: Kongsberg-Simrad EK60 echosounder
    - ``ES70``: Kongsberg-Simrad ES70 echosounder
    - ``EK80``: Kongsberg-Simrad EK80 echosounder
    - ``ES80``: Kongsberg-Simrad ES80 echosounder
    - ``EA640``: Kongsberg EA640 echosounder
    - ``AZFP``: ASL Environmental Sciences AZFP echosounder
    - ``AD2CP``: Nortek Signature series ADCP
      (tested with Signature 500 and Signature 1000)


``EchoData`` objects are based on the **SONAR-netCDF4 vers.1 convention**, with some
modifications introduced by echopype; see :doc:`data-format` for details.

In the following example, ``open_raw`` is used to convert a raw EK80 file,
and return an in-memory ``EchoData`` object ``ed``. The ``to_netcdf`` method on
``ed`` is then used to generate a converted SONAR-netCDF4 vers.1 file named ``FILENAME.nc``
saved to the directory path ``./unpacked_files``:

.. code-block:: python

    ed = open_raw('FILENAME.raw', sonar_model='EK80')  # for EK80 file
    ed.to_netcdf(save_path='./unpacked_files')

For data files from the AZFP echosounder, the conversion requires an
extra ``.XML`` file along with the ``.01A`` data file, specified using
the parameter ``xml_path``:

.. code-block:: python

    ed = open_raw('FILENAME.01A', sonar_model='AZFP', xml_path='XMLFILENAME.xml')
    ed.to_netcdf(save_path='./unpacked_files')

The ``.XML`` file contains a lot of metadata needed for unpacking the
binary data files. Typically a single ``.XML`` file is associated with
all files from the same deployment.

.. note::

   The ``EchoData`` instance contains all the data unpacked from the raw file,
   so it is a good idea to clear it from memory once done with conversion.

.. attention::
    In version 0.6.2 of echopype we improved the in-memory usage of ``open_raw``
    by allowing users to directly write variables that may consume a large amount of memory
    into a temporary zarr store (see `#774 <https://github.com/OSOceanAcoustics/echopype/pull/774>`_).

    This feature is accessible through ``open_raw`` via arguments ``use_swap`` and ``max_mb``
    and is only available for the following echosounders: EK60, ES70, EK80, ES80, EA640.
    See :ref:`API reference <api-open_raw>` for usage.
    This is currently a beta feature that will benefit from user feedback.

File access
-----------

.. Specifying multiple files
.. ~~~~~~~~~~~~~~~~~~~~~~~~~

.. ``open_raw`` can accept a list of file paths pointing to multiple files.
.. For example:

.. .. code-block:: python

   raw_file_paths = [
      './raw_data_files/file_01.raw',
      './raw_data_files/file_02.raw'
   ]
   ed = open_raw(raw_file_paths, sonar_model='EK60')

``open_raw`` can also accept paths to files on remote systems such as ``http``
(a file on a web server) and cloud object storage such as Amazon Web Services (AWS) S3.
This capability is provided by the `fsspec <https://filesystem-spec.readthedocs.io>`_
package, and all file systems implemented by ``fsspec`` are supported;
a list of these file systems is available on the
`fsspec registry documentation <https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations>`_.

https access
~~~~~~~~~~~~

A file on a web server can be accessed by specifying the file url:

.. code-block:: python

   raw_file_url = "https://mydomain.com/my/dir/D20170615-T190214.raw"
   ed = open_raw(raw_file_url, sonar_model='EK60')

AWS S3 access
~~~~~~~~~~~~~

.. note::

   These instructions should apply to other object storage providers such as
   Google Cloud and Azure, but have only been tested on AWS S3.

A file on an `AWS S3 <https://aws.amazon.com/s3/>`_ "bucket" can be accessed by
specifying the S3 path that starts with "s3://" and using the ``storage_options``
argument. For a publicly accessible file ("anonymous") on a bucket called ``mybucket``:

.. code-block:: python

   raw_file_s3path = "s3://mybucket/my/dir/D20170615-T190214.raw"
   ed = open_raw(
      raw_file_s3path, sonar_model='EK60',
      storage_options={'anon': True}
   )

If the file is not publicly accessible, the credentials can be specified explicitly
through ``storage_options`` keywords:

.. code-block:: python

   ed = open_raw(
      raw_file_s3path, sonar_model='EK60',
      storage_options={'key': 'ACCESSKEY', 'secret': 'SECRETKEY'}
   )

or via a credentials file stored in the default AWS credentials file
(``~/.aws/credentials``). For ``profile`` "myprofilename" found in
the credential file (note that ``aiobotocore`` is installed by ``echopype``):

.. code-block:: python

   import aiobotocore
   aws_session = aiobotocore.AioSession(profile='myprofilename')
   ed = open_raw(
      raw_file_s3path, sonar_model='EK60',
      storage_options={'session': aws_session}
   )


File export
-----------

Converted data are saved to netCDF4 or Zarr files using ``EchoData.to_netcdf()``
and ``EchoData.to_zarr()``. These methods accept convenient optional arguments.
The examples below apply equally to both methods, except as noted.

A destination folder or file path should be specified with the ``save_path``
argument in these methods in order to control the location of the converted files.
If the argument is not specified, the converted ``.nc`` and ``.zarr``
files are saved into the directory ``~/.echopype/temp_output``.
This folder will be created if it doesn't already exists.


Specify metadata attributes
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Before calling ``to_netcdf()`` or ``to_zarr()``, you can manually set some
metadata attributes that are not recorded in the raw data files but need to be
specified according to the SONAR-netCDF4 convention.
Common attributes typically not found in the raw files include the following,
in the ``Platform`` netCDF4 group:
``platform_name``, ``platform_type`` and ``platform_code_ICES``.
These attributes can be set using the following:

.. code-block:: python

    ed['Platform']['platform_name'] = 'OOI'
    ed['Platform']['platform_type'] = 'subsurface mooring'
    ed['Platform']['platform_code_ICES'] = '3164'   # Platform code for Moorings

The ``platform_code_ICES`` attribute can be chosen by referencing
the platform code from the
`ICES SHIPC vocabulary <https://vocab.ices.dk/?ref=315>`_.


.. Save converted files into a specified folder
.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. In this example, each input file will be converted to an individual ``.nc`` file
.. and stored in the ``./unpacked_files`` directory.

.. .. code-block:: python

   raw_file_paths = [                              # a list of raw data files
      './raw_data_files/dir1/file_01.raw',
      './raw_data_files/dir2/file_02.raw'
   ]
   ed = open_raw(raw_file_paths, sonar_model='EK60')     # create an EchoData object
   ed.to_netcdf(save_path='./unpacked_files')      # set the output directory

.. Combine multiple raw files into one converted file
.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. Multiple files can be combined into a single converted file using the
.. ``combine`` argument (the default is ``combine=False``). In that case,
.. ``save_path`` must be specified explicitly. If ``save_path`` is only a filename
.. rather than a full file path, the combined output file will be saved to the
.. default ``~/.echopype/temp_output`` folder.

.. .. code-block:: python

   raw_file_paths = [                              # a list of raw data files
      './raw_data_files/dir1/file_01.raw',
      './raw_data_files/dir2/file_02.raw'
   ]
   ed = open_raw(raw_file_paths, sonar_model='EK60')     # create an EchoData object
   ed.to_zarr(
      combine=True,                                # combine all input files on conversion
      save_path='./unpacked_files/combined_file.zarr'
   )

Save to AWS S3
~~~~~~~~~~~~~~

.. note::

   These instructions should apply to other object storage providers such as
   Google Cloud and Azure, but have only been tested on AWS S3.

Converted files can be saved directly into an AWS S3 bucket by specifying
``output_storage_options``, similar to ``storage_options`` with input files
(see above, "AWS S3 access"). The example below illustrates a fully remote
processing pipeline, reading a raw file from a web server and saving the
converted Zarr dataset to S3. (As with ``storage_options`` when accessing
raw data from S3, a ``profile``-based ``session`` can also be used, passing the
``session`` to ``output_storage_options``). Writing netCDF4 to S3 is
currently not supported.

.. code-block:: python

      raw_file_url = 'http://mydomain.com/from1/file_01.raw'
      ed = open_raw(raw_file_url, sonar_model='EK60')
      ed.to_zarr(
         overwrite=True,
         save_path='s3://mybucket/converted_file.zarr',
         output_storage_options={'key': 'ACCESSKEY', 'secret': 'SECRETKEY'}
      )

.. note::

   Zarr datasets will be automatically chunked with default chunk sizes of
   25000 for ``range_sample`` and 2500 for ``ping_time`` dimensions.