(convert)=
# Converting raw files

(convert-sonar_types)=
## Supported raw file types

Echopype supports converting raw instrument-generated data files into [netCDF](https://www.unidata.ucar.edu/software/netcdf/) or [Zarr](https://zarr.readthedocs.io) from the following echosounders:
- `.raw` files generated by [Kongsberg Simrad](https://www.kongsberg.com/maritime/contact/simrad/) EK60, ES70, EK80, and ES80 echosounders and Kongsberg EA640 echosounder
- `.01A` files generated by [ASL Environmental Sciences](https://aslenv.com) AZFP echosounder
- `.ad2cp` files generated by [Nortek](https://www.nortekgroup.com/) Signature series Acoustic Doppler Current Profilers (ADCPs) (beta)


## Conversion operation

File conversion for different types of echosounders is achieved by using the function `open_raw` to parse the raw data and create an `EchoData` object.

Use the parameter `sonar_model` to indicate the echosounder model:
- `EK60`: Kongsberg Simrad EK60 echosounder
- `ES70`: Kongsberg Simrad ES70 echosounder
- `EK80`: Kongsberg Simrad EK80 echosounder
- `ES80`: Kongsberg Simrad ES80 echosounder
- `EA640`: Kongsberg EA640 echosounder
- `AZFP`: ASL Environmental Sciences AZFP echosounder
- `AD2CP`: Nortek Signature series ADCP (tested with Signature 500 and Signature 1000 files collected in 2021)


To convert a raw EK80 file to an in-memory `EchoData` object:
```python
import echopype as ep  # we encourage importing echopype as ep
ed = ep.open_raw("FILENAME.raw", sonar_model="EK80")  # for EK80 file
```

For data from the AZFP echosounder, the conversion requires an extra `.XML` file (specified using `xml_path`) along with the `.01A` data file:

```python
ed = open_raw("FILENAME.01A", sonar_model="AZFP", xml_path="XML_FILENAME.xml")  # AZFP data need an XML file
ed.to_netcdf(save_path="./unpacked_files")
```

The AZFP `.XML` file contains a lot of metadata needed for unpacking the binary `.01A` files. Typically a single `.XML` file is associated with all files from the same deployment.

:::{tip}
The `EchoData` object contains all the data unpacked from the raw file, so it is a good idea to clear it from memory once done with conversion.
:::

:::{attention}
In Echopype v0.6.2 we improved `open_raw` by allowing users to directly write variables that may consume a large amount of memory into a temporary zarr store ([#774](https://github.com/OSOceanAcoustics/echopype/pull/774)).

This feature is accessible through `open_raw` via arguments `use_swap` and `max_mb` and is only available for the following echosounders: EK60, ES70, EK80, ES80, EA640. See the [API reference](api-open_raw) for usage.
:::


## Local and remote file access

`open_raw` can accept paths to files on both local and remote file systems (e.g., web `http` server and cloud object storage such as Amazon Web Services (AWS) S3).
This capability is provided by the [fsspec](https://filesystem-spec.readthedocs.io) package, and all file systems implemented by `fsspec` are supported (see the list [here](https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations)).


For a file on a web server can be accessed by specifying the file url:
```python
ed = open_raw(
    "https://mydomain.com/my/dir/D20170615-T190214.raw",  # file on http server
    sonar_model="EK80"
)
```

For a file in a publicly accessible S3 bucket:
```python
raw_file_s3path = "s3://mybucket/my/dir/D20170615-T190214.raw"
ed = open_raw(
    "s3://mybucket/my/dir/D20170615-T190214.raw",  # file in S3 bucket
    sonar_model="EK80",
    storage_options={"anon": True}  # publicly accessible file ("anonymous")
)
```

For a file in a private S3 bucket:
```python
raw_file_s3path = "s3://mybucket/my/dir/D20170615-T190214.raw"
ed = open_raw(
    "s3://mybucket/my/dir/D20170615-T190214.raw",  # file in S3 bucket
    sonar_model="EK80",
    storage_options={"key": "ACCESSKEY", "secret": "SECRETKEY"}  # access credentials
)
```

It is often safer to store a credential file so that the access credentials are not supplied directly in scripts or notebooks. For example, for AWS, a default AWS credentials file
(`~/.aws/credentials`) can contain a with `profile` "myprofilename" and be used with `aiobotocore` to access data:
```python
import aiobotocore
aws_session = aiobotocore.AioSession(profile="myprofilename")
ed = open_raw(
    raw_file_s3path, sonar_model="EK60",
    storage_options={"session": aws_session}
)
```

:::{note}
These instructions should apply to other object storage providers such as Google Cloud Platform and Microsoft Azure, but have only been tested on AWS S3.
:::


## Saving converted data

The converted `EchoData` object can be saved to netCDF4 (`.nc`) or Zarr (`.zarr`) files using the `.to_netcdf` or `.to_zarr` method.
The destination folder or file path should be specified with the `save_path` argument.
If left unspecified, the converted files will be saved to `~/.echopype/temp_output`.
This folder will be created if it does not already exists.

```python
ed.to_netcdf(save_path="./unpacked_files")  # save to FILENAME.nc in the folder unpacked_files
ed.to_zarr(save_path="./unpacked_files/NEW_FILENAME.zarr")  # fully specify filename also works
```

The converted `EchoData` object can be also be saved directly to an AWS S3 bucket by specifying
`output_storage_options`, similar to the `storage_options` argument in `open_raw`. The example below illustrates a workflow that reads a raw file from a web server and saving the converted Zarr dataset to S3. Writing netCDF4 to S3 is currently not supported.

```python
ed = open_raw("http://mydomain.com/from1/file_01.raw", sonar_model="EK60")
ed.to_zarr(
    overwrite=True,
    save_path="s3://mybucket/converted_file.zarr",
    output_storage_options={"key": "ACCESSKEY", "secret": "SECRETKEY"}
)
```

:::{note}
Zarr datasets will be automatically chunked with default chunk sizes of 25000 for `range_sample` and 2500 for `ping_time` dimensions.
:::



## Specify metadata attributes

You can manually set some `EchoData` metadata attributes specified in the SONAR-netCDF4 convention that are not recorded in the raw instrument-generated files. For example, many `Platform` variables are not stored in the raw files, including `platform_name`, `platform_type` and `platform_code_ICES`. They can be set by:

```python
ed["Platform"]["platform_name"] = "OOI"
ed["Platform"]["platform_type"] = "subsurface mooring"
ed["Platform"]["platform_code_ICES"] = "3164"   # Platform code for Moorings
```

`platform_code_ICES` can be chosen by referencing the [ICES SHIPC vocabulary](https://vocab.ices.dk/?ref=315).
