.. _io-persistence:

I/O & Persistence
=================


Saving and loading
------------------

emg3d has functions to store data to disk and to read data from disk, in
different file formats. Currently three file formats are supported, each
with its own advantages and disadvantages:

- ``.h5``:
  Uses h5py to store inputs to a hierarchical, compressed binary HDF5 file.
  **Recommended file format.**

  - Advantage: Widely used, compressed file format, which can be read and
    written in many programs.
  - Disadvantage: You have to install ``h5py``.


- ``.npz``:
  Uses numpy to store inputs to a flat, compressed binary file.

  - Advantage: No extra installation is required, and the outputs are
    compressed.
  - Disadvantage: Only useful within the Python ecosystem.


- ``.json``:
  Uses json to store inputs to a hierarchical, plain text file.

  - Advantage: No extra installation is required, and the output is a plain
    text file that can be viewed in any editor; good for developing and
    debugging.
  - Disadvantage: Not compressed (files can become huge).


Example
~~~~~~~

You should be able to save and load everything you do in emg3d with these
functions. Please have a look at the API of :func:`emg3d.io.save` and
:func:`emg3d.io.load`. But in a nutshell, the first argument is a string
containing the relative or absolute path, file name, and the appropriate suffix
indicating the file format. Afterwards it is simply a ``name=value`` list,
where the name can be anything, and the value must be an existing variable.
(There are a few more options, see the API.)

.. ipython::
  :verbatim:

  In [1]: emg3d.save(
     ...:     '/path/to/filename.ending',
     ...:     inp_model=model1,
     ...:     out_model=model2,
     ...:     survey=survey,
     ...:     efield=efield,
     ...: )
  Out[1]: Data saved to «/path/to/filename.ending»

When you load such a file it will give you a dictionary containing as keys the
names you have defined:

.. ipython::
  :verbatim:

  In [1]: data = emg3d.load('/path/to/filename.ending')
  Out[1]: Data loaded from «/path/to/filename.ending»

  In [2]: data.keys()
  Out[2]: dict_keys(['_date', '_format', '_version', 'efield', 'inp_model', 'out_model', 'survey'])

In addition to the variables you have defined there are a few other, "private"
(starting with an underscore) variables such as the date, format, and version
of emg3d with which the archive was created.


``{to;from}_file``
~~~~~~~~~~~~~~~~~~

The two classes :class:`emg3d.surveys.Survey` and
:class:`emg3d.simulations.Simulation` have ``to_file`` and ``from_file``
methods, which are basically wrappers around the saving and loading functions.
They can be used in the following way:

Storing to disk

.. ipython::
  :verbatim:

  In [1]: my_survey.to_file('mydata.h5')


and loading from disk

.. ipython::
  :verbatim:

  In [1]: my_survey = emg3d.Survey.from_file('mydata.h5')


Serialization
-------------

The following are advanced information if you want to read data created with
emg3d outside of Python or if you want to create data outside of Python which
you can read subsequently with emg3d. As a pure end-user of emg3d you can
ignore this section.

Here a few info with regards to the (de-)serialization used in emg3d.

- When invoking ``emg3d.save('filename.ending', a=a, b=something, foo=bar)``,
  the data is collected in a dict ``{'a': a, 'b': something, 'foo': bar}``.
- Afterwards the dict is serialized. Instances of emg3d
  (:class:`emg3d.meshes.TensorMesh`, :class:`emg3d.fields.Field`,
  :class:`emg3d.surveys.Survey`, :class:`emg3d.simulations.Simulation`) have
  ``to_dict`` and ``from_dict`` methods to (de-)serialize themselves. These are
  used when saving and loading them. In principal emg3d can save everything
  that is either serialized already or is present in
  ``emg3d.utils._KNOWN_CLASSES``. You can define your own classes which have
  ``{to;from}_dict`` methods, and add them to the known classes with the
  decorator ``@utils._known_class``.

  - Things which are done when serializing and undone when de-serializing:

    - ``None`` is saved as a string ``'NoneType'``.

  - Things done when serializing:

    - Dictionary key names are converted to strings
    - Grids generated with discretize are stored as if they were created using
      emg3d.

  - Things done when de-serializing:

    - ``np.bool_`` is returned as ``bool``.


These first two points are always carried out. After this it depends on the
file format, as different file formats have different limitations.


- ``.h5``:
  Each nesting level creates a new data set.


- ``.npz``:
  The serialized dict is converted into a flattened dict, where the keys are
  separated with ``'>'``.

- ``.json``:

  - NumPy-arrays are turned into lists, where ``'__array-'`` plus the ``dtype``
    are added to the key.
  - Complex numbers are stacked, real values followed by imaginary values;
    ``__complex`` is added to the key.