I/O & Persistence

Saving and loading

emg3d has functions to store data to disk and to read data from disk, in different file formats. Currently three file formats are supported, each with its own advantages and disadvantages:

  • .h5: Uses h5py to store inputs to a hierarchical, compressed binary HDF5 file. Recommended file format.

    • Advantage: Widely used, compressed file format, which can be read and written in many programs.

    • Disadvantage: You have to install h5py.

  • .npz: Uses numpy to store inputs to a flat, compressed binary file.

    • Advantage: No extra installation is required, and the outputs are compressed.

    • Disadvantage: Only useful within the Python ecosystem.

  • .json: Uses json to store inputs to a hierarchical, plain text file.

    • Advantage: No extra installation is required, and the output is a plain text file that can be viewed in any editor; good for developing and debugging.

    • Disadvantage: Not compressed (files can become huge).

Example

You should be able to save and load everything you do in emg3d with these functions. Please have a look at the API of emg3d.io.save and emg3d.io.load. But in a nutshell, the first argument is a string containing the relative or absolute path, file name, and the appropriate suffix indicating the file format. Afterwards it is simply a name=value list, where the name can be anything, and the value must be an existing variable. (There are a few more options, see the API.)

In [1]: emg3d.save(
   ...:     '/path/to/filename.ending',
   ...:     inp_model=model1,
   ...:     out_model=model2,
   ...:     survey=survey,
   ...:     efield=efield,
   ...: )
   ...: 
Out[1]: Data saved to «/path/to/filename.ending»

When you load such a file it will give you a dictionary containing as keys the names you have defined:

In [2]: data = emg3d.load('/path/to/filename.ending')
Out[2]: Data loaded from «/path/to/filename.ending»

In [3]: data.keys()
Out[3]: dict_keys(['_date', '_format', '_version', 'efield', 'inp_model', 'out_model', 'survey'])

In addition to the variables you have defined there are a few other, “private” (starting with an underscore) variables such as the date, format, and version of emg3d with which the archive was created.

{to;from}_file

The two classes emg3d.surveys.Survey and emg3d.simulations.Simulation have to_file and from_file methods, which are basically wrappers around the saving and loading functions. They can be used in the following way:

Storing to disk

In [4]: my_survey.to_file('mydata.h5')

and loading from disk

In [5]: my_survey = emg3d.Survey.from_file('mydata.h5')

Serialization

The following are advanced information if you want to read data created with emg3d outside of Python or if you want to create data outside of Python which you can read subsequently with emg3d. As a pure end-user of emg3d you can ignore this section.

Here a few info with regards to the (de-)serialization used in emg3d.

  • When invoking emg3d.save('filename.ending', a=a, b=something, foo=bar), the data is collected in a dict {'a': a, 'b': something, 'foo': bar}.

  • Afterwards the dict is serialized. Instances of emg3d (emg3d.meshes.TensorMesh, emg3d.fields.Field, emg3d.surveys.Survey, emg3d.simulations.Simulation) have to_dict and from_dict methods to (de-)serialize themselves. These are used when saving and loading them. In principal emg3d can save everything that is either serialized already or is present in emg3d.utils._KNOWN_CLASSES. You can define your own classes which have {to;from}_dict methods, and add them to the known classes with the decorator @utils._known_class.

    • Things which are done when serializing and undone when de-serializing:

      • None is saved as a string 'NoneType'.

    • Things done when serializing:

      • Dictionary key names are converted to strings

      • Grids generated with discretize are stored as if they were created using emg3d.

    • Things done when de-serializing:

      • np.bool_ is returned as bool.

These first two points are always carried out. After this it depends on the file format, as different file formats have different limitations.

  • .h5: Each nesting level creates a new data set.

  • .npz: The serialized dict is converted into a flattened dict, where the keys are separated with '>'.

  • .json:

    • NumPy-arrays are turned into lists, where '__array-' plus the dtype are added to the key.

    • Complex numbers are stacked, real values followed by imaginary values; __complex is added to the key.