Quantify dataset specification#

See also

The complete source code of this tutorial can be found in

Quantify dataset - specification.ipynb

This document describes the Quantify dataset specification. Here we focus on the concepts and terminology specific to the Quantify dataset. It is based on the Xarray dataset, hence, we assume basic familiarity with the xarray.Dataset. If you are not familiar with it, we highly recommend to first have a look at our Xarray - brief introduction for a brief overview.

Dataset attributes#

The required attributes of the Quantify dataset are defined by the following dataclass. It can be used to generate a default dictionary that is attached to a dataset under the xarray.Dataset.attrs attribute.

class QDatasetAttrs(tuid=None, dataset_name='', dataset_state=None, timestamp_start=None, timestamp_end=None, quantify_dataset_version='2.0.0', software_versions=<factory>, relationships=<factory>, json_serialize_exclude=<factory>)[source]

Bases: DataClassJsonMixin

A dataclass representing the attrs attribute of the Quantify dataset.

See also

Required dataset attributes

dataset_name: str = '': The dataset name, usually same as the the experiment name included in the name of the experiment container.

dataset_state: Literal[None, 'running', 'interrupted (safety)', 'interrupted (forced)', 'done'] = None: Denotes the last known state of the experiment/data acquisition that served to ‘build’ this dataset. Can be used later to filter ‘bad’ datasets.

json_serialize_exclude: List[str]: A list of strings corresponding to the names of other attributes that should not be json-serialized when writing the dataset to disk. Empty by default.

quantify_dataset_version: str = '2.0.0': A string identifying the version of this Quantify dataset for backwards compatibility.

relationships: List[QDatasetIntraRelationship]: A list of relationships within the dataset specified as list of dictionaries that comply with the QDatasetIntraRelationship.

software_versions: Dict[str, str]: A mapping of other relevant software packages that are relevant to log for this dataset. Another example is the git tag or hash of a commit of a lab repository.

timestamp_end: Union[str, None] = None: Human-readable timestamp (ISO8601) as returned by datetime.datetime.now().astimezone().isoformat(). Specifies when the experiment/data acquisition ended.

timestamp_start: Union[str, None] = None: Human-readable timestamp (ISO8601) as returned by datetime.datetime.now().astimezone().isoformat(). Specifies when the experiment/data acquisition started.

tuid: Union[str, None] = None: The time-based unique identifier of the dataset. See quantify_core.data.types.TUID.

Additionally in order to express relationships between coordinates and/or variables the following template is provided:

class QDatasetIntraRelationship(item_name=None, relation_type=None, related_names=<factory>, relation_metadata=<factory>)[source]

Bases: DataClassJsonMixin

A dataclass representing a dictionary that specifies a relationship between dataset variables.

A prominent example are calibration points contained within one variable or several variables that are necessary to interpret correctly the data of another variable.

item_name: str | None = None: The name of the coordinate/variable to which we want to relate other coordinates/variables.

related_names: List[str]: A list of names related to the item_name.

relation_metadata: Dict[str, Any]: A free-form dictionary to store additional information relevant to this relationship.

relation_type: str | None = None

A string specifying the type of relationship.

Reserved relation types:

"calibration" - Specifies a list of main variables used as calibration data for the main variables whose name is specified by the item_name.

from quantify_core.data.dataset_attrs import QDatasetAttrs

# tip: to_json and from_dict, from_json  are also available
dataset.attrs = QDatasetAttrs().to_dict()
dataset.attrs

{
    'tuid': None,
    'dataset_name': '',
    'dataset_state': None,
    'timestamp_start': None,
    'timestamp_end': None,
    'quantify_dataset_version': '2.0.0',
    'software_versions': {},
    'relationships': [],
    'json_serialize_exclude': []
}

Note that xarray automatically provides the entries of the dataset attributes as python attributes. And similarly for the xarray coordinates and data variables.

dataset.quantify_dataset_version, dataset.tuid

('2.0.0', None)

Main coordinates and variables attributes#

Similar to the dataset attributes (xarray.Dataset.attrs), the main coordinates and variables have each their own required attributes attached to them as a dictionary under the xarray.DataArray.attrs attribute.

class QCoordAttrs(unit='', long_name='', is_main_coord=None, uniformly_spaced=None, is_dataset_ref=False, json_serialize_exclude=<factory>)[source]

Bases: DataClassJsonMixin

A dataclass representing the required attrs attribute of main and secondary coordinates.

See also

Required dataset coordinate attributes

is_dataset_ref: bool = False: Flags if it is an array of quantify_core.data.types.TUID s of other dataset.

is_main_coord: bool | None = None: When set to True, flags the xarray coordinate to correspond to a main coordinate, otherwise (False) it corresponds to a secondary coordinate.

json_serialize_exclude: List[str]: A list of strings corresponding to the names of other attributes that should not be json-serialized when writing the dataset to disk. Empty by default.

long_name: str = '': A long name for this coordinate.

uniformly_spaced: bool | None = None: Indicates if the values are uniformly spaced.

unit: str = '': The units of the values.

dataset.amp.attrs

{
    'unit': 'V',
    'long_name': 'Amplitude',
    'is_main_coord': True,
    'uniformly_spaced': True,
    'is_dataset_ref': False,
    'json_serialize_exclude': []
}

class QVarAttrs(unit='', long_name='', is_main_var=None, uniformly_spaced=None, grid=None, is_dataset_ref=False, has_repetitions=False, json_serialize_exclude=<factory>)[source]

Bases: DataClassJsonMixin

A dataclass representing the required attrs attribute of main and secondary variables.

grid: bool | None = None

Indicates if the variables data are located on a grid, which does not need to be uniformly spaced along all dimensions. In other words, specifies if the corresponding main coordinates are the ‘unrolled’ points (also known as ‘unstacked’) corresponding to a grid.

If True than it is possible to use quantify_core.data.handling.to_gridded_dataset() to convert the variables to a ‘stacked’ version.

has_repetitions: bool = False: Indicates that the outermost dimension of this variable is a repetitions dimension. This attribute is intended to allow easy programmatic detection of such dimension. It can be used, for example, to average along this dimension before an automatic live plotting or analysis.

is_dataset_ref: bool = False: Flags if it is an array of quantify_core.data.types.TUID s of other dataset. See also Dataset for a “nested MeasurementControl” experiment.

is_main_var: bool | None = None: When set to True, flags this xarray data variable to correspond to a main variable, otherwise (False) it corresponds to a secondary variable.

json_serialize_exclude: List[str]: A list of strings corresponding to the names of other attributes that should not be json-serialized when writing the dataset to disk. Empty by default.

long_name: str = '': A long name for this coordinate.

uniformly_spaced: bool | None = None: Indicates if the values are uniformly spaced. This does not apply to ‘true’ main variables but, because a MultiIndex is not supported yet by xarray when writing to disk, some coordinate variables have to be stored as main variables instead.

unit: str = '': The units of the values.

dataset.pop_q0.attrs

{
    'unit': '',
    'long_name': 'Population Q0',
    'is_main_var': True,
    'uniformly_spaced': True,
    'grid': True,
    'is_dataset_ref': False,
    'has_repetitions': True,
    'json_serialize_exclude': []
}

Storage format#

The Quantify dataset is written to disk and loaded back making use of xarray-supported facilities. Internally we write and load to/from disk using:

display_source_code(dh.write_dataset)
display_source_code(dh.load_dataset)

def write_dataset(path: Path | str, dataset: xr.Dataset) -> None:
    """Writes a :class:`~xarray.Dataset` to a file with the `h5netcdf` engine.

    Before writing the
    :meth:`~quantify_core.data.dataset_adapters.AdapterH5NetCDF.adapt`
    is applied.

    To accommodate for complex-type numbers and arrays ``invalid_netcdf=True`` is used.

    Parameters
    ----------
    path
        Path to the file including filename and extension
    dataset
        The :class:`~xarray.Dataset` to be written to file.
    """  # pylint: disable=line-too-long
    _xarray_numpy_bool_patch(dataset)  # See issue #161 in quantify-core
    # Only quantify_dataset_version=>2.0.0 requires the adapter
    if "quantify_dataset_version" in dataset.attrs:
        dataset = da.AdapterH5NetCDF.adapt(dataset)
    dataset.to_netcdf(path, engine="h5netcdf", invalid_netcdf=True)

def load_dataset(
    tuid: TUID,
    datadir: Path | str | None = None,
    name: str = DATASET_NAME,
) -> xr.Dataset:
    """Loads a dataset specified by a tuid.

    .. tip::

        This method also works when specifying only the first part of a
        :class:`~quantify_core.data.types.TUID`.

    .. note::

        This method uses :func:`~.load_dataset` to ensure the file is closed after
        loading as datasets are intended to be immutable after performing the initial
        experiment.

    Parameters
    ----------
    tuid
        A :class:`~quantify_core.data.types.TUID` string. It is also possible to specify
        only the first part of a tuid.
    datadir
        Path of the data directory. If ``None``, uses :meth:`~get_datadir` to determine
        the data directory.
    name
        Name of the dataset.

    Returns
    -------
    :
        The dataset.

    Raises
    ------
    FileNotFoundError
        No data found for specified date.
    """
    return load_dataset_from_path(_locate_experiment_file(tuid, datadir, name))

Note that we use the h5netcdf engine which is more permissive than the default NetCDF engine to accommodate arrays of complex numbers.

Note

Furthermore, in order to support a variety of attribute types (e.g. the None type) and shapes (e.g. nested dictionaries) in a seamless dataset round trip, some additional tooling is required. See source codes below that implements the two-way conversion adapter used by the functions shown above.

display_source_code(dadapters.AdapterH5NetCDF)

class AdapterH5NetCDF(DatasetAdapterBase):
    """
    Quantify dataset adapter for the ``h5netcdf`` engine.

    It has the functionality of adapting the Quantify dataset to a format compatible
    with the ``h5netcdf`` xarray backend engine that is used to write and load the
    dataset to/from disk.

    .. warning::

        The ``h5netcdf`` engine has minor issues when performing a two-way trip of the
        dataset. The ``type`` of some attributes are not preserved. E.g., list- and
        tuple-like objects are loaded as numpy arrays of ``dtype=object``.
    """

    @classmethod
    def adapt(cls, dataset: xr.Dataset) -> xr.Dataset:
        """
        Serializes to JSON the dataset and variables attributes.

        To prevent the JSON serialization for specific items, their names should be
        listed under the attribute named ``json_serialize_exclude`` (for each ``attrs``
        dictionary).

        Parameters
        ----------
        dataset
            Dataset that needs to be adapted.

        Returns
        -------
        :
            Dataset in which the attributes have been replaced with their JSON strings
            version.
        """

        return cls._transform(dataset, vals_converter=json.dumps)

    @classmethod
    def recover(cls, dataset: xr.Dataset) -> xr.Dataset:
        """
        Reverts the action of ``.adapt()``.

        To prevent the JSON de-serialization for specific items, their names should be
        listed under the attribute named ``json_serialize_exclude``
        (for each ``attrs`` dictionary).

        Parameters
        ----------
        dataset
            Dataset from which to recover the original format.

        Returns
        -------
        :
            Dataset in which the attributes have been replaced with their python objects
            version.
        """

        return cls._transform(dataset, vals_converter=json.loads)

    @staticmethod
    def attrs_convert(
        attrs: dict,
        inplace: bool = False,
        vals_converter: Callable[Any, Any] = json.dumps,
    ) -> dict:
        """
        Converts to/from JSON string the values of the keys which are not listed in the
        ``json_serialize_exclude`` list.

        Parameters
        ----------
        attrs
            The input dictionary.
        inplace
            If ``True`` the values are replaced in place, otherwise a deepcopy of
            ``attrs`` is performed first.
        """
        json_serialize_exclude = attrs.get("json_serialize_exclude", [])

        attrs = attrs if inplace else deepcopy(attrs)
        for attr_name, attr_val in attrs.items():
            if attr_name not in json_serialize_exclude:
                attrs[attr_name] = vals_converter(attr_val)
        return attrs

    @classmethod
    def _transform(
        cls, dataset: xr.Dataset, vals_converter: Callable[Any, Any] = json.dumps
    ) -> xr.Dataset:
        dataset = xr.Dataset(
            dataset,
            attrs=cls.attrs_convert(
                dataset.attrs, inplace=False, vals_converter=vals_converter
            ),
        )

        for var_name in dataset.variables.keys():
            # The new dataset generated above has already a deepcopy of the attributes.
            _ = cls.attrs_convert(
                dataset[var_name].attrs, inplace=True, vals_converter=vals_converter
            )

        return dataset

Quantify dataset specification#

Coordinates and Variables#

Main coordinate(s)#

Secondary coordinate(s)#

Main variable(s)#

Secondary variables(s)#

2D example#

Dimensions#

Main dimension(s) [Required]#

Secondary dimension(s) [Optional]#

Repetitions dimension(s) [Optional]#

Example datasets with repetition#

Dataset attributes#

Main coordinates and variables attributes#

Storage format#