Quantify dataset - advanced examples

See also

The complete source code of this tutorial can be found in

Quantify dataset - advanced examples.py.ipynb

Quantify dataset - advanced examples.py.py

Here we will explore a few advanced usages of the quantify dataset and how it can accommodate them.

Dataset for an “unstructured” experiment

Let’s take consider a Surface Code experiment, in particular the one portrayed in Fig. 4b from one of the papers from DiCarlo Lab [Marques et al., 2021].

For simplicity, we will not use exactly the same schedule, because what matters here are the measurements. It is difficult to deal with the results of these measurements because we have a few repeating cycles followed by a final measurement that leaves the overall dataset “unstructured”.

../../_images/surface-7-sched.png

How do we store all the shots for this measurement? We might want this because, e.g., we know we have an issue with leakage to the second excited state of a transmon and we would like to be able to store and inspect raw data.

To support such use-case we will have a dimension in dataset for the repeating cycles and one extra dimension for the final measurement.

# mock data parameters
num_shots = 128  # NB usually >~1000 in real experiments
ground = -0.2 + 0.65j
excited = 0.7 + 4j
centroids = ground, excited
sigmas = [0.1] * 2

display_source_code(mk_iq_shots)
display_source_code(mk_shots_from_probabilities)
display_source_code(mk_surface7_cyles_dataset)
def mk_iq_shots(
    num_shots: int = 128,
    sigmas: Union[Tuple[float], np.ndarray] = (0.1, 0.1),
    centers: Union[Tuple[complex], np.ndarray] = (-0.2 + 0.65j, 0.7 + 4j),
    probabilities: Union[Tuple[float], np.ndarray] = (0.4, 0.6),
    seed: Union[int, None] = 112233,
) -> np.ndarray:
    """
    Generates clusters of (I + 1j*Q) points with a Gaussian distribution with the
    specified sigmas and centers according to the probabilities of each cluster

    .. admonition:: Examples
        :class: dropdown

        .. include:: examples/utilities.examples_support.mk_iq_shots.py.rst.txt

    Parameters
    ----------
    num_shots
        The number of shot to generate.
    sigma
        The sigma of the Gaussian distribution used for both real and imaginary parts.
    centers
        The center of each cluster on the imaginary plane.
    probabilities
        The probabilities of each cluster being randomly selected for each shot.
    seed
        Random number generator seed passed to ``numpy.random.default_rng``.
    """
    if not len(sigmas) == len(centers) == len(probabilities):
        raise ValueError(
            f"Incorrect input. sigmas={sigmas}, centers={centers} and "
            f"probabilities={probabilities} must have the same length."
        )

    rng = np.random.default_rng(seed=seed)

    cluster_indices = tuple(range(len(centers)))
    choices = rng.choice(a=cluster_indices, size=num_shots, p=probabilities)

    shots = []
    for idx in cluster_indices:
        num_shots_this_cluster = np.sum(choices == idx)
        i_data = rng.normal(
            loc=centers[idx].real,
            scale=sigmas[idx],
            size=num_shots_this_cluster,
        )
        q_data = rng.normal(
            loc=centers[idx].imag,
            scale=sigmas[idx],
            size=num_shots_this_cluster,
        )
        shots.append(i_data + 1j * q_data)
    return np.concatenate(shots)
def mk_shots_from_probabilities(probabilities: Union[np.ndarray, list], **kwargs):
    """Generates multiple shots for a list of probabilities assuming two states.

    Parameters
    ----------
    probabilities
        The list/array of the probabilities of one of the states.
    **kwargs
        Keyword arguments passed to
        :func:`~quantify_core.utilities.examples_support.mk_iq_shots`.

    Returns
    -------
    :
        Array containing the shots. Shape: (num_shots, len(probabilities)).
    """

    shots = np.array(
        tuple(
            mk_iq_shots(probabilities=[prob, 1 - prob], **kwargs)
            for prob in probabilities
        )
    ).T

    return shots
def mk_surface7_cyles_dataset(num_cycles: int = 3, **kwargs) -> xr.Dataset:
    """
    See also :func:`quantify_core.utilities.examples_support.mk_surface7_sched`.

    Parameters
    ----------
    num_cycles
        The number of repeating cycles before the final measurement.
    **kwargs
        Keyword arguments passed to :func:`~.mk_shots_from_probabilities`.
    """

    cycles = range(num_cycles)

    mock_data = mk_shots_from_probabilities(
        probabilities=[np.random.random() for _ in cycles], **kwargs
    )

    mock_data_final = mk_shots_from_probabilities(
        probabilities=[np.random.random()], **kwargs
    )

    # %%
    data_vars = {}

    # NB same random data is used for all qubits only for the simplicity of the mock!
    for qubit in (f"A{i}" for i in range(3)):
        data_vars[f"{qubit}_shots"] = (
            ("repetitions", "dim_cycle"),
            mock_data,
            mk_main_var_attrs(
                unit="V", long_name=f"IQ amplitude {qubit}", has_repetitions=True
            ),
        )

    for qubit in (f"D{i}" for i in range(4)):
        data_vars[f"{qubit}_shots"] = (
            ("repetitions", "dim_final"),
            mock_data_final,
            mk_main_var_attrs(
                unit="V", long_name=f"IQ amplitude {qubit}", has_repetitions=True
            ),
        )

    cycle_attrs = mk_main_coord_attrs(long_name="Surface code cycle number")
    final_msmt_attrs = mk_main_coord_attrs(long_name="Final measurement")
    coords = dict(
        cycle=("dim_cycle", cycles, cycle_attrs),
        final_msmt=("dim_final", [0], final_msmt_attrs),
    )

    dataset = xr.Dataset(
        data_vars=data_vars,
        coords=coords,
        attrs=mk_dataset_attrs(),
    )

    return dataset
dataset = mk_surface7_cyles_dataset(
    num_shots=num_shots, sigmas=sigmas, centers=centroids
)

assert dataset == round_trip_dataset(dataset)  # confirm read/write

dataset
<xarray.Dataset>
Dimensions:     (repetitions: 128, dim_cycle: 3, dim_final: 1)
Coordinates:
    cycle       (dim_cycle) int64 0 1 2
    final_msmt  (dim_final) int64 0
Dimensions without coordinates: repetitions, dim_cycle, dim_final
Data variables:
    A0_shots    (repetitions, dim_cycle) complex128 (-0.23630343679164473+0.5...
    A1_shots    (repetitions, dim_cycle) complex128 (-0.23630343679164473+0.5...
    A2_shots    (repetitions, dim_cycle) complex128 (-0.23630343679164473+0.5...
    D0_shots    (repetitions, dim_final) complex128 (-0.23630343679164473+0.5...
    D1_shots    (repetitions, dim_final) complex128 (-0.23630343679164473+0.5...
    D2_shots    (repetitions, dim_final) complex128 (-0.23630343679164473+0.5...
    D3_shots    (repetitions, dim_final) complex128 (-0.23630343679164473+0.5...
Attributes:
    tuid:                      20211101-201107-743-41c27e
    dataset_name:              
    dataset_state:             None
    timestamp_start:           None
    timestamp_end:             None
    quantify_dataset_version:  2.0.0
    software_versions:         {}
    relationships:             []
    json_serialize_exclude:    []
dataset.A1_shots.shape, dataset.D1_shots.shape
((128, 3), (128, 1))
dataset_gridded = dh.to_gridded_dataset(
    dataset, dimension="dim_cycle", coords_names=["cycle"]
)
dataset_gridded = dh.to_gridded_dataset(
    dataset_gridded, dimension="dim_final", coords_names=["final_msmt"]
)
dataset_gridded
<xarray.Dataset>
Dimensions:     (final_msmt: 1, cycle: 3, repetitions: 128)
Coordinates:
  * final_msmt  (final_msmt) int64 0
  * cycle       (cycle) int64 0 1 2
Dimensions without coordinates: repetitions
Data variables:
    A0_shots    (repetitions, cycle) complex128 (-0.23630343679164473+0.57959...
    A1_shots    (repetitions, cycle) complex128 (-0.23630343679164473+0.57959...
    A2_shots    (repetitions, cycle) complex128 (-0.23630343679164473+0.57959...
    D0_shots    (repetitions, final_msmt) complex128 (-0.23630343679164473+0....
    D1_shots    (repetitions, final_msmt) complex128 (-0.23630343679164473+0....
    D2_shots    (repetitions, final_msmt) complex128 (-0.23630343679164473+0....
    D3_shots    (repetitions, final_msmt) complex128 (-0.23630343679164473+0....
Attributes:
    tuid:                      20211101-201107-743-41c27e
    dataset_name:              
    dataset_state:             None
    timestamp_start:           None
    timestamp_end:             None
    quantify_dataset_version:  2.0.0
    software_versions:         {}
    relationships:             []
    json_serialize_exclude:    []
dataset_gridded.A0_shots.real.mean("repetitions").plot(marker="o", label="I-quadrature")
dataset_gridded.A0_shots.imag.mean("repetitions").plot(marker="^", label="Q-quadrature")
_ = plt.gca().legend()
../../_images/Quantify dataset - advanced examples.py_8_0.png

Dataset for a “nested MeasurementControl” experiment

Now consider a dataset that has been constructed by an experiment involving the operation of two MeasurementControl objects. The second of them performs a “meta” outer loop in which we sweep a flux bias and then perform several experiments to characterize a transmon qubit, e.g. determining the frequency of a read-out resonator, the frequency of the transmon, and its T1 lifetime.

Below we showcase what the data from the dataset containing the T1 experiment results could look like

fig, ax = plt.subplots()
rng = np.random.default_rng(seed=112244)  # random number generator

num_t1_datasets = 7
t1_times = np.linspace(0, 120e-6, 30)

for tau in rng.uniform(10e-6, 50e-6, num_t1_datasets):
    probabilities = exp_decay_func(
        t=t1_times, tau=tau, offset=0, n_factor=1, amplitude=1
    )
    dataset = dataset_examples.mk_t1_av_with_cal_dataset(t1_times, probabilities)

    round_trip_dataset(dataset)  # confirm read/write
    dataset_g = dh.to_gridded_dataset(
        dataset, dimension="main_dim", coords_names=["t1_time"]
    )
    # rotate the iq data
    rotated_and_normalized = rotate_to_calibrated_axis(
        dataset_g.q0_iq_av.values, *dataset_g.q0_iq_av_cal.values
    )
    rotated_and_normalized_da = xr.DataArray(dataset_g.q0_iq_av)
    rotated_and_normalized_da.values = rotated_and_normalized
    rotated_and_normalized_da.attrs["long_name"] = "|1> Population"
    rotated_and_normalized_da.attrs["units"] = ""
    rotated_and_normalized_da.real.plot(ax=ax, label=dataset.tuid, marker=".")
ax.set_title("Results from repeated T1 experiments\n(different datasets)")
_ = ax.legend()
../../_images/Quantify dataset - advanced examples.py_9_0.png

Since the raw data is now split among several datasets, we would like to keep a reference to all these datasets in our “combined” datasets. Below we showcase how this can be achieved, along with some useful xarray features and known limitations.

We start by generating a mock dataset that combines all the information that would have been obtained from analyzing a series of other datasets.

display_source_code(mk_nested_mc_dataset)
def mk_nested_mc_dataset(
    num_points: int = 12,
    flux_bias_min_max: tuple = (-0.04, 0.04),
    resonator_freqs_min_max: tuple = (7e9, 7.3e9),
    qubit_freqs_min_max: tuple = (4.5e9, 5.0e9),
    t1_values_min_max: tuple = (20e-6, 50e-6),
    seed: Optional[int] = 112233,
) -> xr.Dataset:
    """
    Generates a dataset with dataset references and several coordinates that serve to
    index the same variables.

    Note that the each value for ``resonator_freqs``, ``qubit_freqs`` and ``t1_values``
    would have been extracted from other dataset corresponding to individual experiments
    with their own dataset.

    Parameters
    ----------
    num_points
        Number of datapoints to generate (used for all variables/coordinates).
    flux_bias_min_max
        Range for mock values.
    resonator_freqs_min_max
        Range for mock values.
    qubit_freqs_min_max
        Range for mock values.
    t1_values_min_max
        Range for mock random values.
    seed
        Random number generator seed passed to ``numpy.random.default_rng``.
    """
    rng = np.random.default_rng(seed=seed)  # random number generator

    flux_bias_vals = np.linspace(*flux_bias_min_max, num_points)
    resonator_freqs = np.linspace(*resonator_freqs_min_max, num_points)
    qubit_freqs = np.linspace(*qubit_freqs_min_max, num_points)
    t1_values = rng.uniform(*t1_values_min_max, num_points)

    resonator_freq_tuids = [dh.gen_tuid() for _ in range(num_points)]
    qubit_freq_tuids = [dh.gen_tuid() for _ in range(num_points)]
    t1_tuids = [dh.gen_tuid() for _ in range(num_points)]

    coords = dict(
        flux_bias=(
            "main_dim",
            flux_bias_vals,
            mk_main_coord_attrs(long_name="Flux bias", unit="A"),
        ),
        resonator_freq_tuids=(
            "main_dim",
            resonator_freq_tuids,
            mk_main_coord_attrs(
                long_name="Dataset TUID resonator frequency", is_dataset_ref=True
            ),
        ),
        qubit_freq_tuids=(
            "main_dim",
            qubit_freq_tuids,
            mk_main_coord_attrs(
                long_name="Dataset TUID qubit frequency", is_dataset_ref=True
            ),
        ),
        t1_tuids=(
            "main_dim",
            t1_tuids,
            mk_main_coord_attrs(long_name="Dataset TUID T1", is_dataset_ref=True),
        ),
    )

    data_vars = dict(
        resonator_freq=(
            "main_dim",
            resonator_freqs,
            mk_main_var_attrs(long_name="Resonator frequency", unit="Hz"),
        ),
        qubit_freq=(
            "main_dim",
            qubit_freqs,
            mk_main_var_attrs(long_name="Qubit frequency", unit="Hz"),
        ),
        t1=(
            "main_dim",
            t1_values,
            mk_main_var_attrs(long_name="T1", unit="s"),
        ),
    )
    dataset_attrs = mk_dataset_attrs()

    dataset = xr.Dataset(data_vars=data_vars, coords=coords, attrs=dataset_attrs)

    return dataset
dataset = mk_nested_mc_dataset(num_points=num_t1_datasets)
assert dataset == round_trip_dataset(dataset)  # confirm read/write
dataset
<xarray.Dataset>
Dimensions:               (main_dim: 7)
Coordinates:
    flux_bias             (main_dim) float64 -0.04 -0.02667 ... 0.02667 0.04
    resonator_freq_tuids  (main_dim) <U26 '20211101-201108-789-b66dec' ... '2...
    qubit_freq_tuids      (main_dim) <U26 '20211101-201108-790-8c34a0' ... '2...
    t1_tuids              (main_dim) <U26 '20211101-201108-790-a8d674' ... '2...
Dimensions without coordinates: main_dim
Data variables:
    resonator_freq        (main_dim) float64 7e+09 7.05e+09 ... 7.25e+09 7.3e+09
    qubit_freq            (main_dim) float64 4.5e+09 4.583e+09 ... 5e+09
    t1                    (main_dim) float64 4.238e-05 3.867e-05 ... 4.154e-05
Attributes:
    tuid:                      20211101-201108-791-c63776
    dataset_name:              
    dataset_state:             None
    timestamp_start:           None
    timestamp_end:             None
    quantify_dataset_version:  2.0.0
    software_versions:         {}
    relationships:             []
    json_serialize_exclude:    []

In this case the four main coordinates are not orthogonal coordinates, but instead just different label for the same data points, also known as a “multi-index”.

fig, axs = plt.subplots(3, 1, figsize=(10, 10), sharex=True)

_ = dataset.t1.plot(x="flux_bias", marker="o", ax=axs[0].twiny(), color="C0")
x = "t1_tuids"
_ = dataset.t1.plot(x=x, marker="o", ax=axs[0], color="C0")
_ = dataset.resonator_freq.plot(x=x, marker="o", ax=axs[1], color="C1")
_ = dataset.qubit_freq.plot(x=x, marker="o", ax=axs[2], color="C2")
for tick in axs[2].get_xticklabels():
    tick.set_rotation(15)  # avoid tuid labels overlapping
../../_images/Quantify dataset - advanced examples.py_12_0.png

It is possible to work with an explicit MultiIndex within a (python) xarray object:

dataset_multi_indexed = dataset.set_index({"main_dim": tuple(dataset.t1.coords.keys())})
dataset_multi_indexed
<xarray.Dataset>
Dimensions:               (main_dim: 7)
Coordinates:
  * main_dim              (main_dim) MultiIndex
  - flux_bias             (main_dim) float64 -0.04 -0.02667 ... 0.02667 0.04
  - resonator_freq_tuids  (main_dim) object '20211101-201108-789-b66dec' ... ...
  - qubit_freq_tuids      (main_dim) object '20211101-201108-790-8c34a0' ... ...
  - t1_tuids              (main_dim) object '20211101-201108-790-a8d674' ... ...
Data variables:
    resonator_freq        (main_dim) float64 7e+09 7.05e+09 ... 7.25e+09 7.3e+09
    qubit_freq            (main_dim) float64 4.5e+09 4.583e+09 ... 5e+09
    t1                    (main_dim) float64 4.238e-05 3.867e-05 ... 4.154e-05
Attributes:
    tuid:                      20211101-201108-791-c63776
    dataset_name:              
    dataset_state:             None
    timestamp_start:           None
    timestamp_end:             None
    quantify_dataset_version:  2.0.0
    software_versions:         {}
    relationships:             []
    json_serialize_exclude:    []

The MultiIndex is very handy for selecting data in different ways, e.g.:

index = 2
dataset_multi_indexed.qubit_freq.sel(
    qubit_freq_tuids=dataset_multi_indexed.qubit_freq_tuids.values[index]
)
<xarray.DataArray 'qubit_freq' (main_dim: 1)>
array([4.66666667e+09])
Coordinates:
  * main_dim              (main_dim) MultiIndex
  - flux_bias             (main_dim) float64 -0.01333
  - resonator_freq_tuids  (main_dim) object '20211101-201108-790-e5cdce'
  - t1_tuids              (main_dim) object '20211101-201108-790-8387bc'
Attributes:
    unit:                    Hz
    long_name:               Qubit frequency
    is_main_var:             True
    uniformly_spaced:        True
    grid:                    True
    is_dataset_ref:          False
    has_repetitions:         False
    json_serialize_exclude:  []
dataset_multi_indexed.qubit_freq.sel(t1_tuids=dataset.t1_tuids.values[index])
<xarray.DataArray 'qubit_freq' (main_dim: 1)>
array([4.66666667e+09])
Coordinates:
  * main_dim              (main_dim) MultiIndex
  - flux_bias             (main_dim) float64 -0.01333
  - resonator_freq_tuids  (main_dim) object '20211101-201108-790-e5cdce'
  - qubit_freq_tuids      (main_dim) object '20211101-201108-790-0ae41c'
Attributes:
    unit:                    Hz
    long_name:               Qubit frequency
    is_main_var:             True
    uniformly_spaced:        True
    grid:                    True
    is_dataset_ref:          False
    has_repetitions:         False
    json_serialize_exclude:  []

Known limitations

Unfortunately, at the moment the MultiIndex has the problem of not being compatible with the NetCDF format used to write to disk:

try:
    assert dataset_multi_indexed == round_trip_dataset(dataset_multi_indexed)
except NotImplementedError as exp:
    print(exp)
variable 'main_dim' is a MultiIndex, which cannot yet be serialized to netCDF files (https://github.com/pydata/xarray/issues/1077). Use reset_index() to convert MultiIndex levels into coordinate variables instead.

We could make our load/write utilities to take care of setting and resetting the index under the hood. Though there are some nuances there as well. If we would do that then some extra metadata needs to be stored in order to store/restore the multi-index. At the moment, the MultiIndex is not supported when writing a Quantify dataset to disk. Below we show a few complications related to this.

Fortunately, the MultiIndex can be reset back:

dataset_multi_indexed.reset_index(dims_or_levels="main_dim")
<xarray.Dataset>
Dimensions:               (main_dim: 7)
Coordinates:
    flux_bias             (main_dim) float64 -0.04 -0.02667 ... 0.02667 0.04
    resonator_freq_tuids  (main_dim) object '20211101-201108-789-b66dec' ... ...
    qubit_freq_tuids      (main_dim) object '20211101-201108-790-8c34a0' ... ...
    t1_tuids              (main_dim) object '20211101-201108-790-a8d674' ... ...
Dimensions without coordinates: main_dim
Data variables:
    resonator_freq        (main_dim) float64 7e+09 7.05e+09 ... 7.25e+09 7.3e+09
    qubit_freq            (main_dim) float64 4.5e+09 4.583e+09 ... 5e+09
    t1                    (main_dim) float64 4.238e-05 3.867e-05 ... 4.154e-05
Attributes:
    tuid:                      20211101-201108-791-c63776
    dataset_name:              
    dataset_state:             None
    timestamp_start:           None
    timestamp_end:             None
    quantify_dataset_version:  2.0.0
    software_versions:         {}
    relationships:             []
    json_serialize_exclude:    []
all(dataset_multi_indexed.reset_index("main_dim").t1_tuids == dataset.t1_tuids)
True

But, for example, the dtype has been changed to object (from fixed-length string):

dataset.t1_tuids.dtype, dataset_multi_indexed.reset_index("main_dim").t1_tuids.dtype
(dtype('<U26'), dtype('O'))
dataset.t1_tuids.dtype == dataset_multi_indexed.reset_index("main_dim").t1_tuids.dtype
False