HDF5 output structure (v2.2)

Important

This version of the HDF5 output structure is outdated. To see the current HDF5 output structure, click here.

The output of a NuRadioMC simulation is saved in the HDF5 file format, as well as (optionally) in .nur files. The data structure of .nur files is explained here. This page outlines the structure of the HDF5 files.

Opening the HDF5 file

The HDF5 file can be opened using the h5py module:

import h5py

f = h5py.File("/path/to/hdf5_file", mode='r')
attributes = f.attrs

(...)
f.close()

If you have many HDF5 files, for example because you ran a simulation parallelized over multiple energy bins, NuRadioMC contains a convenience function to correctly merge these files - see here for instructions.

What’s behind the HDF5 files

The hdf5 file is created in NuRadioMC/simulation/simulation.py A list of vertices with different arrival direction (zenith and azimuth) and energy is provided by the event generator. Starting from the vertex, several sub-showers are created along the track. These are not simulated, but the electric field per sub-shower is provided. Sub-showers that happen within a certain time interval arrive at the antenna simultaneous and interfere constructively, therefore, they are summed up.

The event_group_id is the same for all showers that follow the same first interaction. The shower_id is unique for every shower. Shower which interfere constructively are combined into one event and have the same event_id starting from 0.

../../../_images/event_sketch.png

HDF5 structure

The HDF5 files can be thought of as a structured dictionary:

  • The top level attributes, which can be accessed through f.attrs, contain some top-level information about the simulation.

  • The individual keys contain some properties (energy, vertex, …) for each stored event or shower.

  • Finally, the station_<station_id> key contains slightly more detailed information (triggers, propagation times, amplitudes…) at the level of individual channels for each station.

HDF5 file attributes

The top-level attributes can be accessed using f.attrs. These contain:

  • Emax, Emin

    maximum and minimum energy simulated

  • NuRadioMC_EvtGen_version, NuRadioMC_EvtGen_version_hash

  • NuRadioMC_version, NuRadioMC_version_hash

  • Tnoise

    (explicit) noise temperature used in simulation

  • Vrms

  • area

  • bandwidth

  • config

    the (yaml-style) config file used for the simulation

  • deposited

  • detector

    the (json-format) detector description used for the simulation

  • dt

    the time resolution, i.e. the inverse of the sampling rate used for the simulation. This is not necessarily the same as the sampling rate of the simulated channels!

  • fiducial_rmax, fiducial_rmin, fiducial_zmax, fiducial_zmin

    Specify the simulated fiducial volume

  • flavors

    a list of particle flavors that were simulated, using the PDG convention.

  • n_events

    total number of events simulated (including those that did not trigger)

  • n_samples

  • phimax, phimin

  • rmax, rmin

  • start_event_id

    event_id of the first event in the file

  • thetamax, thetamin

  • trigger_names

    list of the names of the different triggers simulated

  • volume

  • zmax, zmin

HDF5 file contents

The HDF5 file contains the following items. Listed are the key and the shape of each HDF5 dataset, where n_events is the number of events in the file, n_showers is the number of showers (which may be larger than the number of events), and n_triggers is the number of different triggers simulated.

  • azimuths: (n_events,)

  • energies: (n_events,)

  • event_group_ids: (n_events,)

  • flavors: (n_events,)

  • inelasticity: (n_events,)

  • interaction_type: (n_events,)

  • multiple_triggers: (n_events, n_triggers)

  • n_interaction: (n_events,)

  • shower_energies: (n_showers,)

  • shower_ids: (n_showers,)

  • shower_realization_ARZ: (n_showers,)

    Which realization from the ARZ shower library was used for each shower (only if ARZ was used for signal generation).

  • shower_type: (n_showers,)

  • triggered: (n_events,)

    boolean; True if the event triggered on any trigger, False otherwise

  • vertex_times: (n_events,)

  • weights: (n_events,)

  • xx: (n_events,)

  • yy: (n_events,)

  • zeniths: (n_events,)

  • zz: (n_events,)

Station data

In addition, the HDF5 file contains a key for each station in the simulation. The station contains more detailed information for each event that triggered it: n_events and n_shower refer to the number of events and showers that triggered the station. The event_group_id is the same as in the global dictionary. Therefore you can check for one event with an event_group_id which stations contain the same event_group_id and retrieve the information, which station triggered, with which amplitude, etc. The same approach works for shower_id.

  • event_group_ids: (n_events)

  • event_group_id_per_shower': (n_shower)

    event group ids of the triggered events

  • event_ids: (n_events)

  • event_id_per_shower: (n_shower)

    the event ids of each event. These are unique only within each separate event group, and start from 0.

  • focusing_factor: (n_showers, n_channels, n_ray_tracing_solutions)

  • launch_vectors: (n_showers, n_channels, n_ray_tracing_solutions, 3)

    3D (Cartesian) coordinates of the launch vector of each ray tracing solution, per shower and channel.

  • max_amp_shower_and_ray: (n_showers, n_channels, n_ray_tracing_solutions)

    Maximum amplitude per shower, channel and ray tracing solution.

  • maximum_amplitudes: (n_events, n_channels)

    Maximum amplitude per event and channel

  • maximum_amplitudes_envelope: (n_events, n_channels)

    Maximum amplitude of the hilbert envelope for each event and channel

  • multiple_triggers: (n_showers, n_triggers)

    a boolean array that specifies if a shower contributed to an event that fulfills a certain trigger. The index of the trigger can be translated to the trigger name via the attribute trigger_names.

  • multiple_triggers_per_event: (n_events, n_triggers)

    a boolean array that specifies if each event fulfilled a certain trigger. The index of the trigger can be translated to the trigger name via the attribute trigger_names.

  • polarization: (n_shower, n_channels, n_ray_tracing_solutions, 3)

    3D (Cartesian) coordinates of the polarization vector

  • ray_tracing_C0: (n_showers, n_channels, n_ray_tracing_solutions)

    One of two parameters specifying the analytic ray tracing solution. Can be used to retrieve the solutions without having to re-run the ray tracer.

  • ray_tracing_C1: (n_showers, n_channels, n_ray_tracing_solutions)

    One of two parameters specifying the analytic ray tracing solution. Can be used to retrieve the solutions without having to re-run the ray tracer.

  • ray_tracing_reflection: (n_showers, n_channels, n_ray_tracing_solutions)

  • ray_tracing_reflection_case: (n_showers, n_channels, n_ray_tracing_solutions)

  • ray_tracing_solution_type: (n_showers, n_channels, n_ray_tracing_solutions)

  • receive_vectors: (n_showers, n_channels, n_ray_tracing_solutions, 3)

    3D (Cartesian) coordinates of the receive vector of each ray tracing solution, per shower and channel.

  • shower_id: (n_showers,)

  • time_shower_and_ray: (n_showers, n_channels, n_ray_tracing_solutions)

  • travel_distances: (n_showers, n_channels, n_ray_tracing_solutions)

    The distance travelled by each ray tracing solution to a specific channel

  • travel_times: (n_showers, n_channels, n_ray_tracing_solutions)

    The time travelled by each ray tracing solution to a specific channel

  • triggered: (n_showers,)

    Whether or not each shower contributed to an event that satisfied any trigger condition

  • triggered_per_event: (n_events,)

    Whether or not each event fulfilled any trigger condition.