HDF5 output structure
The output of a NuRadioMC simulation is saved in the HDF5 file format, as well as (optionally) in .nur
files.
The data structure of .nur
files is explained here.
This page outlines the structure of the HDF5 files v3.0. Find the structure of v2.2 here.
Opening the HDF5 file
The HDF5 file can be opened using the h5py
module:
import h5py
f = h5py.File("/path/to/hdf5_file", mode='r')
attributes = f.attrs
(...)
f.close()
If you have many HDF5 files, for example because you ran a simulation parallelized over multiple energy bins, NuRadioMC contains a convenience function to correctly merge these files - see here for instructions.
What’s behind the HDF5 files
The hdf5 file is created in NuRadioMC/simulation/simulation.py A list of vertices with different arrival direction (zenith and azimuth) and energy is provided by the event generator. Starting from the vertex, several sub-showers are created along the track. These are not simulated, but the electric field per sub-shower is provided. Sub-showers that happen within a certain time interval arrive at the antenna simultaneous and interfere constructively, therefore, they are summed up.
The event_group_id
is the same for all showers that follow the same first interaction.
The shower_id
is unique for every shower. Shower which interfere constructively are combined into one event and have
the same event_id
starting from 0.
HDF5 structure
The HDF5 files can be thought of as a structured dictionary:
The top level attributes, which can be accessed through
f.attrs
, contain some top-level information about the simulation.The individual keys contain some properties (energy, vertex, …) for each stored event or shower.
Finally, the
station_<station_id>
key contains slightly more detailed information (triggers, propagation times, amplitudes…) at the level of individual channels for each station. Each station group has its own attributes (f[station_<station_id>].attrs
)
HDF5 file attributes
The top-level attributes can be accessed using f.attrs
. These contain:
HDF5 attributes Key
Description
NuRadioMC_EvtGen_version
NuRadioMC_EvtGen_version_hash
NuRadioMC_version
NuRadioMC_version_hash
Versions of the generator/framework as integer as hash
Emin
Emax
Define energy range for neutrino energies
phimax
phimin
Define azimuth range for incoming neutrino directions
thetamax
thetamin
Define zenith range for incoming neutrino directions
flavors
A list of particle flavors that were simulated, using the PDG convention.
n_events
Total number of generated/simulated events(including those that did not trigger)
fiducial_xmax
fiducial_xmin
fiducial_ymax
fiducial_ymin
fiducial_zmax
fiducial_zmin
/fiducial_rmax
fiducial_rmin
fiducial_zmax
fiducial_zmin
Specify the simulated qubic/cylindrical fiducial volume. An event has to produce an interaction within this volume. However, in case of a muon or tau CC interaction the first interaction can occur outside
rmax
rmin
zmax
zmin
/xmax
xmin
ymax
ymin
zmax
zmin
Specify the qubic/cylindrical volume in which neutrino interactions are generated
volume
Volume of the above specified volume
area
Surface area of the above specified volume
start_event_id
event_id
of the first event in the file
trigger_names
List of the names of the different triggers simulated
Tnoise
(explicit) noise temperature used in simulation
n_samples
Samples of the to-be generated antenna signals
config
The (yaml-style) config file used for the simulation
deposited
detector
The (json-format) detector description used for the simulation
dt
The time resolution, i.e. the inverse of the sampling rate used for the simulation. This is not necessarily the same as the sampling rate of the simulated channels!
The station-level attributes can be accessed using f[station_<station_id>].attrs
. The first two attributes Vrms
and bandwidth
also exist on the top-level and refer to the corresponding to the first station/channel pair.
HDF5 station attributes Key
Description
Vrms
RMS of the voltage used as thermal noise floor \(v_{n} = (k_{B} \, R \, T \, \Delta f) ^ {0.5}\). See the relevant section “Maximum transfer of noise power” in this wiki article . Determine from
Tnoise
andbandwidth
(see below).
Vrms_trigger
(Optional) Same as
Vrms
but for the trigger channels if they were simulated with a different response.
bandwidth
Bandwidth is above equation. Calculated as the integral over the simulated filter response (filt) squared: \(\Delta f = np.trapz(np.abs(filt) ** 2, ff)\).
antenna_positions
Relative position of all simulated antennas (channels)
HDF5 file contents
The HDF5 file contains the following items. Listed are the key
and the shape
of each HDF5 dataset, where n_events
is the number of events stored in the file and n_showers
is the number of showers (which may be larger than the number of events), and n_triggers
is the number of different triggers simulated. Each “row” correspond to a particle shower which can produce radio emission.
HDF5 items Key
Shape
Description
event_group_ids
(
n_showers
)Specifies the event id to which the corresponding shower belongs (
n_events = len(unique(event_group_ids)))
)
xx
yy
zz
(
n_showers
)Specifying coordinates of interaction vertices
vertex_times
(
n_showers
)Time at the interaction vertex. The neutrino interaction (= first interaction) is defined as time 0
azimuths
zeniths
(
n_showers
)Angle Specifying the neutrino incoming direction (
azimuths = 0
points east)
energies
(
n_showers
)Energy of the parent particle of a shower. This is typically the energy of the neutrino (for showers produced at the first interaction: all flavor NC, electron CC interactions) or the energy of a muon or tau lepton when those are producing secondary energy losses
shower_energies
(
n_showers
)Energy of the shower which is used to determine the radio emission
flavors
(
n_showers
)Same as above (the parent of an electromagnetic cascade in an electron CC interaction is the neutrino)
inelasticity
(
n_showers
)Inelasticity of the first interaction
interaction_type
(
n_showers
)Interaction type producing the shower (for the first interaction that can be “nc” or “cc”)
multiple_triggers
(
n_showers
,n_triggers
)Information which exact trigger fired each shower. The different triggers are specified in the attributes (
f.attrs["triggers"]
). The order off.attrs["triggers"]
matches that inmultiple_triggers
triggered
(
n_showers
)A boolean;
True
if any trigger fired for this shower,False
otherwise
trigger_times
(
n_showers
,n_triggers
)The trigger times (relative to the first interaction) at which each shower triggered. If there are multiple stations, this will be the earliest trigger time.
n_interaction
(
n_showers
)Hierarchical counter for the number of showers per event (also accounts for showers which did not trigger and might not be saved)
shower_ids
(
n_showers
)Hierarchical counter for the number of triggered showers
shower_realization_ARZ
(
n_showers
)Which realization from the ARZ shower library was used for each shower (only if ARZ was used for signal generation).
shower_type
(
n_showers
)Type of the shower (so far we only have “em” and “had”)
weights
(
n_showers
)Weight for the probability that the neutrino reached the interaction vertex taking into account the attenuation from the earth (Does not include interaction probability in the volume)
Station data
In addition, the HDF5 file contains a key for each station in the simulation.
The station contains more detailed information for each station. Some parameters are per event and
some parameters are per shower. See https://doi.org/10.22323/1.395.1231 for a description of how showers relate to events.
m_events
and m_showers
refer to the number of events and showers that triggered the station. NOTE: The simple table
structure of hdf5 files can not capture the complex relation between events and showers in all cases. Some fields can be ambiguous
(e.g. trigger_times that only lists the last trigger that a shower generated).
For more advanced analyses, please use the *.nur
files.
The event_group_id
is the same as in the global dictionary. Therefore you can check for one event with
an event_group_id
which stations contain the same event_group_id
and retrieve the information, which
station triggered, with which amplitude, etc. The same approach works for shower_id
.
HDF5 station items Key
Shape
Description
event_group_ids
(
m_events
)The event group ids of the triggered events in the selected station
event_group_id_per_shower
(
m_showers
)The event group id of every shower that triggered the selected station
event_ids
(
m_events
)The event ids of each event that triggered in that station for every event group id. These are unique only within each separate event group, and start from 0.
event_id_per_shower
(
m_showers
)The event ids of each event that triggered in that station. This one is for every shower
shower_id
(
m_showers
)The Shower ids of showers that triggered the selected station
max_amp_shower_and_ray
(
m_showers
,n_channels
,n_ray_tracing_solutions
)Maximum amplitude per shower, channel and ray tracing solution.
maximum_amplitudes
(
m_events
,n_channels
)Maximum amplitude per event and channel
maximum_amplitudes_envelope
(
m_events
,n_channels
)Maximum amplitude of the hilbert envelope for each event and channel
multiple_triggers
(
m_showers
,n_triggers
)A boolean array that specifies if a shower contributed to an event that fulfills a certain trigger. The index of the trigger can be translated to the trigger name via the attribute
trigger_names
.
multiple_triggers_per_event
(
m_events
,n_triggers
)A boolean array that specifies if each event fulfilled a certain trigger. The index of the trigger can be translated to the trigger name via the attribute
trigger_names
.
polarization
(
m_showers
,n_channels
,n_ray_tracing_solutions
,3
)3D coordinates of the polarization vector at the antenna in cartesian coordinates. (The receive vector (which is opposite to the propagation direction) was used to rotate from spherical/on-sky coordinates to cartesian coordinates). The polarization vector does not include any propagation effects that could change the polarization, such as different reflectivities at the surface for the p and s polarization component.
ray_tracing_C0
(
m_showers
,n_channels
,n_ray_tracing_solutions
)One of two parameters specifying the analytic ray tracing solution. Can be used to retrieve the solutions without having to re-run the ray tracer.
ray_tracing_C1
(
m_showers
,n_channels
,n_ray_tracing_solutions
)One of two parameters specifying the analytic ray tracing solution. Can be used to retrieve the solutions without having to re-run the ray tracer.
ray_tracing_reflection
(
m_showers
,n_channels
,n_ray_tracing_solutions
)The number of bottom reflections (This variable is only non-zero if a reflection layer was defined in the ice model and if ‘propagation.n_reflections’ was set to a value larger than 0 in the config.yaml file.)
ray_tracing_reflection_case
(
m_showers
,n_channels
,n_ray_tracing_solutions
)Only relevant for bottom reflections. 1: rays start upwards, 2: rays start downwards
ray_tracing_solution_type
(
m_showers
,n_channels
,n_ray_tracing_solutions
)The type of the ray tracing solution. 0: direct, 1: refracted, 2: reflected (off the surface) (A refracted ray is defined as a ray that has a turning point, i.e. if it transitions from upward going to downward going; a reflected ray is defined if it has a surface reflection.)
focusing_factor
(
m_showers
,n_channels
,n_ray_tracing_solutions
)The focusing factor calculated by the propagation module.
launch_vectors
(
m_showers
,n_channels
,n_ray_tracing_solutions
,3
)3D (Cartesian) coordinates of the launch vector of each ray tracing solution, per shower and channel.
receive_vectors
(
m_showers
,n_channels
,n_ray_tracing_solutions
,3
)3D (Cartesian) coordinates of the receive vector of each ray tracing solution, per shower and channel.
time_shower_and_ray
(
m_showers
,n_channels
,n_ray_tracing_solutions
)The “signal time” per shower and raytracing solution. I.e., the time of the signal arriving at the DAQ including, e.g., cable delay, …
travel_distances
(
m_showers
,n_channels
,n_ray_tracing_solutions
)The distance travelled by each ray tracing solution to a specific channel
travel_times
(
m_showers
,n_channels
,n_ray_tracing_solutions
)The time travelled by each ray tracing solution to a specific channel
triggered
(
m_showers
)Whether each shower contributed to an event that satisfied any trigger condition
triggered_per_event
(
m_events
)Whether each event fulfilled any trigger condition.
trigger_times
(
m_showers
,n_triggers
)The trigger times for each shower and trigger. IMPORTANT: A shower can potentially generate multiple events. Then this field is ambiguous, as only a single trigger time per shower can be saved. In that case, the latest trigger time is saved into this field.
trigger_times_per_event
(
m_events
,n_triggers
)The trigger times per event.