Athena I/O and the xAOD EDM¶

This page gives an overview of the Athena I/O system and the event data model used to write analysis level data. This is a large and technical topic, so these pages aren't intended to be an in-depth manual. Rather, they are intended to give a broad overview, acquaint readers with the main terms and components, and provide links to relevant resources.

Persistent data storage¶

Data manipulated by Athena are usually in the form of C++ objects. These must be represented on disk such that they can be read back and regenerated in memory at a later time, potentially by a newer version of the software. ATLAS accomplishes this task using a software layer called POOL which has its ATLAS instance in APR, with the back-end I/O being provided by ROOT. Athena interacts with POOL via software in the AthenaPOOL packages.

Any class whose instances need to be stored to disk must be provided with a "POOL converter". Generally it's not necessary to write a POOL converter manually - it can be done automatically by an ATLAS script built into CMake (atlas_add_poolcnv_library). A step-by-step example is given here (internal). See the CMake guide for information on the CMake setup in ATLAS.

Transient/persistent conversion and schema evolution¶

The transient/persistent (T/P) separation mechanism allows the classes to evolve in newer versions of the software whilst allowing existing persistent versions to be read back, known as "schema evolution". This is accomplished by means of "T/P converters". A T/P converter converts between the (latest/current) transient version, and a persistent version. Unlike the POOL converters, T/P converters need to be written by hand. T/P converters are only required for non-xAOD classes (see below) and when “versioning” is involved in some way (although at this stage in the lifetime of ATLAS this does involve almost all classes).

A full guide to the T/P mechanism is given here (internal).

xAOD event data model¶

An event data model is a collection of classes — interfaces and concrete types — and their relationships which, together, provide a representation of a physics event, recorded by the ATLAS detector. It defines both how they are represented in memory (transient form) and on disk (persistent form). Using the same, coherent classes throughout the software improves commonality and coherence across the experiment, facilitates the use of more common and higher quality software, and allows for common object definitions (in both a software and a physics sense).

The xAOD EDM has been developed by ATLAS to represent analysis-level data objects in AOD and DAOD files. As such the xAOD objects are the data structures that are most commonly encountered by most ATLAS members. The objects in the xAOD are split into two types: interface objects and payload objects. The interface objects provide a user interface to the type, allowing operations such as electron->pt(), but do not contain the numerical data itself. The numerical payload is instead held in the payload objects. These allocate continuous memory for the data, and allow the interface objects to access this data. The data structure currently used by ATLAS for the payload is the ROOT TTree, but the EDM is actually not bound to a specific storage technology. Instances of the payload classes are referred to as "auxiliary stores".

xAOD classes consequently have a "dual personality": in Athena and other applications using an explicit event loop, the interface objects are used and the programmer is presented with containers of objects belonging to a given class (e.g. xAOD::TrackParticle, xAOD::Muon), with each data member being provided by an accessor method as shown above. However, if using ROOT or a similar column-based application the payload can be read directly since they are contiguous in memory and ultimately written as branches of a TTree. This means that files written in the xAOD format can be opened in ROOT without any ATLAS libraries, and plots made from the event data in exactly the same way as with a plain TTree. The xAOD has also enabled the development of non-Athena analysis frameworks since Athena libraries are no longer needed to read the files.

For examples on how to use the xAOD in analysis, please refer to the analysis tutorial.

xAOD design¶

A formal description of the xAOD design can be found here. The key components, concepts and features are:

SG::AuxElement: all xAOD interface objects inherit from this class. It provides a consistent means of accessing the payload from all interface types.
xAOD::IParticle: most xAOD objects inherit from this class (which in turn inherits from SG::AuxElement) but not all - xAOD::EventInfo is an obvious exception. It provides a uniform interface for accessing 4-momentum / particle information about different types.
DataVector<T>: this represents containers of objects of type <T>. It behaves much like std::vector<T*> but has a number of additional features, most importantly covariance such that since xAOD::Muon inherits from xAOD::IParticle, DataVector<xAOD::Muon> also inherits from DataVector<xAOD::IParticle>. It includes code for implementing the separation of the interface and payload
SG::ÌAuxStore: the abstract interface to the auxiliary stores, allowing the connection between the DataVectors and the numerical payload
Versioning: concrete xAOD types always have a name ending in _vX, but in user code these are never directly used, with versionless typedefs used instead. This allows the conversion of an old _vX on disk to a newer version _vY in memory with a more up to date version of the software
Static and dynamic auxiliary stores: the static auxiliary stores contain the variables that are regarded as class members. In addition each static store references a dynamic store, which can be used to hold arbitrary data added on the fly. The static store forwards any requests for variables that it does not manage to the dynamic store
Shallow copies: a copy can be made of a DataVector with auxiliary data. The auxiliary store for this copied vector is of the type xAOD::ShallowAuxContainer. It maintains a reference to the original store. Any requests to write a variable will be carried out in the xAOD::ShallowAuxContainer, while read requests for variables not in the xAOD::ShallowAuxContainer. will be forwarded to the original store. This allows one to make a copy of a container and change a few variables, but still share the storage for most of the data.

Several of these features are only possible due to the separation between interface and payload.

In-file metadata¶

The in-file metadata plays a key role in providing descriptions of both the events in a file and about the file itself. As this information is essential across all workflows, it is crucial for the metadata to be accurate and reliable. Examples of how in-file metadata is utilized in workflows include:

Configuring jobs based on input file metadata
Initialising software components
Mapping of names to data objects or values
Decoding trigger information
Keeping track of event selection and luminosity blocks
Allowing for user-specific annotations

The types of in-file metadata in ATLAS are categorized into distinct domains as summarized in the table below:

Domain	Description
EventStreamInfo	Event sample description, used for production
EventFormat	Summary of event layout, used for analysis
FileMetaData	Event and provenance summary
ByteStream	Run parameters
Interval of Validity	Information with lifetime other than event or file
BookKeeping	Event selections, cuts
LumiBlock	Luminosity blocks stored in file
TriggerMenu	Trigger configuration
Truth	MC weights, generator details

MetaDataSvc is the service component in Athena that orchestrates metadata propagation tools through file incidents. The domain–specific tools create or propagate metadata upon opening an input file (beginInputFile) or after each processed event. During a job, an input object store is dynamically populated with content from each new input file while an output store accumulates new content by appending to output metadata containers. The metadata is stored in the files as a dedicated TTree MetaData containing a single entry.

When merging files, the metadata needs to be handled in different ways depending on its type. Metadata merging proceeds through one of the following scenarios:

Unique accumulation: new values are appended to existing ones with deduplication (e.g. EventFormat, TriggerMenu)
Natural addition: values from the inputs are simply summed (e.g. event counts in EventStreamInfo)
First-value priority: when differences in value found, use the first encountered value (e.g. FileMetaData)
Processing-dependent: depends on number of events processed/expected (e.g. LumiBlock)

In cases where a file has no events, event-dependent metadata fields may be absent/empty. Such an eventless file must still retain its other metadata attributes to be readable for downstream workflows. When merging between files with/without events, the metadata will be appropriately set based on the type in the same manner previously described above.

(Note: at the moment merging with eventless files can be problematic for FileMetaData combined with the first-value priority approach; to be improved in the future)