Athena Persistency Example

This is a simple exercise showing how to introduce a very basic persistency into an Athena job. It builds on an existing transient Athena example that was used in the ATLAS Software Tutorial and adds writing functionality to it, so that the transient objects used in that example are stored in an output file, for all processed events.

This exercise shows how to:

create a class dictionary (for an existing Athena class)
trigger an automatic creation of an AthenaPool converter for this class
modify job options to produce an output file
inspect produced file with PyROOT

Prerequisites

We will re-use the same build directory and athena clone as used elsewhere in this tutorial.

Setup runtime

If you've started a fresh terminal/ssh session, remember to setup the Athena development environment for the latest nightly build of the Athena main branch:

setupATLAS
asetup main,Athena,latest

Make a new branch

Now cd to the athena source directory and make a new branch:

cd athena
git fetch upstream
git checkout -b persistency-exercise upstream/main --no-track

Create the new package

By convention, persistency support for classes defined in a certain package (named Package) goes to a new package with a name PackageAthenaPool. We will implement persistency for AthExHive so let's create a package AthExHiveAthenaPool.

From the athena directory do the following:

# Make the package directory:
mkdir AthExHiveAthenaPool
cd AthExHiveAthenaPool
# Make the headers and python subdirectories
mkdir AthExHiveAthenaPool python

Then create a CMakeLists.txt file in the package root directory with just the package name:

# Declare the package name
atlas_subdir( AthExHiveAthenaPool )

CMake in the context of ATLAS is covered in this tutorial.

Add class dictionary

All classes for which instances are to be persistified require dictionaries. Dictionaries are created by the build system and added to the release.
First let's create the selection.xml file - a file in the XML format listing which types should be included in a given dictionary. This file usually resides in the include directory of a package (here that would be: AthExHiveAthenaPool/selection.xml). We want a dictionary for the HiveDataObj class defined and used in AthExHive example, so the file should contain the following lines:

<lcgdict>
     <class name="HiveDataObj" id="8AF5C571-6D5E-46A7-918F-145FB3AA2C43"/>
</lcgdict>

Note: all types that are to be persistified on their own (that is, not as part of another object) should be given an identifier - as show in the above example. It is an XML attribute with the name id and a value being a unique identifier - also called Global Unique Identifier (GUID). This identifier can be generated randomly with the standard Unix command uuidgen but should never be changed once it was used to write into a file (as long as we want to be able to read it back later on).

Next we will create a C++ header file that will combine all C++ includes for the types that should have dictionaries. Let's call it according to the convention AthExHiveAthenaPoolDict.h and also put it into the package include directory (here AthExHiveAthenaPool/AthExHiveAthenaPoolDict.h).

#include "AthExHive/HiveDataObj.h"

Finally we need to add a CMake command to the CMakeLists.txt file in the package root directory that so far was mostly empty. This is the command to create a dictionary using the selection.xml and AthExHiveAthenaPoolDict.h files as input:

atlas_add_dictionary( AthExHiveAthenaPoolDict
                      AthExHiveAthenaPool/AthExHiveAthenaPoolDict.h
                      AthExHiveAthenaPool/selection.xml
                      LINK_LIBRARIES AthExHiveLib )

AthExHiveAthenaPoolDict is the name of the dictionary that will be created.

At this point we should be already able to build the dictionary. Go to your working directory (the one where the build and run subdirectories were created earlier. Edit the package_filters.txt file (or whatever you called it) there, with the newly created package listed (the last line says: "do not build other packages from the release"):

+ AthExHiveAthenaPool
- .*

Enter the build directory and execute the following cmake command, and if everything goes well source the environment setup.sh and execute make.

cd build
cmake -DATLAS_PACKAGE_FILTER_FILE=../package_filters.txt ../athena/Projects/WorkDir
source $LCG_PLATFORM/setup.sh
make

After make finishes, the dictionary should be ready and usable (thanks to the environment setting added by the sourced setup.sh script). It can also be used by PyROOT, so it should be possible to perform the following quick verification from the command line (you may remember that HiveDataObj wraps a single integer and provides the val() accessor to it):

build% python
>>> import ROOT
>>> obj = ROOT.HiveDataObj(123)
>>> obj.val()
123

You can inspect the results of dictionary building in the lib subdirectory in your build location:

build% ls -l $LCG_PLATFORM/lib
-rw-r--r--. 1 mnowak zp    110 Nov 16 19:26 WorkDir.rootmap
-rwxr-xr-x. 1 mnowak zp  29424 Nov 16 19:26 libAthExHiveAthenaPoolDict.so
-rwxr-xr-x. 1 mnowak zp 146208 Nov 16 19:26 libAthExHiveAthenaPoolDict.so.dbg
-rw-r--r--. 1 mnowak zp   1096 Nov 16 19:26 libAthExHiveAthenaPoolDict_rdict.pcm

build% cat $LCG_PLATFORM/lib/WorkDir.rootmap 

[ libAthExHiveAthenaPoolDict.so ]
# List of selected classes
class HiveDataObj
header AthExHive/HiveDataObj.h

What is shown here is the rootmap file, which tells the runtime dictionary discovery system which library contains the dictionary for the HiveDataObj class. The dynamic library contains, among others, class factory functions, and the .PCM file contains C++ reflection information about those classes.

Add AthenaPool converter

As mentioned in the tutorial, ATLAS CMake provides a command to build AthenaPool converters for a particular class. The following lines should be added to the CMakeLists.txt file:

atlas_add_poolcnv_library( AthExHiveAthenaPoolCnv
                           FILES AthExHive/HiveDataObj.h
                           LINK_LIBRARIES AthExHiveLib AthenaPoolCnvSvcLib )

For this exercise we will let CMake create the converter automatically - CMake will do that be default if it can't find converter source files in a well known location. After adding the atlas_add_poolcnv_library command to the CMakeLists.txt file go back to the build directory and execute make again.

Afterwards, a quick look into the lib directory shows new files:

build% ls -l $LCG_PLATFORM/lib                  
-rw-r--r--. 1 mnowak zp      50 Nov 16 20:02 WorkDir.components
-rw-r--r--. 1 mnowak zp     110 Nov 16 19:26 WorkDir.rootmap
-rw-r--r--. 1 mnowak zp      50 Nov 16 20:02 libAthExHiveAthenaPoolCnv.components
-rwxr-xr-x. 1 mnowak zp  132064 Nov 16 20:02 libAthExHiveAthenaPoolCnv.so
-rwxr-xr-x. 1 mnowak zp 1236032 Nov 16 20:02 libAthExHiveAthenaPoolCnv.so.dbg
-rwxr-xr-x. 1 mnowak zp   29424 Nov 16 19:26 libAthExHiveAthenaPoolDict.so
-rwxr-xr-x. 1 mnowak zp  146208 Nov 16 19:26 libAthExHiveAthenaPoolDict.so.dbg
-rw-r--r--. 1 mnowak zp    1096 Nov 16 19:26 libAthExHiveAthenaPoolDict_rdict.pcm

build% cat $LCG_PLATFORM/lib/WorkDir.components 
v2::libAthExHiveAthenaPoolCnv.so:CNV_256_37539154

Here we can see the components manifest file which lists all components that can be dynamically loaded by Athena at runtime, as needed. In particular, there is a converter for a class with CLID=37539154 that can be found in libAthExHiveAthenaPoolCnv shared library. The CLID assignment can be found in the class header files - 37539154 is assigned to HiveDataObj.

Prepare job options for writing

Job config files that are to be executed at runtime must be stored in the python package subdirectory. We have created this subdirectory in the beginning - now let's add the job config there by copying the WriteHiveDataObjConfig.py file into the python/ directory:

# Copyright (C) 2002-2025 CERN for the benefit of the ATLAS collaboration

from AthenaConfiguration.ComponentAccumulator import ComponentAccumulator


def WriteHiveDataObjCfg(flags):
    """
    Configure and return a ComponentAccumulator with all Hive algorithms.
    """
    from AthExHive.AthExHiveConfig import (
        HiveAlgAConf,
        HiveAlgBConf,
        HiveAlgCConf,
        HiveAlgDConf,
        HiveAlgEConf,
        HiveAlgFConf,
        HiveAlgGConf,
        HiveAlgVConf,
    )

    # Merge algs into CA
    alg_configs = [
        HiveAlgAConf,
        HiveAlgBConf,
        HiveAlgCConf,
        HiveAlgDConf,
        HiveAlgEConf,
        HiveAlgFConf,
        HiveAlgGConf,
        HiveAlgVConf,
    ]

    cfg = ComponentAccumulator()
    for alg_conf in alg_configs:
        cfg.merge(alg_conf(flags))

    return cfg


if __name__ == "__main__":
    # Setup configuration flags
    from AthenaConfiguration.MainServicesConfig import MainEvgenServicesCfg
    from AthenaConfiguration.AllConfigFlags import initConfigFlags

    flags = initConfigFlags()
    flags.Input.RunNumbers = [284500]
    flags.Input.TimeStamps = [1]  # dummy value
    # workaround for building xAOD::EventInfo without input files
    flags.Input.TypedCollections = []
    flags.Exec.MaxEvents = 20
    flags.fillFromArgs()
    flags.lock()

    # The example runs with no input file. We configure it with the McEventSelector
    cfg = MainEvgenServicesCfg(flags, withSequences=True)

    cfg.merge(WriteHiveDataObjCfg(flags))

    # Output stream
    from OutputStreamAthenaPool.OutputStreamConfig import OutputStreamCfg

    cfg.merge(OutputStreamCfg(flags, "ExampleStream", ItemList=["HiveDataObj#*"]))

    # Execute
    import sys

    sys.exit(cfg.run().isFailure())

To make the config available at runtime, we need to instruct CMake to install them in the release. This can be done by adding the following command to the CMakeLists.txt file:

atlas_install_python_modules( python/*.py )

Go to the build directory and execute make once more. After it finishes, we should be finally ready to execute the Athena job.

Execute the Athena job to create an output file

Run the Athena job with the new job config. Do it from the run directory so we can clearly see the output:

cd ../run
python -m AthExHiveAthenaPool.WriteHiveDataObjConfig

Inspect the results

The directory should contain 2 files produced by the job: myExampleStream.pool.root and PoolFileCatalog.xml. You can check the content of the XML file catalog, but the ROOT file is a bit more complicated. It can be opened directly with ROOT and browsed (hint - Athena data is stored in "CollectionTree" TTree). But for the purpose of this exercise we may take advantage of PyROOT.

Save the ReadHiveDataObjs.py PyROOT script to the run directory:

# Copyright (C) 2002-2025 CERN for the benefit of the ATLAS collaboration

import ROOT

file = ROOT.TFile.Open("myExampleStream.pool.root")

# CollectionTree is the default TTree name for Event Data
tree = file.Get("CollectionTree")
tree.GetEntry(0)

# Find the branches containing HiveDataObj
brNames = [
    b.GetName()
    for b in tree.GetListOfBranches()
    if b.GetName().startswith("HiveDataObj")
]

print(" ---- ".join(["   Event"] + [name[12:] for name in brNames]))
# Loop over all rows (events) - get objects from the HiveDataObj branches with "getattr(tree,b)"
lineformat = "{:>8}" * (len(brNames) + 1)
for evt in range(tree.GetEntries()):
    tree.GetEntry(evt)
    print(lineformat.format(evt + 1, *[getattr(tree, b).val() for b in brNames]))

and execute it using python to see a printout of the data from the ROOT file created earlier.

Bonus: adding in-file metadata

While writing out event data, we are usually also interested in the metadata describing those events and the produced output file. We can achieve it by adding relevant configuration to the job options. In this exercise, we will add two helper tools which will create EventFormat and FileMetaData objects and persist them in the output file.

You can achieve it by using the configuration from WriteHiveWithMetaData.py (note the additions to the configuration used previously in this exercise):

# Copyright (C) 2002-2025 CERN for the benefit of the ATLAS collaboration

import sys
from OutputStreamAthenaPool.OutputStreamConfig import (
    OutputStreamCfg,
    addToMetaData,
    outputStreamName,
)
from AthExHiveAthenaPool.WriteHiveDataObjConfig import WriteHiveDataObjCfg
from AthenaConfiguration.MainServicesConfig import MainEvgenServicesCfg
from AthenaConfiguration.AllConfigFlags import initConfigFlags
from AthenaConfiguration.ComponentFactory import CompFactory

if __name__ == "__main__":
    flags = initConfigFlags()
    flags.Input.RunNumbers = [284600]
    flags.Input.TimeStamps = [1]  # dummy value
    # workaround for building xAOD::EventInfo without input files
    flags.Input.TypedCollections = []
    flags.Exec.MaxEvents = 20

    streamName = "TestStream"
    flags.addFlag(
        f"Output.{streamName}FileName",
        f"{streamName}.pool.root",
    )
    flags.addFlag(f"Output.doWrite{streamName}", True)

    flags.fillFromArgs()
    flags.lock()

    # The example runs with no input file. We configure it with the McEventSelector
    cfg = MainEvgenServicesCfg(flags, withSequences=True)
    cfg.merge(WriteHiveDataObjCfg(flags))

    # Output stream
    cfg.merge(
        OutputStreamCfg(
            flags,
            streamName=streamName,
            ItemList=[
                "xAOD::EventInfo#EventInfo",
                "xAOD::EventAuxInfo#EventInfoAux.",
                "HiveDataObj#*",
            ],
        )
    )
    cfg.merge(
        addToMetaData(
            flags,
            streamName=streamName,
            itemOrList=[
                f"xAOD::EventFormat#EventFormat{outputStreamName(streamName)}",
                "xAOD::FileMetaData#FileMetaData",
                "xAOD::FileMetaDataAuxInfo#FileMetaDataAux.",
            ],
            HelperTools=[
                CompFactory.xAODMaker.EventFormatStreamHelperTool(
                    f"{outputStreamName(streamName)}_EventFormatStreamHelperTool",
                    Key=f"EventFormat{outputStreamName(streamName)}",
                    DataHeaderKey=f"{outputStreamName(streamName)}",
                    TypeNames=["HiveDataObj#*"],
                ),
                CompFactory.xAODMaker.FileMetaDataCreatorTool(
                    f"{outputStreamName(streamName)}_FileMetaDataCreatorTool",
                    OutputKey="FileMetaData",
                    StreamName=f"{outputStreamName(streamName)}",
                ),
            ],
        )
    )

    # Execute
    sys.exit(cfg.run().isFailure())

To inspect the metadata in the output file, you can use the following command:

meta-reader -m full TestStream.pool.root