Athena Persistency Example
This is a simple exercise showing how to introduce a very basic persistency into an Athena job. It builds on an existing transient Athena example that was used in the ATLAS Software Tutorial and adds writing functionality to it, so that the transient objects used in that example are stored in an output file, for all processed events.
This exercise shows how to:
- create a class dictionary (for an existing Athena class)
- trigger an automatic creation of an AthenaPool converter for this class
- modify job options to produce an output file
- inspect produced file with PyROOT
Prerequisites
We will re-use the same build
directory and athena
clone as used elsewhere in this tutorial.
Setup runtime
If you've started a fresh terminal/ssh session, remember to setup the Athena development environment for the latest nightly build of the Athena main branch:
setupATLAS
asetup main,Athena,latest
Make a new branch
Now cd
to the athena
source directory and make a new branch:
cd athena
git fetch upstream
git checkout -b persistency-exercise upstream/main --no-track
Create the new package
By convention, persistency support for classes defined in a certain package (named Package) goes to a new package with a name PackageAthenaPool
. We will implement persistency for AthExHive
so let's create a package AthExHiveAthenaPool
.
From the athena
directory do the following:
# Make the package directory:
mkdir AthExHiveAthenaPool
cd AthExHiveAthenaPool
# Make the headers and python subdirectories
mkdir AthExHiveAthenaPool python
Then create a CMakeLists.txt
file in the package root directory with just the package name:
# Declare the package name
atlas_subdir( AthExHiveAthenaPool )
- CMake in the context of ATLAS is covered in this tutorial.
Add class dictionary
All classes for which instances are to be persistified require dictionaries. Dictionaries are created by the build system and added to the release.
First let's create the selection.xml
file - a file in the XML format listing which types should be included in a given dictionary. This file usually resides in the include directory of a package (here that would be: AthExHiveAthenaPool/selection.xml
).
We want a dictionary for the HiveDataObj
class defined and used in AthExHive
example, so the file should contain the following lines:
<lcgdict>
<class name="HiveDataObj" id="8AF5C571-6D5E-46A7-918F-145FB3AA2C43"/>
</lcgdict>
Note: all types that are to be persistified on their own (that is, not as part of another object) should be given an identifier - as show in the above example. It is an XML attribute with the name id and a value being a unique identifier - also called Global Unique Identifier (GUID). This identifier can be generated randomly with the standard Unix command uuidgen
but should never be changed once it was used to write into a file (as long as we want to be able to read it back later on).
Next we will create a C++ header file that will combine all C++ includes for the types that should have dictionaries. Let's call it according to the convention AthExHiveAthenaPoolDict.h
and also put it into the package include directory (here AthExHiveAthenaPool/AthExHiveAthenaPoolDict.h
).
#include "AthExHive/HiveDataObj.h"
Finally we need to add a CMake command to the CMakeLists.txt
file in the package root directory that so far was mostly empty. This is the command to create a dictionary using the selection.xml
and AthExHiveAthenaPoolDict.h
files as input:
atlas_add_dictionary( AthExHiveAthenaPoolDict
AthExHiveAthenaPool/AthExHiveAthenaPoolDict.h
AthExHiveAthenaPool/selection.xml
LINK_LIBRARIES AthExHiveLib )
AthExHiveAthenaPoolDict
is the name of the dictionary that will be created.
At this point we should be already able to build the dictionary. Go to your working directory (the one where the build
and run
subdirectories were created earlier. Edit the package_filters.txt
file (or whatever you called it) there, with the newly created package listed (the last line says: "do not build other packages from the release"):
+ AthExHiveAthenaPool
- .*
Enter the build
directory and execute the following cmake command, and if everything goes well source the environment setup.sh and execute make
.
cd build
cmake -DATLAS_PACKAGE_FILTER_FILE=../package_filters.txt ../athena/Projects/WorkDir
source $LCG_PLATFORM/setup.sh
make
After make
finishes, the dictionary should be ready and usable (thanks to the environment setting added by the sourced setup.sh
script). It can also be used by PyROOT, so it should be possible to perform the following quick verification from the command line (you may remember that HiveDataObj
wraps a single integer and provides the val()
accessor to it):
build% python
>>> import ROOT
>>> obj = ROOT.HiveDataObj(123)
>>> obj.val()
123
You can inspect the results of dictionary building in the lib
subdirectory in your build location:
build% ls -l $LCG_PLATFORM/lib
-rw-r--r--. 1 mnowak zp 110 Nov 16 19:26 WorkDir.rootmap
-rwxr-xr-x. 1 mnowak zp 29424 Nov 16 19:26 libAthExHiveAthenaPoolDict.so
-rwxr-xr-x. 1 mnowak zp 146208 Nov 16 19:26 libAthExHiveAthenaPoolDict.so.dbg
-rw-r--r--. 1 mnowak zp 1096 Nov 16 19:26 libAthExHiveAthenaPoolDict_rdict.pcm
build% cat $LCG_PLATFORM/lib/WorkDir.rootmap
[ libAthExHiveAthenaPoolDict.so ]
# List of selected classes
class HiveDataObj
header AthExHive/HiveDataObj.h
What is shown here is the rootmap
file, which tells the runtime dictionary discovery system which library contains the dictionary for the HiveDataObj
class. The dynamic library contains, among others, class factory functions, and the .PCM
file contains C++ reflection information about those classes.
Add AthenaPool converter
As mentioned in the tutorial, ATLAS CMake provides a command to build AthenaPool converters for a particular class. The following lines should be added to the CMakeLists.txt
file:
atlas_add_poolcnv_library( AthExHiveAthenaPoolCnv
FILES AthExHive/HiveDataObj.h
LINK_LIBRARIES AthExHiveLib AthenaPoolCnvSvcLib )
For this exercise we will let CMake create the converter automatically - CMake will do that be default if it can't find converter source files in a well known location.
After adding the atlas_add_poolcnv_library
command to the CMakeLists.txt
file go back to the build
directory and execute make
again.
Afterwards, a quick look into the lib
directory shows new files:
build% ls -l $LCG_PLATFORM/lib
-rw-r--r--. 1 mnowak zp 50 Nov 16 20:02 WorkDir.components
-rw-r--r--. 1 mnowak zp 110 Nov 16 19:26 WorkDir.rootmap
-rw-r--r--. 1 mnowak zp 50 Nov 16 20:02 libAthExHiveAthenaPoolCnv.components
-rwxr-xr-x. 1 mnowak zp 132064 Nov 16 20:02 libAthExHiveAthenaPoolCnv.so
-rwxr-xr-x. 1 mnowak zp 1236032 Nov 16 20:02 libAthExHiveAthenaPoolCnv.so.dbg
-rwxr-xr-x. 1 mnowak zp 29424 Nov 16 19:26 libAthExHiveAthenaPoolDict.so
-rwxr-xr-x. 1 mnowak zp 146208 Nov 16 19:26 libAthExHiveAthenaPoolDict.so.dbg
-rw-r--r--. 1 mnowak zp 1096 Nov 16 19:26 libAthExHiveAthenaPoolDict_rdict.pcm
build% cat $LCG_PLATFORM/lib/WorkDir.components
v2::libAthExHiveAthenaPoolCnv.so:CNV_256_37539154
Here we can see the components manifest file which lists all components that can be dynamically loaded by Athena at runtime, as needed. In particular, there is a converter for a class with CLID=37539154
that can be found in libAthExHiveAthenaPoolCnv
shared library. The CLID assignment can be found in the class header files - 37539154 is assigned to HiveDataObj.
Prepare job options for writing
Job config files that are to be executed at runtime must be stored in the python
package subdirectory. We have created this subdirectory in the beginning - now let's add the job config there by copying the WriteHiveDataObjConfig.py file into the python/
directory:
# Copyright (C) 2002-2025 CERN for the benefit of the ATLAS collaboration
from AthenaConfiguration.ComponentAccumulator import ComponentAccumulator
def WriteHiveDataObjCfg(flags):
"""
Configure and return a ComponentAccumulator with all Hive algorithms.
"""
from AthExHive.AthExHiveConfig import (
HiveAlgAConf,
HiveAlgBConf,
HiveAlgCConf,
HiveAlgDConf,
HiveAlgEConf,
HiveAlgFConf,
HiveAlgGConf,
HiveAlgVConf,
)
# Merge algs into CA
alg_configs = [
HiveAlgAConf,
HiveAlgBConf,
HiveAlgCConf,
HiveAlgDConf,
HiveAlgEConf,
HiveAlgFConf,
HiveAlgGConf,
HiveAlgVConf,
]
cfg = ComponentAccumulator()
for alg_conf in alg_configs:
cfg.merge(alg_conf(flags))
return cfg
if __name__ == "__main__":
# Setup configuration flags
from AthenaConfiguration.MainServicesConfig import MainEvgenServicesCfg
from AthenaConfiguration.AllConfigFlags import initConfigFlags
flags = initConfigFlags()
flags.Input.RunNumbers = [284500]
flags.Input.TimeStamps = [1] # dummy value
# workaround for building xAOD::EventInfo without input files
flags.Input.TypedCollections = []
flags.Exec.MaxEvents = 20
flags.fillFromArgs()
flags.lock()
# The example runs with no input file. We configure it with the McEventSelector
cfg = MainEvgenServicesCfg(flags, withSequences=True)
cfg.merge(WriteHiveDataObjCfg(flags))
# Output stream
from OutputStreamAthenaPool.OutputStreamConfig import OutputStreamCfg
cfg.merge(OutputStreamCfg(flags, "ExampleStream", ItemList=["HiveDataObj#*"]))
# Execute
import sys
sys.exit(cfg.run().isFailure())
To make the config available at runtime, we need to instruct CMake to install them in the release. This can be done by adding the following command to the CMakeLists.txt
file:
atlas_install_python_modules( python/*.py )
Go to the build
directory and execute make
once more. After it finishes, we should be finally ready to execute the Athena job.
Execute the Athena job to create an output file
Run the Athena job with the new job config. Do it from the run
directory so we can clearly see the output:
cd ../run
python -m AthExHiveAthenaPool.WriteHiveDataObjConfig
Inspect the results
The directory should contain 2 files produced by the job: myExampleStream.pool.root
and PoolFileCatalog.xml
. You can check the content of the XML file catalog, but the ROOT file is a bit more complicated. It can be opened directly with ROOT and browsed (hint - Athena data is stored in "CollectionTree" TTree
). But for the purpose of this exercise we may take advantage of PyROOT.
Save the ReadHiveDataObjs.py PyROOT script to the run
directory:
# Copyright (C) 2002-2025 CERN for the benefit of the ATLAS collaboration
import ROOT
file = ROOT.TFile.Open("myExampleStream.pool.root")
# CollectionTree is the default TTree name for Event Data
tree = file.Get("CollectionTree")
tree.GetEntry(0)
# Find the branches containing HiveDataObj
brNames = [
b.GetName()
for b in tree.GetListOfBranches()
if b.GetName().startswith("HiveDataObj")
]
print(" ---- ".join([" Event"] + [name[12:] for name in brNames]))
# Loop over all rows (events) - get objects from the HiveDataObj branches with "getattr(tree,b)"
lineformat = "{:>8}" * (len(brNames) + 1)
for evt in range(tree.GetEntries()):
tree.GetEntry(evt)
print(lineformat.format(evt + 1, *[getattr(tree, b).val() for b in brNames]))
and execute it using python
to see a printout of the data from the ROOT file created earlier.
Bonus: adding in-file metadata
While writing out event data, we are usually also interested in the metadata describing those events and the produced output file. We can achieve it by adding relevant configuration to the job options. In this exercise, we will add two helper tools which will create EventFormat and FileMetaData objects and persist them in the output file.
You can achieve it by using the configuration from WriteHiveWithMetaData.py (note the additions to the configuration used previously in this exercise):
# Copyright (C) 2002-2025 CERN for the benefit of the ATLAS collaboration
import sys
from OutputStreamAthenaPool.OutputStreamConfig import (
OutputStreamCfg,
addToMetaData,
outputStreamName,
)
from AthExHiveAthenaPool.WriteHiveDataObjConfig import WriteHiveDataObjCfg
from AthenaConfiguration.MainServicesConfig import MainEvgenServicesCfg
from AthenaConfiguration.AllConfigFlags import initConfigFlags
from AthenaConfiguration.ComponentFactory import CompFactory
if __name__ == "__main__":
flags = initConfigFlags()
flags.Input.RunNumbers = [284600]
flags.Input.TimeStamps = [1] # dummy value
# workaround for building xAOD::EventInfo without input files
flags.Input.TypedCollections = []
flags.Exec.MaxEvents = 20
streamName = "TestStream"
flags.addFlag(
f"Output.{streamName}FileName",
f"{streamName}.pool.root",
)
flags.addFlag(f"Output.doWrite{streamName}", True)
flags.fillFromArgs()
flags.lock()
# The example runs with no input file. We configure it with the McEventSelector
cfg = MainEvgenServicesCfg(flags, withSequences=True)
cfg.merge(WriteHiveDataObjCfg(flags))
# Output stream
cfg.merge(
OutputStreamCfg(
flags,
streamName=streamName,
ItemList=[
"xAOD::EventInfo#EventInfo",
"xAOD::EventAuxInfo#EventInfoAux.",
"HiveDataObj#*",
],
)
)
cfg.merge(
addToMetaData(
flags,
streamName=streamName,
itemOrList=[
f"xAOD::EventFormat#EventFormat{outputStreamName(streamName)}",
"xAOD::FileMetaData#FileMetaData",
"xAOD::FileMetaDataAuxInfo#FileMetaDataAux.",
],
HelperTools=[
CompFactory.xAODMaker.EventFormatStreamHelperTool(
f"{outputStreamName(streamName)}_EventFormatStreamHelperTool",
Key=f"EventFormat{outputStreamName(streamName)}",
DataHeaderKey=f"{outputStreamName(streamName)}",
TypeNames=["HiveDataObj#*"],
),
CompFactory.xAODMaker.FileMetaDataCreatorTool(
f"{outputStreamName(streamName)}_FileMetaDataCreatorTool",
OutputKey="FileMetaData",
StreamName=f"{outputStreamName(streamName)}",
),
],
)
)
# Execute
sys.exit(cfg.run().isFailure())
To inspect the metadata in the output file, you can use the following command:
meta-reader -m full TestStream.pool.root