Configuration Blocks¶
The configuration block is the basic unit of configuration, with each block generally mapping onto a single section or subsection in the YAML file.
The approach here differs from how the ComponentAccumulator configuration works (the main mechanism in Athena): For the ComponentAccumulator configuration is provided in the form of a function that gets all options and global settings passed in, and returns the configured algorithms as a ComponentAccumulator object. The configuration blocks on the other hand are objects that declare all relevant options, which get set on the object, and the configuration gets passed in a ConfigAccumulator that gives access to global settings and receives the configured algorithms.
Example ConfigBlock¶
To illustrate how ConfigBlocks work, here is a stripped-down version of
the tau calibration block, we will then explore below how it works. This
will typically be sitting in a file like TauAnalysisConfig.py, and you
will need to define a block factory to use
it:
from AnalysisAlgorithmsConfig.ConfigAccumulator import DataType
from AnalysisAlgorithmsConfig.ConfigBlock import ConfigBlock
from AthenaConfiguration.Enums import LHCPeriod
class TauCalibrationConfig (ConfigBlock):
"""the ConfigBlock for the tau four-momentum correction"""
def __init__ (self) :
super (TauCalibrationConfig, self).__init__ ()
self.setBlockName('Taus')
# Options are declared in __init__ and will be set before makeAlgs is called
self.addOption ('inputContainer', '', type=str,
info="select tau input container, by default set to TauJets")
self.addOption ('containerName', '', type=str,
noneAction='error',
info="the name of the output container after calibration.")
self.addOption ('rerunTruthMatching', True, type=bool,
info="whether to rerun truth matching (sets up an instance of "
"CP::TauTruthMatchingAlg). The default is True.")
self.addOption ('decorateExtraVariables', True, type=bool,
info="decorate extra variables for the reconstructed tau")
def instanceName (self) :
# Used to make algorithm names unique when multiple instances are scheduled
return self.containerName
def makeAlgs (self, config) :
# The config object (ConfigAccumulator) provides access to global settings,
# manages created algorithms, and tracks bookkeeping across blocks
# Determine and register the source container
inputContainer = "AnalysisTauJets" if config.isPhyslite() else "TauJets"
if self.inputContainer:
inputContainer = self.inputContainer
config.setSourceName (self.containerName, inputContainer)
# Set up the tau truth matching algorithm (MC only)
if self.rerunTruthMatching and config.dataType() is not DataType.Data:
alg = config.createAlgorithm( 'CP::TauTruthMatchingAlg',
'TauTruthMatchingAlg' )
config.addPrivateTool( 'matchingTool',
'TauAnalysisTools::TauTruthMatchingTool' )
alg.matchingTool.TruthJetContainerName = 'AntiKt4TruthDressedWZJets'
alg.taus = config.readName (self.containerName)
alg.preselection = config.getPreselection (self.containerName, '')
# Decorate extra variables
if self.decorateExtraVariables:
alg = config.createAlgorithm( 'CP::TauExtraVariablesAlg',
'TauExtraVariablesAlg' )
alg.taus = config.readName (self.containerName)
config.addOutputVar (self.containerName, 'nTracksCharged', 'nTracksCharged', noSys=True)
# Set up the tau 4-momentum smearing algorithm
alg = config.createAlgorithm( 'CP::TauSmearingAlg', 'TauSmearingAlg' )
config.addPrivateTool( 'smearingTool', 'TauAnalysisTools::TauSmearingTool' )
alg.smearingTool.useFastSim = config.dataType() is DataType.FastSim
alg.smearingTool.Campaign = "mc23" if config.geometry() is LHCPeriod.Run3 else "mc20"
alg.taus = config.readName (self.containerName)
alg.tausOut = config.copyName (self.containerName)
alg.preselection = config.getPreselection (self.containerName, '')
# Register output variables
config.addOutputVar (self.containerName, 'pt', 'pt')
config.addOutputVar (self.containerName, 'eta', 'eta', noSys=True)
config.addOutputVar (self.containerName, 'phi', 'phi', noSys=True)
config.addOutputVar (self.containerName, 'charge', 'charge', noSys=True)
Option Declaration¶
The addOption method supports several parameters beyond the basic ones
shown above.
noneAction Values¶
The noneAction parameter controls what happens when an option is set
to the value of None (or left at a default value of None):
'ignore'(default): The option will not be changed from its default value (second parameter).'error': Raises an error if the option is not set. Use this for options that must be provided by the user. Alternatively you can specifyrequired=True, but that is less common.'set': Allows setting the option toNoneexplicitly. This is rarely needed, because options that allowNonetypically haveNoneas their default value already.
Expert Mode Options¶
Options can be marked with expertMode=True:
self.addOption ('advancedSetting', 0.5, type=float, expertMode=True,
info="an advanced setting most users should not change")
Expert mode options can only be set when this is specifically enabled, in which case a warning will be generated. These options typically enable features and settings that should not be used in physics analysis, but may be used e.g. for CP studies or cross checks.
Built-in Options¶
All ConfigBlocks automatically have several built-in options that do not need to be declared:
skipOnData: IfTrue, skip this block when running on dataskipOnMC: IfTrue, skip this block when running on MConlyForDSIDs: A list of DSIDs; if set, only run for those specific MC samplesgroupName: Override the default group name for the block
Common makeAlgs Patterns¶
Querying Configuration State¶
The config object (a ConfigAccumulator) provides methods to query
the current configuration state:
def makeAlgs(self, config):
# Check data type (DataType imported from AnalysisAlgorithmsConfig.ConfigAccumulator)
if config.dataType() is DataType.Data:
# Data-specific configuration
pass
elif config.dataType() is DataType.FullSim:
# Full simulation MC
pass
elif config.dataType() is DataType.FastSim:
# Fast simulation MC
pass
# Check if running on PHYSLITE
if config.isPhyslite():
# PHYSLITE-specific handling
pass
# Get LHC run period (LHCPeriod imported from AthenaConfiguration.Enums)
if config.geometry() >= LHCPeriod.Run3:
# Run 3 specific configuration
pass
# Get MC channel number (0 for data)
dsid = config.dsid()
Preselections and Selections¶
Most algorithms support preselections, which allow them to skip objects that have already failed earlier cuts. Configure this by asking the config accumulator for the current preselection:
alg.preselection = config.getPreselection (self.containerName, '')
The second argument is the selection name (empty string for the default selection). As more selection cuts get added by upstream blocks, the preselection string is updated automatically.
When an algorithm adds new selection cuts, register them via
addSelection immediately after creating the algorithm:
alg = config.createAlgorithm('CP::TauSelectionAlg', 'TauSelectionAlg')
alg.selectionDecoration = 'selected_tau,as_char'
alg.particles = config.readName (self.containerName)
alg.preselection = config.getPreselection (self.containerName, self.selectionName)
config.addSelection (self.containerName, self.selectionName, alg.selectionDecoration,
preselection=True)
The preselection=True parameter indicates that subsequent algorithms
should include this cut in their preselection.
Container Management¶
In general for any container to which we apply corrections or
decorations, we will make a shallow copy first. Or often multiple
shallow copies as we move to the processing chain and add additional
momentum systematics. The ConfigAccumulator will track these shallow
copies and make sure you always access the latest one.
For most algorithms you simply want to pass in the name of the current
container copy, which you can do using readName:
alg.taus = config.readName (self.containerName)
Some algorithms need to create a shallow copy as part of their operation
(usually because they add momentum systematics). For that you can then
switch the container to a new copy via copyName:
alg.taus = config.readName (self.containerName)
alg.tausOut = config.copyName (self.containerName)
The order is important here. After you have called copyName the name
will be updated and readName will return the new name.
Note that readName (or copyName) won't usually return a name in the
event store, but a name that ends in _%SYS%, because that's what the
systematics handles need.
Connecting the source container¶
IMPORTANT: This is only relevant for the first block running on each container. In most cases that's the calibration block. Subsequent blocks will already have the source container set up from an upstream block.
The user will generally give each container its own name in the
configuration file (typically following a pattern of AnaJets,
AnaMuons, etc.). As a first thing before the container is used
(usually at the beginning of the calibration block), you need to declare
the source container your container connects to via setSourceName.
inputContainer = "AnalysisTauJets" if config.isPhyslite() else "TauJets"
if self.inputContainer:
inputContainer = self.inputContainer
config.setSourceName (self.containerName, inputContainer)
If your first algorithm does not create a shallow copy, you will have to
check whether a shallow copy is required (via wantCopy) and if so
create it (via ShallowCopyAlg):
if config.wantCopy (self.containerName) :
alg = config.createAlgorithm( 'CP::AsgShallowCopyAlg', 'TauShallowCopyAlg' )
alg.input = config.readName (self.containerName)
alg.output = config.copyName (self.containerName)
Output Variables¶
Most configuration blocks will create some variables that eventually
should be added to the output n-tuple. These are registered in the block
that creates them (via addOutputVar()), so that they directly
correspond to the variables the configured blocks generate. See the
output documentation for details.
Block Dependencies¶
Normally blocks are ordered based on the order in which they are declared in the config factory (i.e. not the order in the user configuration file). In most situations that is sufficient, but when it isn't you can add extra data dependencies to enforce a specific ordering.
ConfigBlocks can declare dependencies on other blocks using
addDependency. This ensures blocks are scheduled in the correct order.
def __init__(self):
super(MyAnalysisConfig, self).__init__()
self.setBlockName('MyAnalysis')
self.addDependency('Electrons', required=True)
self.addDependency('Jets', required=False)
With required=True (the default), an error is raised if the dependency
is not present in the configuration. With required=False, the
dependency serves only as an ordering hint—if the other block is
present, this block will run after it.
When you declare dependencies, an ignoreDependencies option is
automatically added to the block. Users can set this to True to bypass
dependency checks, which is occasionally useful for advanced
configurations.
Notes from the Developers¶
We did actually start with a design somewhat similar to the
ComponentAccumulator design, in which there was a function that took all
options via arguments and returned an algorithm sequence. However, in
practice we reached a point in which these functions then needed to be
wrapped into objects to provide the needed functionality. And while that
may still be fine, those wrapper objects were custom for each function
and tightly coupled to them, including needing to replicate if
statements for every major conditional inside the function. Merging the
function into the object made the implementation simpler and more
maintainable.
One of the important differences to the ComponentAccumulator is that the blocks are not truly isolated. Each block needs to be able to pass information about selections, container names, and output variables to subsequent variables. This is typically fairly low-level information, but also quite a lot of information, so the approach is to pass the information in a way that keeps coupling low (instead of completely isolating blocks). This is currently tightly integrated into the ConfigAccumulator, but there has been discussing of splitting that off (and turning ConfigAccumulator into a facade).
The naming convention of *Config.py is very unfortunate, as it invites
regular confusion with the ComponentAccumulator configuration files. In
principle we intend to switch this over to *Block.py to make it
unique, but we never seem to be able to find a good time at which it
won't break a lot of pending merge requests. For reference: ConfigBlocks
live in *Config.py files within the *AnalysisAlgorithms packages,
while ComponentAccumulator configuration files are typically named
*Cfg.py.
There is a tension in the design that a ConfigBlock maps both to a specific section in the configuration and to a series of algorithms to run as a block. For the most part that is fine, but sometimes you may want to run some algorithms at a later point. In practice that would mean running some algorithms early in the sequence, then give other blocks a chance to schedule their algorithms, and then scheduling some more algorithms. At the moment there are no good general mechanisms for handling such cases. If there was, one could e.g. run the good object selection at the current point, but delay efficiency scale factors until after all event selection algorithms are applied and we are sure we actually need the scale factors.
There is currently a mechanic that leads to makeAlgs() being called once on every block, then the output sequence gets reset and then it gets called again. For the most part that won't matter for your block, and as a mechanism we intend to remove it. If you however change member variables on your block (e.g. have a counter) it would become problematic:
def makeAlgs(self, config):
# BAD: member variable modified in makeAlgs
self.counter += 1 # Will be incremented twice!