Configuration Blocks¶

The configuration block is the basic unit of configuration, with each block generally mapping onto a single section or subsection in the YAML file.

The approach here differs from how the ComponentAccumulator configuration works (the main mechanism in Athena): For the ComponentAccumulator configuration is provided in the form of a function that gets all options and global settings passed in, and returns the configured algorithms as a ComponentAccumulator object. The configuration blocks on the other hand are objects that declare all relevant options, which get set on the object, and the configuration gets passed in a ConfigAccumulator that gives access to global settings and receives the configured algorithms.

Example ConfigBlock¶

To illustrate how ConfigBlocks work, here is a stripped-down version of the tau calibration block, we will then explore below how it works. This will typically be sitting in a file like TauAnalysisConfig.py, and you will need to define a block factory to use it:

from AnalysisAlgorithmsConfig.ConfigAccumulator import DataType
from AnalysisAlgorithmsConfig.ConfigBlock import ConfigBlock
from AthenaConfiguration.Enums import LHCPeriod

class TauCalibrationConfig (ConfigBlock):
    """the ConfigBlock for the tau four-momentum correction"""

    def __init__ (self) :
        super (TauCalibrationConfig, self).__init__ ()
        self.setBlockName('Taus')
        # Options are declared in __init__ and will be set before makeAlgs is called
        self.addOption ('inputContainer', '', type=str,
            info="select tau input container, by default set to TauJets")
        self.addOption ('containerName', '', type=str,
            noneAction='error',
            info="the name of the output container after calibration.")
        self.addOption ('rerunTruthMatching', True, type=bool,
            info="whether to rerun truth matching (sets up an instance of "
            "CP::TauTruthMatchingAlg). The default is True.")
        self.addOption ('decorateExtraVariables', True, type=bool,
            info="decorate extra variables for the reconstructed tau")

    def instanceName (self) :
        # Used to make algorithm names unique when multiple instances are scheduled
        return self.containerName

    def makeAlgs (self, config) :
        # The config object (ConfigAccumulator) provides access to global settings,
        # manages created algorithms, and tracks bookkeeping across blocks

        # Determine and register the source container
        inputContainer = "AnalysisTauJets" if config.isPhyslite() else "TauJets"
        if self.inputContainer:
            inputContainer = self.inputContainer
        config.setSourceName (self.containerName, inputContainer)

        # Set up the tau truth matching algorithm (MC only)
        if self.rerunTruthMatching and config.dataType() is not DataType.Data:
            alg = config.createAlgorithm( 'CP::TauTruthMatchingAlg',
                                          'TauTruthMatchingAlg' )
            config.addPrivateTool( 'matchingTool',
                                   'TauAnalysisTools::TauTruthMatchingTool' )
            alg.matchingTool.TruthJetContainerName = 'AntiKt4TruthDressedWZJets'
            alg.taus = config.readName (self.containerName)
            alg.preselection = config.getPreselection (self.containerName, '')

        # Decorate extra variables
        if self.decorateExtraVariables:
            alg = config.createAlgorithm( 'CP::TauExtraVariablesAlg',
                                          'TauExtraVariablesAlg' )
            alg.taus = config.readName (self.containerName)
            config.addOutputVar (self.containerName, 'nTracksCharged', 'nTracksCharged', noSys=True)

        # Set up the tau 4-momentum smearing algorithm
        alg = config.createAlgorithm( 'CP::TauSmearingAlg', 'TauSmearingAlg' )
        config.addPrivateTool( 'smearingTool', 'TauAnalysisTools::TauSmearingTool' )
        alg.smearingTool.useFastSim = config.dataType() is DataType.FastSim
        alg.smearingTool.Campaign = "mc23" if config.geometry() is LHCPeriod.Run3 else "mc20"
        alg.taus = config.readName (self.containerName)
        alg.tausOut = config.copyName (self.containerName)
        alg.preselection = config.getPreselection (self.containerName, '')

        # Register output variables
        config.addOutputVar (self.containerName, 'pt', 'pt')
        config.addOutputVar (self.containerName, 'eta', 'eta', noSys=True)
        config.addOutputVar (self.containerName, 'phi', 'phi', noSys=True)
        config.addOutputVar (self.containerName, 'charge', 'charge', noSys=True)

Option Declaration¶

The addOption method supports several parameters beyond the basic ones shown above.

noneAction Values¶

The noneAction parameter controls what happens when an option is set to the value of None (or left at a default value of None):

'ignore' (default): The option will not be changed from its default value (second parameter).
'error': Raises an error if the option is not set. Use this for options that must be provided by the user. Alternatively you can specify required=True, but that is less common.
'set': Allows setting the option to None explicitly. This is rarely needed, because options that allow None typically have None as their default value already.

Expert Mode Options¶

Options can be marked with expertMode=True:

self.addOption ('advancedSetting', 0.5, type=float, expertMode=True,
    info="an advanced setting most users should not change")

Expert mode options can only be set when this is specifically enabled, in which case a warning will be generated. These options typically enable features and settings that should not be used in physics analysis, but may be used e.g. for CP studies or cross checks.

Built-in Options¶

All ConfigBlocks automatically have several built-in options that do not need to be declared:

skipOnData: If True, skip this block when running on data
skipOnMC: If True, skip this block when running on MC
onlyForDSIDs: A list of DSIDs; if set, only run for those specific MC samples
groupName: Override the default group name for the block

Common makeAlgs Patterns¶

Querying Configuration State¶

The config object (a ConfigAccumulator) provides methods to query the current configuration state:

def makeAlgs(self, config):
    # Check data type (DataType imported from AnalysisAlgorithmsConfig.ConfigAccumulator)
    if config.dataType() is DataType.Data:
        # Data-specific configuration
        pass
    elif config.dataType() is DataType.FullSim:
        # Full simulation MC
        pass
    elif config.dataType() is DataType.FastSim:
        # Fast simulation MC
        pass

    # Check if running on PHYSLITE
    if config.isPhyslite():
        # PHYSLITE-specific handling
        pass

    # Get LHC run period (LHCPeriod imported from AthenaConfiguration.Enums)
    if config.geometry() >= LHCPeriod.Run3:
        # Run 3 specific configuration
        pass

    # Get MC channel number (0 for data)
    dsid = config.dsid()

Preselections and Selections¶

Most algorithms support preselections, which allow them to skip objects that have already failed earlier cuts. Configure this by asking the config accumulator for the current preselection:

alg.preselection = config.getPreselection (self.containerName, '')

The second argument is the selection name (empty string for the default selection). As more selection cuts get added by upstream blocks, the preselection string is updated automatically.

When an algorithm adds new selection cuts, register them via addSelection immediately after creating the algorithm:

alg = config.createAlgorithm('CP::TauSelectionAlg', 'TauSelectionAlg')
alg.selectionDecoration = 'selected_tau,as_char'
alg.particles = config.readName (self.containerName)
alg.preselection = config.getPreselection (self.containerName, self.selectionName)
config.addSelection (self.containerName, self.selectionName, alg.selectionDecoration,
                     preselection=True)

The preselection=True parameter indicates that subsequent algorithms should include this cut in their preselection.

Container Management¶

In general for any container to which we apply corrections or decorations, we will make a shallow copy first. Or often multiple shallow copies as we move to the processing chain and add additional momentum systematics. The ConfigAccumulator will track these shallow copies and make sure you always access the latest one.

For most algorithms you simply want to pass in the name of the current container copy, which you can do using readName:

alg.taus = config.readName (self.containerName)

Some algorithms need to create a shallow copy as part of their operation (usually because they add momentum systematics). For that you can then switch the container to a new copy via copyName:

alg.taus = config.readName (self.containerName)
alg.tausOut = config.copyName (self.containerName)

The order is important here. After you have called copyName the name will be updated and readName will return the new name.

Note that readName (or copyName) won't usually return a name in the event store, but a name that ends in _%SYS%, because that's what the systematics handles need.

Connecting the source container¶

IMPORTANT: This is only relevant for the first block running on each container. In most cases that's the calibration block. Subsequent blocks will already have the source container set up from an upstream block.

The user will generally give each container its own name in the configuration file (typically following a pattern of AnaJets, AnaMuons, etc.). As a first thing before the container is used (usually at the beginning of the calibration block), you need to declare the source container your container connects to via setSourceName.

inputContainer = "AnalysisTauJets" if config.isPhyslite() else "TauJets"
if self.inputContainer:
    inputContainer = self.inputContainer
config.setSourceName (self.containerName, inputContainer)

If your first algorithm does not create a shallow copy, you will have to check whether a shallow copy is required (via wantCopy) and if so create it (via ShallowCopyAlg):

if config.wantCopy (self.containerName) :
    alg = config.createAlgorithm( 'CP::AsgShallowCopyAlg', 'TauShallowCopyAlg' )
    alg.input = config.readName (self.containerName)
    alg.output = config.copyName (self.containerName)

Output Variables¶

Most configuration blocks will create some variables that eventually should be added to the output n-tuple. These are registered in the block that creates them (via addOutputVar()), so that they directly correspond to the variables the configured blocks generate. See the output documentation for details.

Block Dependencies¶

Normally blocks are ordered based on the order in which they are declared in the config factory (i.e. not the order in the user configuration file). In most situations that is sufficient, but when it isn't you can add extra data dependencies to enforce a specific ordering.

ConfigBlocks can declare dependencies on other blocks using addDependency. This ensures blocks are scheduled in the correct order.

def __init__(self):
    super(MyAnalysisConfig, self).__init__()
    self.setBlockName('MyAnalysis')
    self.addDependency('Electrons', required=True)
    self.addDependency('Jets', required=False)

With required=True (the default), an error is raised if the dependency is not present in the configuration. With required=False, the dependency serves only as an ordering hint—if the other block is present, this block will run after it.

When you declare dependencies, an ignoreDependencies option is automatically added to the block. Users can set this to True to bypass dependency checks, which is occasionally useful for advanced configurations.

Notes from the Developers¶

We did actually start with a design somewhat similar to the ComponentAccumulator design, in which there was a function that took all options via arguments and returned an algorithm sequence. However, in practice we reached a point in which these functions then needed to be wrapped into objects to provide the needed functionality. And while that may still be fine, those wrapper objects were custom for each function and tightly coupled to them, including needing to replicate if statements for every major conditional inside the function. Merging the function into the object made the implementation simpler and more maintainable.

One of the important differences to the ComponentAccumulator is that the blocks are not truly isolated. Each block needs to be able to pass information about selections, container names, and output variables to subsequent variables. This is typically fairly low-level information, but also quite a lot of information, so the approach is to pass the information in a way that keeps coupling low (instead of completely isolating blocks). This is currently tightly integrated into the ConfigAccumulator, but there has been discussing of splitting that off (and turning ConfigAccumulator into a facade).

The naming convention of *Config.py is very unfortunate, as it invites regular confusion with the ComponentAccumulator configuration files. In principle we intend to switch this over to *Block.py to make it unique, but we never seem to be able to find a good time at which it won't break a lot of pending merge requests. For reference: ConfigBlocks live in *Config.py files within the *AnalysisAlgorithms packages, while ComponentAccumulator configuration files are typically named *Cfg.py.

There is a tension in the design that a ConfigBlock maps both to a specific section in the configuration and to a series of algorithms to run as a block. For the most part that is fine, but sometimes you may want to run some algorithms at a later point. In practice that would mean running some algorithms early in the sequence, then give other blocks a chance to schedule their algorithms, and then scheduling some more algorithms. At the moment there are no good general mechanisms for handling such cases. If there was, one could e.g. run the good object selection at the current point, but delay efficiency scale factors until after all event selection algorithms are applied and we are sure we actually need the scale factors.

There is currently a mechanic that leads to makeAlgs() being called once on every block, then the output sequence gets reset and then it gets called again. For the most part that won't matter for your block, and as a mechanism we intend to remove it. If you however change member variables on your block (e.g. have a counter) it would become problematic:

def makeAlgs(self, config):
    # BAD: member variable modified in makeAlgs
    self.counter += 1  # Will be incremented twice!